Microdata questions

Hi all! I have a lot of experience using ACS data through data.census.gov and have recently started trying to learn how to use microdata to answer some more detailed research questions. I can do basic tabulations through MDAT (e.g., finding the number of 3-year-olds who live in poverty) but am getting a little stumped at how to answer more detailed questions that have both household-level characteristics and person-level characteristics. (For example, how many households in X state have both an infant under age 1 AND a 3-year-old?). Is this type of question answerable through MDAT or through microdata in general? If so, are there resources you would recommend to build more sophisticated skills in using ACS microdata?

Thanks so much for any insights you can offer!

Sarah

Parents
  • Hi Sarah, yes, I agree with Stas. The best way to answer this question is working with the actual dataset. I know ipums is great, but I have more familiarity with the regular PUMS dataset from census.gov (https://www.census.gov/programs-surveys/acs/microdata.html).  They have example SAS code, which I've used and found helpful. (As Stas mentions, you can also use R or Stata).

    For the first question: How many households have both an infant under age 1 and a 3-yr old, the variables & logic you would need are something along these lines:

    • download both the person file and the housing file
    • from the person file, keep the variables: serialno, agep, st, pwgtp1 - pwgtp80,
    • from the housing file, keep the variableS: serialno, wgtp1 - wgtp80
    • subset the person file to agep == 0  or agep == 3
    • join this subsetted person file to the housing file based on the serialno as the match key
    • identify serialno values that came up twice (once from the 1-y.o. and once from the 3-y.o)
    • group by state (the st variable), and apply the survey weights to get your final estimates
Reply
  • Hi Sarah, yes, I agree with Stas. The best way to answer this question is working with the actual dataset. I know ipums is great, but I have more familiarity with the regular PUMS dataset from census.gov (https://www.census.gov/programs-surveys/acs/microdata.html).  They have example SAS code, which I've used and found helpful. (As Stas mentions, you can also use R or Stata).

    For the first question: How many households have both an infant under age 1 and a 3-yr old, the variables & logic you would need are something along these lines:

    • download both the person file and the housing file
    • from the person file, keep the variables: serialno, agep, st, pwgtp1 - pwgtp80,
    • from the housing file, keep the variableS: serialno, wgtp1 - wgtp80
    • subset the person file to agep == 0  or agep == 3
    • join this subsetted person file to the housing file based on the serialno as the match key
    • identify serialno values that came up twice (once from the 1-y.o. and once from the 3-y.o)
    • group by state (the st variable), and apply the survey weights to get your final estimates
Children