Hi all! I have a lot of experience using ACS data through data.census.gov and have recently started trying to learn how to use microdata to answer some more detailed research questions. I can do basic tabulations through MDAT (e.g., finding the number of 3-year-olds who live in poverty) but am getting a little stumped at how to answer more detailed questions that have both household-level characteristics and person-level characteristics. (For example, how many households in X state have both an infant under age 1 AND a 3-year-old?). Is this type of question answerable through MDAT or through microdata in general? If so, are there resources you would recommend to build more sophisticated skills in using ACS microdata?
Thanks so much for any insights you can offer!
Sarah
IPUMS USA offers data from the American Community Survey and decennial census harmonized across time and space. Our version of the PUMS is easy to use, has consistent codes for all samples, and we also provide detailed technical documentation and learning resources for users. This video tutorial is a good introduction to how to create a custom data extract from IPUMS USA—a data file that includes just the sample(s) and variables you want. We also have an online data analysis tool that’s great for simple tabulations, but to create the types of variables you’ve described here, you will need to create a data extract and analyze the data in a stats package like R, Stata, SPSS, or SAS.
You can use ACS data provided by IPUMS USA to measure the number of households with an infant under age 1 and a 3-year-old. In IPUMS USA, each household has a SERIAL number that you can use to identify all the members of the household. Within a given sample (such as the 2021 ACS), SERIAL uniquely identifies a household. Within the whole IPUMS USA data collection, SERIAL and SAMPLE together uniquely identify a household across samples. I would create a binary variable indicating that an individual is an infant under age 1 (AGE=0) and another binary variable indicating that an individual is a 3-year-old (AGE=3). Then, I would create a household level variable that counts the number of infants and 3-year-olds in the household. Then, I would simply create a binary variable equal to one if the respondent’s household includes at least one infant and at least one 3-year-old. Below is how I would code this using Stata, though I’m sure there are more elegant ways to do it:
gen infant = 1 if age==0replace infant = 0 if age!=0gen threeyo = 1 if age==3replace threeyo = 1 if age!=3egen infant_hh = sum(infant), by(sample serial)egen threeyo_hh = sum(threeyo), by(sample serial)gen targethh = 1 if infant_hh>0 & threeyo_hh>0
From here, you could count the number of households meeting this criterium simply by summarizing or tabulating the targethh variable while retaining just one member of each household using the filter PERNUM=1. This filter restricts your analysis to just the first person on each household roster. Apply the household weight HHWT to get a representative count of households meeting your specifications.
I hope this helps you get started if you choose to use IPUMS for this purpose. Feel free to email us at ipums@umn.edu or post in our user forum as well with any questions.
Thanks so much for this detailed reply, Isabel. I really appreciate it!