Summarizing 5 year PUMS data

Hello,

I have a table from iPUMS for 2021 5 year ACS. I want to sum the person weight in the PUMS data to create a single county total. The PUMS data distinguishes between different years in the data product - 2017, 2018, etc. When I compare the results to a standard ACS table with the same variables, the PUMS data is 2x to 5x higher across all variables.  This seems more than just a difference in methods.  

The different years stands out to me.  Should I be summing up all 5 years or doing something else like summing each year and getting the average?  Or something else entirely? 

EDIT: SOLVED.  I was comparing the wrong thing.

Parents
  • It sounds like you are trying to compare results from a summary of PUMS data to summary file data. Without more information on the specific topic/table you are trying to replicate, I can only provide general comments. Hopefully they help. For questions about IPUMS versions of the data, I would encourage you to post at forum.ipums.org; our staff monitors that site more actively.

    1. The PUMS and summary data both come in 1-year and 5-year files. I would suggest using the same year/multi-year reference period if making a comparison between the two.
    2. The 5-year PUMS files will adjust the weights for the pooling of all five years; however, if you are directly pooling adjacent 1-year samples, you will need to make this adjustment on your own (dividing by the number of years you are pooling should be fine).
    3. The topic/variable will dictate which weight you use (household or person weight). For household-level analyses (e.g., households by household income), you would want to apply the household-level weight and be sure to restrict your PUMS file to only one observation per household (depending on where you are accessing the PUMS file from and the format of your data file, you may not need to restrict your file at all). For person-level tables (e.g., race), you would want to use the person-level weight. There is no family-level weight; I think published estimates apply the household weights for family-level analyses.
    4. The summary files will offer geographic detail not available in the PUMS files. PUMAs are the most detailed geography in the PUMS files; IPUMS USA infers a number of other geographic units where possible, including counties. Depending on the level you are looking at, the unit may not be identifiable in the PUMS or there may be a greater margin of error around estimates for certain geographic units (though I wouldn't expect this at the magnitude you are describing).
    5. I wouldn't expect your estimates from the PUMS files to be 2-5x off of published counts, but I also wouldn't expect to match them exactly. The published estimates are calculated from a restricted-use version of the data which include more detail and more records--I would expect that your results would be within the margin of error, but not be identical. Note that the published estimates may also implement universe restrictions that can be a bit buried in the documentation.
Reply
  • It sounds like you are trying to compare results from a summary of PUMS data to summary file data. Without more information on the specific topic/table you are trying to replicate, I can only provide general comments. Hopefully they help. For questions about IPUMS versions of the data, I would encourage you to post at forum.ipums.org; our staff monitors that site more actively.

    1. The PUMS and summary data both come in 1-year and 5-year files. I would suggest using the same year/multi-year reference period if making a comparison between the two.
    2. The 5-year PUMS files will adjust the weights for the pooling of all five years; however, if you are directly pooling adjacent 1-year samples, you will need to make this adjustment on your own (dividing by the number of years you are pooling should be fine).
    3. The topic/variable will dictate which weight you use (household or person weight). For household-level analyses (e.g., households by household income), you would want to apply the household-level weight and be sure to restrict your PUMS file to only one observation per household (depending on where you are accessing the PUMS file from and the format of your data file, you may not need to restrict your file at all). For person-level tables (e.g., race), you would want to use the person-level weight. There is no family-level weight; I think published estimates apply the household weights for family-level analyses.
    4. The summary files will offer geographic detail not available in the PUMS files. PUMAs are the most detailed geography in the PUMS files; IPUMS USA infers a number of other geographic units where possible, including counties. Depending on the level you are looking at, the unit may not be identifiable in the PUMS or there may be a greater margin of error around estimates for certain geographic units (though I wouldn't expect this at the magnitude you are describing).
    5. I wouldn't expect your estimates from the PUMS files to be 2-5x off of published counts, but I also wouldn't expect to match them exactly. The published estimates are calculated from a restricted-use version of the data which include more detail and more records--I would expect that your results would be within the margin of error, but not be identical. Note that the published estimates may also implement universe restrictions that can be a bit buried in the documentation.
Children