Summarizing 5 year PUMS data


I have a table from iPUMS for 2021 5 year ACS. I want to sum the person weight in the PUMS data to create a single county total. The PUMS data distinguishes between different years in the data product - 2017, 2018, etc. When I compare the results to a standard ACS table with the same variables, the PUMS data is 2x to 5x higher across all variables.  This seems more than just a difference in methods.  

The different years stands out to me.  Should I be summing up all 5 years or doing something else like summing each year and getting the average?  Or something else entirely? 

EDIT: SOLVED.  I was comparing the wrong thing.

  • It sounds like you are trying to compare results from a summary of PUMS data to summary file data. Without more information on the specific topic/table you are trying to replicate, I can only provide general comments. Hopefully they help. For questions about IPUMS versions of the data, I would encourage you to post at; our staff monitors that site more actively.

    1. The PUMS and summary data both come in 1-year and 5-year files. I would suggest using the same year/multi-year reference period if making a comparison between the two.
    2. The 5-year PUMS files will adjust the weights for the pooling of all five years; however, if you are directly pooling adjacent 1-year samples, you will need to make this adjustment on your own (dividing by the number of years you are pooling should be fine).
    3. The topic/variable will dictate which weight you use (household or person weight). For household-level analyses (e.g., households by household income), you would want to apply the household-level weight and be sure to restrict your PUMS file to only one observation per household (depending on where you are accessing the PUMS file from and the format of your data file, you may not need to restrict your file at all). For person-level tables (e.g., race), you would want to use the person-level weight. There is no family-level weight; I think published estimates apply the household weights for family-level analyses.
    4. The summary files will offer geographic detail not available in the PUMS files. PUMAs are the most detailed geography in the PUMS files; IPUMS USA infers a number of other geographic units where possible, including counties. Depending on the level you are looking at, the unit may not be identifiable in the PUMS or there may be a greater margin of error around estimates for certain geographic units (though I wouldn't expect this at the magnitude you are describing).
    5. I wouldn't expect your estimates from the PUMS files to be 2-5x off of published counts, but I also wouldn't expect to match them exactly. The published estimates are calculated from a restricted-use version of the data which include more detail and more records--I would expect that your results would be within the margin of error, but not be identical. Note that the published estimates may also implement universe restrictions that can be a bit buried in the documentation.
  • Thank you for your help!  I didn't realize there was an iPUMS forum.  In this case, I realized I was comparing the wrong variables as well as comparing 5 year to 1 year.  Once I compared apples to apples, the results were fine.  

  • Glad you were able to solve the issue!

  • Here is some information that you might find useful.

    You can check your calculation by creating variables in the PUMS dataset that correspond to the categories in the detailed ACS table. You can then create a table that corresponds to an ACS detailed (or subject or data profile) table.  Look at the ACS table on selecting the PUMA (public use microdata )geography  that uses the same PUMA as the PUMS data.  If you define everything correctly you should get a result that  is close to the result for the other method.  To get a "margin of error" for the PUMS data use the replicate weights. The MoE is in the table.  In general PUMAs and counties do not correspond.  A PUMA may include several counties or a county can have several PUMAs.  I general there is no direct correspondence between PUMAs and counties.  If you want data that gives you the correspondence use GEOCORR.

    This is a useful way to check you PUMS data calculations/computer code.

    Make sure to use the same "vintage" and period (1 year or 5 year) for both the PUMS and tables.