Hello, I am accessing the 5-Yr estimates for 2022 but not seeing that PUMAs are an option in the "Geography" Any ideas how to get this?
2022
2021
Update: IPUMS USA has now added PUMA identifiers to its version of the 2022 5-year sample through a single PUMA variable. Almost all the other geographic variables that we derive from PUMA information…
In New York the names are very long. My guess is they edited the PUMAs, but did not edit all of the names. PUMAs now are almost always a set of tracts. In the distant past they were created by Summary…
Geocorr (Glenn Rice) has cross-walks between various vintages of tracts and pumas. Between decennial census' the tract boundaries may change as well as the PUMA boundaries. Again Geocorr has cross…
Yes, in MDAT, for the 2022 5-year PUMS, you'll find PUMA information through two _variables_, not geographies. The PUMA10 variable identifies 2010 PUMA codes for respondents from 2018 through 2021. PUMA20 identifies 2020 PUMA codes for respondents from 2022. This two-variable system will most likely continue for 5-year PUMS until the 2026 release when, once again, the 5-year PUMS will use only one set of PUMA definitions for the entire 5-year period. (MDAT uses the same setup for the 2012 through 2015 5-year PUMS releases, which also used two sets of PUMA definitions.)For IPUMS USA, we're working on providing PUMA codes for the 2022 5-year sample through a single "PUMA" variable (as we already do for the 2012 through 2015 5-year samples). We aim to release that update sometime in the next couple weeks. We have several other resources related to PUMAs and PUMA changes through our Geographic Tools & Resources page.
Hi Jonathan,
I''m curious why the Census Bureau does not simply regeocode the 2018 to 2021 respondents into the 2020 PUMA boundaries? This way all respondents would be in the same geographies. This would take very little time and would make the dataset much easier to use.
Thanks,Charles
Geocorr (Glenn Rice) has cross-walks between various vintages of tracts and pumas. Between decennial census' the tract boundaries may change as well as the PUMA boundaries. Again Geocorr has cross walks between the "new 2020" boundaries (put into effect in 2022 for PUMAs).
https://mcdc.missouri.edu/applications/geocorr.html
For tracts or pumas or counties or any combination of geos and vintages, when the boundaries split another geography (such as 2019 tracts and 2022 tracts), geocorr gives an "allocation factor" based on the populations in the split geographies. I think that geocorr disaggregates down to the block level and then regroups up to the higher geography to get the allocation factors. I'm not sure how geocorr works when the block boundaries change between vintages. For example when a block is split between vintages.
Hope this helps. I use R to download and analyze PUMS data (including the geocorr files).
The short answer is that I use a GIS to intersect the older geography boundaries with the current block boundaries.
All geographies within in a single vintage of Geocorr must use the same "atoms" (e.g. 2020 blocks for Geocorr 2022). In order to add an earlier vintage of geography (e.g. 2010-vintage PUMAs) to Geocorr 2022, I have to redefine it in terms of 2020 blocks.
This is tedious, as you might expect, so I do this only for geo types that I consider most useful to compare across decennial vintages.
Thank you very much David, I'll check this out.
I'm still curious as to why the respondents in 2010 PUMAs are not geocoded to 2020 PUMAs in this latest file.
Charles,
Since my original post here, the Census Bureau has done something very much like what you suggest. Their first release of the 2019-2023 5-year PUMS identified 2020 PUMAs for all respondents. Later, they released a new version of the 2018-2022 5-year PUMS that identifies 2020 PUMAs for all respondents, but they later removed that.(We at IPUMS discovered some 2018 records in the 2018-2022 updated release that had invalid 2020 PUMA IDs. We let the Bureau know about these errors, and they temporarily removed the entire updated version of the PUMS. In this errata note, they state that they plan to release a corrected version at an unspecified later date.)
We’ve integrated the 2019-2023 5-year release into IPUMS USA, and we will also add the updated 2018-2022 PUMS if/when they're again available.
That said, as I understand, the approach they've taken is not exactly what you suggest--"regeocoding"--and there's a very good reason for that: it would violate respondent privacy.If they had simply regeocoded, then for the many households that appear in both the 2017-2021 5-year PUMS (using 2010 PUMAs) and in the 2018-2022 5-year PUMS (using 2020 PUMAs), it would typically be possible for a user to identify both the 2010 and 2020 PUMAs where each household resided. If a household resided in an area affected by a small PUMA boundary change--maybe encompassing a few thousand residents or less--then we could determine the location of that household MUCH more precisely than if we only knew which 2010 PUMA the household is in. (PUMAs are required to have at least 100,000 residents in order to prevent users from locating respondents more precisely.)
To my knowledge, the Bureau has yet to provide exact details on how they allocated 2020 PUMA identifiers to 2018-2021 responses in these new PUMS releases, but the documentation they have provided indicates that it was not by regeocoding. Rather, they used some kind of crosswalk from 2010 PUMAs to 2020 PUMAs, as you might get from Geocorr. I’ve confirmed that the allocation is not based on areal weighting (simplistically assuming that the likelihood of a 2010 PUMA resident living in a 2020 PUMA is equal to the proportion of the 2010 PUMA’s area in the 2020 PUMA.) But beyond that, we don’t know what the technique was.They may have used population allocation factors like those provided in Geocorr files, but I can't verify that. Unfortunately, whatever allocation technique they used--if it was not in fact regeocoding--will have introduced some allocation errors, so it'd be helpful if at some point they could provide more information about their approach and/or the potential scope and impact of any resultant PUMA misidentifications.
Thanks so much for the detailed explanation about why they did not regeocode the older respondents, that makes sense (and did not occur to me). We are going to proceed with using the 2019-2023 5-year since it has been "corrected", but I agree that it would be helpful to know how they allocated 2010 PUMA respondents to 2020 PUMAs.
Thanks, Charles