Hello, I am accessing the 5-Yr estimates for 2022 but not seeing that PUMAs are an option in the "Geography" Any ideas how to get this?
2022
2021
Update: IPUMS USA has now added PUMA identifiers to its version of the 2022 5-year sample through a single PUMA variable. Almost all the other geographic variables that we derive from PUMA information…
In New York the names are very long. My guess is they edited the PUMAs, but did not edit all of the names. PUMAs now are almost always a set of tracts. In the distant past they were created by Summary…
Yes, in MDAT, for the 2022 5-year PUMS, you'll find PUMA information through two _variables_, not geographies. The PUMA10 variable identifies 2010 PUMA codes for respondents from 2018 through 2021. PUMA20 identifies 2020 PUMA codes for respondents from 2022. This two-variable system will most likely continue for 5-year PUMS until the 2026 release when, once again, the 5-year PUMS will use only one set of PUMA definitions for the entire 5-year period. (MDAT uses the same setup for the 2012 through 2015 5-year PUMS releases, which also used two sets of PUMA definitions.)For IPUMS USA, we're working on providing PUMA codes for the 2022 5-year sample through a single "PUMA" variable (as we already do for the 2012 through 2015 5-year samples). We aim to release that update sometime in the next couple weeks. We have several other resources related to PUMAs and PUMA changes through our Geographic Tools & Resources page.
Thank you, Jonathan. I found the variable, but there is no detail with it. So, there is no telling which PUMA each row is.
As Jonathan indicate, the 2018-2022 PUMS data is a "mixed geography" file. There are 2 variables that don't exist in the 2017-2021 file, PUMA10 and PUMA20. The file is 4/5ths PUMA10 records and 1/5th PUMA20 records. This makes the 2022 5 year file pretty useless. You are better using the 2022 1-year PUMS data. For the API for the 2018-2022 you can't use the the "&for=public use microdata:PUMAFIPS" construction. You need to use "&for state=STATEFIPS" and the "subset" on PUMA10 or PUMA20 FIPS code records.
From the people at ACSO (American Community Survey Operations) :
Thank you David. I need to use the 5-yr estimate as I want to go down to the PUMA level (I actually want county data). But, this seems to not be an option with this file. Like I've stated, I used the PUMA20 or PUMA10 variable but its pretty useless, with the way I'm using it. The PUMA value does not show on the table.
Any ideas of how to accurately get to the county level for 2022 data?
Thank you,
Lorna
Dear Lorna,
If you look at my post about a Small Area Estimation (SAE) program that I wrote you can use the program to get county data. The problem is that a county may contain several PUMAs, This situation is pretty easy to handle you can just "add up" the tables for the relevant PUMAs. For PUMAs that cross county lines, which may contain parts of several counties, you are stuck. I wrote the SAE program to handle this situation. I start with PUMS data for all the relevant PUMAs (large area) and then I create PUMs like tract level data. Next I "stack" the tracts for the county that I want. There are many potential issues with this approach but it seems reasonable. You can then create any county table that you want using the synthetic data.
The current version of the program does not produce useful MOEs. I'm working on an extension the produces replicate weights. You can produce MOEs using the replicate weights. To do all this you need to be able to use "R" Do you have any experience with R ? If you work for a nonprofit 501(c)(3) or government you can get free support through my foundation dorerfoundation.org
Dave
Thanks again David. I think I will switch over and use the PUMS data with SAS. I need demographics by county/zip for 200% FPL. I'm VERY interested in in using the Supplemental poverty data though. Can you get me started? I work for the state of Washington, DSHS
I haven't used SAS in years and years so I don't recall how transferrable this resource is.... These two links were extremely helpful with using pums data in R.
https://walker-data.com/census-r/introduction-to-census-microdata.html
https://walker-data.com/tidycensus/articles/pums-data.html
Meghan
Go to dorerfoundation.org an look for the "Contact Us" tab across the top. Send an email to the address and we can communicate via email.
Best,
Jonathan, thanks for your note. Will there be an announcement when IPUMS releases the single PUMA geography?
We will announce to registered IPUMS users by email, but sometimes there's a week or two delay between release and the email. You can also occasionally check the Revision History page on the site for current status.Some insider info: we're on track for a release this Thursday or Friday. It's fully prepped but there are some technical hold-ups on the deployment. I'm not sure, but it sounds like our IT team will get those worked out soon.
That's great to hear, I'm excited to use the new PUMAs in the 2022 5yr survey for the older data for a question I'm working on. I have a tangential question.
In my work I've been using the tidycensus, survey, and srvyr packages in R and calling the data through ACS API. I'm using it because of the resources available on how to use these to get margins of error as recommended by the census bureau (primarily: https://walker-data.com/tidycensus/articles/pums-data.html and https://walker-data.com/census-r/introduction-to-census-microdata.html).
My question is, do you know of similar resources that walk a novice R user on how to do similar manipulations and summarizations of IPUMS microdata that include error estimates? Or perhaps a crosswalk of how IPUMS should be handled differently than the data extracted from ACS to get the margin of error using replicate weights (since the survey and srvyr packages do that automatically).
If I am understanding your question correctly, you are interested in sample code for applying replicate weights to the ACS PUMS available from IPUMS USA to generate empirically derived standard errors for your estimates. IPUMS USA offers both the household (REPWT) and the person (REPWTP) replicate weights through their data access system. This IPUMS USA replicate weights summary page provides a bit of background information as well as sample code for applying replicate weights in R with the srvyr package.
srvyr
Update: IPUMS USA has now added PUMA identifiers to its version of the 2022 5-year sample through a single PUMA variable. Almost all the other geographic variables that we derive from PUMA information have also been added, identifying counties, cities, metropolitan areas, percent metro population, and metropolitan / principal city status (where possible).
We also extended three more geographic variables to both the 2022 1-year and 5-year samples: DENSITY, METPOP10, and HOMELAND. (We hadn't yet updated these for 2020 PUMAs, so unlike the other variables mentioned above, they weren't yet available in the 2022 1-year sample until today.)
While you work working on generating the single PUMA varialble, did you uncover any mislabeled 2020 PUMAs in the Census Bureau Reference Maps?
In Arkansas, for example, 0500800 is labeled as "White, Lonoke & Woodruff Counties" but it does not include any part of Lonoke County, that not resides in 0500900.
It's not a huge deal, but I was hoping not to have to check and manually relabel all the PUMAs before we publish our data dashboard.
To KatherinRPhillips: We didn't notice any PUMA naming issues, but IPUMS USA doesn't do much with PUMA names. In our versions of the microdata, we provide PUMA codes but not names. Out of curiosity, I looked into the naming issue you describe, and I agree that PUMA 0500800 is apparently misnamed. In the Census Bureau's relationship file between 2020 tracts and 2020 PUMAs, all of the tracts in Lonoke County (state 05, county 085) are in PUMA 0500900 and none are in 0500800.
Thanks for the quick response. Looks like I will have to hand check each label as I suspect the shape files contain the same issue.