5-Yr PUMA Geography

Hello, I am accessing the 5-Yr estimates for 2022 but not seeing that PUMAs are an option in the "Geography"  Any ideas how to get this?

2022

2021

  • Yes, in MDAT, for the 2022 5-year PUMS, you'll find PUMA information through two _variables_, not geographies. The PUMA10 variable identifies 2010 PUMA codes for respondents from 2018 through 2021. PUMA20 identifies 2020 PUMA codes for respondents from 2022. This two-variable system will most likely continue for 5-year PUMS until the 2026 release when, once again, the 5-year PUMS will use only one set of PUMA definitions for the entire 5-year period. (MDAT uses the same setup for the 2012 through 2015 5-year PUMS releases, which also used two sets of PUMA definitions.)

    For IPUMS USA, we're working on providing PUMA codes for the 2022 5-year sample through a single "PUMA" variable (as we already do for the 2012 through 2015 5-year samples). We aim to release that update sometime in the next couple weeks. We have several other resources related to PUMAs and PUMA changes through our Geographic Tools & Resources page.

  • Thank you, Jonathan.  I found the variable, but there is no detail with it.  So, there is no telling which PUMA each row is.

  • As Jonathan indicate, the 2018-2022 PUMS data is a "mixed geography" file.  There are 2 variables that don't exist in the 2017-2021 file, PUMA10 and PUMA20.  The file is 4/5ths PUMA10 records and 1/5th PUMA20 records.  This makes the 2022 5 year file pretty useless.  You are better using the 2022 1-year PUMS data.  For the API for the 2018-2022 you can't use the  the "&for=public use microdata:PUMAFIPS" construction.  You need to use "&for state=STATEFIPS" and the "subset" on PUMA10 or PUMA20 FIPS code records.

    From the people at ACSO (American Community Survey Operations) :

    Hi David, 
    I think you are running into is the dual-PUMA issue. 
    This is what I  found in the 2022 PUMS 5-year User Guide:  The current PUMA boundaries are based on Census 2020 definitions, while records from 2021 and earlier use boundaries based on Census 2010 definitions. Therefore, multi-year files for 2022 will contain PUMA codes created from both Census 2010 and Census 2020. PUMA codes defined using Census 2010 are called PUMA10, while the newer PUMA codes defined from Census 2020 are called PUMA20. Each record on the PUMS files will contain either the PUMA10 or PUMA20 code, based on which year the record’s data were collected. Due to disclosure concerns, it is not possible to update the PUMA codes for the records from 2021 and earlier to 2020-based PUMAs by using their detailed geographic locations. Data users will need to crosswalk their data to obtain a single PUMA geography using other means, such as using allocation rates using GEOCORR.  

    I have reached out to the PUMS subject matter experts and have received the following responses:

    There's an error in the first API URL you provided, for the housing file call.  It's retrieving records where PUMA20=00902 regardless of state, while the person API call is restricting to PUMA20=00902 and state=25.

    Dual PUMAs were used for the DY22 5-year PUMS, as PUMA20 is only on the 2022 records, while PUMA10 is on the 2018 through 2021 records.  The information available is on p.12 of the ACS 5-year PUMS User Guide: https://www.census.gov/programs-surveys/acs/microdata/documentation.html

    You may benefit from using the PUMA10 and PUMA20 Variables to narrow down the search with the universe of 00902 below or could use the state link and do the same thing. 
    This might be helpful if you want to only be in that PUMA area for the Geography:


    I have attached the PUMS Data Dictionary for you as well. 


    I hope this helps.  Let me know if you have any other questions. 

    Vicki
  • Thank you David.  I need to use the 5-yr estimate as I want to go down to the PUMA level (I actually want county data).  But, this seems to not be an option with this file.  Like I've stated, I used the PUMA20 or PUMA10 variable but its pretty useless, with the way I'm using it.  The PUMA value does not show on the table.

    Any ideas of how to accurately get to the county level for 2022 data?

    Thank you,

    Lorna

  • Dear Lorna,

    If you look at my post about a Small Area Estimation (SAE) program that I wrote you can use the program to get county data.  The problem is that a county may contain several PUMAs, This situation is pretty easy to handle you can just "add up" the tables for the relevant PUMAs.  For PUMAs that cross county lines, which may contain parts of several counties, you are stuck.  I wrote the SAE program to handle this situation.  I start with PUMS data for all the relevant PUMAs (large area) and then I create PUMs like tract level data. Next I "stack" the tracts for the county that I want.  There are many potential issues with this approach but it seems reasonable.  You can then create any county table that you want using the synthetic data.

    The current version of the program does not produce useful MOEs.  I'm working on an extension the produces replicate weights. You can produce MOEs using the replicate weights.  To do all this you need to be able to use "R"  Do you have any experience with R ?  If you work for a nonprofit 501(c)(3) or government you can get free support through my foundation dorerfoundation.org

    Dave

  • Thanks again David.  I think I will switch over and use the PUMS data with SAS.  I need demographics by county/zip for 200% FPL.  I'm VERY interested in in using the Supplemental poverty data though.  Can you get me started?  I work for the state of Washington, DSHS 

  • I haven't used SAS in years and years so I don't recall how transferrable this resource is.... These two links were extremely helpful with using pums data in R.

    https://walker-data.com/census-r/introduction-to-census-microdata.html

    https://walker-data.com/tidycensus/articles/pums-data.html

    Meghan

  • Dear Lorna,

    Go to dorerfoundation.org an look for the "Contact Us" tab across the top. Send an email to the address and we can communicate via email.

    Best,

    Dave

  • Jonathan, thanks for your note. Will there be an announcement when IPUMS releases the single PUMA geography? 

  • We will announce to registered IPUMS users by email, but sometimes there's a week or two delay between release and the email. You can also occasionally check the Revision History page on the site for current status.

    Some insider info: we're on track for a release this Thursday or Friday. It's fully prepped but there are some technical hold-ups on the deployment. I'm not sure, but it sounds like our IT team will get those worked out soon.

  • That's great to hear, I'm excited to use the new PUMAs in the 2022 5yr survey for the older data for a question I'm working on. I have a tangential question.

    In my work I've been using the tidycensus, survey, and srvyr packages in R and calling the data through ACS API. I'm using it because of the resources available on how to use these to get margins of error as recommended by the census bureau (primarily: https://walker-data.com/tidycensus/articles/pums-data.html and https://walker-data.com/census-r/introduction-to-census-microdata.html).

    My question is, do you know of similar resources that walk a novice R user on how to do similar manipulations and summarizations of IPUMS microdata that include error estimates? Or perhaps a crosswalk of how IPUMS should be handled differently than the data extracted from ACS to get the margin of error using replicate weights (since the survey and srvyr packages do that automatically).

    Thank you,

    Meghan

  • If I am understanding your question correctly, you are interested in sample code for applying replicate weights to the ACS PUMS available from IPUMS USA to generate empirically derived standard errors for your estimates. IPUMS USA offers both the household (REPWT) and the person (REPWTP) replicate weights through their data access system. This IPUMS USA replicate weights summary page provides a bit of background information as well as sample code for applying replicate weights in R with the srvyr package.
  • Update: IPUMS USA has now added PUMA identifiers to its version of the 2022 5-year sample through a single PUMA variable. Almost all the other geographic variables that we derive from PUMA information have also been added, identifying counties, cities, metropolitan areas, percent metro population, and metropolitan / principal city status (where possible).

    We also extended three more geographic variables to both the 2022 1-year and 5-year samples: DENSITY, METPOP10, and HOMELAND. (We hadn't yet updated these for 2020 PUMAs, so unlike the other variables mentioned above, they weren't yet available in the 2022 1-year sample until today.)

  • While you work working on generating the single PUMA varialble, did you uncover any mislabeled 2020 PUMAs in the Census Bureau Reference Maps

    In Arkansas, for example, 0500800 is labeled as "White, Lonoke & Woodruff Counties" but it does not include any part of Lonoke County, that not resides in 0500900.

    It's not a huge deal, but I was hoping not to have to check and manually relabel all the PUMAs before we publish our data dashboard. 

  • To KatherinRPhillips: We didn't notice any PUMA naming issues, but IPUMS USA doesn't do much with PUMA names. In our versions of the microdata, we provide PUMA codes but not names. Out of curiosity, I looked into the naming issue you describe, and I agree that PUMA 0500800 is apparently misnamed. In the Census Bureau's relationship file between 2020 tracts and 2020 PUMAs, all of the tracts in Lonoke County (state 05, county 085) are in PUMA 0500900 and none are in 0500800.