The 2025 ACS Data Users Conference will be held on May 29, 2025 (virtual day) and June 3, 2025 (in-person in DC)
American Community Survey Data Users Group
Search for topics, people, or keywords
Sign Up
Log in
Site
Search for topics, people, or keywords
User
Home
Discussion Forum
ACS Resources
Webinars
Conferences
Steering Committee
More
Cancel
Home
Discussion Forum
ACS Resources
Webinars
Conferences
About
More
Cancel
Details
Views
10324 views
Replies
12 replies
Subscribers
541 subscribers
Users
0 members are here
Labels
PUMS
Related Census Bureau Resources
Related Forum Threads
Use PUMS data for PUMAS belonging only in one county
Galateia
over 10 years ago
Hello! I try to use PUMS 5yr 2008-2012 data and I face a strange fact that when I sum the weights (variable PWGTP in the dataset) for the PUMAS that belong to a specific County, I do not take as a result the known total population in this county (compared to the population from ACS 2008-2012 estimates)...
Could anyone advise me if I use the given weights properly? The PUMS data are for PUMAS in one whole state and I try to extract only the PUMAS in the county that I need. Is it possible or there is problem because the PUMS data are defined at a state level?
Cliff Cook
over 10 years ago
My experience working with a single PUMA is that the weighted total for population derived from PUMS data is close to but not an exact match with the published population total in the ACS.
Cancel
Up
0
Down
Reply
Cancel
Galateia
over 10 years ago
Thank you! My estimation is very different but it seems that I did some mistake because I found out that in general the PUMAs boundaries are related to the counties limits, so there is no reason for such a difference.
Cancel
Up
0
Down
Reply
Cancel
Tim Gilbert
over 10 years ago
Are you taking into account the fact that the 2008-2012 ACS 5-Year PUMS dataset has two different PUMA variables, PUMA00 and PUMA10? PUMA00 will have a value for cases in which the data was collected in 2008-2011, and PUMA10 will have a value for cases in which the data was collected in 2012. Also, the PUMA boundaries might be different, and PUMAs do not necessarily match up to counties.
Cancel
Up
0
Down
Reply
Cancel
Galateia
over 10 years ago
Thank you very much for your response!
Yes, I create a new column with values from PUMA10 when PUMAA00 is equal to -999. But how can we estimate the total population from the given variables?Is it ok to add the weights given in variable PWGTP within the dataset in order to take the total population in the corresponding area of interest?
Cancel
Up
0
Down
Reply
Cancel
Beth Jarosz
over 10 years ago
Tim has a good point. It may be (and in fact is likely) that the actual PUMA boundaries changed between 2000 and 2010, so simply creating one field may not be sufficient.
The county was likely defined by one set of PUMAs in 2000 and one set in 2010. So for the records that have a PUMA00, you'd need to select based on the PUMA numbers for that county in 2000, and for records with a PUMA10 you'd need to select the PUMA numbers for that county based on the re-numbered PUMAs in 2010.
Cancel
Up
0
Down
Reply
Cancel
Galateia
over 10 years ago
You are totally right!But, currently, I am working on Oklahoma county which, according to the 2012 tiger shapefile for PUMA10 for Oklahoma State includes 6 PUMAs (I checked it also from other sources). So, I downloaded the .csv file for the 2008-2012 PUMS in Oklahoma State and in this file there are two columns, one for the PUMA00 and one for the PUMA10. Either joining or not the two columns in one to have all the existing PUMAs, the weird in my estimations is that:
1) when I sum all the weights from the whole dataset, I can correctly reproduce the whole population in the State but
2) when I extract a small number of PUMAs (6) and I sum the weight (PWGTP variable) that corresponds only to the six PUMAs that I am interested in (oklahoma county), I take totally wrong number of population compared to other official data for this county.
So, I try to understand now if:
a) I have a mistake in the calculations for the extraction of the 6 PUMAs or
b) the PUMs data are defined in the level of State and they are not representative at the scale of 1 single PUMA
I am sorry for being tiring but your help is really valuable for me!
Thank you in advance!
Cancel
Up
0
Down
Reply
Cancel
Beth Jarosz
over 10 years ago
Apologies if my explanation was unclear.
There may be 6 PUMAs in the 2012 TIGER shapefile, but there may have been 4... or 7... (or who knows how many?) in 2000. And the numbering probably changed between 00 and 10.
For example, in Oklahoma the PUMA that was PUMA 100 from 2000-2011 (i.e. labeled in PUMA00 as "100") has become PUMA 500 (i.e. labeled in PUMA10 as "500").
You can create a list showing the changes using the MABLE/GeoCorr correspondence table creator at:
mcdc.missouri.edu/.../geocorr12.html
and you can find out more about how the PUMAs changed at
www.census.gov/.../puma.html
I hope that helps!
Cancel
Up
0
Down
Reply
Cancel
Tom Bell
over 10 years ago
There are text files that show what PUMA are in what counties. For 2000 Oklahoma County, OK is made up of 3 5% PUMA (these are the PUMA in the ACS public use file). These PUMA codes are 01301, 01302, 01400.
See
www2.census.gov/.../PUMEQ5-OK.TXT
for this information.
For 2010 Oklahoma county is made up of 6 PUMA 01001 - 01006.
Unless you are doing extensions cross tabs and need additional sample, using the 2012 1 yr PUMS would make this easier as then you could use the 2010 bases PUMA definitions.
Cancel
Up
0
Down
Reply
Cancel
Galateia
over 10 years ago
Thank you so much both of you!!!
Beth, now I realized what exactly you meant with the different PUMA numbers!The proposed website is perfect!
Tom the change is now totally clear and I try to see how I can use the information that correspond to the PUMA00 and PUMA10 under interest...
If the variables are the same in 5-year and 1-year PUMS I could use the 2012yr data to do the trials that I want. I just selected this 5-hr data because I needed to compare their use with the use of aggregated 2008-2012 5-yr ACS estimates for census tracts. The purpose is to conclude if I need individual-based data to assess human vulnerability in my study.
Thank you a lot!
Cancel
Up
0
Down
Reply
Cancel
Doug Hillmer
over 10 years ago
I'm chiming in a bit late to this thread on Galatia's question. However, I want to make one point which I haven't seen stated explicitly so far: You should never expect estimates based on PUMS data to match the published estimates for the same time period even if the PUMAs being summed up do match the county (or city) boundaries exactly. This is because the PUMS is a subsample of the selected sample used as a basis for the full sample estimates. The details on PUMS weighting are in
www.census.gov/.../2008_2012AccuracyPUMS.pdf
. There are a few instances in which the PUMS estimate for a characteristic in a PUMA should match the estimate based on the full sample for the same time period, but, based on my reading of this document, total population is not one of those characteristics.
Doug
Cancel
Up
0
Down
Reply
Cancel
Galateia
over 10 years ago
Thank you very much!That make sense!At the beginning I had huge differences because of the PUMA00/PUMA10 confusing issue. Now, that I am doing the good sum, you are totally right, the estimations can be close in some cases but never match excactly!
Thank you for this clarification!
Cancel
Up
0
Down
Reply
Cancel
Beth Jarosz
over 10 years ago
Doug is absolutely right.
I skipped past that issue because Cliff mentioned it briefly, above, and Galatia noted that her estimates were FAR off the mark. It didn't seem like a matter of rounding/weighting error in this case.
But the point is a good reminder for all. PUMS data will not exactly match published ACS.
Cancel
Up
0
Down
Reply
Cancel