Use PUMS data for PUMAS belonging only in one county

Hello! I try to use PUMS 5yr 2008-2012 data and I face a strange fact that when I sum the weights (variable PWGTP in the dataset) for the PUMAS that belong to a specific County, I do not take as a result the known total population in this county (compared to the population from ACS 2008-2012 estimates)...
Could anyone advise me if I use the given weights properly? The PUMS data are for PUMAS in one whole state and I try to extract only the PUMAS in the county that I need. Is it possible or there is problem because the PUMS data are defined at a state level?
  • My experience working with a single PUMA is that the weighted total for population derived from PUMS data is close to but not an exact match with the published population total in the ACS.
  • Thank you! My estimation is very different but it seems that I did some mistake because I found out that in general the PUMAs boundaries are related to the counties limits, so there is no reason for such a difference.
  • Are you taking into account the fact that the 2008-2012 ACS 5-Year PUMS dataset has two different PUMA variables, PUMA00 and PUMA10? PUMA00 will have a value for cases in which the data was collected in 2008-2011, and PUMA10 will have a value for cases in which the data was collected in 2012. Also, the PUMA boundaries might be different, and PUMAs do not necessarily match up to counties.
  • Thank you very much for your response!
    Yes, I create a new column with values from PUMA10 when PUMAA00 is equal to -999. But how can we estimate the total population from the given variables?Is it ok to add the weights given in variable PWGTP within the dataset in order to take the total population in the corresponding area of interest?
  • Tim has a good point. It may be (and in fact is likely) that the actual PUMA boundaries changed between 2000 and 2010, so simply creating one field may not be sufficient.

    The county was likely defined by one set of PUMAs in 2000 and one set in 2010. So for the records that have a PUMA00, you'd need to select based on the PUMA numbers for that county in 2000, and for records with a PUMA10 you'd need to select the PUMA numbers for that county based on the re-numbered PUMAs in 2010.
  • You are totally right!But, currently, I am working on Oklahoma county which, according to the 2012 tiger shapefile for PUMA10 for Oklahoma State includes 6 PUMAs (I checked it also from other sources). So, I downloaded the .csv file for the 2008-2012 PUMS in Oklahoma State and in this file there are two columns, one for the PUMA00 and one for the PUMA10. Either joining or not the two columns in one to have all the existing PUMAs, the weird in my estimations is that:
    1) when I sum all the weights from the whole dataset, I can correctly reproduce the whole population in the State but
    2) when I extract a small number of PUMAs (6) and I sum the weight (PWGTP variable) that corresponds only to the six PUMAs that I am interested in (oklahoma county), I take totally wrong number of population compared to other official data for this county.
    So, I try to understand now if:
    a) I have a mistake in the calculations for the extraction of the 6 PUMAs or
    b) the PUMs data are defined in the level of State and they are not representative at the scale of 1 single PUMA
    I am sorry for being tiring but your help is really valuable for me!
    Thank you in advance!
  • Apologies if my explanation was unclear.
    There may be 6 PUMAs in the 2012 TIGER shapefile, but there may have been 4... or 7... (or who knows how many?) in 2000. And the numbering probably changed between 00 and 10.

    For example, in Oklahoma the PUMA that was PUMA 100 from 2000-2011 (i.e. labeled in PUMA00 as "100") has become PUMA 500 (i.e. labeled in PUMA10 as "500").

    You can create a list showing the changes using the MABLE/GeoCorr correspondence table creator at: mcdc.missouri.edu/.../geocorr12.html
    and you can find out more about how the PUMAs changed at
    www.census.gov/.../puma.html

    I hope that helps!
  • There are text files that show what PUMA are in what counties. For 2000 Oklahoma County, OK is made up of 3 5% PUMA (these are the PUMA in the ACS public use file). These PUMA codes are 01301, 01302, 01400.

    See www2.census.gov/.../PUMEQ5-OK.TXT for this information.

    For 2010 Oklahoma county is made up of 6 PUMA 01001 - 01006.

    Unless you are doing extensions cross tabs and need additional sample, using the 2012 1 yr PUMS would make this easier as then you could use the 2010 bases PUMA definitions.
  • Thank you so much both of you!!!
    Beth, now I realized what exactly you meant with the different PUMA numbers!The proposed website is perfect!
    Tom the change is now totally clear and I try to see how I can use the information that correspond to the PUMA00 and PUMA10 under interest...
    If the variables are the same in 5-year and 1-year PUMS I could use the 2012yr data to do the trials that I want. I just selected this 5-hr data because I needed to compare their use with the use of aggregated 2008-2012 5-yr ACS estimates for census tracts. The purpose is to conclude if I need individual-based data to assess human vulnerability in my study.
    Thank you a lot!
  • I'm chiming in a bit late to this thread on Galatia's question. However, I want to make one point which I haven't seen stated explicitly so far: You should never expect estimates based on PUMS data to match the published estimates for the same time period even if the PUMAs being summed up do match the county (or city) boundaries exactly. This is because the PUMS is a subsample of the selected sample used as a basis for the full sample estimates. The details on PUMS weighting are in www.census.gov/.../2008_2012AccuracyPUMS.pdf. There are a few instances in which the PUMS estimate for a characteristic in a PUMA should match the estimate based on the full sample for the same time period, but, based on my reading of this document, total population is not one of those characteristics.

    Doug
  • Thank you very much!That make sense!At the beginning I had huge differences because of the PUMA00/PUMA10 confusing issue. Now, that I am doing the good sum, you are totally right, the estimations can be close in some cases but never match excactly!
    Thank you for this clarification!
  • Doug is absolutely right.
    I skipped past that issue because Cliff mentioned it briefly, above, and Galatia noted that her estimates were FAR off the mark. It didn't seem like a matter of rounding/weighting error in this case.

    But the point is a good reminder for all. PUMS data will not exactly match published ACS.