American Community Survey Data Users Group
ACS Data Products & Resources
ACS Public Use Microdata Samples (PUMS)
ACS Summary Files
Application Programming Interface (API)
ACS Data Issues
ACS Data for Small Geographic Areas & Population Subgroups
Mapping ACS Data/GIS Applications
Measuring Trends Over Time with ACS Data
ACS Public Use Microdata…
ACS Public Use Microdata Samples…
I'm finally starting to…
This group requires membership for participation - click to join
Mar 17, 2016 11:39 AM
I'm finally starting to run tabulations from the new 5 year file ...
I've uploaded (to the PUMS group) 3 tabulations from the 2010-2014 PUMS file as follows: In the first (MergedMigration), there was a need to see where people were moving to/from based on age. The problem here is that in the 5 year file there are two vintages of Migsp. In creating this I merged the two vintages (migsp05 and migsp12) into a new dimension called Migsp, which is a lot easier to use, but caution must be applied for those who are affected by the differences in the vintages. E.g. if you are interested in people who came from Israel, realize that Israel was only specified in the migsp05 item, and will appear underrepresented in this tabulation, because some of the people from Israel will be rolled into another category for the data that uses migsp12. If you are only interested in the US States, there are no differences between the two, and the custom migsp dimension can be used comfortably.
In the second (IntroFields), I'm introducing two new data items. The first (ResHBCty) labels everyone who lives in Hillsborough County, FL as “Yes”, and labels everyone else as “No”. The second (MigHBCty) labels everyone who migrated from Hillsborough County, FL as “Mig From HB”, people who migrated from somewhere else as “Mig Other”, and people who did not migrate as “No Mig”. By cross-referencing these two variables we can see some interesting facts, such as about 71,000 people moved out of the county, 81,000 people moved into the county, and 148,000 people moved from one part of the county to another. I also added a volume of personal income, so we can see that the people who moved into the county made about $1.9 billion, while the people who moved out made about $1.6 billion.
The third tabulation is the same as the first, except that the two new dimensions have been added to it to allow “zooming in” on the people moving to/from/within Hillsborough county without losing any of the other detail.
If anyone is interested in validating these tabulations, and you would like to have the serial numbers and person numbers that went into a particular cell (hopefully a small one!) let me know and I'll send them to you.
– John Grumbine
Mar 22, 2016 11:44 AM
Wow this is great information John! I like how you created two new variables for easy access to see who is moving to/from Hillsborough County by age group. I just have a couple of questions; 1) Is there any way to include household incomes for adults by age group (hincp_vwa)? Or would this number be too low to calculate. 2) Which column makes the most sense to use when calculating the total number of people who moved/to from Hillsborough County?
Mar 22, 2016 8:51 PM
The second question is easiest to answer – use the Pwgtp column to calculate total number of people. For each line in the spreadsheet, Pwgtp can be compared to LoVal and HiVal to see how accurate the number is (LoVal and HiVal are just Pwgtp +/- the margin of error). I also like to include the other columns (People, which is unweighted person count and PL_Wgtp which is Person Level Weighted Household Count) whenever I run tabulations, just because if I leave them out, I often find that I needed them. Remember that if you want to add up lines in the spreadsheet, you can add Pwgtp and People from one line to another, but not PL_Wgtp, HiVal, or LoVal.
The first question is going to take a bit more thought. I'm confident that we can get what you need, but the issue is for me to really understand what it is that you need. The tabulations that I uploaded a few days ago are all person level. Household income is a household level data item, meaning that 1 household level income might apply (e.g.) to 6 different people in the tabulation since they all live in the same household. Since multiple people in the same household will be on different lines in this tabulation, we can't put household level income on each person and have a meaningful tabulation. (I had done one for you a while ago where I categorized every household based on the age of the reference person – that is one way to get household incomes based on a person level characteristic – so I did it again here.)
As a starting point, I've uploaded a new tabulation (HHBMergedMigration.csv) that categorizes every household based on 3 criteria: the number of people who moved from Hillsborough county (HBMig, values 0-0, 1+), the number of people who live in Hillsborough county (HBRes, values 0-0, 1+), and the age of the reference person. (Obviously if one person in the household lives in Hillsborough County, the whole household lives there; this is just an easy way to categorize the households.)
In a tabulation like this, there are two sections. An upper, household level section, and a lower, person level section. All household level dimensions flow through to the person level, because each person inherits all attributes of the household that they live in (dimensions are inherited, volumes, like household income are a different issue). Dividing Hincp_vwa by Wgtp in the household section will give you average household income.
There is a lot of information in this tabulation – but please ask any questions that you have. And remember you can ask for something different if this doesn't suit your needs – it is likely to be just a starting point. We found this multi-dimensional, two level approach very valuable at the telephone company, and there it was definitely worth the learning curve required to understand it.
I started explaining the tabulation in detail, but my note got too long – I think I'll just explain one part of one line, then give you an opportunity to review the tabulation and ask any questions you have (if you have a lot, that's fine) and if you need something different, let me know.
As a good example to help understand the tabulation, in line 13,426, we see 21,934 people that live in 9,393 households where someone migrated from HB County, and these 21,934 people live in HB County, yet they are classified as “No Mig” (they did not migrate). These are people who lived in households in HB County for over one year, and during the course of the year someone moved (from elsewhere in HB County) into their house who did not live there the year before. This type of analysis (two level, multidimensional) allows us to study people based upon the other people in their household, as seen in this line of data.
– John Grumbine
Mar 23, 2016 8:59 AM
Thanks so much for the detail that went into explaining how to understand the tabulation in the new file based on people age 35+in Hillsborough County (FL). I am amazed how much information can be extracted from PUMS. I would definitely be interested in a tutorial on PUMS if one is available.
Mar 23, 2016 11:23 AM
The 0-17 group and the 18-34 (which I think is what you were most interested in?) are also in there, e.g. if you look at lines 13,389 to 13,425 you'll see people who live in households where the reference person is 18-34, has at least 1 person who migrated from Hillsborough Cty, FL, and the household is in Hillsborough Cty, FL. One artifact of creating a 7 dimensional tabulation is that you get a lot of lines, which is why I collapsed the age groups from the previous tabulation.
The data is sorted by the categorizations from left to right, which makes 4 'pockets' of 18-34 data in the person level section that correspond to lines 3, 6, 8, and 11 in the household level section. So each of those 'pockets' shows migration of people who made the household income that is aggregated in the top section. The 26,615 millennial households in line 11 (household level) made 1.28 billion dollars, and the migration of the people who live in the households are described in lines 13,389-13,425. You can see that the vast majority of them migrated from FL - 58,942 from Hillsborough County (line 13,426) and 2,728 from other parts of FL (line 13,393) and 9,823 stayed in their previous Hillsborough County homes (line 13,389). That last group is in the 'migrated household' category because someone in their household migrated into their house, though they themselves did not. (and there are 3 other 'pockets' of 18-34 data in the person section - I haven't really looked at them, but they probably have some interesting facts).
There is a lot of documentation on PUMS ... there was a recorded introductory webinar a month or so ago that may now be on youtube - I attended & thought it was well worthwhile. This document lists data items that are available:
, and I use it daily.
- John Grumbine
[Updated on 3/23/2016 11:35 AM]
[Updated on 3/23/2016 11:40 AM]
Mar 26, 2016 1:26 PM
I just uploaded a fairly simple 4 dimensional, 1 level (person) tabulation to the PUMS group that categorizes everyone in the US and PR based on age, whether or not they live in Hillsborough County, FL, whether or not they migrated from Hillsborough County, FL, and what state or country they migrated from. (To get average person income, divide pincp_vwa by pwgtp.)
This most recent upload might be the final one in this cycle of analysis, so I think at this point it's worth a brief recap of what happened, because the cycle has been somewhat typical of the tabulations that are created by MAST (explained at OneGuyOnTheInternet.com).
The census bureau stores information on the PUMS in a very low-level manner. This is part of what makes the PUMS such a great data source, because the lower the storage level, the more flexibility is available to us. However, trying to directly use the data at that low storage level is awkward to the point of being impossible (imagine visually scanning a PUMS file to understand a characteristic of the nation), so we use computers to add up numbers based on various criteria in the file. By taking that concept (adding up low-level numbers to create a high-level total) to the next level, we can also create new higher-level data items based on existing low-level data items to to further reduce the awkwardness inherent in low-level data storage.
In this case I created migsp by merging the migsp05 and migsp12 fields (with a warning that this resulting field may not be suitable for all users), then created a field called ResHBCty that indicates whether or not someone lives in Hillsborough County (the field was created based on ST, PUMA00 and PUMA10), then created a field called MigHBCty that indicates whether or not a person migrated from Hillsborough County (created based on migsp05, migsp12, migpuma00, and migpuma10).
The first three created data items were all person level. I used two of them to create two new household level data items – HBMig indicates if anyone in the household migrated from Hillsborough County and HBRes indicates whether anyone in the household lives in Hillsborough County (if anyone in the household does, all in the household do – this was just an easy way to create a household level field indicating that the household was in Hillsborough County).
The cycle of MAST tabulations goes like this:
1. Create a tabulation. Don't expect that this tabulation is going to satisfy the need.
2. Get feedback from the user to refine the requirements (the requirements change based on what was learned from the tabulation, & the user's increased understanding of MAST & PUMS, & my increased understanding of the user's needs).
3. Go back to step 1 & repeat.
Tabulations tend to get larger and more complex as we probe into the data, gaining a greater understanding of what is in the data and how we can best use it to satisfy the user's need. In this case, once we gained that understanding, the tabulation was reduced in complexity to the simple 4 dimensional, single level one that I uploaded today.
- John Grumbine