I'm finally starting to run tabulations from the new 5 year file ...

I've uploaded (to the PUMS group) 3 tabulations from the 2010-2014 PUMS file as follows: In the first (MergedMigration), there was a need to see where people were moving to/from based on age. The problem here is that in the 5 year file there are two vintages of Migsp. In creating this I merged the two vintages (migsp05 and migsp12) into a new dimension called Migsp, which is a lot easier to use, but caution must be applied for those who are affected by the differences in the vintages. E.g. if you are interested in people who came from Israel, realize that Israel was only specified in the migsp05 item, and will appear underrepresented in this tabulation, because some of the people from Israel will be rolled into another category for the data that uses migsp12. If you are only interested in the US States, there are no differences between the two, and the custom migsp dimension can be used comfortably.

In the second (IntroFields), I'm introducing two new data items. The first (ResHBCty) labels everyone who lives in Hillsborough County, FL as “Yes”, and labels everyone else as “No”. The second (MigHBCty) labels everyone who migrated from Hillsborough County, FL as “Mig From HB”, people who migrated from somewhere else as “Mig Other”, and people who did not migrate as “No Mig”. By cross-referencing these two variables we can see some interesting facts, such as about 71,000 people moved out of the county, 81,000 people moved into the county, and 148,000 people moved from one part of the county to another. I also added a volume of personal income, so we can see that the people who moved into the county made about $1.9 billion, while the people who moved out made about $1.6 billion.

The third tabulation is the same as the first, except that the two new dimensions have been added to it to allow “zooming in” on the people moving to/from/within Hillsborough county without losing any of the other detail.

If anyone is interested in validating these tabulations, and you would like to have the serial numbers and person numbers that went into a particular cell (hopefully a small one!) let me know and I'll send them to you.

– John Grumbine
Parents
  • Hi Robin,

    The second question is easiest to answer – use the Pwgtp column to calculate total number of people. For each line in the spreadsheet, Pwgtp can be compared to LoVal and HiVal to see how accurate the number is (LoVal and HiVal are just Pwgtp +/- the margin of error). I also like to include the other columns (People, which is unweighted person count and PL_Wgtp which is Person Level Weighted Household Count) whenever I run tabulations, just because if I leave them out, I often find that I needed them. Remember that if you want to add up lines in the spreadsheet, you can add Pwgtp and People from one line to another, but not PL_Wgtp, HiVal, or LoVal.

    The first question is going to take a bit more thought. I'm confident that we can get what you need, but the issue is for me to really understand what it is that you need. The tabulations that I uploaded a few days ago are all person level. Household income is a household level data item, meaning that 1 household level income might apply (e.g.) to 6 different people in the tabulation since they all live in the same household. Since multiple people in the same household will be on different lines in this tabulation, we can't put household level income on each person and have a meaningful tabulation. (I had done one for you a while ago where I categorized every household based on the age of the reference person – that is one way to get household incomes based on a person level characteristic – so I did it again here.)

    As a starting point, I've uploaded a new tabulation (HHBMergedMigration.csv) that categorizes every household based on 3 criteria: the number of people who moved from Hillsborough county (HBMig, values 0-0, 1+), the number of people who live in Hillsborough county (HBRes, values 0-0, 1+), and the age of the reference person. (Obviously if one person in the household lives in Hillsborough County, the whole household lives there; this is just an easy way to categorize the households.)

    In a tabulation like this, there are two sections. An upper, household level section, and a lower, person level section. All household level dimensions flow through to the person level, because each person inherits all attributes of the household that they live in (dimensions are inherited, volumes, like household income are a different issue). Dividing Hincp_vwa by Wgtp in the household section will give you average household income.

    There is a lot of information in this tabulation – but please ask any questions that you have. And remember you can ask for something different if this doesn't suit your needs – it is likely to be just a starting point. We found this multi-dimensional, two level approach very valuable at the telephone company, and there it was definitely worth the learning curve required to understand it.

    I started explaining the tabulation in detail, but my note got too long – I think I'll just explain one part of one line, then give you an opportunity to review the tabulation and ask any questions you have (if you have a lot, that's fine) and if you need something different, let me know.

    As a good example to help understand the tabulation, in line 13,426, we see 21,934 people that live in 9,393 households where someone migrated from HB County, and these 21,934 people live in HB County, yet they are classified as “No Mig” (they did not migrate). These are people who lived in households in HB County for over one year, and during the course of the year someone moved (from elsewhere in HB County) into their house who did not live there the year before. This type of analysis (two level, multidimensional) allows us to study people based upon the other people in their household, as seen in this line of data.

    – John Grumbine
Reply
  • Hi Robin,

    The second question is easiest to answer – use the Pwgtp column to calculate total number of people. For each line in the spreadsheet, Pwgtp can be compared to LoVal and HiVal to see how accurate the number is (LoVal and HiVal are just Pwgtp +/- the margin of error). I also like to include the other columns (People, which is unweighted person count and PL_Wgtp which is Person Level Weighted Household Count) whenever I run tabulations, just because if I leave them out, I often find that I needed them. Remember that if you want to add up lines in the spreadsheet, you can add Pwgtp and People from one line to another, but not PL_Wgtp, HiVal, or LoVal.

    The first question is going to take a bit more thought. I'm confident that we can get what you need, but the issue is for me to really understand what it is that you need. The tabulations that I uploaded a few days ago are all person level. Household income is a household level data item, meaning that 1 household level income might apply (e.g.) to 6 different people in the tabulation since they all live in the same household. Since multiple people in the same household will be on different lines in this tabulation, we can't put household level income on each person and have a meaningful tabulation. (I had done one for you a while ago where I categorized every household based on the age of the reference person – that is one way to get household incomes based on a person level characteristic – so I did it again here.)

    As a starting point, I've uploaded a new tabulation (HHBMergedMigration.csv) that categorizes every household based on 3 criteria: the number of people who moved from Hillsborough county (HBMig, values 0-0, 1+), the number of people who live in Hillsborough county (HBRes, values 0-0, 1+), and the age of the reference person. (Obviously if one person in the household lives in Hillsborough County, the whole household lives there; this is just an easy way to categorize the households.)

    In a tabulation like this, there are two sections. An upper, household level section, and a lower, person level section. All household level dimensions flow through to the person level, because each person inherits all attributes of the household that they live in (dimensions are inherited, volumes, like household income are a different issue). Dividing Hincp_vwa by Wgtp in the household section will give you average household income.

    There is a lot of information in this tabulation – but please ask any questions that you have. And remember you can ask for something different if this doesn't suit your needs – it is likely to be just a starting point. We found this multi-dimensional, two level approach very valuable at the telephone company, and there it was definitely worth the learning curve required to understand it.

    I started explaining the tabulation in detail, but my note got too long – I think I'll just explain one part of one line, then give you an opportunity to review the tabulation and ask any questions you have (if you have a lot, that's fine) and if you need something different, let me know.

    As a good example to help understand the tabulation, in line 13,426, we see 21,934 people that live in 9,393 households where someone migrated from HB County, and these 21,934 people live in HB County, yet they are classified as “No Mig” (they did not migrate). These are people who lived in households in HB County for over one year, and during the course of the year someone moved (from elsewhere in HB County) into their house who did not live there the year before. This type of analysis (two level, multidimensional) allows us to study people based upon the other people in their household, as seen in this line of data.

    – John Grumbine
Children
No Data