I'm finally starting to run tabulations from the new 5 year file ...

I've uploaded (to the PUMS group) 3 tabulations from the 2010-2014 PUMS file as follows: In the first (MergedMigration), there was a need to see where people were moving to/from based on age. The problem here is that in the 5 year file there are two vintages of Migsp. In creating this I merged the two vintages (migsp05 and migsp12) into a new dimension called Migsp, which is a lot easier to use, but caution must be applied for those who are affected by the differences in the vintages. E.g. if you are interested in people who came from Israel, realize that Israel was only specified in the migsp05 item, and will appear underrepresented in this tabulation, because some of the people from Israel will be rolled into another category for the data that uses migsp12. If you are only interested in the US States, there are no differences between the two, and the custom migsp dimension can be used comfortably.

In the second (IntroFields), I'm introducing two new data items. The first (ResHBCty) labels everyone who lives in Hillsborough County, FL as “Yes”, and labels everyone else as “No”. The second (MigHBCty) labels everyone who migrated from Hillsborough County, FL as “Mig From HB”, people who migrated from somewhere else as “Mig Other”, and people who did not migrate as “No Mig”. By cross-referencing these two variables we can see some interesting facts, such as about 71,000 people moved out of the county, 81,000 people moved into the county, and 148,000 people moved from one part of the county to another. I also added a volume of personal income, so we can see that the people who moved into the county made about $1.9 billion, while the people who moved out made about $1.6 billion.

The third tabulation is the same as the first, except that the two new dimensions have been added to it to allow “zooming in” on the people moving to/from/within Hillsborough county without losing any of the other detail.

If anyone is interested in validating these tabulations, and you would like to have the serial numbers and person numbers that went into a particular cell (hopefully a small one!) let me know and I'll send them to you.

– John Grumbine
Parents
  • I just uploaded a fairly simple 4 dimensional, 1 level (person) tabulation to the PUMS group that categorizes everyone in the US and PR based on age, whether or not they live in Hillsborough County, FL, whether or not they migrated from Hillsborough County, FL, and what state or country they migrated from. (To get average person income, divide pincp_vwa by pwgtp.)

    This most recent upload might be the final one in this cycle of analysis, so I think at this point it's worth a brief recap of what happened, because the cycle has been somewhat typical of the tabulations that are created by MAST (explained at OneGuyOnTheInternet.com).

    The census bureau stores information on the PUMS in a very low-level manner. This is part of what makes the PUMS such a great data source, because the lower the storage level, the more flexibility is available to us. However, trying to directly use the data at that low storage level is awkward to the point of being impossible (imagine visually scanning a PUMS file to understand a characteristic of the nation), so we use computers to add up numbers based on various criteria in the file. By taking that concept (adding up low-level numbers to create a high-level total) to the next level, we can also create new higher-level data items based on existing low-level data items to to further reduce the awkwardness inherent in low-level data storage.

    In this case I created migsp by merging the migsp05 and migsp12 fields (with a warning that this resulting field may not be suitable for all users), then created a field called ResHBCty that indicates whether or not someone lives in Hillsborough County (the field was created based on ST, PUMA00 and PUMA10), then created a field called MigHBCty that indicates whether or not a person migrated from Hillsborough County (created based on migsp05, migsp12, migpuma00, and migpuma10).

    The first three created data items were all person level. I used two of them to create two new household level data items – HBMig indicates if anyone in the household migrated from Hillsborough County and HBRes indicates whether anyone in the household lives in Hillsborough County (if anyone in the household does, all in the household do – this was just an easy way to create a household level field indicating that the household was in Hillsborough County).

    The cycle of MAST tabulations goes like this:
    1. Create a tabulation. Don't expect that this tabulation is going to satisfy the need.
    2. Get feedback from the user to refine the requirements (the requirements change based on what was learned from the tabulation, & the user's increased understanding of MAST & PUMS, & my increased understanding of the user's needs).
    3. Go back to step 1 & repeat.

    Tabulations tend to get larger and more complex as we probe into the data, gaining a greater understanding of what is in the data and how we can best use it to satisfy the user's need. In this case, once we gained that understanding, the tabulation was reduced in complexity to the simple 4 dimensional, single level one that I uploaded today.

    - John Grumbine
Reply
  • I just uploaded a fairly simple 4 dimensional, 1 level (person) tabulation to the PUMS group that categorizes everyone in the US and PR based on age, whether or not they live in Hillsborough County, FL, whether or not they migrated from Hillsborough County, FL, and what state or country they migrated from. (To get average person income, divide pincp_vwa by pwgtp.)

    This most recent upload might be the final one in this cycle of analysis, so I think at this point it's worth a brief recap of what happened, because the cycle has been somewhat typical of the tabulations that are created by MAST (explained at OneGuyOnTheInternet.com).

    The census bureau stores information on the PUMS in a very low-level manner. This is part of what makes the PUMS such a great data source, because the lower the storage level, the more flexibility is available to us. However, trying to directly use the data at that low storage level is awkward to the point of being impossible (imagine visually scanning a PUMS file to understand a characteristic of the nation), so we use computers to add up numbers based on various criteria in the file. By taking that concept (adding up low-level numbers to create a high-level total) to the next level, we can also create new higher-level data items based on existing low-level data items to to further reduce the awkwardness inherent in low-level data storage.

    In this case I created migsp by merging the migsp05 and migsp12 fields (with a warning that this resulting field may not be suitable for all users), then created a field called ResHBCty that indicates whether or not someone lives in Hillsborough County (the field was created based on ST, PUMA00 and PUMA10), then created a field called MigHBCty that indicates whether or not a person migrated from Hillsborough County (created based on migsp05, migsp12, migpuma00, and migpuma10).

    The first three created data items were all person level. I used two of them to create two new household level data items – HBMig indicates if anyone in the household migrated from Hillsborough County and HBRes indicates whether anyone in the household lives in Hillsborough County (if anyone in the household does, all in the household do – this was just an easy way to create a household level field indicating that the household was in Hillsborough County).

    The cycle of MAST tabulations goes like this:
    1. Create a tabulation. Don't expect that this tabulation is going to satisfy the need.
    2. Get feedback from the user to refine the requirements (the requirements change based on what was learned from the tabulation, & the user's increased understanding of MAST & PUMS, & my increased understanding of the user's needs).
    3. Go back to step 1 & repeat.

    Tabulations tend to get larger and more complex as we probe into the data, gaining a greater understanding of what is in the data and how we can best use it to satisfy the user's need. In this case, once we gained that understanding, the tabulation was reduced in complexity to the simple 4 dimensional, single level one that I uploaded today.

    - John Grumbine
Children
No Data