This is my first time using ACS PUMS data and I am trying to create an estimate of the Type of Unit variable from the housing file(s) of the 2015 1-yr ACS. I am using the a, b, and Puerto Rico files (ss15husa.csv, ss15husb.csv, ss15hpr.csv). I am using R with Thomas Lumley's survey package.
When I calculate the raw tabulation of the variable TYPE I get the results below indicating that there are records for all three types (Housing, Ins. GQ, and Non-Ins GQ) of units.
1 2 3 1363661 71728 77922
However, when I run the svytotal function on the housing survey design object I created I get zeroes for the gq levels:
> svytotal(~factor(TYPE), hou_prof_design) total SEfactor(TYPE)1 136367197 5089.4factor(TYPE)2 0 0.0factor(TYPE)3 0 0.0
This doesn't make sense and am looking for some advice on how to rectify. A colleague of mine has suggested that I use the person weight but I am unclear on how to associate the person weights for the records in the housing file that correspond to the gq levels in the TYPE variable. Any help will be greatly appreciated it.
Hi, A few thoughts: 1. if the svytotal command in R pulls in the "wgtp" (housing weight) variable, it is 00000 for group quarters because it's a placeholder (see www2.census.gov/.../PUMSDataDict15.pdf, pages 1-31 has all the Housing Unit variables). If you're wondering, "Why is the housing weight zero for group quarters?" it's because the ACS isn't trying to capture characteristics of group quarters the same way it does try to capture characteristics of housing units (e.g., plumbing, heating, rent/housing value, internet access). The ACS does capture characteristics of the people who live in group quarters. 2. I agree with your colleague, use the person weights. Unfortunately this means you'll need additional csv files: ss15pusa, ss15pusb, and ss15ppr. The "h" or "p" after the 15 tells you whether it's a person file or a housing unit file. These files can be found here (factfinder.census.gov/.../productview.xhtml, population records as well as housing records. Link the population records to the housing unit records using the "serialno" variable as your matching key. Then the "pwgtp" variable is your person weight. (Data Dictionary, page 32+ has all the Person Record variables). Hopefully this helps, Diana
Beth - can't thank you enough for your thoughtful reply. Here's the background. The question I have been given is: How many Vietnam War Era Veterans live in Group Quarters? This question is part of whole slew of others that together will form a Profile Vietnam War Veterans. As such this is not an analysis but a collections of "facts" that can describe those Veterans.
Another related question with respect to Housing Units vs. Group Quarters: I have noticed that for many of the variables in the housing file the "b" level is often coded as as NA (GQ/Vacant). I have been treating them as NA, which is to drop them. Is this enough to prevent the mixing of the figurative apples and oranges? Or do I need to subset on the TYPE variable selecting Housing Units only? Cheers!
Diana - I am grateful for your reply as it gives me a definitive path forward. At the same time I'm grinding my teeth because of the work I have ahead of me. You may have read in my reply to Beth that I am working on creating a body of "facts" to describe Vietnam War Era Veterans. So I have already setup the Person file data, created a survey design object for this subpopulation, then using SERIALNO identified their corresponding Housing file records, etc. Now just to make sure I'm thinking straight, I have identified the records in the Housing that represent Vietnam War Veterans. From those Housing records I will identify the ones where TYPE is GQ, then using those SERIALNOs I will go back to the Person records to pull in the weights. This is logic right? P.S. Too embarrassed to say how much time I spent thinking about and researching "Why is the housing weight zero for group quarters?" Thanks a bunch :)