Estimate of Type of Unit (TYPE) Results in a value of 0 for Institutional & Non-Institutional GQ.

This is my first time using ACS PUMS data and I am trying to create an estimate of the Type of Unit variable from the housing file(s) of the 2015 1-yr ACS. I am using the a, b, and Puerto Rico files (ss15husa.csv, ss15husb.csv, ss15hpr.csv). I am using R with Thomas Lumley's survey package.

When I calculate the raw tabulation of the variable TYPE I get the results below indicating that there are records for all three types (Housing, Ins. GQ, and Non-Ins GQ) of units.

1                       2            3
1363661   71728    77922

However, when I run the svytotal function on the housing survey design object I created I get zeroes for the gq levels:

> svytotal(~factor(TYPE), hou_prof_design)
                                total           SE
factor(TYPE)1 136367197    5089.4
factor(TYPE)2                 0          0.0
factor(TYPE)3                 0          0.0

This doesn't make sense and am looking for some advice on how to rectify. A colleague of mine has suggested that I use the person weight but I am unclear on how to associate the person weights for the records in the housing file that correspond to the gq levels in the TYPE variable. Any help will be greatly appreciated it.

  • I have two responses...
    The immediate, practical comment is that you can join the person and household records by st and serialno to get person weight attached to the housing record. (Note: you'd also need to add additional processing to make sure that you're counting housing units only once, because joining housing and person records will "expand" the housing records for each person in a household.)

    The bigger picture response requires more discussion.
    Could you please provide a bit more context about your goal with this analysis? What are you trying to accomplish by defining housing and group quarters "types?"

    The reason I ask is that, to me (at first glance), combining housing unit and group quarters records seems a bit like mixing apples and oranges. Housing unit records have a clear, countable definition. Group quarters records do not, like housing units, reflect a "building," or an "institution," or even a capacity count. They (if my memory serves) act more like a placeholder for a person record. Group quarters (population) counts can rise, or fall, within an institution simply by people moving into (or out of) the facility. So even appending the person weight will only give you the number of residents represented by that record, in that GQ facility, for a given point in time. It does not reflect facility capacity. (There have also, historically, been issues with respect to the weighting, by GQ type, at a sub-state level... but that's another thread for another time.)

    So if you're trying to compare housing inventory with a similar measure of group quarters inventory (number of beds, number of spaces available, etc...) you might want to take these limitations into account.
  • Hi,

    A few thoughts:

    1. if the svytotal command in R pulls in the "wgtp" (housing weight) variable, it is 00000 for group quarters because it's a placeholder (see, pages 1-31 has all the Housing Unit variables). If you're wondering, "Why is the housing weight zero for group quarters?" it's because the ACS isn't trying to capture characteristics of group quarters the same way it does try to capture characteristics of housing units (e.g., plumbing, heating, rent/housing value, internet access). The ACS does capture characteristics of the people who live in group quarters.

    2. I agree with your colleague, use the person weights. Unfortunately this means you'll need additional csv files: ss15pusa, ss15pusb, and ss15ppr. The "h" or "p" after the 15 tells you whether it's a person file or a housing unit file. These files can be found here (, population records as well as housing records. Link the population records to the housing unit records using the "serialno" variable as your matching key. Then the "pwgtp" variable is your person weight. (Data Dictionary, page 32+ has all the Person Record variables).

    Hopefully this helps,

  • In reply to Beth Jarosz:

    Beth - can't thank you enough for your thoughtful reply. Here's the background. The question I have been given is: How many Vietnam War Era Veterans live in Group Quarters? This question is part of whole slew of others that together will form a Profile Vietnam War Veterans. As such this is not an analysis but a collections of "facts" that can describe those Veterans. 

    Another related question with respect to Housing Units vs. Group Quarters: I have noticed that for many of the variables in the housing file the "b" level is often coded as as NA (GQ/Vacant). I have been treating them as NA, which is to drop them. Is this enough to prevent the mixing of the figurative apples and oranges? Or do I need to  subset on the TYPE variable selecting Housing Units only? Cheers! 

  • In reply to Diana Lavery:

    Diana - I am grateful for your reply as it gives me a definitive path forward. At the same time I'm grinding my teeth because of the work I have ahead of me. You may have read in my reply to Beth that I am working on creating a body of "facts" to describe Vietnam War Era Veterans. So I have already setup the Person file data, created a survey design object for this subpopulation, then using SERIALNO identified their corresponding Housing file records, etc.

    Now just to make sure I'm thinking straight, I have identified the records in the Housing that represent Vietnam War Veterans. From those Housing records I will identify the ones where TYPE is GQ, then using those SERIALNOs I will go back to the Person records to pull in the weights. This is logic right?

    P.S. Too embarrassed to say how much time I spent thinking about and researching "Why is the housing weight zero for group quarters?" Thanks a bunch :)

  • In reply to Mihir Iyer:

    Yeah, if you've already identified the list of "serialno" values for Vietnam veterans, you can match those to the housing files and get the distribution of housing "type" and use "pwgtp" to get the estimated number of Vietnam veterans who lived in group quarters in 2015.

    You won't be able to get much info about the physical/structural characteristics of the group quarters, since as you noticed with the "b", group quarters are "not in universe" for a lot of the housing characteristics variables (e.g. plumbing, heating, year built), the group quarters respondents most likely were simply not asked these questions.

    You can, however, subset your population records to only Vietnam veterans who lived in group quarters - SAS code would be "if type in ('2', '3')" - and then run various descriptive statistics on the population variables/population characteristics as of 2015 (e.g., age, race, sex, marital status, education, labor force participation, disability, commute).


    P.S. Don't worry at all! There are so many details and intricacies with ACS data. That's what this whole ACS Data Users Group site is for!