Looking for veterans statistics joined with household data in PUMS - Help appreciated!

Hi guys,

I'm new to PUMS usage, and I appreciate some assistance.

I’m looking for household size for households with veterans in 0 to 200 poverty in them. so the following is what I did and let me know if it is correct.

I pulled up a list of veterans in 0 – 200 poverty from the population table. I joined that table with the housing table using the serial no as key.

There are multiple serial nos that are duplicates as there can be more than one veteran per household. So I choose only the non-duplicated serial nos. Then I add up the housing weights for each serial no for the housing record, grouped by the NP field (number of persons in unit).

Does it sound right? Or am I off in my weight calculation, or would that be number of household for each household size for those with veterans in 0-200 poverty?



  • Hi Julie,

    In PUMS, there's a population file and a housing units file. Housing units refer to the actual structure of the house/dwelling, which is why all the variables in the housing units file are all about the building itself (how many rooms, year built, questions about plumbing, etc.). Household information would actually be in the population file (so use person weights), with the variable "relp" or "hht". (see PUMS Data Dictionary here: www.census.gov/.../documentation.html). Relp = relationship, hht = household/family type.

    Using the housing unit serialno, and the relp variable, you can get at household characteristics (sounds like you're interested in subsetting to those for whom vps is not equal to bb, and povpip is between 0 and 200). FYI, the "reference person" in the relp variable is just the person who filled out the survey.

    Housing unit weights should only be used if you want to produce estimates for characteristics of the actual structure/building/dwelling unit. If you're producing estimates of veterans or veteran households, sounds like you should use person weights.

    Hope this helps!
  • In reply to Diana Lavery:

    thanks. I'm looking for the number of people in the household (or housing unit?). Wouldn't that be in the housing unit file? (NP (Number of person records following this housing record) If so, shouldn't I be summing the housing unit weight?
    Also, if I want to know if this household (with veterans) has children, that would be a field like HUPAC(HH presence and age of children) or FPARC (Family presence and age of related children)? If so, would I be adding up the unduplicated housing weights?

    Thanks again!
  • In reply to Julie Leung:

    Sounds like you are using the right variables for what you want. In general, I try to use the housing unit weights for variables I'm estimating that are explicitly listed in the housing unit variable list, and use the person weight for doing estimates using variables in the person variable list. If NP is from the housing unit file, then it sounds like you're doing the right thing.

    Side note: total "households" should be the same as total occupied housing units. There are often more housing units than households due to vacancies (housing units that are for sale/rent, seasonal/vacation homes, etc.)

    Good luck with your project!
  • Hi Julie,

    I ran a tabulation through MAST that may answer your questions, if the 2014 data is acceptable to you. Unfortunately, when I tried to upload it, I got a message that it is either too large or an unacceptable file type. It was .csv, then I tried .ods. And it's not a large file. Anyway, the top half of the spreadsheet is the household level data and the bottom half is the person level data. Each person inherits all the characteristics of the household, so all household dimensions appear in the person section in addition to the household section.

    The columns in the household level section are:
    VetsInPov – tells how many vets in poverty live in each household, either 0 or 1+
    HHSz – tells how many people live in the households. It is presented as a range, because that's how MAST works, but in your case each range is limited to a single number.
    Wgtp, as you know, is the weighted household count.
    UW_hdrs is the unweighted header count. Since you have the background of having worked with the file, I thought there might be some value in including this. Note that group quarters are counted as '1' here, whereas in the weighted count, they are treated as zero.
    Hincp_vwa is the weighted, adjusted, volumetric portion of household income. Given that you are looking for children who are living with vets who are in poverty, I'm guessing that this number might be important to you.
    LoVal/HiVal are Wgtp +/- margin of error.
    Ave HH Income is Hincp_vwa/Wgtp. This column was created in the spreadsheet, not by MAST.

    Using line 26 as an example, we see households that have at least one vet (based on Vps) with a Povpip of 0-200, and 3 other people live with the vet. There are 3,701 actual records in the 2014 file with this situation, and they are weighted to represent 351,937 households. The average reported income of these households is about $38,000, which can be compared with other households that have 4 people in them (line 6) which report about $103,000 income.

    In the person level section, we see all the dimensions of the household level section, plus information about the people who live in those households.

    New columns in the person level section are:
    Banded age: since you are interested in the presence of children, I dimensionalized the occupants of the household into those 0-17 and 18+.
    Pwgtp, as you know, is weighted person count.
    People is unweighted person count.
    PL_Wgtp is weighted household count at the person level. Sticking with the example of households that have 4 people and one+ veteran with a Povpip of 0-200, lines 92-93 can be used in conjunction with line 26. Line 26 said that there were 351,937 households with this condition. Line 93 tells us that every one of those households has a person 18+ in it, and line 92 tells us that 278,893 of those households have children in them.
    LoVal/HiVal are Pwgtp +/- margin of error.

    Using the unweighted counts, you can do some cross-checks that might be useful. For example, line 26 told us that there are 3,701 unweighted households in the condition that we are examining, and each of those households has 4 people. Lines 92-93 tell us that there are a total of 5014+9790 people living in those households. 3701*4=14,804. 5014+9790=14,804.

    On another note, today I released an android app in the Google Play Store that allows anyone with an android device to run MAST tabulations. However, the app is in its infancy, and it wouldn't be able to create this tabulation because I haven't yet given it the capability to create household level dimensions based on person level data. That capability is essential to solving your problem, and is no problem for MAST, but the app doesn't have that portion of the interface built yet.
  • It looks like you are coming from the SQL tradition rather than the statistics tradition.


    table (SQL) = file (statistics)

    key (SQL) = unique ID variable (statistics)

    join (SQL) = merge (statistics)


    For statisticians, a table is a summary, so it would say something like


    HH size    |   % of population


    1          |        20%

    2          |        40%

    3          |        25%

    4          |        10%

    5+         |        5%


    0 to 200 refers to 0% to 200% of the federal poverty line.

    This was just to straighten the records, so that everybody could follow.

    Depending on which version of the PUMS data you are using, you can have different workflows. My personal favorite are data from IPUMS.org; they come with both HH level and person-level variables, including weights. My workflow would be based solely on the thus created household + person level file, and it would be like this:

    1. By HH ID (serialno): create a 0/1 indicator that at least one person in the household is a veteran

    2. By HH ID (serialno): create a 0/1 indicator that the household is below 200% of the poverty line

    3. By HH ID (serialno): create the household size variable = number of records for that serialno

    4. Subsetting the data by "presence of veteran(s)" variable and "below 200% poverty line" variable, tabulate the HH size variable using the household weights (hhwt). (Working with the person-level file, you need to be careful not to double-triple-count the records in HH with more than 1 person; I usually do this subsetting to relate==1, i.e., the household head).

    There are also fine points left out like group quarters (GQ), age (adults 18+; working age 18 to 65; or whatever version you are interested in).

    Finally, veteran status is self-reported, and may not always coincide with presence in the VA rosters.