Looking for veterans statistics joined with household data in PUMS - Help appreciated!

Hi guys,

I'm new to PUMS usage, and I appreciate some assistance.

I’m looking for household size for households with veterans in 0 to 200 poverty in them. so the following is what I did and let me know if it is correct.

I pulled up a list of veterans in 0 – 200 poverty from the population table. I joined that table with the housing table using the serial no as key.

There are multiple serial nos that are duplicates as there can be more than one veteran per household. So I choose only the non-duplicated serial nos. Then I add up the housing weights for each serial no for the housing record, grouped by the NP field (number of persons in unit).

Does it sound right? Or am I off in my weight calculation, or would that be number of household for each household size for those with veterans in 0-200 poverty?

 

Thanks!

Parents
  • It looks like you are coming from the SQL tradition rather than the statistics tradition.

     

    table (SQL) = file (statistics)

    key (SQL) = unique ID variable (statistics)

    join (SQL) = merge (statistics)

     

    For statisticians, a table is a summary, so it would say something like

     

    HH size    |   % of population

    -----------+------------------

    1          |        20%

    2          |        40%

    3          |        25%

    4          |        10%

    5+         |        5%

     

    0 to 200 refers to 0% to 200% of the federal poverty line.

    This was just to straighten the records, so that everybody could follow.

    Depending on which version of the PUMS data you are using, you can have different workflows. My personal favorite are data from IPUMS.org; they come with both HH level and person-level variables, including weights. My workflow would be based solely on the thus created household + person level file, and it would be like this:

    1. By HH ID (serialno): create a 0/1 indicator that at least one person in the household is a veteran

    2. By HH ID (serialno): create a 0/1 indicator that the household is below 200% of the poverty line

    3. By HH ID (serialno): create the household size variable = number of records for that serialno

    4. Subsetting the data by "presence of veteran(s)" variable and "below 200% poverty line" variable, tabulate the HH size variable using the household weights (hhwt). (Working with the person-level file, you need to be careful not to double-triple-count the records in HH with more than 1 person; I usually do this subsetting to relate==1, i.e., the household head).

    There are also fine points left out like group quarters (GQ), age (adults 18+; working age 18 to 65; or whatever version you are interested in).

    Finally, veteran status is self-reported, and may not always coincide with presence in the VA rosters.

    HTH.

Reply
  • It looks like you are coming from the SQL tradition rather than the statistics tradition.

     

    table (SQL) = file (statistics)

    key (SQL) = unique ID variable (statistics)

    join (SQL) = merge (statistics)

     

    For statisticians, a table is a summary, so it would say something like

     

    HH size    |   % of population

    -----------+------------------

    1          |        20%

    2          |        40%

    3          |        25%

    4          |        10%

    5+         |        5%

     

    0 to 200 refers to 0% to 200% of the federal poverty line.

    This was just to straighten the records, so that everybody could follow.

    Depending on which version of the PUMS data you are using, you can have different workflows. My personal favorite are data from IPUMS.org; they come with both HH level and person-level variables, including weights. My workflow would be based solely on the thus created household + person level file, and it would be like this:

    1. By HH ID (serialno): create a 0/1 indicator that at least one person in the household is a veteran

    2. By HH ID (serialno): create a 0/1 indicator that the household is below 200% of the poverty line

    3. By HH ID (serialno): create the household size variable = number of records for that serialno

    4. Subsetting the data by "presence of veteran(s)" variable and "below 200% poverty line" variable, tabulate the HH size variable using the household weights (hhwt). (Working with the person-level file, you need to be careful not to double-triple-count the records in HH with more than 1 person; I usually do this subsetting to relate==1, i.e., the household head).

    There are also fine points left out like group quarters (GQ), age (adults 18+; working age 18 to 65; or whatever version you are interested in).

    Finally, veteran status is self-reported, and may not always coincide with presence in the VA rosters.

    HTH.

Children
No Data