Calculating household incomes when household and person files are merged

Anise Vance over 7 years ago

Hi all,

I'm brand new to the ACS Data Users Group and very excited to be a part of it! What an incredible wealth of resources.

I actually have a challenge that I've been running into while using PUMS. I've been using FactFinder for years, but have only started using PUMS in the last six months, so please forgive my lack of knowledge.

We (my research center, Boston Indicators at the Boston Foundation) are trying to find the average (either mean or median) income of same-sex couples in Massachusetts disaggregated by race and ethnicity. This is not possible through FactFinder or other ACS sources (like NHGIS) that pre-can tables. So, we thought we'd use PUMS to get at this problem. The best we think we can do is finding the race/ethnicity of the primary householder of same-sex couples, which is not ideal, but still better than nothing.

One challenge, as I'm sure you've run across in your own work, was that the desired data was recorded in two different datasets. Data on same-sex couples and household income is in the household record while data on race and ethnicity is in the person record. We solved for this by doing a simple merge of the data and then subsetting to include only people who are primary householders (or, in PUMS' lingo, the "reference person"). So, theoretically, we should have a dataset of primary householders that includes: the primary householder's race/ethnicity, same-sex couples, and household income.

The second challenge is around weighting. Because we merged the two datasets, I am a little unclear on how to appropriately use the PUMS weights. I've chosen to use the household weights (WGTP) because we are looking for household income. After adjusting the household income using ADJINC, I went ahead and calculated the mean income for each subset population (e.g. same-sex couples with a white primary householder) using the following process: I multiplied each household income by it's weight (let's call this "whi") and summed all the weighted household incomes (let's call this "s.whi"). Then, I separately summed all the weights (let's call this "sw") to get what is essentially an estimate of the total number of households in the subset. Then, I divided s.whi by sw to get average income (avg income = s.whi/sw).

When I finished the script, there was a glaring problem: the results were obviously incorrect. For example, the mean income for same-sex couples was about $157,000 and the mean income for different-sex couples was about $139,000. Based on research done by the Williams Institute, we would expect the mean income for same-sex couples in Massachusetts to be higher than different-sex couples, but neither of those mean incomes should be anywhere close to $139,000 or $157,000.

I am happy to share the R script if anyone is interested!

Any thoughts on why the figures are so out of whack? Am I weighting incorrectly? Is there some quirk of PUMS data I'm missing?

Any insight you could give would be much appreciated!

Parents

Vincent Palacios over 7 years ago

Hi Anise,

I'm happy to take a look. What you propose sounds roughly correct (keeping the householder record, using the household weight, using a weighted average). The most likely problem I can think if is that you're dropping cases with 0 income. Another problem that could have occurred is not adjusting the for number of years of data you use if by chance you combined more than one year of data.

As a new PUMS user, I highly recommend downloading the verification files and attempting to recreate the weighted estimates therein. Even better if you can reproduce the replicate weight standard errors. (See PUMS Estimates for User Verification: www.census.gov/.../documentation.html)

From there, I'd then try to reproduce estimates of mean household income for all households in Massachusetts and compare those against what you see in American FactFinder. This is a good way to benchmark your approach, though your PUMS estimate will differ slightly from the pre-tabulated results because you're using a smaller (public use) sample and, I think, incomes are rounded to protect confidentiality. (See Subject Definitions: www.census.gov/.../code-lists.html)

I'll see what I get when I get a moment, but you can send your code to palacios@cbpp.org as well.

Vincent
Cancel
Up 0 Down

Reply

Cancel
Beth Jarosz over 7 years ago in reply to Vincent Palacios

Related... You may want to check to see how your code is handling negative income levels. Reported household income can be negative or positive.
Cancel
Up 0 Down

Reply

Cancel

Reply

Beth Jarosz over 7 years ago in reply to Vincent Palacios

Related... You may want to check to see how your code is handling negative income levels. Reported household income can be negative or positive.
Cancel
Up 0 Down

Reply

Cancel

Children

Vincent Palacios over 7 years ago in reply to Beth Jarosz

From what I can tell, these numbers are not that unreasonable. For 2016 1-year ACS data, table S1901 shows mean incomes by household type. (See: factfinder.census.gov/.../0400000US25)

From table S1901:
There are 2,579,398 households with a mean household income of $101,911.
There are 1,633,661 family households with a mean HHI of $122,310.
There are 1,194,726 married-couple family households with a mean HHI of $143,966.
There are 945,737 non-family households with a mean HHI of $62,500.

With weighted 2016 1-year ACS PUMS for MA I get:
There are 2,579,453 households with a mean household income of $101,593.
There are 1,634,746 family households with a mean HHI of $124,355.
There are 1,195,011 married-couple family households with a mean HHI of $144,204.
There are 944,707 non-family households with a mean HHI of $62,207.

-And for comparison, with PUMS-
There are 17,396 same-sex couple households with a mean HHI of $181,920.

Stata code:
...load data...
gen hincp_adj = hincp * adjinc/1000000
gen hh = 1
gen fam_hh = inrange(hht, 1, 3)
gen mcfam_hh = hht == 1
gen nonfam_hh = inrange(hht, 4, 7)
gen ssmc_hh = inlist(ssmc, 1, 2)
tab1 hh fam_hh mcfam_hh nonfam_hh ssmc_hh [fw=wgtp], sum(hincp_adj)
Cancel
Up 0 Down

Reply

Cancel
Cliff Cook over 7 years ago in reply to Vincent Palacios

Speaking qualitatively, as someone familiar with the situation on the ground in Massachusetts I do not find the results here surprising. We have the fifth highest mean household income in the country and the fourth highest family median income. Two adults in a household who both have professional or union jobs could certainly earn salaries commensurate with these mean values. This is reflected to some extent in our ever soaring real estate values. The issue in Massachusetts lies more with the distribution of incomes rather than a lack of income overall.
Cancel
Up 0 Down

Reply

Cancel