Calculating household incomes when household and person files are merged

Hi all,

I'm brand new to the ACS Data Users Group and very excited to be a part of it! What an incredible wealth of resources. 

I actually have a challenge that I've been running into while using PUMS. I've been using FactFinder for years, but have only started using PUMS in the last six months, so please forgive my lack of knowledge. 

We (my research center, Boston Indicators at the Boston Foundation) are trying to find the average (either mean or median) income of same-sex couples in Massachusetts disaggregated by race and ethnicity. This is not possible through FactFinder or other ACS sources (like NHGIS) that pre-can tables. So, we thought we'd use PUMS to get at this problem. The best we think we can do is finding the race/ethnicity of the primary householder of same-sex couples, which is not ideal, but still better than nothing. 
 
One challenge, as I'm sure you've run across in your own work, was that the desired data was recorded in two different datasets. Data on same-sex couples and household income is in the household record while data on race and ethnicity is in the person record. We solved for this by doing a simple merge of the data and then subsetting to include only people who are primary householders (or, in PUMS' lingo, the "reference person"). So, theoretically, we should have a dataset of primary householders that includes: the primary householder's race/ethnicity, same-sex couples, and household income. 
 
The second challenge is around weighting. Because we merged the two datasets, I am a little unclear on how to appropriately use the PUMS weights. I've chosen to use the household weights (WGTP) because we are looking for household income. After adjusting the household income using ADJINC, I went ahead and calculated the mean income for each subset population (e.g. same-sex couples with a white primary householder) using the following process: I multiplied each household income by it's weight (let's call this "whi") and summed all the weighted household incomes (let's call this "s.whi"). Then, I separately summed all the weights (let's call this "sw") to get what is essentially an estimate of the total number of households in the subset. Then, I divided s.whi by sw to get average income (avg income = s.whi/sw). 
 
When I finished the script, there was a glaring problem: the results were obviously incorrect. For example, the mean income for same-sex couples was about $157,000 and the mean income for different-sex couples was about $139,000. Based on research done by the Williams Institute, we would expect the mean income for same-sex couples in Massachusetts to be higher than different-sex couples, but neither of those mean incomes should be anywhere close to $139,000 or $157,000. 
 
I am happy to share the R script if anyone is interested! 
 
Any thoughts on why the figures are so out of whack? Am I weighting incorrectly? Is there some quirk of PUMS data I'm missing? 
 
Any insight you could give would be much appreciated!
Parents
  • I ran a tabulation that breaks households into state (so you can just look at MA, or compare MA to other states), SSMC, and whether or not the reference person is white. I've tried including it in this post, but it says that .csv files are invalid. So I've put it on google drive here: drive.google.com/open

    It's 2014 data, so expect some differences if you are using newer data. Line 135 shows you that in MA with a white reference person, the average household income for a same sex couple is about $164,000. With a non-white reference person (line 132) there is an average household income of $151,000. However, the sample size for line 132 is pretty tiny - there are only 10 (UW_hdrs) actual households in the sample in that category. For non same-sex couples with a white householder my numbers are quite a bit lower than yours ($76,000 for family income, $100,000 for household income, where you had $139,000 for household income). The tabulation has two levels (sections). The upper is household level information. If you want to know a bit about the people who live in those households, scroll down to the person level section (line 438 for MA).

    If you have any questions (most people aren't used to working with 2 level MAST tabulations) feel free to ask.
  • John, this is great. Thanks for taking the time to pull that sheet together. It's interesting to see how MA compares to other states in your tabulation--very useful overview.
Reply Children