Calculating household incomes when household and person files are merged

Hi all,

I'm brand new to the ACS Data Users Group and very excited to be a part of it! What an incredible wealth of resources. 

I actually have a challenge that I've been running into while using PUMS. I've been using FactFinder for years, but have only started using PUMS in the last six months, so please forgive my lack of knowledge. 

We (my research center, Boston Indicators at the Boston Foundation) are trying to find the average (either mean or median) income of same-sex couples in Massachusetts disaggregated by race and ethnicity. This is not possible through FactFinder or other ACS sources (like NHGIS) that pre-can tables. So, we thought we'd use PUMS to get at this problem. The best we think we can do is finding the race/ethnicity of the primary householder of same-sex couples, which is not ideal, but still better than nothing. 
 
One challenge, as I'm sure you've run across in your own work, was that the desired data was recorded in two different datasets. Data on same-sex couples and household income is in the household record while data on race and ethnicity is in the person record. We solved for this by doing a simple merge of the data and then subsetting to include only people who are primary householders (or, in PUMS' lingo, the "reference person"). So, theoretically, we should have a dataset of primary householders that includes: the primary householder's race/ethnicity, same-sex couples, and household income. 
 
The second challenge is around weighting. Because we merged the two datasets, I am a little unclear on how to appropriately use the PUMS weights. I've chosen to use the household weights (WGTP) because we are looking for household income. After adjusting the household income using ADJINC, I went ahead and calculated the mean income for each subset population (e.g. same-sex couples with a white primary householder) using the following process: I multiplied each household income by it's weight (let's call this "whi") and summed all the weighted household incomes (let's call this "s.whi"). Then, I separately summed all the weights (let's call this "sw") to get what is essentially an estimate of the total number of households in the subset. Then, I divided s.whi by sw to get average income (avg income = s.whi/sw). 
 
When I finished the script, there was a glaring problem: the results were obviously incorrect. For example, the mean income for same-sex couples was about $157,000 and the mean income for different-sex couples was about $139,000. Based on research done by the Williams Institute, we would expect the mean income for same-sex couples in Massachusetts to be higher than different-sex couples, but neither of those mean incomes should be anywhere close to $139,000 or $157,000. 
 
I am happy to share the R script if anyone is interested! 
 
Any thoughts on why the figures are so out of whack? Am I weighting incorrectly? Is there some quirk of PUMS data I'm missing? 
 
Any insight you could give would be much appreciated!
Parents Reply
  • When I'm working with open data like the ACS PUMS, I create a github repo and then use that to share since it provide a reproducible example. This site also has a messaging feature but I have been unsuccessful in using it. Dropbox etc. work too.
Children
No Data