allocating median household income across Census boundaries

I am working on a project to create demographic data for a series of custom planning areas that do not match Census boundaries. In order to do this we are allocating ACS 5-year data across block group boundaries using point-level housing units as weights. Our testing showed that the method compares well to commercial products that provide demographic estimates for custom geographies.

Doing this for count data is straightforward, but we would also like to use median household income. Is that possible using simple proportional weighting? Is there a better way?

Parents
  • Hi, Mara -- taking a weighted average of block group medians can give you misleading results, because the median depends on the income distributions of the different block groups. If the block group pieces you're combining have roughly symmetrical distributions of income and contribute roughly the same number of households to each custom planning area, then taking the weighted average will be fine. But those conditions aren't likely to be satisfied. So the best way to do this would probably be to apportion the household income distributions themselves (table B19001), just like you're doing for other count data. Once you've got the estimated distribution for the custom planning area, you can estimate its median, as described here (pp. 3-5).

    One caution: the formula given there assumes a symmetrical distribution within the category containing the median. That assumption is probably not satisfied in many areas. If accuracy is very important, you could use other information about the income distribution (tables B19025, B19080, B19081, B19082) to refine those calculations. And if you wanted to refine the weighting itself, and you have sub-block group information on rents or home values, you could also see whether one portion of the block group is likely to be disproportionately higher-income or lower-income. I just wanted to mention this in case you need very refined estimates of median incomes; I'm not suggesting you should go to all this trouble for an uncertain payoff. (Speaking for myself, it would take a lot to get me to pursue this!)

    Regardless, though, estimating the median based on the income distribution you derive by apportioning block group counts will be better in most cases than taking a weighted average of block group medians. Good luck!

  • After some more thought, I need to make a couple corrections to my post. The basic message is still the same: it's better to estimate the median from the income distribution of the custom planning area than to use a weighted average of block group medians. But I don't want to mislead anyone, so...

    If the block group pieces you're combining have roughly symmetrical distributions of income and contribute roughly the same number of households to each custom planning area, then taking the weighted average will be fine.

    This is incorrect. For the weighted average of medians to match the actual median, having symmetrical income distributions for each of the block groups you're combining is neither necessary nor sufficient. Having a symmetrical income distribution for the combination of multiple block groups, though, is a sufficient but not necessary condition. Basically, it's difficult to predict how accurate a weighted average of medians will be, so it's better to go ahead and just use the combined distribution.

    One caution: the formula given there assumes a symmetrical distribution within the category containing the median.

    I was wrong here as well: the formula assumes a uniform distribution in the category containing the median, where no one income level is more prominent than others. (Think of a bell curve, and then flatten it into a rectangle.) If this assumption doesn't hold, the formula might overstate or understate the actual median. Those errors are likely to be larger when the the range of that category containing the median is relatively wide: for example, if the category containing the median is $75,000-$99,999, there's more room for error than if it's $40,000 to $44,999. That's why using other information about the component block groups' income distributions might be useful -- but again, I'd imagine the formula would be fine for most purposes. (I'd be interested in hearing from anyone who's looked into this in more detail.)

    P.S. Apologies for what I think was a duplicate post from me this morning -- my browser automatically reloaded the tab and submitted it again.

Reply
  • After some more thought, I need to make a couple corrections to my post. The basic message is still the same: it's better to estimate the median from the income distribution of the custom planning area than to use a weighted average of block group medians. But I don't want to mislead anyone, so...

    If the block group pieces you're combining have roughly symmetrical distributions of income and contribute roughly the same number of households to each custom planning area, then taking the weighted average will be fine.

    This is incorrect. For the weighted average of medians to match the actual median, having symmetrical income distributions for each of the block groups you're combining is neither necessary nor sufficient. Having a symmetrical income distribution for the combination of multiple block groups, though, is a sufficient but not necessary condition. Basically, it's difficult to predict how accurate a weighted average of medians will be, so it's better to go ahead and just use the combined distribution.

    One caution: the formula given there assumes a symmetrical distribution within the category containing the median.

    I was wrong here as well: the formula assumes a uniform distribution in the category containing the median, where no one income level is more prominent than others. (Think of a bell curve, and then flatten it into a rectangle.) If this assumption doesn't hold, the formula might overstate or understate the actual median. Those errors are likely to be larger when the the range of that category containing the median is relatively wide: for example, if the category containing the median is $75,000-$99,999, there's more room for error than if it's $40,000 to $44,999. That's why using other information about the component block groups' income distributions might be useful -- but again, I'd imagine the formula would be fine for most purposes. (I'd be interested in hearing from anyone who's looked into this in more detail.)

    P.S. Apologies for what I think was a duplicate post from me this morning -- my browser automatically reloaded the tab and submitted it again.

Children
No Data