I hope everyone is healthy.
If we want to combine categories within a table, what’s the most appropriate way to calculate margin of error or coefficient of variation?
So for example, if we want to look at children in poverty, by race/ethnicity, by county, and were using the b17020 series, for example, b17020b, and wanted to get “children” under 18, we’d have to combine three groups, “under 6”, “6 to 11” and “12 to 17”. What’s the best way to calculate margin of error or coefficient of variation for “children”?
A couple of presentations / documents seem to explain how to do it. Are these what we’d use? I just want to check these are the most recent and would still be used.
Calculating Margins of Error the ACS Way Using Replicate Methodology to Calculate Uncertainty
Worked Examples for Approximating Margins of Error. Instructions for Applying Statistical Testing
Those are the right resources. You can use the Census Bureau's worked examples to approximate margins of error for your derived estimates. If you're using 5-year data you could also get more accurate MOEs for derived estimates by using the Variance Replicate Estimates (VRE) Tables, which use replicate weights. The VRE tables for the 2017-2021 ACS data have not been released yet but should be available on Jan. 23.
The Census presentation you cite does a nice job of explaining how to use the VRE tables if you decided to go that route.
We use the simple sum-of-squares method for the MOEs when adding up estimates.
So, using your example, if you're adding up vars B17020Bi003, B17020Bi004, and B17020Bi005, to get the combined MOE we do:
SQRT(B17020Bm003^2 + B17020Bm004^2 + B17020Bm005^2)
It's more complicated for ratios:
SQRT(numerator_MOE^2 + ((numerator_EST/denominator_EST)^2) * denominator_MOE^2 ) / denominator_EST
According to the ACS documentation, the formula for calculating an MOE for a sum/aggregation is the same as the formula for calculating a difference – the square root of the ‘sum’ of squared MOEs. However, I’ve been unable to find an actual example of this and would like some confirmation. For example, rather than adding up 12 separate variables to acquire my desired result, I would like to subtract one variable from the total n. If I modify the formula to take the square root of the ‘difference’ of squared MOEs, I get better results than the ‘sum’ of squared MOEs.
Also, is it more appropriate/preferred to calculate the MOE on the aggregation of 12 variables or the difference between two? It seems more logical to use two variables rather than 12; however, I have never seen an example of this and am not sure it is appropriate.
Since the margin of error is unsigned, I would think that you should still use the sum-of-squares method when calculating aggregated MOEs. But I'm not a statistics expert.
The formulas in the Census statistics methods handbook chapter 8 are based on a a simple theorem on linear combinations of random variables. A linear combination is a sum where you multiply the components of the sum by different numbers with a sign. The mean and variance of a linear combination of independent random variables is given here https://online.stat.psu.edu/stat414/lesson/24/24.3
Roughly the mean of the sum is the sum of the means and the variance of the sum is the sum of the variances. The MOE is related to the square root of the variance or to put it another way the variance is related to the square of the MOE.
So a difference is just a linear combination where one of the variables has a minus (-) sign. The formulas are only approximate because the counts in the cells aren't independent. The use of replicate weights helps to correct for this.
This is pretty close to magic but in most cases the counts in the cells of a multiple cross tabulation (read census table) are not independent rather they are correlated (a measure of the lack of independence).