I need some statistical advice. Paging Stas Kolenikov !
When calculating MOEs for derived estimates -- specifically, averages -- should I use the formula for ratio, or proportion?
The two formulas shown in the ACS handbook chapter 8 (pages 64 and 65) are nearly identical. except that the proportion formula uses a minus operator under the radix whereas the ratio formula uses a plus.
I'm finding that using the proportion formula sometimes results in an error from trying to take the square root of a negative number.
Here's an example from the ACS 2022 1-year data for the nation:
B25065_E001 (aggregate gross rent) = $63,086,890,700
B25065_M001 (MOE for above) = ±$234,476,359
B25063_E002 (count of cash renters) = 42,971,061
B25063_M002 (MOE for above) = ±162,515
I'm trying to derive average gross rent as (aggregate gross rent / count of cash renters), or about $1,468.13 for the USA. Seems about right. But plugging the numbers into the proportion formula leads to madness:
MOE(P-hat) = sqrt(B25065_M0012 - ((B25065_E001 / B25063_E002)2 * B25063_M0022)) / B25063_E002
MOE(P-hat) = sqrt(5.49e+16 - (2,155,391 * 1.85e+15)) / 42,971,061
MOE(P-hat) = sqrt(-3.98e+21) / 42,971,061
I feel like I'm missing something. Is it because the source estimates and MOEs are counting different things? Is there a different formula for calculating MOEs for derived averages?
Thanks for any guidance.
My comments refer to this document
https://www.census.gov/content/dam/Census/library/publications/2020/acs/acs_general_handbook_2020_ch08.pdf
The 2 formulas are (6) and (7) in this document.
Formula (6…
Glenn Rice the average is the ratio: you have the total (across the U.S.) of cash payments in the numerator, and you have the total (# of households that are renters) in the denominator. So the calculations…
Just another note formula (6) with the - sign applies to the case where the data is counts and the numerator is a subset of the denominator. Formula (7) applies to a ratio of any 2 things. The 2 "things…
To possibly answer my own question and ask a new one: Could I use formula (9) to do this?
Calculating Measures of Error for the Product of Two Estimates
Since dividing x / y is the same as the product x * 1/y, could this work?
EDIT: trying this out....
MOE(X-hat * 1/Y-hat) = sqrt((X-hat**2 * (1/MOE[Y-hat])**2) + ((1/Y-hat)**2 * MOE[X-hat]**2))
= sqrt((B25065_E001**2 * (1/B25063_M002)**2) + ((1/B25063_E002)**2 * B25065_M001**2))
= sqrt((63086890700**2 * (1/162515)**2) + ((1/42971061)**2 * 234476359**2))
= sqrt(2.3886483 + 29.7746023)
= $5.67
...which seems low, but MOEs should be pretty low for the nation as a whole.
Better, worse, or just completely wrong? Thanks
Formula (6) has a - sign under the square root.
The caveat for that formula is that
"Users should note that if the value under the square root is negative, then substitute a “plus” for the “minus” signunder the square root in formula (6). This modified formula is the same as the formula for the MOE of a ratio,which will be discussed in the next section." This occurs when either the proportion/ratio is large or the MoE or the denominator is large when compared to the numerator MoE
Formula (7) has a + sign.
The MoE given by (6) will be lower than the MoE given by (7).
Responding to my own second question: JUST COMPLETELY WRONG
Glenn Rice the average is the ratio: you have the total (across the U.S.) of cash payments in the numerator, and you have the total (# of households that are renters) in the denominator. So the calculations for the ratio give the MOE of $7.78 (which seems a bit too tight but what do I know).
Analysis of microdata on IPUMS (https://sda.usa.ipums.org/sdaweb/analysis/exec?formid=mnf&sdaprog=means&dataset=all_acs_samples&sec508=false&dep=rent&row=year&filters=rent%281-**%29&weightlist=hhwt&main=means&transform=none&percentileopt=none&confidence=on&cflevel=95&se=on&wncases=on&color=on&ch_type=bar&ch_color=yes&ch_width=600&ch_height=400&ch_orientation=vertical&ch_effects=use2D&decmeans=2&dectotals=0&decdiffs=1&decmedian=2&decse=1&decsd=1&decminmax=2&decwn=1&deczstats=2&csvformat=no&csvfilename=means.csv) gives this (mean of rent by year, filter rent(1-**) i.e. non-missing) -- the total N of 95M probably means it counts the individuals rather than the households. The MOE is even tighter at about $1.8.
The sampling error of $234M should give a heart attack to any reasonable economist... but fortunately economists don't know survey statistics :).
Thanks Stas!
Thanks David! I've looked at that page a hundred times and never saw that note. It would have saved me a world of trouble.
Just another note formula (6) with the - sign applies to the case where the data is counts and the numerator is a subset of the denominator. Formula (7) applies to a ratio of any 2 things. The 2 "things" can even come from different tables, as they do in your case.
If you use R, I have some code that computes ratios, products and linear combinations (sums of variables with a fixed coefficient for each term ) taking into account the MoEs of the terms. The ratio code uses a + sign under the square root. I use formula 7 which is conservative (larger MoE) when compared to formula 6. All these formulas are approximate and are based on the variance (or standard deviation which is the square root of the variance) and the rest comes form the "delta method." https://en.wikipedia.org/wiki/Delta_method. The delta method (multivariate version) for x/y depends on the derivative with respect to x == (1/y) and the derivative with respect to y == -x/(y * y). With these facts about the derivatives you can kind of see where the formulas in chapter 8 come from.