Different coefficients of variation with same information

Hi, all. We often use coefficients of variation (CV = standard error ÷ estimate) as a rough guideline for statistical reliability: CV < 0.15 is pretty trustworthy; CV > 0.30 is not to be trusted. But the CV value depends on how the estimate is expressed. Here's an example:

  • The small, affluent city of Deephaven (Minnesota) has a poverty rate of 3.5%, with a MOE of 2.1 percentage points. Dividing the MOE by 1.645 yields a standard error of about 1.3. So the CV is around 0.37 (1.3 ÷ 3.5), which would be considered unreliable.
  • If you flip that statistic around, you find that 96.5% of people in Deephaven are not in poverty. The standard error doesn’t change, so now the CV is 0.01 (1.3 ÷ 96.5), which would be considered quite reliable.
  • So this is the exact same information (just expressed differently) but leads to very different judgments about reliability.

This is an extreme example, but it does get across the general idea. I'm curious how others have dealt with this, and whether there's a better way to give our audiences a quick and simple way to assess reliability.

Thanks for any perspective you all can offer.

  • I don't have a direct answer, but I'd posit that the source of the problem is how the guidelines for ACS MOEs almost universally assume normal distributions, which small counts and proportions rarely follow. This is the same problem that leads to negative lower bounds on values that could never be negative (e.g., by subtracting a large MOE from a small count).

    In the case of proportions, analysts frequently use logit models, converting proportions or probabilities to odds ratios and logging those, which conveniently allows for normal distributions that never extend beyond p=0 or p=1. (I may be getting the details wrong here, but I'm quite sure the basic principle is right!) I suspect there'd be a way to handle MOEs for proportions in a similar way, in which case the variability on the log of the odds ratio would identical for both p and 1-p (3.5% and 96.5% in your example). Maybe you could come up with a different rule of thumb for reliability based on this, but I'm not sure how I'd compute the MOEs on the logs of odds ratios from ACS MOEs!

    Anyway, I don't think a standard CV is a valid measure for a proportions, given the hard lower *and upper* limits on their distributions. A CV makes more sense for distributions with a zero minimum and no upper bound.

  • The MoE "concept" that the Census uses if just a way to compute a 90% confidence interval for a proportion or a count. See  https://www.census.gov/content/dam/Census/library/publications/2018/acs/acs_general_handbook_2018_ch07.pdf

    The formulas in the handbook are approximate.  For small or counts or proportions near 0% or 100% you need to work directly with the confidence interval.  Using an MoE to calculate a confidence interval uses the conversion 1.645 to get a 90% confidence interval for the associated count or percentage. This calculation is approximate and it does not work well for small percentages or percentages near 100% when the confidence interval is not symmetric. You can't think in "+-" terms. To learn how to proceed in this case you need to learn methods from categorical data analysis.  The bible for that is Agresti  Categorical Data Analysis.  If you google the tittle a pdf of the 2007 edition comes up.  The current edition is 2012.  The website with the 2007 version is in a foreign country. I think that the posting is a violation of US copyright laws. So I'm not going to give you a link. You can purchase the book here https://www.amazon.com/Categorical-Data-Analysis-Alan-Agresti/dp/0470463635/ for example or at a bookseller of your choice.. You might look in your local library.  When I give someone who is not a statistician a research paper with a lot of formulas, I tell them "skip the formulas on the first read."  Here is a link to some course notes https://online.stat.psu.edu/stat100/lesson/9/9.1  (Penn State).  They are pretty good.

  • Agresti is where to start but SAS incorporated most what was in Sudaan, and STATA, R and even SPSS have reduced also this whole area to practice.  The Census Bureau has not kept up, but the Random Replicates is one version of the way to do this correctly.  The problem is that below 30% or above 70% there usual method breaks down, so one should use a different method.  In effect, the confidence interval become asymmetric.  I used to joke that they apparently skipped the appropriate chapters or dropped out of the Stat class before it was finished.

    This is why they added the Random Replicates, but they do not exist for many of the tables, and no one uses them anyway.