MOE > value

There are a number of 0 (and small) values where the margin of error is huge compared to the value, e.g. 0 +/- 15 Black / African-American residents of Burt, MI.

Similarly, the CI can be a big proportion of the value, e.g. the 2018 5-year estimate for the female 5-9 year old population of the subdivision of Clio City in Genesee County, MI is 64 +/- 52.

I've heard ad hoc advice to treat any value with a CI 20% or less of the value is suspect.  Even if that's reasonable, what would you do with a 0 estimate? Any thoughts or suggestions would be appreciated!

  • Since you use the term "CI," I assume that you mean "confidence interval" and have some understanding of the concept.  Often people write  "+- "t o indicate that the "underlying" population number/count falls within an interval. The "confidence interval" gives a range where the value would fall if you repeated the survey by drawing many samples from the population and tabulate the data again and again. For a 95% confidence interval, the sampled value would fall in the interval for 95 out of 100 sample "draws."  The interval that you get with this calculation is not symmetric as is implied by the +- symbol for the estimate. From the definition of a 95% Confidence Interval you can see that for any estimate, there are many possible intervals that provide 95% "coverage."  Thus when you give a confidence interval you need to specify what method you used to compute the interval.  When the MoE > Estimate and the Estimate is > 0, the lower value of the "+-" confidence interval is less than 0 , and this is impossible. The lower limit of the confidence interval would be zero but this is impossible as well because you had a positive count for that particular ACS sample year(s for 5 year data). People usually think that the confidence interval indicates where the "true" value lies. The "true" value is the value that you would get if you did a census, i.e. you "surveyed" everybody.

    In any case these problems are all solved by using "replicate weights" for microdata or variance replicate estimate tables.  These are tables that include many estimates of the underlying population value. See this page to locate the tables "Variance Replicate Estimate tables include estimates, margins of error, and 80 variance replicates for selected American Community Survey (ACS) 5-year Detailed Tables" If you want to make a calculation from the replicate table cell values, you repeat the your calculation 80 times taking a different "replicate" each time. You then apply a formula to calculate the interval for the underlying values that you computed. Any "function" or formula can be used for your calculation, sum, ratio, product etc. Use your imagination.

    The census people "suppress" table cells so that people don't miss interpret the value in the published table. This drives statisticians / mathematicians crazy because there is always some information in the number that you get when you tabulate a survey. You just need to know how to interpret the result that you get. There is also a problem with a zero result when you take a survey. Are there no people in the underlying population in that table cell, a "structural" zero, or are you just getting a zero value for that sample. This is another topic worthy of discussion.

    The way that I handle these issues in a presentation or report is to report the number that you get and put in a footnote to explain the situation. Also when I report an estimate or confidence interval I put in at least 1 or 2 digits beyond the decimal point to indicate that the number is an estimate and not an actual census (count everybody) count.

    Hope this helps


  • Dear All--

    The Census use of MOE in the ACS is very weak.  One big problem is that they use the normal approximation for the values they publish with the main release.  This results in absurd results such as  120+ - 200.  Of course one cannot have a negative number for a count.  The normal approximation is generally to only be used between 30 and 70 percent of the distribution.  For  the ends of  the distribution one should use other methods.  They know this and now also release Random Replicate values for some tables, but later.  They generally have smaller confidence intervals.  They also reproduce them for the PUMS data so you can run those yourself, if that works.

    I have memos and examples of this craziness if anyone wants it.  Also the median income CI for income is vastly overstated, since they use the normal approximation and assume no skewness.  This results in a much higher CI than should be.  The CI for a skewed variable approaches zero if it is quite skewed, as income is!


  • Thanks, Andy!  Aren't the replicate data only for a subset of tables?  In my admittedly limited explorations, I've found for numbers close to 0 with small N's, the replicate intervals aren't much different from the published numbers. The median value for skewed variables like income or housing values are very troubling.

    My use of the ACS estimates is in a community dashboard. I don't see providing some numbers from the replicate tables and others from the base tables where replicate number's aren't available - too hard for users to understand. And I'm not sure what to do about the issue with medians where incomes and housing values are seriously skewed.  It would be reasonable to exclude values from small populations, because outlying areas aren't as important as the city.

    If anyone has any thoughts about how to handle these issues in reporting, they would be appreciated!

  • If you want to read the whole sorry set of my correspondence with the Census Bureau it is here.  it is also in Scribd.  The replicate approach, and there are others if you use SAS or R or even STATA all guard against going negative or over 1 at either tail of the distribution.  No the replicates at the tails are very different than the normal approximation and much, much smaller.  Logically if you find a zero count or small count it by definition it cannot go below Zero for an estimate, nor can a large count go above 100%.  (For those, who only want to end of the story, their response "Important issues, we are trying to solve them in a production environment.")   Meaning:  A bad margin of error is better than no margin of error.

    Beyond this frequentist approach, it is plain that they actually know plenty about any count, but they treat each survey completely de Novo.  Groves and Rod Little tried to move some of the estimates to Bayesian approaches, and did do something with the Language data for voting rights, but Rod felt that even that did not go far enough.  Rod's back at Michigan, I think and he famous said:  "My favorite kind of data is missing data."

    As to a community dashboard, I think the arcane nature of much of the Census lore (try to explain why one can have blocks underwater (aka Mermaid blocks), blocks with only those under 17 (aka Lord of the Flies blocks) or the blocks with people and no housing units or vice versa, makes that particularly challenging.


  • Thank you very much, Andy!  Some weekend reading...