MOE > value

There are a number of 0 (and small) values where the margin of error is huge compared to the value, e.g. 0 +/- 15 Black / African-American residents of Burt, MI.

Similarly, the CI can be a big proportion of the value, e.g. the 2018 5-year estimate for the female 5-9 year old population of the subdivision of Clio City in Genesee County, MI is 64 +/- 52.

I've heard ad hoc advice to treat any value with a CI 20% or less of the value is suspect.  Even if that's reasonable, what would you do with a 0 estimate? Any thoughts or suggestions would be appreciated!

Parents
  • Dear All--

    The Census use of MOE in the ACS is very weak.  One big problem is that they use the normal approximation for the values they publish with the main release.  This results in absurd results such as  120+ - 200.  Of course one cannot have a negative number for a count.  The normal approximation is generally to only be used between 30 and 70 percent of the distribution.  For  the ends of  the distribution one should use other methods.  They know this and now also release Random Replicate values for some tables, but later.  They generally have smaller confidence intervals.  They also reproduce them for the PUMS data so you can run those yourself, if that works.

    I have memos and examples of this craziness if anyone wants it.  Also the median income CI for income is vastly overstated, since they use the normal approximation and assume no skewness.  This results in a much higher CI than should be.  The CI for a skewed variable approaches zero if it is quite skewed, as income is!

    Andy

Reply
  • Dear All--

    The Census use of MOE in the ACS is very weak.  One big problem is that they use the normal approximation for the values they publish with the main release.  This results in absurd results such as  120+ - 200.  Of course one cannot have a negative number for a count.  The normal approximation is generally to only be used between 30 and 70 percent of the distribution.  For  the ends of  the distribution one should use other methods.  They know this and now also release Random Replicate values for some tables, but later.  They generally have smaller confidence intervals.  They also reproduce them for the PUMS data so you can run those yourself, if that works.

    I have memos and examples of this craziness if anyone wants it.  Also the median income CI for income is vastly overstated, since they use the normal approximation and assume no skewness.  This results in a much higher CI than should be.  The CI for a skewed variable approaches zero if it is quite skewed, as income is!

    Andy

Children
  • Thanks, Andy!  Aren't the replicate data only for a subset of tables?  In my admittedly limited explorations, I've found for numbers close to 0 with small N's, the replicate intervals aren't much different from the published numbers. The median value for skewed variables like income or housing values are very troubling.

    My use of the ACS estimates is in a community dashboard. I don't see providing some numbers from the replicate tables and others from the base tables where replicate number's aren't available - too hard for users to understand. And I'm not sure what to do about the issue with medians where incomes and housing values are seriously skewed.  It would be reasonable to exclude values from small populations, because outlying areas aren't as important as the city.

    If anyone has any thoughts about how to handle these issues in reporting, they would be appreciated!

  • If you want to read the whole sorry set of my correspondence with the Census Bureau it is here.  https://www.dropbox.com/s/lfjo11wm5ma1axx/Memo_Regarding_ACS-With_Response.pdf?dl=0  it is also in Scribd.  The replicate approach, and there are others if you use SAS or R or even STATA all guard against going negative or over 1 at either tail of the distribution.  No the replicates at the tails are very different than the normal approximation and much, much smaller.  Logically if you find a zero count or small count it by definition it cannot go below Zero for an estimate, nor can a large count go above 100%.  (For those, who only want to end of the story, their response "Important issues, we are trying to solve them in a production environment.")   Meaning:  A bad margin of error is better than no margin of error.

    Beyond this frequentist approach, it is plain that they actually know plenty about any count, but they treat each survey completely de Novo.  Groves and Rod Little tried to move some of the estimates to Bayesian approaches, and did do something with the Language data for voting rights, but Rod felt that even that did not go far enough.  Rod's back at Michigan, I think and he famous said:  "My favorite kind of data is missing data."

    As to a community dashboard, I think the arcane nature of much of the Census lore (try to explain why one can have blocks underwater (aka Mermaid blocks), blocks with only those under 17 (aka Lord of the Flies blocks) or the blocks with people and no housing units or vice versa, makes that particularly challenging.

    Andy

  • Thank you very much, Andy!  Some weekend reading...