Statistically comparing estimates

Hi All,

I've been doing some research on this and I can't seem to find a clear answer.

I want to use 2014-2019 ACS data to see if there is a statistically significant difference in poverty rates between Cook County and the state of Illinois.  So, I would be comparing Illinois as a whole to a smaller part of Illinois.  

I know how to use the Census Bureau's statistical testing tool, but what is the correct method of calculating statistically significant differences?  Is it appropriate to statistically compare two geographic units that overlap a bit?  Or would a better alternative be comparing Cook County to a statewide poverty estimate from which Cook County has been removed?

Any help would be greatly appreciated!

  • Hi Bill,

    This is a good question and I don't have a good answer. I checked with my colleagues at PRB and we are not aware of any Census Bureau guidance on this issue. However, at PRB we often make these kinds of comparisons--not just across geographic areas but also comparing estimates for population subgroups with that of a total, which seems like a similar exercise. 

  • I think generally the guidance is just to ignore this covariance, but this could be problematic if the overlap is large.

    That said, you could use the Variance Replicate Tables (https://www.census.gov/programs-surveys/acs/data/variance-tables.2019.html) using the B17001 table (poverty table) for State (040) and Counties (050) to create 1+80 poverty rates for Illinois, 1+80 for Cook County, and 1+80 differences. Now you can find the SE of the difference using the successive difference formula.

    If you want to do this for each year (not available in replicate table form), you could use variance replicate tables to produce a value for rho:

    rho = ( SE(a)^2 + SE(b)^2  - SE(a-b)^2 ) / ( 2 * SE(a) * SE(b) )

    and then calculate the SE of the difference using the MOEs in the published tables:

    sqrt( SE(c)^2 + SE(d)^2 - 2 * rho * SE(c) * SE(d) )

    The assumption here is that the sampling correlation in the one-year estimates is the same as the sampling correlation in the 5-year estimates.

  • I agree with Mark. It's simple hypothesis testing. "Is subset A significantly different from the total population?" (I think I saw your question the other day about ZCTA, which the same thing applies to.) However, if your subset comprises a very large portion of the population, you might get non-intuitive results. For example, comparing males to the total population. Since the only other option is female, so you may want to divide into two independent samples of male and female. Although there might be statistical vagaries, as a practitioner I would compare two geographic samples that are not independent, so long as the "population" sample is much larger.

  • I appreciate your responses!  I also sent this question to the help desk at the Census Bureau.  If and when they get back to me, I will post here.  Thanks again!

  • Here is the response I got from the Census Bureau directly.  Just some food for thought!

    "You are correct that the poverty rate for Cook county is not independent from Illinois.  Depending on what they are using the comparison for will determine if they should take the covariance into account in some manner.  It is reasonable to compare counties to the state estimate, even though they are not independent in order to examine how the poverty rates may vary at a finer geographic level.  

    To get into the details, there are two ways that I can think of to adjust the MOE to take the covariance into account.  One method is to use the Variance Replicate Estimate (VRE) tables (view documentation).  The VRE tables are the same as published detailed tables with the added benefit that they include replicate estimates to calculate the variance (as well as margin of error, or MOE).  This is the method that the ACS uses to calculate the MOE published on data.census.gov.  The documentation provides examples on how to calculate the MOE using the replicate estimates for a ratio.  As an interesting aside, if you subtracted the Cook county estimate from the Illinois estimate (and did the same for the 80 replicate estimates to calculate the MOE), you would effectively be calculating the MOE for all counties in Illinois except for Cook county.  This is because the estimate for Illinois is made up of the sum of the weights for Cook county plus the sum of the weights for all other counties in Illinois.  So, subtracting out Cook county leaves you with the weighted estimate of all other counties.  The same logic applies to the replicate estimates, which are used to calculate the MOE.

    Another method you could use to take the covariance into account would be to calculate it in a more classical sense.  It has been a while for me, but this method uses the correlation as well as the variance of the individual estimates.  For this approach, you would need the microdata.  While this is not available, there is the Public Use Microdata Sample (PUMS).  PUMS is a subsample of the full ACS microdata.  In addition, the only geography available below the state level on the PUMS files is the Public Use Microdata Area, or PUMA.  PUMAs are completely contained within a state and are designed to contain roughly 100,000 people.  Using Tigerweb, it appears that the boundaries for Cook county lines up with the boundaries of several PUMAs.  So, it appears that you could aggregate several PUMAs together to obtain an estimate for Cook county.  Then, you can calculate the covariance.

    In practice, both methods would require some technical expertise as well as a decent amount of work.  From experience and anecdotal evidence, the approximate MOE that does not take the covariance into account between two estimates tends to be fairly close to the MOE that does incorporate the covariance.  I took a look at the 20015-2019 ACS 5-year poverty rates for Cook county and Illinois (as well as the US).  According to data.census.gov, Cook county has an MOE of +/-0.2, while Illinois has an MOE of +/-0.1.  If you approximate the MOE, you get MOE_approx = sqrt( 0.1^2 + 0.2^2) or about 0.22.  It should also be noted that for very small, but not zero, MOEs, the ACS rounds to the lowest accuracy.  So, if the MOE for Illinois were much small than 0.1, it would still be rounded to 0.1.  That is to say, the MOE for Illinois may be conservative to begin with.

    So, if you want to calculate the covariance, it requires some additional work.  If you are producing this estimate for a research project (or some other reason) and need something more exact, then putting in the extra work might be worth it.  Otherwise, for a simple comparison, the amount of work to calculate the covariance vs. the fact that the covariance may not have any effect on the result would, from a practical standpoint, argue for not taking it into account.

    As a final note, it is worth mentioning that if you are examining poverty rate by other demographics, the MOE can be much higher.  For example, in Cook county, for Native Hawaiian and Other Pacific Islanders alone (NHOPI), the poverty rate is 15.5% +/-8.2%.  The total NHOPI estimate (both in and not in poverty) in Cook county is about 1,900, and about 4,300 in Illinois total.  In this instance, due to the small sample size (as well as the large percent of NHOPI in Cook county compared to the rest of Illinois), the covariance could play a much larger role."