Rationale for making comparisons using non-overlapping periods

Hi ACS friends, 

I'm trying to explain to some brilliant but non-ACS-knowledgeable coworkers and others why the recommended approach is to use *non-overlapping* periods when making comparisons over time.  I've looked at the Census doc (https://www.census.gov/programs-surveys/acs/guidance/comparing-acs-data.html) and all I can find is the guidance that says Do use non-overlapping datasets, Do not use overlapping datasets.  I'm having a hard time finding the why.  All I can come up with is this:

4-5ths of the sample is the same in adjacent datasets, e.g., 2014-2018 vs. 2015-2019. You can think of Census removing respondents from 2014, and adding respondents from 2019, but the respondents from 2015, 2016, 2017, and 2018 are the same. We should compare two completely different datasets, which means two non-overlapping periods.

I'm curious how others answer this question as well!  -Diana

  • There are the following statements in the document https://www.census.gov/content/dam/Census/library/publications/2018/acs/acs_general_handbook_2018.pdf:

    "TIP: As shown in Figure 3.1, consecutive 5-year estimates
    contain 4 years of overlapping coverage (for
    example, the 2010–2014 ACS 5-year estimates share
    sample data from 2011 through 2014 with the 2011–
    2015 ACS 5-year estimates). Because of this overlap,
    users should use extreme caution in making comparisons
    with consecutive years of multiyear estimates."

    "TIP: In general, ACS 1-year data are more likely to
    show year-to-year fluctuations, while consecutive
    5-year estimates are more likely to show a smooth
    trend, because 4 of the 5 years in the series overlap
    from one year to the next.".

    "When using ACS 1-year data, these comparisons are
    generally straightforward. Using multiyear estimates to
    look at trends for small populations can be challenging
    because they rely on pooled data for 5 years. For
    example, comparisons of 5-year estimates from 2010
    to 2014 and 2011 to 2015 are unlikely to show much difference
    because four of the years overlap; both sets of
    estimates include the same data collected from 2011
    through 2014. The Census Bureau suggests comparing 5-year estimates that do not overlap—for example, comparing
    2006–2010 ACS 5-year estimates with 2011–2015
    ACS 5-year estimates."

  • Diana and Todd Z--

    If you were comparing two separate time-periods, two datapoints at different times, then you would want to account for the statistical inference "margin of error" around the two estimate points. There's a formula for this.

    See the "Determining Statistical Significance" section (p 55) in the "Understanding and Using ACS Data" handbook
    https://www.census.gov/content/dam/Census/library/publications/2020/acs/acs_general_handbook_2020.pdf 
    But if the two time-periods are overlapping, this is the statisticians' equivalent of "double-dipping your chip" in the serving bowl. Some people will say "no big deal."  Other people will condemn it as unmannered barbarism.  Not so much for germophobe reasons, but because the "double-dip" (the same survey cases being included in both the 2010-14 and 2014-18 estimates) makes the standard significance test invalid. 
    Bottom line: Do you want to be able to say something about statistical significance of the differences over time? (And does your audience include statisticians?) 
    --TG
  • The general ACS handbook does say that "these comparisons can be made with caution" (footnote 19 on p. 17 of https://www.census.gov/content/dam/Census/library/publications/2020/acs/acs_general_handbook_2020.pdf). I vaguely recall someone mentioning once that what you're basically getting is the difference between the last year of the more recent dataset and the first year of the earlier dataset (so 2014 vs 2019, in your example).

    As Todd G noted, the usual significance tests don't work, but the Census Bureau used to provide a modification, which is multiplying the standard error of the difference by a factor that depends on the proportion of years that overlap. (Specifically, you subtract that proportion -- 4/5 = 0.8 in your example -- from 1 and take the square root.) I don't know if the Census Bureau is still publishing that guidance, but it does appear in the 2013-vintage ACS documentation (https://www2.census.gov/programs-surveys/acs/tech_docs/statistical_testing/2013StatisticalTesting3and5.pdf).

    So it's doable, but I don't know whether it's desirable. You could easily imagine this being used as a "magic" way to make single-year comparisons for geographies without one-year estimates, but keep in mind:

    • That modification to the usual significance tests is an approximation of an approximation (as I understand it). So there's more room for things to go awry.
    • On average, the ACS samples about 2% of households each year. In a census tract with 2,000 total households, that's only ~40 households. So: even if a change over time were statistically significant, I would hesitate to draw any firm conclusions about change between two years, especially if I'm looking at subgroups (for example, renter households, or households with Black householders).

    I'm looking forward to seeing other responses as well!

    --Matt

  • Here's how I'd put it to a non-expert: 5-year estimates exist because that's the time required to collect a sufficient number of survey responses (particularly in small areas like census tracts, ZCTAs, or small towns). The 5-year estimates in small areas generally already have high margins of error. Given how few responses there are in five years, looking at only one year's worth of responses would give you very unreliable data. And (as you articulated), when comparing overlapping estimates, you're effectively comparing only the non-overlapping years.

    For larger areas (states, CBSAs, big cities, big counties), you could just use the 1-year estimates; the reason you'd use the 5-year estimates is for their lower margins of error. By comparing overlapping years, you lose that advantage.

    tl;dr: There's a reason they're 5-year estimates!

  • A colleague suggested looking at this paper (https://www.census.gov/library/working-papers/2012/adrm/rrs2012-03.html).  There's a short discussion about non-overlapping periods in section 4.4 on page 9.