Hi ACS friends,
I'm trying to explain to some brilliant but non-ACS-knowledgeable coworkers and others why the recommended approach is to use *non-overlapping* periods when making comparisons over time. I've looked at the Census doc (https://www.census.gov/programs-surveys/acs/guidance/comparing-acs-data.html) and all I can find is the guidance that says Do use non-overlapping datasets, Do not use overlapping datasets. I'm having a hard time finding the why. All I can come up with is this:
4-5ths of the sample is the same in adjacent datasets, e.g., 2014-2018 vs. 2015-2019. You can think of Census removing respondents from 2014, and adding respondents from 2019, but the respondents from 2015, 2016, 2017, and 2018 are the same. We should compare two completely different datasets, which means two non-overlapping periods.
I'm curious how others answer this question as well! -Diana
A colleague suggested looking at this paper (https://www.census.gov/library/working-papers/2012/adrm/rrs2012-03.html). There's a short discussion about non-overlapping periods in section 4.4 on page 9…
There are the following statements in the document https://www.census.gov/content/dam/Census/library/publications/2018/acs/acs_general_handbook_2018.pdf:
"TIP: As shown in Figure 3.1, consecutive 5-year estimatescontain 4 years of overlapping coverage (forexample, the 2010–2014 ACS 5-year estimates sharesample data from 2011 through 2014 with the 2011–2015 ACS 5-year estimates). Because of this overlap,users should use extreme caution in making comparisonswith consecutive years of multiyear estimates."
"TIP: In general, ACS 1-year data are more likely toshow year-to-year fluctuations, while consecutive5-year estimates are more likely to show a smoothtrend, because 4 of the 5 years in the series overlapfrom one year to the next.".
"When using ACS 1-year data, these comparisons aregenerally straightforward. Using multiyear estimates tolook at trends for small populations can be challengingbecause they rely on pooled data for 5 years. Forexample, comparisons of 5-year estimates from 2010to 2014 and 2011 to 2015 are unlikely to show much differencebecause four of the years overlap; both sets ofestimates include the same data collected from 2011through 2014. The Census Bureau suggests comparing 5-year estimates that do not overlap—for example, comparing2006–2010 ACS 5-year estimates with 2011–2015ACS 5-year estimates."
Diana and Todd Z--
The general ACS handbook does say that "these comparisons can be made with caution" (footnote 19 on p. 17 of https://www.census.gov/content/dam/Census/library/publications/2020/acs/acs_general_handbook_2020.pdf). I vaguely recall someone mentioning once that what you're basically getting is the difference between the last year of the more recent dataset and the first year of the earlier dataset (so 2014 vs 2019, in your example).
As Todd G noted, the usual significance tests don't work, but the Census Bureau used to provide a modification, which is multiplying the standard error of the difference by a factor that depends on the proportion of years that overlap. (Specifically, you subtract that proportion -- 4/5 = 0.8 in your example -- from 1 and take the square root.) I don't know if the Census Bureau is still publishing that guidance, but it does appear in the 2013-vintage ACS documentation (https://www2.census.gov/programs-surveys/acs/tech_docs/statistical_testing/2013StatisticalTesting3and5.pdf).
So it's doable, but I don't know whether it's desirable. You could easily imagine this being used as a "magic" way to make single-year comparisons for geographies without one-year estimates, but keep in mind:
I'm looking forward to seeing other responses as well!
--Matt
Here's how I'd put it to a non-expert: 5-year estimates exist because that's the time required to collect a sufficient number of survey responses (particularly in small areas like census tracts, ZCTAs, or small towns). The 5-year estimates in small areas generally already have high margins of error. Given how few responses there are in five years, looking at only one year's worth of responses would give you very unreliable data. And (as you articulated), when comparing overlapping estimates, you're effectively comparing only the non-overlapping years.For larger areas (states, CBSAs, big cities, big counties), you could just use the 1-year estimates; the reason you'd use the 5-year estimates is for their lower margins of error. By comparing overlapping years, you lose that advantage.
tl;dr: There's a reason they're 5-year estimates!
A colleague suggested looking at this paper (https://www.census.gov/library/working-papers/2012/adrm/rrs2012-03.html). There's a short discussion about non-overlapping periods in section 4.4 on page 9.