How useful are the 5-year Estimates, really?

Hey everyone!

So I've been quite familiar with using ACS data over the past few years, but recently I've become more of a data skeptic and wonder how I can reasonably consider using a dataset that took up to 5 years to collect. On the surface I was thinking "sure, if the data was collected in 2013, and it takes 5 years to give us the better account of the data, then while late, its still a bit useful"; but from there I questioned why the data is being represented as "2017" estimates.

It's a bit confusing, isn't it? At first I would have simply accepted it, but after thinking about it, I wondered if that should really be 2013 data. This would likely be the case if the 5 year collection was, as I mentioned before, a 5-year collection of data that belongs to 2013, but its probably not the case. The other conclusion would be that the collected data starting in 2013 through 2017 as a single period.
If I consider using the 2017 5-year estimates as periodic data that started in 2013, should that not in some way overlap with the 2016 5-year estimates, and 2015 estimates before that, etc.? Not to mention, there's likely a lot variation there that isn't being considered within those 5-years. When considering geography, which I tend to do, its possible we can look at various geographies with data more relevant to 2013 but are used to reflect 2017. Might beg the question which of the data is reasonably relevant for the question you're working with.  

It matters because these estimates are used quite a lot, so just how useful are these estimates, really? What can one go assume when using 5-year estimates? While I can see a few usages here and there about using the data for Data Visualization and Descriptive Statistical purposes, there seems to be some major limitations that aren't being talked enough about.
  • Frankly it seems like you just need to read the basic documentation explaining ACS period estimates.

    This comment for instance indicates that you'd greatly benefit from reading more about the basic definitions of the 5-year estimates:

    >"This would likely be the case if the 5 year collection was, as I mentioned before, a 5-year collection of data that belongs to 2013, but its probably not the case. The other conclusion would be that the collected data starting in 2013 through 2017 as a single period."

    Here are a couple good resources explaining what the ACS 5-year estimates are and are not.

    In short, ACS period estimates represent the entire period for which data were collected; the 2013-2017 5-year estimates are meant to cover all five of those years, not just 2013 or 2017. This is done so that estimates can be published on several subjects for many small geographies, without compromising respondents' confidentiality or publishing estimates which have so much variance as to be unreliable.

  • In reply to bschneidr:

    Thank you for responding!

    I used second link as a guide before I posted my question, but I never came across the Appendix pdf, so thanks for this.

    On the page I used, it says the 5-Year Estimates are best used for when "[p]recision is more important than currency", for when "[a]nalyzing very small populations", and for "[e]xamining tracts and other smaller geographies because 1-year estimates are not available". I find that pretty discouraging, as someone who does analyse smaller geographies because I was hoping to work with data from a more recent time period. There's a lot of variation in 5-years. They could at least provide the date (i.e. Month/Year) for each data entry since they're already going through the trouble of already providing the estimates. That way we know which of the entries are more/less recent.

    I think usage of the 5-year estimates can become problematic when its being re-packaged in other projects as single-year estimates. The County Health Rankings program comes to mind, with an example being their 2019 Income Inequality estimates that were measured using 2013-2017 data. Another being the datasets they put together, which are a mix of single year estimates and multi-year estimates, packaged as single year estimates. I just don't see how this can be okay, at least without making an explicit warning about how the data can be used. The CHR is just an example, but surely others have dipped into these estimates and likely used the data for more than what its probably intended. The folks at the Bureau even appear to acknowledge the potential error in the 4th Appendix (p. A-19):

    "Users are advised against comparing single-year estimates with multiyear estimates (e.g., comparing 2006 with 2007–2009) and against comparing multiyear estimates of differing lengths (e.g., comparing 2006–2008 with 2009–2014), as they are measuring the characteristics of the population in two different ways, so differences between such estimates are difficult to interpret."

  • Sorry for coming late to this discussion - I was out of the country last week.

    When I present 5 year data, I use the full 5 year label; e.g. 2013-17. I never present overlapping data sets. I might be using 2008-12 and 2013-17, but then would not use, e.g. 2012-16.

    I smooth the data by presenting only whole percentages, no decimals, and rounding median income number to '00s. I think it's very important to avoid false precision. I never mix 1 year and 5 year data in one chart; if I have multiple geographies where some have 1 year data and some do not, I use 5 year data for all of them. I do not present MOEs to users, but pay attention to them myself.

    This year I 've done a lot of work presenting data for small MCDs in the Detroit area, cities or townships of 10 to 30 thousand population. Much of it is comparative, looking at how much community A is similar to communities B, C, and D, while different from communities E, F, and G. Even with problematic MOEs, the data work for these purposes. If median household income is $50K in one community and $125 in another, MOEs are not a concern.

    Hope this helps.