How useful are the 5-year Estimates, really?

Joshua Ellis over 5 years ago

Hey everyone!

So I've been quite familiar with using ACS data over the past few years, but recently I've become more of a data skeptic and wonder how I can reasonably consider using a dataset that took up to 5 years to collect. On the surface I was thinking "sure, if the data was collected in 2013, and it takes 5 years to give us the better account of the data, then while late, its still a bit useful"; but from there I questioned why the data is being represented as "2017" estimates.

It's a bit confusing, isn't it? At first I would have simply accepted it, but after thinking about it, I wondered if that should really be 2013 data. This would likely be the case if the 5 year collection was, as I mentioned before, a 5-year collection of data that belongs to 2013, but its probably not the case. The other conclusion would be that the collected data starting in 2013 through 2017 as a single period.

If I consider using the 2017 5-year estimates as periodic data that started in 2013, should that not in some way overlap with the 2016 5-year estimates, and 2015 estimates before that, etc.? Not to mention, there's likely a lot variation there that isn't being considered within those 5-years. When considering geography, which I tend to do, its possible we can look at various geographies with data more relevant to 2013 but are used to reflect 2017. Might beg the question which of the data is reasonably relevant for the question you're working with.

It matters because these estimates are used quite a lot, so just how useful are these estimates, really? What can one go assume when using 5-year estimates? While I can see a few usages here and there about using the data for Data Visualization and Descriptive Statistical purposes, there seems to be some major limitations that aren't being talked enough about.

Parents

bschneidr over 5 years ago

Frankly it seems like you just need to read the basic documentation explaining ACS period estimates.

This comment for instance indicates that you'd greatly benefit from reading more about the basic definitions of the 5-year estimates:

>"This would likely be the case if the 5 year collection was, as I mentioned before, a 5-year collection of data that belongs to 2013, but its probably not the case. The other conclusion would be that the collected data starting in 2013 through 2017 as a single period."

Here are a couple good resources explaining what the ACS 5-year estimates are and are not.
www.psc.isr.umich.edu/.../Compass_Appendix.pdf
www.census.gov/.../estimates.html

In short, ACS period estimates represent the entire period for which data were collected; the 2013-2017 5-year estimates are meant to cover all five of those years, not just 2013 or 2017. This is done so that estimates can be published on several subjects for many small geographies, without compromising respondents' confidentiality or publishing estimates which have so much variance as to be unreliable.
Cancel
Up 0 Down

Reply

Cancel
Joshua Ellis over 5 years ago in reply to bschneidr

Thank you for responding!

I used second link as a guide before I posted my question, but I never came across the Appendix pdf, so thanks for this.

On the page I used, it says the 5-Year Estimates are best used for when "[p]recision is more important than currency", for when "[a]nalyzing very small populations", and for "[e]xamining tracts and other smaller geographies because 1-year estimates are not available". I find that pretty discouraging, as someone who does analyse smaller geographies because I was hoping to work with data from a more recent time period. There's a lot of variation in 5-years. They could at least provide the date (i.e. Month/Year) for each data entry since they're already going through the trouble of already providing the estimates. That way we know which of the entries are more/less recent.

I think usage of the 5-year estimates can become problematic when its being re-packaged in other projects as single-year estimates. The County Health Rankings program comes to mind, with an example being their 2019 Income Inequality estimates that were measured using 2013-2017 data. Another being the datasets they put together, which are a mix of single year estimates and multi-year estimates, packaged as single year estimates. I just don't see how this can be okay, at least without making an explicit warning about how the data can be used. The CHR is just an example, but surely others have dipped into these estimates and likely used the data for more than what its probably intended. The folks at the Bureau even appear to acknowledge the potential error in the 4^th Appendix (p. A-19):

"Users are advised against comparing single-year estimates with multiyear estimates (e.g., comparing 2006 with 2007–2009) and against comparing multiyear estimates of differing lengths (e.g., comparing 2006–2008 with 2009–2014), as they are measuring the characteristics of the population in two different ways, so differences between such estimates are difficult to interpret."
Cancel
Up 0 Down

Reply

Cancel

Reply

Joshua Ellis over 5 years ago in reply to bschneidr

Thank you for responding!

I used second link as a guide before I posted my question, but I never came across the Appendix pdf, so thanks for this.

On the page I used, it says the 5-Year Estimates are best used for when "[p]recision is more important than currency", for when "[a]nalyzing very small populations", and for "[e]xamining tracts and other smaller geographies because 1-year estimates are not available". I find that pretty discouraging, as someone who does analyse smaller geographies because I was hoping to work with data from a more recent time period. There's a lot of variation in 5-years. They could at least provide the date (i.e. Month/Year) for each data entry since they're already going through the trouble of already providing the estimates. That way we know which of the entries are more/less recent.

I think usage of the 5-year estimates can become problematic when its being re-packaged in other projects as single-year estimates. The County Health Rankings program comes to mind, with an example being their 2019 Income Inequality estimates that were measured using 2013-2017 data. Another being the datasets they put together, which are a mix of single year estimates and multi-year estimates, packaged as single year estimates. I just don't see how this can be okay, at least without making an explicit warning about how the data can be used. The CHR is just an example, but surely others have dipped into these estimates and likely used the data for more than what its probably intended. The folks at the Bureau even appear to acknowledge the potential error in the 4^th Appendix (p. A-19):

"Users are advised against comparing single-year estimates with multiyear estimates (e.g., comparing 2006 with 2007–2009) and against comparing multiyear estimates of differing lengths (e.g., comparing 2006–2008 with 2009–2014), as they are measuring the characteristics of the population in two different ways, so differences between such estimates are difficult to interpret."
Cancel
Up 0 Down

Reply

Cancel

Children

No Data