# Best approach for minimizing error for a given year when using overlapping 5-year data?

Hi Folks,

This is my first post and day on this forum.  I couldn't find the answer to this question, but if there is a good thread which exists already, please point me in that direction.

I am conducting an analysis where my geographic region is at the county level.  As such, I would like to use the 5-year data.

However, I need to run a calculation for a single year.  For example,  I need to run a calculation in Suffolks County, NY for 2010.  There are several 5 year samples (2006-2010,2007-2011...2010-2014) which overlap and include 2010.

What is the optimal approach, in terms of minimizing error, of dealing with this?  How do I combine, if at all, the different multi-year datasets to find the best figure to use.

thanks,
Eric

• The short answer is, you can't. The ACS sample just doesn't have enough cases to provide single year data for small areas, which the Census Bureau has defined as those under 65,000 population.

I would use 2008-12 and label the data presentation as such.
• The short answer is, you can't. The ACS sample just doesn't have enough cases to provide single year data for small areas, which the Census Bureau has defined as those under 65,000 population.

I would use 2008-12 and label the data presentation as such.
• In reply to Patty Becker:

PS - Suffolk County is certainly big enough to use the 1 year data, so just use 2010 1 year ACS. But you may have other counties in your analysis which are under 65,000. Don't combine 1 year and 5 year data in one presentation or analysis.
• In reply to Patty Becker:

Hi Patty,

Thank you for your response. I'm looking at all counties across the country, I just pulled Suffolk randomly.

So your recommended approach to use the 5-year data where the year of interest is the midpoint?

Why is this a better approach (statistically), than averaging the 2006-2010 set and the 2010 to 2014 set?

thanks,
Eric
• In reply to ericlaufer:

If you average the two five-year data sets, you're looking at 10 years of data, instead of 5. A 10 year spread really isn't very useful. No one loves the 5 year spread, either, but it's the best we've got for areas of under 65K population.