best practices for media use of local trends

My work as an urban transportation reporter for a couple websites involves close scrutiny of the ACS commuting data on bicycling, public transit, etc. Because this behavior has been changing fast in Portland and other cities, I often use several consecutive one-year estimates to draw rough conclusions, despite differences that are within the margins of error. For example:

And here's a comment that, understandably, takes me to task for doing so:

Here's what I wrote in response:

Yes, I agree with both of you that five-year rolling estimates would be more accurate/precise than these year-on-year estimates, and I'm glad that at Alex's urging I've been including more caveats about margins of error when I write about these numbers. In this case, unfortunately, the five-year rolling data just isn't very meaningful, because we've only got two cycles of it (soon three): 2006-2010 and 2007-2011 since ACS upped its data collection.

In lieu of that, what I'm doing is tracking one-year estimates over time and focusing only on the ones that show what seems to be a clear trend year after year. The lack of precision here is also the reason the chart above has wavy lines rather than bars or sharp angles, for what it's worth.

Please, take this data with a fat grain of salt. But year-on-year ACS figures are widely used by media organizations, governments, advocacy groups, etc. -- the city actually issued a press release yesterday about these same numbers, though this report wasn't based on theirs. I disagree that widely used and respected data should be off-limits for reporting when the reporting is careful about its claims. In my mind, it'd be worse for our city to use the increasingly remote possibility that this is a four-time statistical anomaly to disregard a situation that's staring it in the face.

I'm not trained as an analyst; I'm just an English major who did enough social science in college to have a basic understanding of statistics, and then poked around Factfinder 1 until he figured out what it offered. I do try to take care, in my reporting, to limit myself to covering trends that seem consistent over time or in multiple geographic areas, to prominently include caveats, etc. But am I off base to be even working with this data? How, if at all, could I strengthen my practices?
  • I have a cardinal rule in presenting ACS data for public consumption. I round all whole #s to hundreds, and don't use any tenths of percentages. I also force round when necessary to make a tabulation add up to 100% - e.g. if the #s are 62.4, 22.3, and 15.3, I'll publish the numbers as 63%, 22% and 15%. It's critical that we not give the data false precision.

    I'd have to see what you're reporting to have an opinion about whether or not you're off base. If you see "bike to work" in Portland increasing by 1% a year, and the MOE is +/- 5%, yes, you're off base. I don't offhand know the population of Portland (and can't look it up right now on AFF!), so I don't have a sense of how the numbers might run.
  • Hello Michael,

    I have worked on several articles for my local paper with ACS data and they of course want to report year on year change, particularly with things like poverty or Health Care Insurace. What I tell them and for people who want to use ACS data at all is that it is an estimate with Margin of Error and if you don't run statistical analysis you don't know if there is really a statistically significant difference year to year.

    Also, as I look at data sets and calculate the Coefficient of Variance, the ratio of the Margin of Error to the estimate and if it is higher than 30% or so I am very carefuly in how I use it.

    This all needs to be tempered by the fact that much of the data in the ACS is only found in the ACS, but that doesn't mean you should use it as your only source, it should be put in context.

    The Census could do a much better describing these issues to the public and you need to be judicious in how you use ACS data, because of how volatile the estimates are, especially at smaller geographic areas.
  • Hello again, I mistyped in the above response, I meant to say Standard Error instead of Margin of Error when calculating the Coefficient of Variance (CV). The CV is:

    The ratio of the standard error (square root of the variance) to the value being estimated, usually expressed in terms of a percentage (also known as the relative standard deviation). The lower the CV, the higher the relative reliability of the estimate.