Hello, new to the forum
I’m exploring data in the ACS 5-year survey data. I thought I’d look at the census tract-level, but notice that in many tables the data has large margins of error, with a coefficient of variation 50% or more (screenshot).
When looking at ACS 5-year data, what’s the smallest geographic slice you look at? Seems like the zip code is where margin of error comes within a tolerable range? Of course would depend on the specific table. Tables that include multi-dimensional demographic slicing probably still have a high MOE at the zip code level.
Thanks,
Welcome Eric.
There is so much in this question, and I'm actually working on a blog post about this very topic! You're right that with more dimensions of demographic slicing, the more you should look at coarser levels of geography. For basics like total pop, total housing units, etc. you might be fine with block groups or tracts, depending on your purpose.
I want to mention another option though: aggregation.
Aggregate by geography:
instead of being limited by census-defined geography, you can aggregate multiple block groups or multiple tracts together until you have a margin of error/reliability score that's acceptable to you.
Aggregate by rows/categories:
For example, with the age and sex breakdowns in your screenshot, could you live with 10-year age breakdowns instead of 5-year ones? Do you really just need everyone age 65+? Do you need the breakdowns by sex, or are you really just interested in age groups? Combining these groups will also help to get the reliability up.
Hope this helps,
Diana
Hi Diana,
thanks for your notes here! Would be curious to read your blog post :)
Aggregating by geography makes sense.
Regarding aggregating by rows/categories, you said
> For example, with the age and sex breakdowns in your screenshot, could you live with 10-year age breakdowns instead of 5-year ones? Do you really just need everyone age 65+? Do you need the breakdowns by sex, or are you really just interested in age groups? Combining these groups will also help to get the reliability up.
How doyou aggregate by rows/categories in practice? I don't see a way to do this in the table exploration UI that data.census.gov offers.
I guess I could aggregate by rows with code.
The "Total Population Estimate" for 40-49 years woud be 598. In "Understanding and Using the U.S. Census Bureau’s American Community Survey" I see:
> Margin of Error for Aggregated Count Data> The ACS allows the use of unique estimates called derived estimates. These are generated by aggregatingreported estimates across geographic areas or population sub groups. Margin of error is not provided foraggregated estimates and therefore needs to be calculated. This is calculated by square root of the sum ofsquared margin of errors. The letter ‘c’ in the equation below represents each estimate that will be included inthe aggregation.
I could select the rows, for example as you suggested say 10-year age breakdowns:
So presumably the margin of error would be the square of (164^2 + 114^2), which equals 199. Aggregating the Percent Population Estimate would be 20.4 with a margin of error of 6.1.
Thanks for pointing me in the right direction here! Aggregating by row seems like a real valuable method.
As an FYI the calculations (approximate) for various ways of aggregating data (combining categories or geographies) are in the ACS General Handbook chapter 8 here
https://www.census.gov/content/dam/Census/library/publications/2020/acs/acs_general_handbook_2020_ch08.pdf
As a general matter. unless you are doing only a small number of calculations, you will need a statistical package to do your calculations. The R statistics package https://cran.r-project.org/ is a free open source package. The free add on package "tidycensus" can handle downloading and calculations. walker-data.com/.../
Here is a link to the full ACS handbook for data users; https://www.census.gov/content/dam/Census/library/publications/2020/acs/acs_general_handbook_2020.pdf
Understanding and Using American Community Survey Data What All Data Users Need to Know
Thanks Eric and David. I agree with both of you.
Here is the blog post I mentioned, it's now live. It talks specifically about aggregating tracts together: https://www.esri.com/arcgis-blog/products/arcgis-living-atlas/analytics/acs-summarization-app/
(Not included in here but in discussions for a future blog is the part about aggregating groups, like it seems you're doing with age groups. This is the best way to go for people who are committed to finer geography levels but are okay with combining groups.)
As someone new and not into stats etc. I pulled some key measures ACS1 by county and found most had data (not -99999999 ). However, there were some key race measures (Black and Hispanic for example) with a lot of -999999. I think there actually was a state that was missing one or more races for the ACS1. I think S2701_C01_017E and S2701_C01_023E stood out the most. I assume that the ACS5 will resolve most of these issues at the county and lower levels. I'm going to compare ACS1 and ACS5 for the 900 counties in ACS1 to see what the differences are. I will use around 70 measures. When I'm done, I'll paste the 70 measures ACS1 vs ACS5 in a discussion here (just for reference in case anyone wants to see it). I could also do something with margin or errors, but for all the Public Health websites and more, they just take whatever the data values are without regard to margin of error, so I think I will too. No heavy lifting, just using the published numbers. Not that I could if I tried, not in your league but I'm Ok with that.
These are a few of the measures and what percent of the 900 counties have data for ACS1 (not -9999999). So later perhaps 4 columns with percent with data and the actual numbers by diff ACS1 vs ACS5 Top level, no mix and match (by Age by Race, by Poverty)