Hello, new to the forum
I’m exploring data in the ACS 5-year survey data. I thought I’d look at the census tract-level, but notice that in many tables the data has large margins of error, with a coefficient of variation 50% or more (screenshot).
When looking at ACS 5-year data, what’s the smallest geographic slice you look at? Seems like the zip code is where margin of error comes within a tolerable range? Of course would depend on the specific table. Tables that include multi-dimensional demographic slicing probably still have a high MOE at the zip code level.
There is so much in this question, and I'm actually working on a blog post about this very topic! You're right that with more dimensions of demographic slicing, the more you should look at coarser levels of geography. For basics like total pop, total housing units, etc. you might be fine with block groups or tracts, depending on your purpose.
I want to mention another option though: aggregation.
Aggregate by geography:
instead of being limited by census-defined geography, you can aggregate multiple block groups or multiple tracts together until you have a margin of error/reliability score that's acceptable to you.
Aggregate by rows/categories:
For example, with the age and sex breakdowns in your screenshot, could you live with 10-year age breakdowns instead of 5-year ones? Do you really just need everyone age 65+? Do you need the breakdowns by sex, or are you really just interested in age groups? Combining these groups will also help to get the reliability up.
Hope this helps,
thanks for your notes here! Would be curious to read your blog post :)
Aggregating by geography makes sense.
Regarding aggregating by rows/categories, you said
> For example, with the age and sex breakdowns in your screenshot, could you live with 10-year age breakdowns instead of 5-year ones? Do you really just need everyone age 65+? Do you need the breakdowns by sex, or are you really just interested in age groups? Combining these groups will also help to get the reliability up.
How doyou aggregate by rows/categories in practice? I don't see a way to do this in the table exploration UI that data.census.gov offers.
I guess I could aggregate by rows with code.
The "Total Population Estimate" for 40-49 years woud be 598. In "Understanding and Using the U.S. Census Bureau’s American Community Survey" I see:
> Margin of Error for Aggregated Count Data> The ACS allows the use of unique estimates called derived estimates. These are generated by aggregatingreported estimates across geographic areas or population sub groups. Margin of error is not provided foraggregated estimates and therefore needs to be calculated. This is calculated by square root of the sum ofsquared margin of errors. The letter ‘c’ in the equation below represents each estimate that will be included inthe aggregation.
I could select the rows, for example as you suggested say 10-year age breakdowns:
So presumably the margin of error would be the square of (164^2 + 114^2), which equals 199. Aggregating the Percent Population Estimate would be 20.4 with a margin of error of 6.1.
Thanks for pointing me in the right direction here! Aggregating by row seems like a real valuable method.
As an FYI the calculations (approximate) for various ways of aggregating data (combining categories or geographies) are in the ACS General Handbook chapter 8 here
As a general matter. unless you are doing only a small number of calculations, you will need a statistical package to do your calculations. The R statistics package https://cran.r-project.org/ is a free open source package. The free add on package "tidycensus" can handle downloading and calculations. walker-data.com/.../
Here is a link to the full ACS handbook for data users; https://www.census.gov/content/dam/Census/library/publications/2020/acs/acs_general_handbook_2020.pdf
Understanding and Using American Community Survey Data What All Data Users Need to Know