Workflows for combining ACS geographies

When using ACS data, we often want to find a statistic, say the percent and number of people who are not “white alone, non-Hispanic” (or some other interesting characteristic) for a collection of municipalities or census tracts representing a large region or a study area. Our practice is to essentially:

  1. calculate the percentage of the characteristic of interest using ACS data by dividing the ACS value by the appropriate total ACS population
  2. multiply the resulting percentage by the corresponding population universe using Census populations
  3. sum the resulting value across all geographies to find the number and percentage of people in our group of interest for the aggregated area, then divide that total by the total Census population (as used in step 2) to find the percent of that demographic in the aggregated area

This practice seems to stem from text in the user guides, but I can’t find any documentation suggesting the intended way to apply this–if there even is a right way.

“TIP: The ACS was designed to provide estimates of the characteristics of the population, not to provide counts of the population in different geographic areas or population subgroups. For basic counts of the U.S. population by age, sex, race, and Hispanic origin, visit the Census Bureau’s Population and Housing Unit Estimates Web page.”

and

“TIP: ACS data for small statistical areas (such as census tracts) have no control totals, which may lead to errors in the population and housing unit estimates. In such cases, data users are encouraged to rely more upon noncount statistics, such as percent distributions or averages.” from “https://www.census.gov/content/dam/Census/library/publications/2020/acs/acs_general_handbook_2020_ch01.pdf

I don’t think I’ve seen this workflow noted in others’ analyses, and as far as I’ve read in the documentation an example of this doesn’t appear,  but it may just be that I’m not looking in the right places.

Do others “control” the ACS data manually when aggregating the data? Do you just use the ACS as-is without tying data back to the decennial? I’d love to hear about your workflows for aggregating ACS data!

Parents
  • A couple of issues.   The ACS data at the county level (I believe I have this correct) is "controlled" by decennial census populations as adjusted by the Population Estimates Program (PEP) which adjust population counts between decennial census counts.

    https://www.census.gov/programs-surveys/acs/technical-documentation/user-notes/2022-10.html

    So your adjustment procedure is not needed.  The intercensal ACS numbers are already adjusted.  If you want to get a population estimate of some characteristic for a collection of census tracts (non-overlapping geographies) then just add the estimates.  To get the margin of error MoE use the formulas in the handbook chapter 8. https://www.census.gov/programs-surveys/acs/library/handbooks/researchers.html  https://www.census.gov/content/dam/Census/library/publications/2020/acs/acs_researchers_handbook_2020_ch03.pdf

    For calculating the MoE (see paragraph for Aggregating across geographic areas). https://www.census.gov/content/dam/Census/library/publications/2020/acs/acs_general_handbook_2020_ch08.pdf

    Always use the appropriate geography.  For example to compute counts for a county DO NOT add up the totals for all the tracts in a county and apply the formula for the MoE.  Use the county table.

    To get some combination of "cells" in a single ACS table (or difference) for a single geography take the sum or difference of the appropriate cells.  In general you can take a linear combination (google for explanation).  You can then aggregate across non overlapping geographies and compute the MoE.

    From Wikipedia, the free encyclopedia

    In mathematics, a linear combination is an expression constructed from a set of terms by multiplying each term by a constant and adding the results (e.g. a linear combination of x and y would be any expression of the form ax + by, where a and b are constants).[1][2]

    Dave

    PS it is useful to add some information to your bio/profile  so I can figure out the correct level to pitch my answer.

  • Thanks for this, and for the note to update my bio! Seems like we're both in the same region.

    I've seen the note about values being controlled at the county and "Starting with the 2009 survey, ACS estimates of the total population of incorporated places (self-governing cities, towns, or villages) and minor civil divisions (such as county precincts) are also adjusted so that they are consistent with official population estimates." https://www.census.gov/content/dam/Census/library/publications/2020/acs/acs_general_handbook_2020.pdf

    but I'm not sure how that trickles down to tracts--a level below the control. I can see how simply aggregating two counties together or three county subdivisions (perhaps?) after 2009 would work--add them up because they have the 'right' populations and households. But tracts--I'm less certain. If a focused part of a county experiences high growth, does that growth get smoothed out over the constituent tracts during the control step such that the total county has the right pop/households but individual tracts might not reflect their high growth?

  • Dear Steve,

    Don't forget that the ACS is a survey and the numbers in the tables are survey/sample estimates. A census is a count of "everybody" and there is no "margin of error"

    As a general matter you can look at tract level estimates and make any computation you want just make sure to carry the MoE along so you can look at the MoE for your estimate/combination. You will get an estimate and say a 95% confidence interval.

    To do this correctly and get "exact" estimates for MoE you need to use Variance Replicate Estimate tables.

    "The tables are intended for advanced users who are adding ACS data within a table or between geographies. Users can calculate margins of error for aggregated data by using the variance replicates. Unlike available approximation formulas, this method results in an exact margin of error by using the covariance term."

    For details look here: https://www2.census.gov/programs-surveys/acs/replicate_estimates/2021/documentation/5-year/2017-2021_Variance_Replicate_Table_Documentation.pdf

    A good book on Survey methodology which discusses the various methods used for ACS estimates:

    www.amazon.com/.../

    https://www.census.gov/programs-surveys/acs/data/variance-tables.html which are available for a subset of ACS 5 year "B" tables.

    The reason the the ACS puts in the "Tips"  is so that less sophisticated users don't "fall into any traps" when the compute and interpret ACS data.

    With the Variance Replicate Estimate tables you can compute the MoE for any function of the table cells. A percentage, a square root, the cosine an exponential anything you want!

    Hope that this gives you some insight into the "statistical technology" used with the ACS.

    The take home is when you put an ACS estimate in a report put in a confidence interval Xx (low,high) or use a +- after your estimate or number.  I have a trick. I always include one digit after the decimal point to indicate that the number is an estimate not a count of people -- you cant have a 1.3 of a person.

  • Dear Steve,

    To get "down in the weeds" a little here are the methods for the between decennial census county estimates.  These numbers are used to adjust the ACS survey county estimates.  For the complete calculation for the nation, each county and the states see:

    https://www2.census.gov/programs-surveys/popest/technical-documentation/methodology/2020-2022/methods-statement-v2022.pdf

    Some of the details (for 2020 to 2022) included adjusting the 2020 census counts (yes counts not survey estimates) using birth and death certificate data. The death/birth data comes from the National Center for Health Statistics (you can go to their website for details). As an aside, if you have a need to know, go through several levels of review, and are willing to be sworn to secrecy you can get the data to write a research paper.  After you are done you have to destroy all the raw data that you received.

    Another adjustment is for migration.  They use IRS, Social Security and Medicare address data for that.

    They use an adjustment process called raking to make things add up.  This basically is a statistical model and it yields estimates that are not counts but rather numbers with digits after the decimal point.  They fudge things using an algorithm to make the county counts come out as  integers and still have them add up to be a count (integer). In this process they also account for age categories, sex race and Hispanic ethnicity.

    The short of it is the numbers that you see in the ACS tables are model based and they really have digits after the decimal place.  Things are "fudged" to make them come out as counts.  I assume that this is done to keep people from asking questions. I've been doing stats for over 40 years and I have needed to describe this process many times when writing papers.  The audience is typically physicians and you usually can explain it so that they get an idea of what is going on and they accept it.  The reason that the ACS documentation says not to report the "counts" but rather use rates or percentages is because a percent is just a fraction with 2 decimal places and people seem to understand what a "rate" is.  In any case converting everything to a percent, making a calculation and then multiplying the percent by a number to get a count is just taking you "around the barn" for no good reason.  The data in the ACS tables is fine as is.  Just realize that it is an estimate with digits after the decimal point dressed up to look like a count.

    PS

    I just skimmed through the tread and the census tract totals (B01003) add up to the county total.  This is the only "control" applied (I think -- they may also control age and sex totals). See note above about counties where the totals are controlled for age x sex x race x ethnicity.

Reply
  • Dear Steve,

    To get "down in the weeds" a little here are the methods for the between decennial census county estimates.  These numbers are used to adjust the ACS survey county estimates.  For the complete calculation for the nation, each county and the states see:

    https://www2.census.gov/programs-surveys/popest/technical-documentation/methodology/2020-2022/methods-statement-v2022.pdf

    Some of the details (for 2020 to 2022) included adjusting the 2020 census counts (yes counts not survey estimates) using birth and death certificate data. The death/birth data comes from the National Center for Health Statistics (you can go to their website for details). As an aside, if you have a need to know, go through several levels of review, and are willing to be sworn to secrecy you can get the data to write a research paper.  After you are done you have to destroy all the raw data that you received.

    Another adjustment is for migration.  They use IRS, Social Security and Medicare address data for that.

    They use an adjustment process called raking to make things add up.  This basically is a statistical model and it yields estimates that are not counts but rather numbers with digits after the decimal point.  They fudge things using an algorithm to make the county counts come out as  integers and still have them add up to be a count (integer). In this process they also account for age categories, sex race and Hispanic ethnicity.

    The short of it is the numbers that you see in the ACS tables are model based and they really have digits after the decimal place.  Things are "fudged" to make them come out as counts.  I assume that this is done to keep people from asking questions. I've been doing stats for over 40 years and I have needed to describe this process many times when writing papers.  The audience is typically physicians and you usually can explain it so that they get an idea of what is going on and they accept it.  The reason that the ACS documentation says not to report the "counts" but rather use rates or percentages is because a percent is just a fraction with 2 decimal places and people seem to understand what a "rate" is.  In any case converting everything to a percent, making a calculation and then multiplying the percent by a number to get a count is just taking you "around the barn" for no good reason.  The data in the ACS tables is fine as is.  Just realize that it is an estimate with digits after the decimal point dressed up to look like a count.

    PS

    I just skimmed through the tread and the census tract totals (B01003) add up to the county total.  This is the only "control" applied (I think -- they may also control age and sex totals). See note above about counties where the totals are controlled for age x sex x race x ethnicity.

Children
No Data