I'm using population estimates at census tracts and census block group level for race and ethnicity. Despite coming from the same tables, why don't the aggregate totals not match at census tract and census block group level? I have tried this with ACS 2015-2019 as well as latest 2020 data (taken from NHGIS). Basically, I'm extracting this data from the same table at different geographic level but curious to know what the source of discrepancy *within* same dataset could be to find different totals?
Can you give an example of the table ID's and the rows in the tables that don't add up ?
Table numbers are : B01003 Total Population and B03002 Hispanic or Latino by Origin at census tract and census block group level. NHGIS (IPUMS) codes are: AMP3E001 and AMP3E012 respectively. Their sums for two datasets -- ACS 2015-2019 and Census 2020 -- don't match up as per the sums above (all in millions).
I was hoping to see if I could make an example out of what you discovered but here are 3 reasons why "things don't add up" in tables
1) You added an extra variable to the tabulation. In your case, the total US population is a "0-way" table - a total only. The other table includes Hispanic ethnicity and race. There may be missing values on the census forms for the question Hispanic origin in the the "2-way" table with Hispanic and race . If there are some missing values then when you add up across the Hispanic category in B03002 you will get a different value than the total in the table. The Census tries to make the totals and "marginals" be as consistent as possible within and across all the tables but you can't always do this. In general, you should use the table with as few variables as possible when you look up a statistic.
2) The ACS is a survey so the counts are measured with sampling error. When you add counts together you need to "carry along" the Margin of Error (or MoE). Here is a webinar with some slides about how to do this for sums and ratios (percents).https://www.census.gov/programs-surveys/acs/guidance/training-presentations/acs-moe.html Two combinations (typically sums) of table cells may be different but they may be close when you consider the margin of error.
3) "Disclosure Avoidance" This applies to the ACS and the Decennial census as well. For 2020 only some of the tables have been released mostly the 2020 Census Redistricting Data (P.L. 94-171) Summary File, which is used for making up congressional districts. Basically the Census doesn't report the actual counts. They change them by adding a "random" number of counts. This prevents someone from using a collection of tables and arithmetic to figure out how a particular individual answered a census question. In terms of a percentage change in the value, this is most evident at the "Block Group" geography because the Block Group counts are the smallest. This is a ,complicated subject. For technical details see: https://www.census.gov/library/publications/2021/decennial/2020-census-disclosure-avoidance-handbook.html. I can't find an introductory web page that explains this but try a google search and see if you can find something that provides an introduction.
Hope this helps,
Thanks Dave for the helpful breakdown. I discovered that the error is probably at the level of geographic selection for NHGIS. When I select all of United States, my totals add up. But when I only select the lower 48 states in the dataset I get different totals.
On MOE: I agree, but I think the Latino Totals for census tract in 2015-2019 is beyond the marginal of error.
On differential privacy, I agree. In a side quest, I'm trying to play around with CBG data from 2010 demonstration data products. Will keep you updated.