Hi all, I am working with the 2009-2013 ACS 5-year estimates detailed tables, and have a general question about comparing ACS data at different geographic levels. I have downloaded ACS data at both the census tract and the ZCTA levels. Does anyone know whether the same raw data is used for all of ACS's geographic levels (i.e. whether I should expect the ZCTA and census tract ACS data to be "equivalent"), or if the data from these two geographic levels draw from different samples? I am primarily asking because I am planning to crosswalk both of these datasets to the 5-digit ZIP code level, and would like to know what to expect when comparing the resulting numbers to each other.
The ACS "raw" data, i.e. the survey form answers, is based on something called the Master Address File, which contains geocoded data for each structure in the US. You can look up an address in…
Yes. the tract level estimates and ZCTA level estimates are derived from the same underlying sample.
I don't believe ZCTAs cover the entire country.It's definitely not based on the respondents' ZIP; ZCTAs are explicitly different than mailing zip codes, based on coordinate location, not address.…
Thank you, Matthew!
I have a related question... we've noticed that total estimated population is about 20K lower for ZCTAs than it is for counties and tracts in the 2020 ACS 5-year. What could explain this "gap"? Is this a relic of sampling methods? Perhaps respondents didn't enter in a ZIP that has a counterpart in the ZCTA file?
I don't believe ZCTAs cover the entire country.It's definitely not based on the respondents' ZIP; ZCTAs are explicitly different than mailing zip codes, based on coordinate location, not address.
Bernie said:I don't believe ZCTAs cover the entire country.
That's right, they don't.
Margins of error might also account for some of the difference.
For 2017-2021, I'm counting a total tract (and county) population of 333,036,755, and a total ZCTA population of 333,032,934, or a difference of 3,281.
For 2016-2020, the difference is 21,905.
for 2015-2019, the difference is 17,490.
The ACS "raw" data, i.e. the survey form answers, is based on something called the Master Address File, which contains geocoded data for each structure in the US. You can look up an address in the MAF using the census geocoder, https://geocoding.geo.census.gov/geocoder/geographies/onelineaddress?form. Every physical "address" has an assigned value for every level of geography, block group, tract, ZCTA (zip code tabulation area), county state PUMA etc.etc. Note ZCTA's are not the same as postal zip codes which change all the time. ACS population totals and some "marginals" (hispanic and race for example) are controlled at the county level. If you take the ACS values for total population B01003 and add them up across all the tracts in a county you will get the total population for the county. Tracts in a zcta do not necessarily add up to the total population of the ZCTA. Counties add up to states. Some other characteristics are "controlled." For example the number of females in all the tracts in a county add up to the number of females in the county (B01001) -- I think. In general "things" don't "add up." The ACS is a survey and the numbers on data.census.gov are estimates with sampling error.
www.census.gov/.../2021-01.html
Thanks, that does make sense. And as I re-read the documentation, I see it was says ZIPs are coming from MAF/TIGER. Something that is confusing me, though, is that is says "all inhabited areas have 2020 ZCTA coverage". Shouldn't this imply that all population is accounted for? Maybe they're referring to inhabited areas according to the decennial count?
Yes, it does appear that the margins of error are on average higher for ZCTAs than for county or tract.My math for 2016-2020:
SUM E_TOTPOP
MAX E_TOTPOP
AVG E_TOTPOP
STDEV E_TOTPOP
MAX M_TOTPOP*
AVG M_TOTPOP*
STDEV M_TOTPOP*
County
326,569,308
10,040,682
103,903
332,097
1,564
7
51.5
Tract
39,373
3,882
1,657
5,676
555
295.95
ZCTA
326,549,615
126,310
9,898
14,762
6,567
598
660.2
*nulls excluded
Hmm, I did a quick check and the differences between ZCTA totals and county totals by state don't seem to be correlated with whether a state has lots of ZCTA "holes".
Note that ZCTAs are not assigned to states by the Bureau. (We do that ourselves at MCDC.)
The thing that's most beguiling about this isn't the differences, but that some states have exactly matching numbers. If margins of error were the culprit, I wouldn't really expect any states to have exactly matching numbers. Note that there are a very small number of zip codes that cross state boundaries, but I don't think that's what's going on here.
Right?! It's baffling.
Differential privacy? Surely not at the level of counties and ZCTAs?
somewhere the is a detailed methods whitepaper about how disclosure avoidance methods are applied to the ACS. There is quite a lot of data massaging so there is more than just sampling error in the data.census.gov. I can't seem to find the documentation