Comparing ACS data from different geographic levels (ZCTA vs. census tracts)

Hi all, I am working with the 2009-2013 ACS 5-year estimates detailed tables, and have a general question about comparing ACS data at different geographic levels. I have downloaded ACS data at both the census tract and the ZCTA levels. Does anyone know whether the same raw data is used for all of ACS's geographic levels (i.e. whether I should expect the ZCTA and census tract ACS data to be "equivalent"), or if the data from these two geographic levels draw from different samples? I am primarily asking because I am planning to crosswalk both of these datasets to the 5-digit ZIP code level, and would like to know what to expect when comparing the resulting numbers to each other.

Parents
  • I have a related question... we've noticed that total estimated population is about 20K lower for ZCTAs than it is for counties and tracts in the 2020 ACS 5-year. What could explain this "gap"?

    Is this a relic of sampling methods? Perhaps respondents didn't enter in a ZIP that has a counterpart in the ZCTA file?

Reply
  • I have a related question... we've noticed that total estimated population is about 20K lower for ZCTAs than it is for counties and tracts in the 2020 ACS 5-year. What could explain this "gap"?

    Is this a relic of sampling methods? Perhaps respondents didn't enter in a ZIP that has a counterpart in the ZCTA file?

Children
  • I don't believe ZCTAs cover the entire country.
    It's definitely not based on the respondents' ZIP; ZCTAs are explicitly different than mailing zip codes, based on coordinate location, not address.

  • I don't believe ZCTAs cover the entire country.

    That's right, they don't.

    Margins of error might also account for some of the difference.

    For 2017-2021, I'm counting a total tract (and county) population of 333,036,755, and a total ZCTA population of 333,032,934, or a difference of 3,281.

    For 2016-2020, the difference is 21,905.

    for 2015-2019, the difference is 17,490.

  • The ACS "raw" data, i.e. the survey form answers, is based on something called the Master Address File, which contains geocoded data for each structure in the US.  You can look up an address in the MAF using the census geocoder, https://geocoding.geo.census.gov/geocoder/geographies/onelineaddress?form. Every physical "address" has an assigned value for every level of geography, block group, tract, ZCTA (zip code tabulation area), county state PUMA etc.etc.  Note ZCTA's are not the same as postal zip codes which change all the time. ACS population totals and some "marginals" (hispanic and race for example) are controlled at the county level. If you take the ACS values for total population B01003 and add them up across all the tracts in a county you will get the total population for the county. Tracts in a zcta do not necessarily add up to the total population of the ZCTA.   Counties add up to states. Some other characteristics are "controlled." For example the number of females in all the tracts in a county add up to the number of females in the county (B01001) -- I think.  In general "things" don't "add up." The ACS is a survey and the numbers on data.census.gov are estimates with sampling error.

    www.census.gov/.../2021-01.html

  • Thanks, that does make sense. And as I re-read the documentation, I see it was says ZIPs are coming from MAF/TIGER.

    Something that is confusing me, though, is that is says "all inhabited areas have 2020 ZCTA coverage". Shouldn't this imply that all population is accounted for? Maybe they're referring to inhabited areas according to the decennial count?

  • Yes, it does appear that the margins of error are on average higher for ZCTAs than for county or tract.

    My math for 2016-2020:

     

    SUM E_TOTPOP

    MAX E_TOTPOP

    AVG E_TOTPOP

    STDEV E_TOTPOP

    MAX M_TOTPOP*

    AVG M_TOTPOP*

    STDEV M_TOTPOP*

    County

    326,569,308

    10,040,682

    103,903

    332,097

    1,564

    7

    51.5

    Tract

    326,569,308

    39,373

    3,882

    1,657

    5,676

    555

    295.95

    ZCTA

    326,549,615

    126,310

    9,898

    14,762

    6,567

    598

    660.2

    *nulls excluded

  • Hmm, I did a quick check and the differences between ZCTA totals and county totals by state don't seem to be correlated with whether a state has lots of ZCTA "holes".

    state Stab ZCTA_Pop County_Pop difference
    1 AL 4,892,929 4,893,186 -257
    2 AK 736,341 736,990 -649
    4 AZ 7,176,644 7,174,064 2580
    5 AR 3,012,005 3,011,873 132
    6 CA 39338103 39346023 -7920
    8 CO 5,685,154 5,684,926 228
    9 CT 3,570,549 3,570,549 0
    10 DE 967,679 967,679 0
    11 DC 701,974 701,974 0
    12 FL 21214455 21216924 -2469
    13 GA 10516211 10516579 -368
    15 HI 1,420,074 1,420,074 0
    16 ID 1,754,012 1,754,367 -355
    17 IL 12716106 12716164 -58
    18 IN 6,696,688 6,696,893 -205
    19 IA 3,150,430 3,150,011 419
    20 KS 2,912,557 2,912,619 -62
    21 KY 4,468,786 4,461,952 6834
    22 LA 4,663,920 4,664,616 -696
    23 ME 1,340,763 1,340,825 -62
    24 MD 6,037,624 6,037,624 0
    25 MA 6,873,004 6,873,003 1
    26 MI 9,973,857 9,973,907 -50
    27 MN 5,600,601 5,600,166 435
    28 MS 2,981,835 2,981,835 0
    29 MO 6,124,441 6,124,160 281
    30 MT 1,062,386 1,061,705 681
    31 NE 1,924,371 1,923,826 545
    32 NV 3,030,453 3,030,281 172
    33 NH 1,355,284 1,355,244 40
    34 NJ 8,885,418 8,885,418 0
    35 NM 2,094,668 2,097,021 -2353
    36 NY 19514719 19514849 -130
    37 NC 10386119 10386227 -108
    38 ND 759,391 760,394 -1003
    39 OH 11675275 11675275 0
    40 OK 3,949,760 3,949,342 418
    41 OR 4,176,208 4,176,346 -138
    42 PA 12794885 12794885 0
    44 RI 1,057,798 1,057,798 0
    45 SC 5,091,517 5,091,517 0
    46 SD 878,569 879,336 -767
    47 TN 6,765,264 6,772,268 -7004
    48 TX 28633873 28635442 -1569
    49 UT 3,144,764 3,151,239 -6475
    50 VT 624,308 624,340 -32
    51 VA 8,509,607 8,509,358 249
    53 WA 7,512,546 7,512,465 81
    54 WV 1,807,137 1,807,426 -289
    55 WI 5,807,020 5,806,975 45
    56 WY 581,533 581,348 185

    Note that ZCTAs are not assigned to states by the Bureau. (We do that ourselves at MCDC.)

  • The thing that's most beguiling about this isn't the differences, but that some states have exactly matching numbers. If margins of error were the culprit, I wouldn't really expect any states to have exactly matching numbers. Note that there are a very small number of zip codes that cross state boundaries, but I don't think that's what's going on here.

  • Right?! It's baffling. 

    Differential privacy? Surely not at the level of counties and ZCTAs?

  • somewhere the is a detailed methods whitepaper about how disclosure avoidance methods are applied to the ACS.  There is quite a lot of data massaging so there is more than just sampling error in the data.census.gov. I can't seem to find the documentation