how often does ZIP not equal ZCTA?

Census Bureau states that "In most instances the ZCTA code is the same as the ZIP Code for an area."  Is there any quantification of "most instances"?  I am looking for an overall estimate of % of addresses where ZIP is not the same as ZCTA.

Parents
  • Since mailing addresses are nearly infinite, I'm not sure how you'd answer a question like this. Maybe you mean buildings, but even there, coming up with a list to compute an answer to the question as you stated it would be pretty complicated.

    One thing you could do would be to compare a list of ZCTAs (say from the Gazetteer, or the ZCTA shapefile) with a list of ZIP codes (for example, from GeoNames), to get a sense of how many ZIP codes don't have a ZCTA. But how to quantify the number of addresses that are "behind" those ZIP codes... ?
  • Hi Joe, thanks for responding. Nearly infinite they may be, but the Census Bureau knows who they are and what their ZIPs and ZCTAs are. Presumably this knowledge is behind their "most instances" statement.

    I have a large dataset of people with their ZIP codes but not their ZCTAs (because who knows their ZCTA?), that I would like to match to SES data. If I assume their ZIP codes are the same as their ZCTAs and match to ACS data, there will be 2 kinds of issues:
    1) ZIPs that don't match to any ZCTA (because their ZIP is a minority in any census block it occurs in or it is some entity like a PO box that CB does not assign ZCTAs to). This is the issue you mentioned. I would lose these people in my analysis, but it does give me an estimate (assuming my dataset large enough and random enough) of that no-match rate.
    2) ZIPs that don't match their ZCTA. Say census block A is all ZIP code 99001 so their ZCTA is also 99001, and census block B is mostly ZIP code 99002 with a little 99001 so their ZCTA is 99002. Fred, in my dataset, lives in census block B with ZIP=99001. He is in ZCTA 99002 but my matching will place him in ZCTA 99001 because of his ZIP. It's this misclassification rate I would like to get some idea of.

    I don't need to derive it -- I just want to know if anyone (besides the Census Bureau) knows what it is.

    Anne
Reply
  • Hi Joe, thanks for responding. Nearly infinite they may be, but the Census Bureau knows who they are and what their ZIPs and ZCTAs are. Presumably this knowledge is behind their "most instances" statement.

    I have a large dataset of people with their ZIP codes but not their ZCTAs (because who knows their ZCTA?), that I would like to match to SES data. If I assume their ZIP codes are the same as their ZCTAs and match to ACS data, there will be 2 kinds of issues:
    1) ZIPs that don't match to any ZCTA (because their ZIP is a minority in any census block it occurs in or it is some entity like a PO box that CB does not assign ZCTAs to). This is the issue you mentioned. I would lose these people in my analysis, but it does give me an estimate (assuming my dataset large enough and random enough) of that no-match rate.
    2) ZIPs that don't match their ZCTA. Say census block A is all ZIP code 99001 so their ZCTA is also 99001, and census block B is mostly ZIP code 99002 with a little 99001 so their ZCTA is 99002. Fred, in my dataset, lives in census block B with ZIP=99001. He is in ZCTA 99002 but my matching will place him in ZCTA 99001 because of his ZIP. It's this misclassification rate I would like to get some idea of.

    I don't need to derive it -- I just want to know if anyone (besides the Census Bureau) knows what it is.

    Anne
Children
  • www.census.gov/.../zctas.html

    ZCTAs are created based on the Census data, so the existing ZCTA are almost 9 years old. ZIP codes are born (and killed) at USPS discretion, reflecting the population dynamics. An additional complication are military zip codes which probably differ quite a bit from the surrounding civilian/residential zip codes.

    If your goal is to attach zip-code-level demographics, which is what I assume you mean by "match to SES data", I would do the following. For most of the zip codes, they are actually 3+2: there's a major sorting facility at 990, and then the additional two digits are nested within it. So for any real zip code I could not match, I would truncate it to three digits, and use the demos of the aggregated area averaging, with population size weights, over all of the 990** codes.