ACS vs Census populations at the ZCTA level

Hi all

Just fyi, I downloaded
B01001, SEX BY AGE, Universe: Total population, 2008-2012 American Community Survey 5-Year Estimates
and compared it to
PCT3. SEX BY AGE. Universe: Total population. 2010 Census Summary File 2
both at the ZCTA level

Here is a comparison of the totals

ACS 2010 Census
State, Total 19,397,882 19,375,158
State, under 5 1,159,733 1,155,708

Pretty close you say? Well..... it is, at this level, but lets compare data from ZCTAs, looking at the category of under 5 (I use this age group a lot)
ZCTA .........ACS ..........2010 Census
10001........686.............624
10002........3271...........3620
10598........1260...........1481
11713........671.............702
11941........52...............98
12428........425............437
12429........0................11
12430........51..............67
13203........938...........1177
13208........2079.........1912
14456........952...........1042
14462.......18..............35
14464.......409............417

Well, in this non random sample, the numbers are generally in the same range at least. So if I'm just looking at population, at the ZCTA level, I'm not sure there is much advantage to using the ACS. Obviously, too, my sample isn't very random. At the state level, the ACS estimate is slightly larger than the 2010 Census estimate. But in most of the ZCTAs I chose (at least randomly to me), most of the 2010 Census estimates are larger than the ACS estimates.

I guess the question is, should I expect close correspondence between the ACS and Census 2010, and if not, why not.

Thanks

Gene
(sorry for the dots. I can't seem to get data tables lined up any other way)
Parents
  • Gene,

    both sets of numbers are subject to various errors, and even the Census figure is not something written in stone. See the famous diagram from Groves (2004), poq.oxfordjournals.org/.../F3.large.jpg.

    The Census data is subject to undercoverage and non-response error on the representation side, and all the typical sorts of measurement error on the measurement side. People may have reported a different zip code because they pick up mail at the mail box, and only have a vague idea about their zip code; the zip data may have been missing from their Census form, and was imputed from the zip code where the form was mailed from -- but this could be on the respondent's way to work rather than their home zip; person's age may be based on the year of birth or an exact calculation on a given date; etc. Besides, the Census is collected in several modes -- mail, phone, in-person enumeration, and multiple modes may have their own unique measurement errors associated with them (although I am sure that the budget of the Census' program to control these mode effects is larger than a budget of most academic surveys or polls). I don't know the magnitudes of these errors -- the Census Bureau people would -- but I imagine that non-response rate is in single digit %, undercoverage is single digit % if not less, and various editing and imputation rates are in single digits, too.

    ACS, on top of that, is subject to sampling error. It would have been SO-O-O much easier to discuss this, Gene, if you supplied the standard errors from ACS along with your tabs. ACS has a slightly higher non-response rate of may be 3-5% on the representation side, and is also collected in multiple modes, which may add to the measurement error.

    Norm mentioned that the ACS totals are controlled, and you asked what that means. It means that the ACS totals are made to agree with externally obtained figures for certain geographies and certain demographic groups. The ACS 2010 may have been made to agree with Census 2010, although I would imagine that ACS 2010 was released before the Census data were, but let's just conceptually think that this was the case. For ACS 2009 or ACS 2011, you don't have the exact Census data to match ACS to, so you have to rely on demographic models of how population grew since the last census. I don't know neither the details of how exactly ACS is weighted, nor how these demographic models work (although I probably should, and this might be an interesting topic for a webinar). I imagine that the demographic models work at a relatively high level of aggregation, and that ACS has hard controls at the national level for all the demographic targets (i.e., controls that have to be satisfied exactly), hard controls for main demographic targets (like age, gender and race) at the state level, softer controls (which allow for some degree of error when the ACS totals are being made to align with the external totals) for auxiliary demographic targets (like education, or interaction of the main demographic variables) at the state level, and mostly soft controls at the county levels (except for large counties like LA, which is in itself is bigger than many states in North East or in the mountains). You definitely cannot calibrate the weights at the ZCTA level as this would just blow the weights up; you cannot produce stable weights unless you have counts of a hundred or so in a given adjustment cell. So as a result what you see at ZCTA level is a reflection of the adjustment processes that occur at higher levels (think shadows in Plato's cave), and hardly account for whatever may be happening at a very local level (e.g., a substantial new development that could shake the zip code boundaries). Thus on top of the representation error (sampling + non-response) and the standard measurement errors, you have additionally the demographic model error as a source of the total error in ACS (which would be the editing error in Groves' diagram). While this error is actually made to be anti-correlated with sampling error, reducing the total error, it is still there. You can sort of gauge the degree of magnitude of the demographic model error by tracking the history of the population projections. E.g., look at the population projection for 2010 that was made say in 2008, and compare that to the actual population count from the Census. I would not expect them to be hugely off, but I would not be surprised if they were say about 0.something% off for nation wide, and maybe 1-3% off for states.

    I don't really have much to say about the systematic errors, but the sampling error can be discussed a little bit. As a very crude measure, let's think of ACS as 5% sample of the population. Then a larger domain count of 2000 is based on a sample of about 100, or rather a binomial random variable with n=2000 and p=0.05, which has a mean of 100 and a standard deviation of 10, or 10% of the figure itself. Thus the count of 2000 is associated with a standard error of 10%, i.e., 200 on the count scale, or the margin of error of +/- 400. Viewed from that angle, the numbers between ACS and Census are in SPECTACULAR agreement. In truth, calibration of ACS weights to the Census target controls removes a lot of that sampling variability, so probably for the count of 2000, you may see the ACS standard error of 3-4%, or 60-80 in absolute units, rather than 200. Give 1% or so for unknown systematic biases, and that's still a remarkable agreement.
Reply
  • Gene,

    both sets of numbers are subject to various errors, and even the Census figure is not something written in stone. See the famous diagram from Groves (2004), poq.oxfordjournals.org/.../F3.large.jpg.

    The Census data is subject to undercoverage and non-response error on the representation side, and all the typical sorts of measurement error on the measurement side. People may have reported a different zip code because they pick up mail at the mail box, and only have a vague idea about their zip code; the zip data may have been missing from their Census form, and was imputed from the zip code where the form was mailed from -- but this could be on the respondent's way to work rather than their home zip; person's age may be based on the year of birth or an exact calculation on a given date; etc. Besides, the Census is collected in several modes -- mail, phone, in-person enumeration, and multiple modes may have their own unique measurement errors associated with them (although I am sure that the budget of the Census' program to control these mode effects is larger than a budget of most academic surveys or polls). I don't know the magnitudes of these errors -- the Census Bureau people would -- but I imagine that non-response rate is in single digit %, undercoverage is single digit % if not less, and various editing and imputation rates are in single digits, too.

    ACS, on top of that, is subject to sampling error. It would have been SO-O-O much easier to discuss this, Gene, if you supplied the standard errors from ACS along with your tabs. ACS has a slightly higher non-response rate of may be 3-5% on the representation side, and is also collected in multiple modes, which may add to the measurement error.

    Norm mentioned that the ACS totals are controlled, and you asked what that means. It means that the ACS totals are made to agree with externally obtained figures for certain geographies and certain demographic groups. The ACS 2010 may have been made to agree with Census 2010, although I would imagine that ACS 2010 was released before the Census data were, but let's just conceptually think that this was the case. For ACS 2009 or ACS 2011, you don't have the exact Census data to match ACS to, so you have to rely on demographic models of how population grew since the last census. I don't know neither the details of how exactly ACS is weighted, nor how these demographic models work (although I probably should, and this might be an interesting topic for a webinar). I imagine that the demographic models work at a relatively high level of aggregation, and that ACS has hard controls at the national level for all the demographic targets (i.e., controls that have to be satisfied exactly), hard controls for main demographic targets (like age, gender and race) at the state level, softer controls (which allow for some degree of error when the ACS totals are being made to align with the external totals) for auxiliary demographic targets (like education, or interaction of the main demographic variables) at the state level, and mostly soft controls at the county levels (except for large counties like LA, which is in itself is bigger than many states in North East or in the mountains). You definitely cannot calibrate the weights at the ZCTA level as this would just blow the weights up; you cannot produce stable weights unless you have counts of a hundred or so in a given adjustment cell. So as a result what you see at ZCTA level is a reflection of the adjustment processes that occur at higher levels (think shadows in Plato's cave), and hardly account for whatever may be happening at a very local level (e.g., a substantial new development that could shake the zip code boundaries). Thus on top of the representation error (sampling + non-response) and the standard measurement errors, you have additionally the demographic model error as a source of the total error in ACS (which would be the editing error in Groves' diagram). While this error is actually made to be anti-correlated with sampling error, reducing the total error, it is still there. You can sort of gauge the degree of magnitude of the demographic model error by tracking the history of the population projections. E.g., look at the population projection for 2010 that was made say in 2008, and compare that to the actual population count from the Census. I would not expect them to be hugely off, but I would not be surprised if they were say about 0.something% off for nation wide, and maybe 1-3% off for states.

    I don't really have much to say about the systematic errors, but the sampling error can be discussed a little bit. As a very crude measure, let's think of ACS as 5% sample of the population. Then a larger domain count of 2000 is based on a sample of about 100, or rather a binomial random variable with n=2000 and p=0.05, which has a mean of 100 and a standard deviation of 10, or 10% of the figure itself. Thus the count of 2000 is associated with a standard error of 10%, i.e., 200 on the count scale, or the margin of error of +/- 400. Viewed from that angle, the numbers between ACS and Census are in SPECTACULAR agreement. In truth, calibration of ACS weights to the Census target controls removes a lot of that sampling variability, so probably for the count of 2000, you may see the ACS standard error of 3-4%, or 60-80 in absolute units, rather than 200. Give 1% or so for unknown systematic biases, and that's still a remarkable agreement.
Children
No Data