ACS vs Census populations at the ZCTA level

Hi all

Just fyi, I downloaded
B01001, SEX BY AGE, Universe: Total population, 2008-2012 American Community Survey 5-Year Estimates
and compared it to
PCT3. SEX BY AGE. Universe: Total population. 2010 Census Summary File 2
both at the ZCTA level

Here is a comparison of the totals

ACS 2010 Census
State, Total 19,397,882 19,375,158
State, under 5 1,159,733 1,155,708

Pretty close you say? Well..... it is, at this level, but lets compare data from ZCTAs, looking at the category of under 5 (I use this age group a lot)
ZCTA .........ACS ..........2010 Census
10001........686.............624
10002........3271...........3620
10598........1260...........1481
11713........671.............702
11941........52...............98
12428........425............437
12429........0................11
12430........51..............67
13203........938...........1177
13208........2079.........1912
14456........952...........1042
14462.......18..............35
14464.......409............417

Well, in this non random sample, the numbers are generally in the same range at least. So if I'm just looking at population, at the ZCTA level, I'm not sure there is much advantage to using the ACS. Obviously, too, my sample isn't very random. At the state level, the ACS estimate is slightly larger than the 2010 Census estimate. But in most of the ZCTAs I chose (at least randomly to me), most of the 2010 Census estimates are larger than the ACS estimates.

I guess the question is, should I expect close correspondence between the ACS and Census 2010, and if not, why not.

Thanks

Gene
(sorry for the dots. I can't seem to get data tables lined up any other way)
  • Remember that the 2008-2012 data includes data for 2008-2010 that were based on estimates of population prior to the actual 2010 census counts. When those get averaged in with more recent estimates, it is likely that there will be significant differences from the 2010 counts.

    As the ACS website notes, the population figures in the ACS data are generated from the population estimates program, which generally works at county and higher geographies. We are encouraged to use ACS data for characteristics and not for population counts. While those two items are linked, it is good caveat to keep in mind.
  • You must keep in mind that the data is collected continuously. About 1/5 of the sample was collected in 2008, 1/5 in 2009 and so on. The Bureau stresses that it is not an average. It has a 5 year sampling frame. The sample size over the 5 years is still smaller than the sample that was used in Summary File 3 back in 2000. With the smaller sample size, up goes the margins of error. When you start slicing it to age groups for a ZCTA it gets much dicier.
  • Seems to me that the close correspondence at the state level of the figures for total pop in the 5-year ACS estimates for 2008 to 2012 and the 2010 decennial support the point made in the recent Webinar on using multi-year estimates. If the decennial year is the mid-year of the multi-year ACS series, then the ACS data can be taken as a close approximation of what the decennial long form data would have shown. Of course this works best for total pop at national, state and county levels where the ACS totals are controlled. It certainly does not work for pop by race for the AI/AN alone population where the ACS numbers are a clear and significant undercount.
  • The Census counts people on April 1st, whereas ACS counts people continuously, but the midpoint for the yearly ACS counts is July 1st, so in general, numbers won't match up exactly.
  • I know the ACS and 2010 census are different, but I'm wondering -how much- difference should I expect. They use different methods, different time periods, etc. Should they be in the same ballpark? And obviously at the ZCTA level there is more uncertainty for the ACS.

    Also, someone else mentioned this "where the ACS totals are controlled". What does that mean?
  • I think there is no good answer for "how much difference should I expect". I would expect the numbers to be close but I don't have a fomula to say what close is. But, I would check the margin of error if it is greatly different from the Census. Also, consider the area. Is changing quickly? Demographic swings usually are not that dramatic but it should be considered. Consider something more volatile like persons in poverty. If you looked at the 2010 ACS 5-year estimate, 3/5ths of the sample would be pre-Great Recession. You would intellectually know the number in poverty in the 2010 ACS 5-yr is lower than the actual number of persons in poverty in 2010. This is a long winded way of saying the 5 year ACS is what it is. Try not to anchor it or compare it to other 1-year data sets (at least not so closely) you'll just get a headache.
  • IN the context of the population age 0 to 4, it is important to note that the 2010 Census undercounted this age group by nearly 5% and that undercount rate varies among states and counties.
  • This last point is an interesting one I have not heard before. Where can I get more information about 2010 undercount?
  • Gene,

    both sets of numbers are subject to various errors, and even the Census figure is not something written in stone. See the famous diagram from Groves (2004), poq.oxfordjournals.org/.../F3.large.jpg.

    The Census data is subject to undercoverage and non-response error on the representation side, and all the typical sorts of measurement error on the measurement side. People may have reported a different zip code because they pick up mail at the mail box, and only have a vague idea about their zip code; the zip data may have been missing from their Census form, and was imputed from the zip code where the form was mailed from -- but this could be on the respondent's way to work rather than their home zip; person's age may be based on the year of birth or an exact calculation on a given date; etc. Besides, the Census is collected in several modes -- mail, phone, in-person enumeration, and multiple modes may have their own unique measurement errors associated with them (although I am sure that the budget of the Census' program to control these mode effects is larger than a budget of most academic surveys or polls). I don't know the magnitudes of these errors -- the Census Bureau people would -- but I imagine that non-response rate is in single digit %, undercoverage is single digit % if not less, and various editing and imputation rates are in single digits, too.

    ACS, on top of that, is subject to sampling error. It would have been SO-O-O much easier to discuss this, Gene, if you supplied the standard errors from ACS along with your tabs. ACS has a slightly higher non-response rate of may be 3-5% on the representation side, and is also collected in multiple modes, which may add to the measurement error.

    Norm mentioned that the ACS totals are controlled, and you asked what that means. It means that the ACS totals are made to agree with externally obtained figures for certain geographies and certain demographic groups. The ACS 2010 may have been made to agree with Census 2010, although I would imagine that ACS 2010 was released before the Census data were, but let's just conceptually think that this was the case. For ACS 2009 or ACS 2011, you don't have the exact Census data to match ACS to, so you have to rely on demographic models of how population grew since the last census. I don't know neither the details of how exactly ACS is weighted, nor how these demographic models work (although I probably should, and this might be an interesting topic for a webinar). I imagine that the demographic models work at a relatively high level of aggregation, and that ACS has hard controls at the national level for all the demographic targets (i.e., controls that have to be satisfied exactly), hard controls for main demographic targets (like age, gender and race) at the state level, softer controls (which allow for some degree of error when the ACS totals are being made to align with the external totals) for auxiliary demographic targets (like education, or interaction of the main demographic variables) at the state level, and mostly soft controls at the county levels (except for large counties like LA, which is in itself is bigger than many states in North East or in the mountains). You definitely cannot calibrate the weights at the ZCTA level as this would just blow the weights up; you cannot produce stable weights unless you have counts of a hundred or so in a given adjustment cell. So as a result what you see at ZCTA level is a reflection of the adjustment processes that occur at higher levels (think shadows in Plato's cave), and hardly account for whatever may be happening at a very local level (e.g., a substantial new development that could shake the zip code boundaries). Thus on top of the representation error (sampling + non-response) and the standard measurement errors, you have additionally the demographic model error as a source of the total error in ACS (which would be the editing error in Groves' diagram). While this error is actually made to be anti-correlated with sampling error, reducing the total error, it is still there. You can sort of gauge the degree of magnitude of the demographic model error by tracking the history of the population projections. E.g., look at the population projection for 2010 that was made say in 2008, and compare that to the actual population count from the Census. I would not expect them to be hugely off, but I would not be surprised if they were say about 0.something% off for nation wide, and maybe 1-3% off for states.

    I don't really have much to say about the systematic errors, but the sampling error can be discussed a little bit. As a very crude measure, let's think of ACS as 5% sample of the population. Then a larger domain count of 2000 is based on a sample of about 100, or rather a binomial random variable with n=2000 and p=0.05, which has a mean of 100 and a standard deviation of 10, or 10% of the figure itself. Thus the count of 2000 is associated with a standard error of 10%, i.e., 200 on the count scale, or the margin of error of +/- 400. Viewed from that angle, the numbers between ACS and Census are in SPECTACULAR agreement. In truth, calibration of ACS weights to the Census target controls removes a lot of that sampling variability, so probably for the count of 2000, you may see the ACS standard error of 3-4%, or 60-80 in absolute units, rather than 200. Give 1% or so for unknown systematic biases, and that's still a remarkable agreement.
  • Perhaps one could apply a difference-of-means test to see if there is a significant difference between Census- and ACS-based estimates of a given variable.
  • Gene: Your question was whether you should expect close correspondence between ACS 5-yr 2008-2012 and decennial census. Several prior respondents have pointed out the difference between a point value (April 1, 2010) and an interval average (5 yrs: 2008 thru 2012). Also that in general we are advised to use decennial census data or census estimates for population counts. Without looking deeply I was more struck by the similarity than the discrepancy. When I looked up the MOEs, I calculated the MOE for Male and Female for ZCTA 10002 as about 300 (picked largest tract). The difference between the two figures, which really shouldn't be compared is about 350 and that suggests to me with all the other factors previous respondents had mentioned, there was more agreement than disagreement among the figures. Frankly, I thought it was interesting the two numbers for most ZCTAs you tabled were so close.

    stan
  • Regarding Cliff's question about where to find more information about the undercount of children... Here are two useful primers on the subject:
    www.aecf.org/.../final census undercount paper.pdf
    and
    www.copafs.org/.../june_2013.aspx (last presentation link on that page)
  • These two documents are quire helpful for understanding the issue. Thanks.