I'm trying to answer a very simple question: by how much has the under 18 population in San Francisco changed since the pandemic.
The 2020 census redistricting data gives the population of San Francisco at 873,965, of which 760,738 were 18 years and over. That implies the U18 population was 113,227.
Table P12 in the Demographic and Housing Characteristics gives the same total and breaks it down by age and sex. So far, so good.
My understanding is that the Population Estimates Program is the place to go for intercensal estimates of the population count.
I downloaded the latest vintage age/sex breakdown by county. Here's the San Francisco data for select age buckets.
The total population figure in the base year is very close to the census figure, which is what I'd expect. Notice that the 0-17 population in the base year is estimated to 118,989 whereas the census gave a figure of 113,227. That's a 5% difference which is way beyond any margin of error.
Meanwhile, the ACS 1-year estimates are consistent with the population estimates:
Questions:
1) Why is the base year U18 population estimate so different from the census figure (118,989 vs 113,227)? Isn't the former supposed to be based on the latter?
2) Which number should I use? I want to use the 2020 census figure because then I've got a consistent series going back decades but comparing the census figure with the 2023 estimates implies that the U18 population fell by only 3,000 which is not credible for two reasons.
3) Even if we forget about the census completely, I struggle to believe the age breakdown of the population change implied by the ACS surveys and population estimates. Are we really to believe that a pandemic that killed mainly old people and led to 70,000 people leaving the city somehow caused the population 65 and over to increase by 8,500 and the population aged 75-79 to increase by nearly 25% in three years? If true, I would have expected this influx of retirees to be covered in the local media.
I neglected to add that the apparent increase in the old age population is not confined to San Francisco. According to the population estimates, across all of California the 75-79 population increased 19% between 2020 and 2023. The 80-84 population apparently increased 12%. Can these figures really be right?
hi Paul--
I'll try to answer your questions.Yes, there are multiple annual population estimates datasets.
I will list four products. But one of these is not yet available -- 'for reasons'. And so, Postcensal Estimates from Census PEP is probably a best option.
Four products:
1. Postcensal Estimates from Census PEP. These are built up, since the previous decennial census, with annual demographic progression. i.e. 2010 is the 'base year' for 2011-19 estimates; 2020 is the base year for 2021-29. Because of that there can be a time-series break, or 'cliff', at the decennial year. So, interpret cautiously!I am simplifying in this description -- there's far more could be said. See https://www.census.gov/programs-surveys/popest/guidance.html
2. Intercensal Estimates from Census PEP. This is (or will be) a subsequent product in which Census PEP tries to reconcile and smooth out the annual time-series. Unfortunately, Census PEP has not released this product yet. And... does anyone know the release date? (Census website says "Fall 2024"...)
3. Population data in ACS. This should be only minimally/minorly different from the Postcensal Estimates already mentioned, because Postcensal Estimates provide ACS's population totals. But, a difference from product 1 above, Census Bureau does *not* revise the ACS stats once they're published. Postcensal Estimates from PEP can be revised, and are revised as data sources, improve and trends become clearer.Again, I am simplifying in this description.
4. Annual Population Estimates (typically postcensal) from your State Demographer or State Population Analyst. Many states want to be more in control of the population data they use, so they have their own population estimates products -- 'for reasons'. The bad news is: it's only half the states that have the state products; and they're mostly concerned with total population, households and housing totals. Very few (there are some) are parsing the estimates into age groups or other demographic categories.
You asked some questions: Which of these to use? The answer is: it depends. If your inquiry includes change over time -- and extending across the decennial census timepoint, then I feel like you want the product with the least amount of time-series disruption. You'll need to figure that out for your case.
You also asked if the speed of senior citizens increase is 'believable'? In my view, yes -- what's not to believe? There’s a substantial aging wave underway, mostly during 2010 thru 2030. In the metro where I am, we project ~ 10% increases annually. This is not due to migration here – this is due to Baby Boomers (and subsequent generations) aging into the 65+ and 75+ brackets. It's due the massive size of the Baby Boom generation (and subsequent generations), and the fact that Baby Boom is the first cohort to enjoy the long life expectancies that came out of 20th century public health and medical advances.
That's a long answer -- hope it helps.
--todd graham principal demographer, Metropolitan Council of the Twin Cities (METC)
1. Estimates base is based on 2020 count, but a few things can contribute to differences. Programs like Count Question Resolution and Post Census Group Quarters Review create data that is incorporated in the Population Estimates Program (PEP) 2020 base. Unfortunately, the CQR and PCGQR results are not incorporated in the redistricting data or the DHC data. Separately, each time the PEP issues a new vintage of estimates, the PEP must inject Differential Privacy noise in the base, so each vintage will have a different base, a different launching point for the estimates. Assuming severe issues with CQR and PCGQR and DP noise, the differences generally should not be of the magnitudes you describe, but outliers should be expected.
2. When comparing population estimates to any other population figure, it's best to use the estimates base because that's what the estimates are based on. In some sense, the estimates base is based on the redistricting data/DHC data, but to the degree that the estimates base differs from redistricting and DHC, the PEP estimates base is the one to use. A whole separate issue is that the vintage 2019 PEP data was based on the 2010 Census, so the vintage 2019 PEP data is expected to be badly mismatched with the vintage 2023 data. If you only needed total population (no age detail), then you'd do well to wait until the November 7 intercensal estimates of total population (basically, 2010-2019 estimates will be revised). Intercensal age detail should be in the works, but it's not clear when or whether the sex, race, Hispanic detail will slow down the release of the age detail. (Some users rely on age detail much more than sex or race or Hispanic details.) It's not hard to imagine the number of children enrolled decreases much more than the number of under-18 residents decreases. The pandemic made school feel faraway and optional.
3. Regarding the 75-79 and 80-84, something seems off. The magnitudes you describe are larger (and over a shorter period of time) than would be expected as a result of aging baby boomers. The oldest (born in 1946) would be turn 77 in 2023, so some increase in the 75-79 population might be expected, but 19% in 3 years seems a bit much and baby boomers are not yet turning 80.
In addition to the information Todd shared, I'll add two points that I think may help.
First, you are right that typically the decennial census forms the basis for the population estimates. But you'll recall that 2020 was not "typical" in any way. In particular, there were challenges with using 2020 Census as a base for the population estimates program that included, but were not limited to... the undercount of young children, the timing of data availability, the implementation of new disclosure avoidance procedures, and more.
So pop estimates incorporates *some* 2020 Census data, but not all, into a "blended base" that incorporated the strengths of several data sources. By all measures I've seen, the estimates are more accurate for age structure. For more info about the "Blended Base" these resources may be helpful:
https://www.census.gov/library/visualizations/2022/comm/creating-the-vintage-2021-blended-base.html
https://www2.census.gov/about/partners/cac/nac/meetings/2022-05/presentation-blended-base-for-population-estimates.pdf
https://www.ctdata.org/blog/pop-est-blended-base
Second... For aging population in California, we are seeing rapid aging across the state. Because there's no population register, it can be difficult to confirm a change for a specific demographic group mid-decade, but the population age 75+ rose by more than 35% statewide between 2010 and 2020 and that was BEFORE the Baby Boom reached age 75. There is no "influx" of retirees. People are just getting older.
If what you really want is information about what's going on with child population mid-decade, birth data can offer clues. The number of births in SFO dropped rapidly (8,691 in 2018, falling to 6750 in 2023). Again, it's difficult to verify the specifics of a change for a group mid-decade, but a smaller child population is consistent with what we'd expect to see given demographic trends.
So, as with Todd, I cannot say which dataset you "should" use. But I will offer that using the blended base estimate does not necessarily "break" your time series.