I'm trying to answer a very simple question: by how much has the under 18 population in San Francisco changed since the pandemic.
The 2020 census redistricting data gives the population of San Francisco at 873,965, of which 760,738 were 18 years and over. That implies the U18 population was 113,227.
Table P12 in the Demographic and Housing Characteristics gives the same total and breaks it down by age and sex. So far, so good.
My understanding is that the Population Estimates Program is the place to go for intercensal estimates of the population count.
I downloaded the latest vintage age/sex breakdown by county. Here's the San Francisco data for select age buckets.
The total population figure in the base year is very close to the census figure, which is what I'd expect. Notice that the 0-17 population in the base year is estimated to 118,989 whereas the census gave a figure of 113,227. That's a 5% difference which is way beyond any margin of error.
Meanwhile, the ACS 1-year estimates are consistent with the population estimates:
Questions:
1) Why is the base year U18 population estimate so different from the census figure (118,989 vs 113,227)? Isn't the former supposed to be based on the latter?
2) Which number should I use? I want to use the 2020 census figure because then I've got a consistent series going back decades but comparing the census figure with the 2023 estimates implies that the U18 population fell by only 3,000 which is not credible for two reasons.
3) Even if we forget about the census completely, I struggle to believe the age breakdown of the population change implied by the ACS surveys and population estimates. Are we really to believe that a pandemic that killed mainly old people and led to 70,000 people leaving the city somehow caused the population 65 and over to increase by 8,500 and the population aged 75-79 to increase by nearly 25% in three years? If true, I would have expected this influx of retirees to be covered in the local media.
In addition to the information Todd shared, I'll add two points that I think may help.
First, you are right that typically the decennial census forms the basis for the population estimates. But you'll recall that 2020 was not "typical" in any way. In particular, there were challenges with using 2020 Census as a base for the population estimates program that included, but were not limited to... the undercount of young children, the timing of data availability, the implementation of new disclosure avoidance procedures, and more.
So pop estimates incorporates *some* 2020 Census data, but not all, into a "blended base" that incorporated the strengths of several data sources. By all measures I've seen, the estimates are more accurate for age structure. For more info about the "Blended Base" these resources may be helpful:
https://www.census.gov/library/visualizations/2022/comm/creating-the-vintage-2021-blended-base.html
https://www2.census.gov/about/partners/cac/nac/meetings/2022-05/presentation-blended-base-for-population-estimates.pdf
https://www.ctdata.org/blog/pop-est-blended-base
Second... For aging population in California, we are seeing rapid aging across the state. Because there's no population register, it can be difficult to confirm a change for a specific demographic group mid-decade, but the population age 75+ rose by more than 35% statewide between 2010 and 2020 and that was BEFORE the Baby Boom reached age 75. There is no "influx" of retirees. People are just getting older.
If what you really want is information about what's going on with child population mid-decade, birth data can offer clues. The number of births in SFO dropped rapidly (8,691 in 2018, falling to 6750 in 2023). Again, it's difficult to verify the specifics of a change for a group mid-decade, but a smaller child population is consistent with what we'd expect to see given demographic trends.
So, as with Todd, I cannot say which dataset you "should" use. But I will offer that using the blended base estimate does not necessarily "break" your time series.