Forecasting race/ethnicity proportions at city scale

Hello, first time poster here, interested in forecasting.

I am working with a city-sized population (~150k residents, ~30 tracts) in the San Francisco Bay Area, for which I can retrieve ACS population estimates by race/ethnicity (from B01003) for a couple of overlapping intervals (2011-2015, …, 2014-2018).

The proportions are fairly stable, but I’m more interested in the future — specifically, in 2021-2025 — than the past.

As an R user, I could try to adapt some generic forecasting tools, like those in Rob Hyndman's {fable} package, to generate estimates for the near future. But should I be guessing at which tools and approaches to use, or has someone already done a pretty robust job somewhere else that could be adapted or applied? (Note: not asking for R code, just recommendations for an approach.)

Methods that would generate predictive intervals, or at least some metric of uncertainty/confidence, would be ideal. All suggestions would be welcome! I'm hoping to be able to apply the same technique(s) to other city-sized regions in the future, even after Decennial counts become available.

Thank you in advance for your time and attention.

Very best,

  • Hey David:

    Interesting question! There are a few different techniques you could try -- one would be some sort of linear-based forecast model to predict total population (from pop estimates) and ACS-derived proportions using single-year data (for city at large). The best reference manual for possible approaches is Smith, Tayman, and Swanson's State and Local Population ProjectionsMethodology and Analysis.

    For predicting demographic composition in small areas, I have a strong preference for some variant of the Hamilton-Perry method (see this recent article by Matt Hauer for more detail on Hamilton-Perry and some modifications to adjust for certain challenges: In my own work, I generally apply the predicted composition to a separately projected population total.



  • Welcome to the group David! 

    I would start by developing a conceptual model of what is driving demographic changes in this community.  Is it the birth rate of different race/ethnic groups?  Death rate? Immigration and emigration rates? General economic growth?  Housing availability?  Transportation?  You could use historic data to calculate trends for the important components.  Then model various scenarios for how policy changes might impact the historic trends.  

  • Hi David -

    I second all of the recommendations above, but also wonder: Have you talked with ABAG about their projections? Do they not have what you need?

    I do some California-specific forecasting work, so I'm happy to discuss short-range projections data sources in more detail if you'd like. Email bjarosz (at) prb (dot) org


  • Hi , that is great advice. After you mentioned ABAG, I dug up some copies of tract-level ABAG forecasts. They’re vintage 2013 and aren't split by race/ethnicity. But I will definitely check totals against them, or (maybe?) better yet, use them as input to a model. Following up with other folks in the Bay Area Regional Collaborative (BARC) is on my to-do list! The agency I work for (BAAQMD) is a member, and we had all been working in the same building pre-pandemic. I'd spoken a year or more ago with a helpful demographer at MTC and was going to reach out to him too.

  • Hi , and . Thanks for the helpful tips! Just adding a couple more thoughts and background here.

    's mention of a conceptual model reminded me that, in the face of an exogenous shock like the current pandemic, any near-term forecasting I do is probably going to be a bit too “business as usual” no matter what. Given that, maybe today's best bet is a really simple and transparent estimate, coupled with strong caveats about the model/reality mismatch? Either way, it's a good reminder that I really should be learning more about what forecasters used to think would be driving change in this city over the next decade or so. The city's 2030 general plan is short on technical details, but I'm continuing to poke around.

    My main concern, pre-pandemic, was in properly reflecting foreseeable gentrification. I’m not a domain expert, and have no sound idea of what the driving factors are, or have been, in the particular region in question (Richmond/San Pablo). Economically driven displacement, I would guess, is increasingly salient? 

    For what it's worth, my team has also been using Woods-and-Poole forecasting as implemented by EPA’s PopGrid tool. The base year for those forecasts is currently 2010, though (decennial block counts). It matches up fairly well at county scale vs ACS blockgroup-level estimates for 2013-2017, but I don’t know its reliability vs alternatives at city scale. It's part of the BenMAP package promulgated by EPA, so it's very defensible by citation, which, for a gov't agency, is meaningful.

    Thank you all for the dense material to chew on! The Nature article is a bit over my head at the moment, but I'm happy to try and learn new terms of art.