Large Difference in Ward Population with ACS v. Census

kaireky over 2 years ago

This is my first time trying to apply ACS data to a separate geography. I'm working on trying to get various ACS data (home ownership, incomes, etc.) for Chicago's new wards. I'm using R and set up a population weighted interpolation using 2021 ACS tract-level data and weighting by the 2020 block population. I ran it with just a general total population number (DP05_0001), and I found the ward populations were substantially off from the those presented with the redistricting (wards varied approx. -8%-15%).

To make sure it wasn't me doing something wrong, I ran the same tract level interpolation with tract-level population data from the 2020 Census. The numbers are almost identical to those presented with the redistricting. So at least I know it's the data and not me coding something wrong. The city-wide 2021 ACS population is about 1.5% lower than the 2020 census.

My questions is this: is this large a difference normal and (if knowable) which would likely be more accurate? I know differences in who is counted as a resident, methods, etc. will cause the Census and ACS data to not match, but I was just surprised at just how far off it was. It makes me question if either the analysis I am doing is really valid, or alternatively the entire redistricting process was entirely unreliable.

Tim Henderson over 2 years ago

I would use 2020 blocks from PL94 to assemble the wards. With tracts that overlap borders you’re leaving too much to chance, hoping population is distributed consistently and it’s not always. It sounds like you have shape files of the wards so you can just overlay them on blocks to get the right block lists —Tim Henderson
Cancel
Up 0 Down

Reply

Cancel
kaireky over 2 years ago in reply to Tim Henderson

I'm not just trying to get population, I was using it initially to make sure I was coding things right. If I want to use ACS data, I can't build up from individual blocks. I get the issues inherent with using tract data with the population spread. I'm using `interpolate_pw` from the `tidycesus` package to mitigate that some (of course knowing nothing is perfect). If I run that with the PL94 tract population, I get almost the exact same populations as if I had built it up with blocks.
Cancel
Up 0 Down

Reply

Cancel
Matt Herman over 2 years ago in reply to kaireky

If you’re using tract-level data, these will be 5 year estimates rather than 1-year. So it’s possible some the differences you’re seeing compared to the 2020 decennial is based on the 5-year period covered by ACS
Cancel
Up 0 Down

Reply

Cancel
Tim Henderson over 2 years ago in reply to kaireky

I can't think of a way to shoehorn tract data into shapes that don't reflect it. You can get lot of information from 2020 blocks including detailed race/ethnicity and 18-over age breakouts. Maybe you could take the tracts that make up the bulk of the city and label them, like "far northwest" or "central" and look for trends that will likely affect the wards in that vicinity -- I'm sure you could get some insights that way!
Cancel
Up 0 Down

Reply

Cancel
kaireky over 2 years ago in reply to Matt Herman

I suppose that could be some. It would require a lot of population shuffling in a small amount of time. The population differences in the underlying tracts vary more than I would expect. Only 65% are within + or - 10%.
Cancel
Up 0 Down

Reply

Cancel
Tim Henderson over 2 years ago in reply to kaireky

There’s no population shuffling — There’s just the block based 100% count and then 2021 and 2022 estimates by county. There’s no new measurement for blocks. If guesstimating population changes from tracts isn’t working for you then other tract data will present the same issues.
Cancel
Up 0 Down

Reply

Cancel