I'm a graduate student working with ACS data for the first time, so I apologize in advance for any basic questions!
I'm working with tract-level median household income data from the 2021 ACS 5 year estimates for the city of San Francisco. 7 of my 240 focal census tracts are missing estimates (from what I can tell from the documentation, likely because there were too few survey respondents), and I want to fill in the missing data for the purpose of my analysis.
My original thought was that I would fill in from past year's 5 yr estimates for the same tracts, applying a correction to account for inflation--ie I would fill in from 2020 estimates, and if that was missing, from 2019 estimates, etc. However, I was warned by a colleague that I should thinking about spatial correlation and not just temporal correlation.
Is there a best practice method for filling in missing ACS tract level data? Any advice would be much appreciated!
Hi Kelley--
Your intuition is right. Median HH income might be missing (1) because Census redacted it, because of excessive MOE. (2) The other reason for a missing stat would be: if the tract has no households…
One of my favorite expressions is "The perfect is the enemy of the good." It holds especially true when dealing with estimates.
Your intuition is right. Median HH income might be missing (1) because Census redacted it, because of excessive MOE. (2) The other reason for a missing stat would be: if the tract has no households.
In the first situation, I like your solution: Fill in a value from a prior year. The tracts data was collected during a 5-year survey window, so the 2016-20 stats and the 2017-21 stats should be in the same ballpark. (of course: the 2016-20 data may have been redacted for the same reason, excessive MOE.)
So, I'll offer my approach to creating placeholder values -- as ugly as it is. (Dear readers, do not @ me complaints.) For placeholder values, analyze table B19001 : number of households in sixteen income levels. Analyze that table to find the 50th percentile category among estimated households. Example: Tract 27145011600 does not have a published median HH income. But table B19001 shows the median is in the range 75,000-99,999. I will use 87,500 as placeholder. (Or you can imagine a fancier approach.)
It's an ugly approach I'm suggesting. But if you really! need placeholders for your 7 missing tracts, consider it.
--Todd Graham
Hi Todd,
Thank you for your response and for your advice! I checked and one of my tracts does indeed have no households, so I'm not worried about filling it in, but the remaining 6 all have over 500, so I think they're probably case 1. I'll check out your approach to filling them in!