I'm a graduate student working with ACS data for the first time, so I apologize in advance for any basic questions!
I'm working with tract-level median household income data from the 2021 ACS 5 year estimates for the city of San Francisco. 7 of my 240 focal census tracts are missing estimates (from what I can tell from the documentation, likely because there were too few survey respondents), and I want to fill in the missing data for the purpose of my analysis.
My original thought was that I would fill in from past year's 5 yr estimates for the same tracts, applying a correction to account for inflation--ie I would fill in from 2020 estimates, and if that was missing, from 2019 estimates, etc. However, I was warned by a colleague that I should thinking about spatial correlation and not just temporal correlation.
Is there a best practice method for filling in missing ACS tract level data? Any advice would be much appreciated!
Your intuition is right. Median HH income might be missing (1) because Census redacted it, because of excessive MOE. (2) The other reason for a missing stat would be: if the tract has no households…
One of my favorite expressions is "The perfect is the enemy of the good." It holds especially true when dealing with estimates.
Your intuition is right. Median HH income might be missing (1) because Census redacted it, because of excessive MOE. (2) The other reason for a missing stat would be: if the tract has no households.
In the first situation, I like your solution: Fill in a value from a prior year. The tracts data was collected during a 5-year survey window, so the 2016-20 stats and the 2017-21 stats should be in the same ballpark. (of course: the 2016-20 data may have been redacted for the same reason, excessive MOE.)
So, I'll offer my approach to creating placeholder values -- as ugly as it is. (Dear readers, do not @ me complaints.) For placeholder values, analyze table B19001 : number of households in sixteen income levels. Analyze that table to find the 50th percentile category among estimated households. Example: Tract 27145011600 does not have a published median HH income. But table B19001 shows the median is in the range 75,000-99,999. I will use 87,500 as placeholder. (Or you can imagine a fancier approach.)
It's an ugly approach I'm suggesting. But if you really! need placeholders for your 7 missing tracts, consider it.
Can we belabor the missing tracts for a moment? Reason 2? When you say no HHs in the tract do you mean samples? For special tabs and in some of the early ACS disclosure literature the Bureau uses the phrasing, "3 or more cases are required to publish a cell in a tab" for medians. I call it the 'rule of three' and understand it to be 3 HHs had to respond to the ACS or else the data is suppressed. If this is the case I would be concern a little about what my tracts looked like that were getting suppressed. With only 7 I may go take a drive around to get a better feel for what may be missing while contemplating my adjustment approach. Is my assumption about 3 unweighted records the trigger besides the MOE?
hi Ed-- Some states have tracts with 0 households; usually the tracts are open water. And the tract ID # includes string "99".
For example, three of our Minnesota counties have 1 tract each for the Minnesota part of Lake Superior. It's all water (ice at this time of year); no households.