I am running a basic multiple regression model with Unemployment proportion (logit transformation of it) at block group level (from ACS 5-year data) as the response variable. In each of the ACS 5-year releases of the past 7 years (2013 to 2019) Unemployment counts are estimated to be 0 in more than a few block groups (resulting in 0 unemployment proportions for those said block groups) and there are also a few block groups that result in relatively higher unemployment counts. Because of this, my model fit is quite ugly and I am trying to brainstorm ways to solve this issues. I would really appreciate some references if anyone is familiar with this kind of a situation. Thank you for your time.
Aggregating the data to larger geographic areas is one potential solution if you are encountering large margins of error or zero values. Seth Spielman also described some potential strategies/pitfalls related to data aggregation in a 2015 paper:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4344219/
Thank you so much! Is it also safe to assume that there are probably not a lot of instances where Block Group level estimates from ACS 5-year summary files are used in modeling because of their high MOEs?
In general you should use Poisson regression for count data with small counts rather than logistic regression. What is the scientific question that you are trying to answer ? What are your covariates in your regression ? You might also consider loglinear models which are related to "raking" and the iterative proportional fitting algorithm.