Multiple regression model with unemployment proportion as response

I am running a basic multiple regression model with Unemployment proportion (logit transformation of it) at block group level (from ACS 5-year data) as the response variable. In each of the ACS 5-year releases of the past 7 years (2013 to 2019) Unemployment counts are estimated to be 0 in more than a few block groups (resulting in 0 unemployment proportions for those said block groups) and there are also a few block groups that result in relatively higher unemployment counts. Because of this, my model fit is quite ugly and I am trying to brainstorm ways to solve this issues. I would really appreciate some references if anyone is familiar with this kind of a situation. Thank you for your time. 

  • Aggregating the data to larger geographic areas is one potential solution if you are encountering large margins of error or zero values. Seth Spielman also described some potential strategies/pitfalls related to data aggregation in a 2015 paper:

    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4344219/ 

  • Thank you so much! Is it also safe to assume that there are probably not a lot of instances where Block Group level estimates from ACS 5-year summary files are used in modeling because of their high MOEs?

  • In general you should use Poisson regression for count data with small counts rather than logistic regression.  What is the scientific question that you are trying to answer ? What are your covariates in your regression ? You might also consider loglinear models which are related to "raking" and the iterative proportional fitting algorithm.