Hi all,I'm a graduate student researcher in computer science. I am part of a project that is exploring using a collaborative framework for a predictive model on ACS data - we are trying to collaboratively predict personal income given non-income responses to ACS 1-year (focusing on just PUMS data from Massachusetts). I think some of you may find this project interesting, and I would love for people to check it out and contribute to the project! So far we have received contributions from 11 people around the world. We are also running a research study as well to evaluate how the framework supports collaborative modeling, which you are invited to join as well if you are interested. Full description is below.
Do you want to use your data science skills for good? Collaborative, open-source projects that create a machine learning model could have a significant impact in civic technology, social sciences, public health, and more.
I'd like to invite you to join a collaborative data science project using an experimental software framework called Ballet.
Your task will be to write a feature definition that can be used to predict personal income from raw survey responses to the US Census American Community Survey. The model built from features submitted by the community can then be used to optimize administration of the survey, direct public policy interventions, assist empirical researchers, and more.
This task is expected to take from 30 minutes to 2 hours. You will also be asked to complete a short survey about your experiences and may additionally be contacted to have a short interview with researchers. If you complete the study you will be entered in a raffle for a small Amazon gift card as a token of thanks for your time and effort.
To participate, please go to https://dai.lids.mit.edu/ballet-study and follow the instructions there by October 3. You should preferably have basic experience in Python programming and data science development.
You can also check out the project without signing up for the study here.
This would be a very difficult time to predict income based on ACS or any pre-pandemic data -- you might look at the CPS basic monthly survey, out through August now. It's the source of monthly employment data. It has some clues, but it's very hard to read income right now. Would love to hear any other ideas.
CPS monthly basic microdata:
employment situation from BLS based on CPS
Micah, keep an eye on student paper competition by the Survey Research Methods Section of the American Statistical Association for the 2021 Joint Statistical Meetings. This would make a strong contender. See e.g. this year's announcements http://www.asasrms.org/travelapp_2020.pdf and www.amstat.org/.../Student-Paper-Competition-Travel-Award-to-Attend-the-Joint-Statistical-Meetings.aspx
Hi Tim Hendersonthat is a good point about the difficulty predicting income in March-now data -- thanks for the pointers! For the exercise we are fixing the development data set to a previous year's data (2018), but a full evaluation of our model will be to see how it performs on unseen data like the ACS responses from pandemic months. Yes, one way of improving the model would be to merge in these other data sources, this would be a great feature for someone to contribute
Thank you Stas Kolenikov we will definitely look into this!
Since you are focusing on Massachusetts you might want to send information about your project to Tim Reardon and Jesse Partridge at the Boston Metropolitan Area Planning Council (MAPC). I think they might be interested in your effort.
Hi Cliff Cook thanks for the pointer, I was able to find their contact info from the MAPC website. Okay to say that you sent me in their direction?
If you have questions about the data, you can contact us at the Massachusetts State Data Center. Mike, the SDC analyst, is out this week, but we are usually able to provide prompt replies to questions.
Thank you C. Bernstein! And please feel free to pass the information about the study to your colleagues as well.