I would like to use the 2017-2021 5-year ACS PUMS to analyze low-income households in Virginia at the PUMA level. I can use the federal poverty measure to identify low-income households, but a major drawback is that the federal poverty guideline doesn't account for the wide variation in living costs across different areas in Virginia. It also underestimates the real cost of living for most households.
I would like to use the ALICE (Asset Limited, Income Constrained, Employed) standards developed by the United Way, which vary by county FIPs. (For more information on ALICE, see file:///C:/Users/ebeecroft/Downloads/19UW_ALICE_Project_Methodology_2019_06_17.pdf
The challenge is creating ALICE values for PUMAs that include more than one county FIPs. My first thought is to combine the ALICE values for each fips using population in each fips as weights. For example, if PUMA X includes two fips, one of which has twice as many people as the other, then we create a value for ALICE for PUMA X = .666*ALICE value for fips 1 + .333 * ALICE value for fips 2.
Has anyone tried to do this? Even if you haven'yt do you have any suggestions for a better approach?
I'm sorry for the bad link to ALICE. See https://www.unitedforalice.org/methodology
Do you want to know how to create weighted statistics from tract level data ? I use PUMS data to create a model (loglinear) from a multivariate table based on PUMS variables at the PUMA level . I then adjust the model based on various marginal tables from the tract level ACS tables (synthetic estimate). (reference Discrete Multivariate Analysis - Theory and Practice, Bishop, Fienberg Holland (Find a reference on Small Area Estimation - synthetic estimates) This would allow you to get tract level statistics for the ALICE estimate You can then combine tracts to get county level estimates using the ALICE formula estimate. I would need to hear more about how the ALICE estimate is constructed. This analysis can be done in R using the survey and mipfp packages.
All this allows you to get estimates for counties that have PUMA "components' that span multiple counties. This requires a fair amount of code and the estimates need to go through a "quality check" process to make sure that you don't make an error in your program.
Your suggestion to take population weighted linear combination is a "quick and dirty" way to do this. However the parts of the PUMAS that intersect/cross the county boundary need to be relatively homogeneous across the county boundary. The only way to test this assumption is to get tract level estimates as above. The estimates above adjust for covariates from the marginal ACS tables.
PS if someone has an easier way to do this -- give us a shout out.
I just looked at reference: file:///home/dorer/Downloads/2020ALICE_Methodology_FINAL.pdf it only mentions ACS 1 year tables:
B18101 SEX BY AGE BY DISABILITY STATUS
B18106 SEX BY AGE BY SELF-CARE DIFFICULTY
in the references.
They also mention using medicare BLS IRS etc data but this data doesn't go down to the tract level.
As a minimum I adjust for the 3-way marginal age x sex x race. It looks like they add Disability status and Self-Care difficulty but not race (a big problem when you are looking at poverty statistics).
In any case the "methodology" has no statistical references and does not give the model that they are using.
I checked out some of the PhD members on the Advisory Committee and the PhD's on the author list and there are no statisticians. I would email them and ask for their statistical methods.
Thank you David. Your thoughts on a possible approach are very helpful.
I am new to ACS PUMS data, but I think understand the gist of the approach you laid out. I didn't realize that it is possible to create synthetic tract-level data from ACS tables. I need to read more about this, and preferably find a worked example.
The CDC has a program to get tract level data from the BRFSS county level health data. (Places database). You can do the same thing using a PUMA. Follow up that by combining tract level data to get the county level results.https://www.cdc.gov/brfss/data_documentation/index.htm
There is a paper that gives the model details. I can't seem to locate it but look in the paper (last one listed above) and the paper should be in the references. People have posted R code on line for problems like yours. I asked the CDC for their SAS code for the Places database but they wouldn't share it so you have to program from the methods paper. It is in:
PREVENTING CHRONIC DISEASE PUBLIC HEALTH RESEARCH, PRACTICE, AND POLICY a CDC journal
PS is this you ? https://www.linkedin.com/in/erik-beecroft-612a0111/
My profile is https://www.linkedin.com/in/david-j-dorer-220061139/
if you want to share emails, email firstname.lastname@example.org and I'll reply with my email address.
My degrees are in Mathematics. My post doc is applied math.
Thank you again David.
Yes, that's my Linked-In page. My email is email@example.com
I will look at the links you sent, and see if I can make sense of it. You have been extremely helpful.