Does anyone know if data is available for county-level ratio to income to poverty by race? I did find Table S1703 but it only goes up to 125% FPL and I need 200% FPL.
Dear Derek The brief outline of the R code is to use the PUMS variable POVPIP (percent of poverty level) to cut the data at 200% of poverty level. When you do this you get no missing values, which is not…
The only way I've been able to do this is build a model with age - sex - race -poverty level (above/below/undefined) and poverty200 (above/below 200 FPL) at the PUMA level (large area) using pums data for a PUMA containing the small area geography (in my case a tract) . The PUMS variables are POVPIP RACIP (and possibly other variables such as SEX AGEP) and then use the B01001* tables at the tract (county) level to create a 3 way marginal tabulation of age - sex - race at the tract (county) level. Last but not least adjust the PUMA model to the age - sex -race marginal and the B17001 poverty based marginal at the tract (county) level. I have R code that does this. If anyone can think of an easier way to do this - join in ! (see thread on poverty level via B17001 and PUMS for more details)
Another thought. PUMAs may or may not cross county lines. For example Suffolk County MA comprises exactly 5 PUMAs. But Norfolk County MA contains tracts where the corresponding PUMA crosses county lines. If the counties that you are looking at are like Suffolk county then you can just aggregate the PUMS data for those PUMAs and then do your calculation with the PUMS variables in the aggregated file. You can cut the POVPIP variable at 200% and include Race in your 2 way tabulation. If you are in the case where the relevant PUMAa cross county lines then you have to aggregate the relevant PUMS files and apply the "Small Area Estimation" technique described above. See the above referenced thread to see the details for using POVPIP (in combination with other variables) to calculate poverty in the same was as S1703 or B17001.
Thank you so much, David! I'm going to dig into this this week, but curious to know if you can share your R code?
Dear Derek The brief outline of the R code is to use the PUMS variable POVPIP (percent of poverty level) to cut the data at 200% of poverty level. When you do this you get no missing values, which is not correct. The universe for poverty level is "those for whom poverty is determined. See https://www.census.gov/topics/income-poverty/poverty/guidance/poverty-measures.html Here is some "pseudo code" The basic part is:
Poverty status cannot be determined for people in: Institutional group quarters (such as prisons or nursing homes) College dormitories Military barracks Living situations without conventional housing (and who are not in shelters)
Also unrelated to the householder children < 15 are not included Additionally, poverty status cannot be determined for unrelated individuals under age 15 (such as foster children) because income questions are asked of people age 15 and older and, if someone is under age 15 and not living with a family member, we do not know their income. Since we cannot determine their poverty status, they are excluded from the “poverty universe” (table totals).
At the PUMA level, using the PUMS data I was able to get close to the B17001 table (Above poverty level) using POVPIP >=100 by excluding some records:
related to head of household RELSHIPP
Exclude (not related) 34: Roomate/Housemate 36: Other nonrelative 37: Institution GQ 38:noninst GQ
Exclude institutional group quarters: TYPEHUGQ == 3
Exclude unrelated children < 15 (AGEP <15) related is the complement of "not related" as above.
I would note that there is a variance of college dormitories and Military barracks which are institutional group quarters (TYPEHUGQ). But the detailed group quarters designation only occurs in the PCT39 table which comes out every 10 years but goes down to the census tract level.. When I dropped non institutional group quarters I couldn't get the PUMS calculation to come close to the B17001 numbers for the PUMA that I was looking at (see another thread) For a census tract with a large college dormitory the PUMS based model at the PUMA level will not be a good fit at the census tract (or CSD) level and there is no way to test for this. (you could look at the corresponding PCT39 table so you can tell if there are problems) The group quarters type (detail) only occurs in PCT39 which comes out for the decennial census (2020 table not out yet).
In any case, I constructed a variable "poverty3" below poverty/above poverty/undefined. For the B17001 table I created the undefined category by subtracting the B17001 total line from the B01001 total line for the same geography.
You would use below 200% / above 200% / undefined
At the PUMS/PUMA level I used the R survey package and built a loglinear model (svyloglin) with the product term age*sex*race*poverty3 (and any other variable that you might like to include in the model employment status might be a good one)
At the tract level (you would be using the county level) I stacked the B01001* tables for the different race categories to get a 3 way marginal age x sex x race (ACS detail tables for other variables that you might want to include in the model -- employment status might be a good variable to have). I used the PUMA level model as the "seed" for the Ipfp function from the mipfp package require("mipfp") setting the target marginal tables to those obtained with ACS "B" tables used as the marginals. This function adjusts the "seed" table to have the same marginals as the target marginal tables.
There is a lot of stuff in the Small Area Estimation literature. This is the simplest technique, "Synthetic Estimation" Economists use it a lot. https://www.adb.org/sites/default/files/publication/609476/small-area-estimation-guide-nsos.pdf page 51. This document has sample R code. You can use generalized linear models "glm" instead of the log linear IPF technique. They are equivalent. I grew up on log linear models and I find them more intuitive then generalized linear models so that is what I like to use.
This is related to "indirect standardization" in epidemiology. http://www.medicine.mcgill.ca/epidemiology/hanley/c609/material/BreslowDayStandardizationRegrn.pdf I googled a bit and apparently geographers write papers on the technique.
One other hint the ACS B table marginals that you use won't necessarily be consistent so you have to scale the marginals for the individual tables so that they yield a common total N before you run them through the Ipfp function. The ACS table totals are adjusted so that things "work out" at either the state or county level For example if you add up the totals from B01001 across tracts in a county the totals agree with the ACS B table for the county. Or something like that -- if anyone reads this add a comment with a reference for what adjustment is made
If you want to take this discussion "off line" send an email to firstname.lastname@example.org with your contact info and we can email phone or zoom.