Hello all,I'm trying to put together some state-level estimates of household eligibility for the American Connectivity Program. The eligibility requirements (sheet 2)are essentially the same as for Lifeline, but the income to poverty ration is 200%, rather than 135. I'm running this analysis for all states, but am just focusing on Alabama in this example. The estimate of eligible households produced (AL.pums) is much lower than expected compared to 2021 eligibility estimates in this dataset from USAC (sheet 1) (USAC manages Lifeline). I realize this is a really specific issue, but since this is my first time working with the PUMS data I have a feeling I may have made an error along the way. Thanks in advance for any help with this!
library(tidyverse)library(tidycensus)census_api_key(Sys.getenv("CENSUS_API_KEY"))pums_vars <- c("HINS4", "FS", "PAP", "SSIP", "POVPIP")all_pums <- get_pums(variables = pums_vars, state = "AL", year = 2020, survey = "acs5")AL.pums <- all_pums %>% #filter records that meet eligibility requirements subset(HINS4 == 1 | FS == 1 | PAP == 1:30000 | SSIP == 1:30000 | POVPIP == 0:200) %>% subset(SPORDER == 1) %>% #retain just one record per householdsummarize(hh_eligible = sum(WGTP)) #calculate total number of households eligible
Dear Christine
Just as a "heads up" on using the POVPIP variable. It is defined for every record whereas ACS detail table B17001 only defines poverty for the Universe: "Population for whom poverty status is determined". Hence the PUMS file should have some records where POVPIP is undefined. [NOTE 2-18-2023 The PUMS files do have records where POVPIP is undefined. They have a blank field in the PUMS csv files. However if you use the API as I did, -1 will be used as the missing value. I didn't realize this initially] There is no way to determine these records from the PUMS data. See the link below for details. I've cut an pasted from and earlier thread, including a message from the people at the Census.
I've just been through a round with the people at the Census on reconciling "below poverty" from the PUMS with "below poverty" for table B17001. Poverty is not defined for households only for families which includes only people in the household who are related to the "head of household." For example a renter not related to the family is not included in the calculation and is not part of the "universe." For the calculation with partial details see: https://www.census.gov/topics/income-poverty/poverty/guidance/poverty-measures.html. You add up the income for all family members and then apply the formula which is based on total family income, number of family members and number of children. Here is the email that I just received from the people at the Census: (regarding matching numbers from B17001 and the calculation using the PUMS variables POVPIP RELSHIPP TYPEHUGQ AGEP.
Hi David,
Thanks for bringing up this point. I am aware that some populations are excluded when I use POVPIP, but I'm trying to recreate Lifeline eligibility estimates based on methodology provided by USAC and they specify that variable. So with that in mind, I need to use these specific variables. I'm wondering if the populations with "0" household weight were somehow accounted for in the eligibility estimate.
In their description it states, " Households are determined to be eligible for Lifeline if the householder reported any of the following in their response to ACS" - this leads me to wonder if they filtered on PERNUM (SPORDER) = 1, and applied person weights? I'm not sure if that makes sense, but I'll try it out and report back.
Dear Christine,
I have to do some additional research on this, but for the benefit of others who use poverty variables and who are reading this here is some additional information. Different Federal (and state) programs use slightly different definitions for "poverty." I did a small google search and found this HHS web page: https://aspe.hhs.gov/topics/poverty-economic-mobility/poverty-guidelines This webpage uses "family" and "household" interchangeably whereas the Census (ACS) considers them different "concepts."
For the ACS, I believe that a "household" (in contrast to a "group quarters") has a head of household. I think that this is "person 1" on the ACS survey form (please correct me if I have this wrong).
To go back to the Federal Register notice, www.federalregister.gov/.../annual-update-of-the-hhs-poverty-guidelines
"The poverty guidelines are used as an eligibility criterion by Medicaid and a number of other federal programs. The poverty guidelines issued here are a simplified version of the poverty thresholds that the Census Bureau uses to prepare its estimates of the number of individuals and families in poverty. " To continue:
"This notice does not provide definitions of such terms as ‘‘income’’ or ‘‘family’’ as there is considerable variation of these terms among programs that use the poverty guidelines. The legislation or regulations governing each program define these terms and determine how the program applies the poverty guidelines. In cases where legislation or regulations do not establish these definitions, the entity that administers or funds the program is responsible to define such terms as ‘‘income’’ and ‘‘family.’’ Therefore questions such as net or gross income, counted or excluded income, or household size should be directed to the entity that administers or funds the program. Dated: January 12, 2023."
One point that I would like to make is that the POVPIP variable in the PUMS file does not correspond to any of these definitions and should be use with care/modified as needed.
In the past I have communicated with the people at the census and have asked for the SAS code that is used to process the ACS survey form responses to produce the POVPIP variable in the PUMS file. Someone sent an outdated "Annotated Case Report Form" (in the FDA new drug application sense). They are reluctant to share the information presumably because it changes all the time and they don't want it "out there." You could probably make a Freedom of Information Act request (FOIA) to get the code used to process the ACS 1 year 2021 PUMS POVPIP variable. Maybe I'll give it a try.
Best,
Dave
If you want to know what an FDA Annotated case report form looks like, here are the guidelines:
wiki.cdisc.org/.../aCRF_Guideline_v1-0_20201120_publish.pdf
(sorry about the bad link in an earlier version of this post)
If you submit data to the FDA you need to provide an Annotated Case Report Form for every data item you submit. The guidelines give you an idea of the required detail (46 pages) !
Dear Christine to get back to your question about 0 household weights WGTP: 0 Weights are group quarters, e.g. college dorm, nursing home, prison, military barracks etc. The group quarters codes are here:
https://www2.census.gov/programs-surveys/acs/tech_docs/code_lists/2021_ACS_Code_Lists.pdf
see pdf page 31 for detailed group quarters codes.
Note due to disclosure avoidance only the institutional/non-institutional breakdown is in the PUMS file variable TYPEHUGQ
Here is the cross tabulation for 1 PUMA/PUMS file Since you use R, here is the R code
State 25 Massachusetts PUMA 03304 (Part of Boston?? -- quite a few College dormitories)
TYPEHUGQ Character 1 Type of unit
1 .Housing unit2 .Institutional group quarters3 .Noninstitutional group quarters
WGTP Housing unit weight 0== group quarters
z data.frame with PUMS person level data with housing data merged back in on SERIALNO
POVn<-as.numeric(z$POVPIP);
poverty_NA<-povn<0;
htype<-factor(z$TYPEHUGQ,levels=c(1,2,3),labels=c("house","inst","non-inst"))
table(pov.NA,htype)
htypepoverty_NA house inst non-inst FALSE 1129 0 44 TRUE 3 22 45
house.wt0<-z$WGTP==0;table(house.wt0,htype);
htypehouse.wt0 house inst non-inst FALSE 1132 0 0 TRUE 0 22 89
My comment in an earlier post below is incorrect. Some POVPIP records have POVPIP undefined/missing.
Note PUMS variables that are positive usually have missing defined as lowest defined value -1 (comment on this forum from person at Census)
Note from above 3 individual records from households have POVPIP "undefined" 44 individual records from non-institutional group quarters have POVPIP defined. So some group quarters (non-intitutional e.g. maybe adult treatment centers, what else ?) have POVPIP defined. What I said below in a previous post that all group quarters have POVPIP missing i incorrect. Compare this to:
https://www.census.gov/topics/income-poverty/poverty/guidance/poverty-measures.html
This web page says college dorms and military barracks do not have POVPIP defined
The type of group quarters with a non-missing POVPIP values cannot be determined from the PUMS data alone because the detailed group quarters type is not in the PUMS data. See ACS code list document above for the list of detailed group quarters codes.
Here are some of the non-institutional codes from the codebook
OTHER NONINSTITUTIONAL FACILITIES
EMERGENCY AND TRANSITIONAL SHELTERS (WITH SLEEPING FACILITIES) FOR PEOPLE EXPERIENCING HOMELESSNESS (701).GROUP HOMES INTENDED FOR ADULTS (801).RESIDENTIAL TREATMENT CENTERS FOR ADULTS (802).WORKERS’ GROUP LIVING QUARTERS AND JOB CORPS CENTERS (901).OTHER NONINSTITUTIONAL GROUP QUARTERS (904)
If I understand the USAC Lifeling programs (Google search) they provide phone and internet service for lower income people.
I guess a question is "are people in group quarters, for example a nursing home eligible for the program USAC ?" I was thinking that some assisted living situations might be group quarters but I think that if you have your own room this would be the same as an apartment building. There is quite a bit on documentation about determining if a living situation is a group quarters or a household.
Hope this helps !
Hey Dave,
I believe I understand the point you're making, but USAC explicitly states that they use POVPIP in their eligibility calculations. I'm comparing my calculated estimates to USAC's and mine are incorrect, so I'm trying to understand how they arrive at their estimates. I'm not comparing my estimates to other aggregate tables. As far as I know, their estimates do not include folks in other living situations. These are the specific qualifications listed in their methods.
I think that i understand what numbers you are trying to reproduce. I can't do it either:
Here is what I get for AL (fips code "01") Accessing the state csv file on
https://www2.census.gov/programs-surveys/acs/data/pums/2021/1-Year/ (2021 1- year ) also tried 2021 5 year data.
773318 AL 2021 1 year PUMS 732903 AL 2021 5-year PUMS including variable POVPIP
566248 AL 2021 1 year PUMS 539818 AL 2021 5-year PUMS excluding variable POVPIP
A couple of issues. POVPIP is defined for individuals but is based on the family. I think that the reason for this is that if you have an "unrelated" person in the household their poverty level is undefined. The poverty variable is defined for members of the family and is computed from the family income, number of members in the family and the number of children in the family. Presumably non-family members of the household are not eligible for the USAC program. FS is a household variable. Presumably it means that someone in the HOUSEHOLD receive food stamps but the PUMS codebook is not totally clear. Perhaps it means that someone in the FAMILY receives SNAP. SSIP is a person variable. I added up the SSIP values for members. 30,000 is the maximum value for a person in the PUMS file but there could be two or more people receiving assistance. Should those values be added or should you take the maximum ? I tried it both ways. HINS4 is a "person" variable. Presumably the household is eligible if any member of the household has HINS4==1.
So much for the details. I call it data archaeology. You might try emailing them and asking their data person for their computer code. SAS perhaps.
Since I use PUMS data and the variables that are included in the Lifeline eligibility criteria quite a bit, i went ahead and wrote an R function to do the "dirty work" of making the calculation. For AL 2021 1 year PUMS data this code produces 774,694 eligible households. The Lifeline spreadsheet Lifeline-Participation-Rate.xlsx shows a value of 626087. Hence the calculations differ by almost 150,000. Since this is off by such a large amount, let me know about any errors or suspicious code. [NOTE 2-18-2023 I modified the code so I can get a value for Lifeline eligibility for individuals in Group Quarters. The WGTP housing weight variable is 0 for these housing records. For the people in Group Quarters you need to sum the person weights PWGTP across the people in each Group Quarters. Only one variable in the eligibility criteria is defined at the Housing record level FS - Food Stamps/SNAP. If you ignore the value of this variable in the Lifeline calculation you get weights (sum of person weights in the Group Quarter). The net result is you get a value for eligible people (Not Households) who reside in Group Quarters TYPEHUGQ values 2 and 3 For my calculation this only increases the divergence between my calculation and the the spreadsheet on the USAC website but it might help Christine's calculation The data handling rules in the USAC/Lifeline spreadsheet are not clear on this point I don't know if individuals who reside in Group Quarters are eligible for the Lifeline program.]
The calculation that I did does not use "tidycensus" so the code is somewhat more primitive than your code. I use the Census FTP site zipped csv files. You need two files for each state, the person file and the housing file. These files need to be merged on the SERIALNO variable to "pull across" the household/group quarters variables into the person level dataset. I believe that tidycensus does this by default but I'm not enough of an expert on tidycensus to be sure.
Based on my reading of the posted application form https://www.usac.org/wp-content/uploads/lifeline/documents/forms/LI_Application_NVstates.pdf, I believe that the logical OR "|" is across all variables (suitably normailzed) and all persons in the household. For example if one person in the household is eligible/receives Medicaid than the household is eligible for the service. I use a loop on the unique SERIALNOs in the merged file. I then | (logical or)across all records where the unique SERIALNO matches the SERIALNO in the merged person/housing data.frame, i.e. across all members of the household. Errors often occur when you do a merge so there could be an error in that step. I used some "defensive" coding to handle missing values. This often a source of errors/differences in results.
I have many years of experience in submitting code and datasets to the FDA. The regulations require a lot of quality checks before you can submit to the FDA. Usually you do "double programming" for these checks. Often it requires over a day of work by two programmers before the code and dataset pass the required checks. Most data people in the nonprofit world do not use this level of rigor. In any case, when you use different programs you usually get different answers and it takes a lot of work to reconcile the results of different calculations.
If you send an email to info@dorerfoundation.org, I can email the code as an attachment.
## USAC lifeline_pums_analysis_021623.R ## R function to compute number of eligible Lifeline Households from ACS PUMS data## www.usac.org/.../LI_Application_NVstates.pdf## By David J Dorer, Ph.D.## copyright 2023 by Dorer Community Service Foundation Inc of Massachusetts## info@dorerfoundation.org## v1.0 djd 16 Feb 2023 12:40 ### z<-usac()#usac<-function(vintage=2021,state="AL",poverty.level=135,fips="01",period=1) {## ARGUMENTS:## vintage: PUMS vintage# period: PUMS period# state: state abbreviation# fips: state FIPS Code# poverty.level: poverty threshhold to use in calculation## VALUE:# # list with## households: number of eligible households in state# vintage: argument# period: argument# url.person: Census FTP site url for PUMS Person records# url.house: Census FTP site url for PUMS House records# state: argument# fips argument# data: Derived data.frame used to make calculation.#
state<-tolower(state);
error<-1; if((vintage==2021) & (period==1)) { note<-paste("vintage: ",vintage,"period: ",period," year PUMS"); urlh<-paste0("".zip");">www2.census.gov/.../csv_h",state,".zip"); urlp<-paste0("".zip");">www2.census.gov/.../csv_p",state,".zip"); error<-0; }; # if
if((vintage==2021)&(period==5)) { urlh<-paste0("".zip");">www2.census.gov/.../csv_h",state,".zip"); urlp<-paste0("".zip");">www2.census.gov/.../csv_p",state,".zip"); note<-"2021 5-year PUMS" error<-0; }; # if(error) { state<-toupper(state); msg<- paste("usac: ERROR bad arguments: state=",state,"FIPS=",fips,"vintage=",vintage,"period=",period); cat(msg,"\n"); return(list(error=error,state=state,fips=fips,vintage=vintage,period=period,message=msg)); }; # if error
# note this code downloads zip file to your "working directory" use getwd() to see the directory/folder download.file(urlh,"pumsh.zip"); # housing records from FTP site download.file(urlp,"pumsp.zip"); # person records from FTP site
house<-read.csv(unz("pumsh.zip",paste0("psam_h",fips,".csv"))); # load housing records directly from zip file person<-read.csv(unz("pumsp.zip",paste0("psam_p",fips,".csv"))); # load person records directly from zip file
# trim down house and person to the variables of interest hvar0<-c("SERIALNO","TYPEHUGQ","WGTP","FS"); hvar<-c("TYPEHUGQ","WGTP","FS"); pvar<-c("SERIALNO","HINS4","PAP","SSIP","POVPIP"); house<-house[,hvar0]; person<-person[,pvar];
# merge person and housing records m<-match(person$SERIALNO,house$SERIALNO);
# data.frame like the on returned by tidycensus function get_pums() ?? dat<-data.frame(person,house[m,c("TYPEHUGQ","WGTP","FS")]);
# create logical variables in the merged "dat" data.frame pov<-dat$POVPIP;pov[pov<0]<-NA; pov[pov<=poverty.level]<-1; # <= poverty.level argument poverty level clause
pov[pov>poverty.level]<-0; pov[is.na(pov)]<-0; # at this point 0 is above poverty.level or dat$POVPIP is missing
ssip<-dat$SSIP; ssip[is.na(ssip)]<-0; # Receive Supplemental Social Security Income SSI >= 1 otherwise 0
pap<-dat$PAP; pap[is.na(pap)]<-0; # Public Assistance Income missing value set to zero
# HINS4 Health Insurance Medicade etc ==1 TRUE otherwise FALSE no missing hins<-dat$HINS4; hins[is.na(hins)]<-0; # result 1==receives Medicaid etc. 0 or missing does not receive Medicaide, etc.
# FS Receives Food Stamps/SNAP == 1 (yes) == 2 (no) no missing values# some defensive coding fs<-dat$FS==1; fs[is.na(fs)]<-0; fs[fs==2]<-0; # recode 1=receives SNAP yes, 0 or missing no
# life<-(fs==1)|(pov>0)|((pap<=30000)&(pap>0))|((ssip<=30000)&(ssip>0))|(hins==1); # logic at household person record level life<-(fs==1)|(pov>0)|(pap>0)|(ssip>0)|(hins==1); # logic at household person record level
# no missing values for all derived variables including "life" dfs<-data.frame(dat,pov,pap,hins,fs,ssip,lifeline=life); # data.frame to make sure everything aligns and check logic
cat("\nusac: Summary of derived per person (for households) derived data frame:\n"); print(summary(dfs[is.na(dfs$POVPIP),])); # check logic on pov variable POVPIP missing pov set to 0 (no)
serialH<-sort(unique(dat$SERIALNO)); # unique household SERIALNO in dat data.frame
mhp<-match(house$SERIALNO,serialH); # look for records in house data.frame that don't match records in dat data.frame multi.person<-house[is.na(mhp),]; cat("\nusac: PUMS Housing Records that match multiple SERIALNO id's from Person Records\n"); cat(" should be all Household records)\n");# all records have TYPEHUGQ set to 1 i.e. household records. print(summary(multi.person));
## now apply | (or) logic across values of life for all persons in a household## application form that lifeline applicants fill out# www.usac.org/.../LI_Application_NVstates.pdf# N<-length(serialH); # loop over serialH (unique dat$SERIALNO values) lifeline<-rep(0,N);
for(i in seq(1,N)) { if(!(i%%2000)) cat("i=",i,"\n"); housei<-serialH[i]==dat$SERIALNO; lifeline[i]<-any(life[housei]) }; # for i weight<-dat$WGTP[match(serialH,dat$SERIALNO)]; datH<-data.frame(serialH,weight,lifeline);
# number of qualified households
qhouse<-sum(datH$weight[datH$lifeline==1],na.rm=TRUE); cat("\nusac: Number of households qualified for Lifeline Households=",qhouse,note,"\n"); rtn<-list(households=qhouse,poverty.level=poverty.level,state=toupper(state),fips=fips ,url.house=urlh,url.person=urlp,vintage=vintage,period=period,data=dfs); invisible(rtn);}; # usac
## END ##
MAKING PROGRESS
I was able to reproduce the AL row in the Lifeline spreadsheet. The value is 626087 I used the 2021 1-year PUMS. I added SPORDER==1 to the code posted earlier This code here computes with only SPORDER==1 person records and with all person records If you can tell me now to paste in the code like you did so it looks OK then I'll post it like you did. If you send an email to info@dorerfoundation.org I'll email the code as an attachment.
I don't use the %>% format in my code but I believe if you add | SPORDER == 1 to the first subset line and eliminate the second subset line you will get the right thing. I'm a long time FORTRAN programmer so I still use loops with an integer index. The other issue is that you used 2020 5 year data and the spreadsheet uses 2021 1-year PUMS data. You might also ask why USAC Lifeline people didn't "|" across all members of the household. The application seems to say that if someone in the household is eligible then the entire household is eligible. So if anyone receives Medicaid then the household is eligible. The ACS has the concept of a single Householder (person 1 on the form). If there is a husband and wife for example and the wife fills out the form as "person 1" and the husband is on Medicaid then the ACS based calculation with SPORDER==1 is wrong and will count the household as NOT eligible. This makes the spreadsheet computed participation rate too high. They should be using the denominator of 774694 (for AL 2021 - 1 year data) which looks at all persons in the household.
Another thing is that the csv and API downloaded data have a different rule for missing values. The csv files have a ",," blank field for missing. The API cannot handle a "blank" data field so the API puts in a value. The rule that a Census person posted on the forum is lowest possible value -1 for missing with the caveat of "usually," This detail does not appear in the PUMS pdf codebook. Your code with POVPIP==0:200 should take care of this in the API case but I'm not sure. Also remember to use a 135 limit for POVPIP to match the Lifeline spreadsheet.
Reference - Lifeline Federal Regulations giving qualification for Lifeline: (uses 135% of poverty GUIDELINE)
https://www.law.cornell.edu/cfr/text/47/54.409
"The consumer, one or more of the consumer's dependents, or the consumer's household must receive benefits from one of the following federal assistance programs: Medicaid; Supplemental Nutrition Assistance Program; Supplemental Security Income; Federal Public Housing Assistance; or Veterans and Survivors Pension Benefit."
ACP (Affordable Connectivity Program (uses 200% of Poverty GUIDELINE (Not Poverty Threshold as does the ACS)
https://www.law.cornell.edu/cfr/text/47/54.1805
Both Lifeline and ACP use "Family"
Household and income are defined here:
https://www.law.cornell.edu/cfr/text/47/54.400#f
## USAC lifeline_pums_analysis_021823.R ## R function to compute number of eligible Lifeline Households from ACS PUMS data## www.usac.org/.../LI_Application_NVstates.pdf## By David J Dorer, Ph.D.## copyright 2023 by Dorer Community Service Foundation Inc of Massachusetts## info@dorerfoundation.org## v1.0 djd 16 Feb 2023 12:40 # v1.1 djd 18 Feb 2023 13:38 added group quarters eligible persons calculation# v1.2 djd 19 Feb 2023 18:43 added calculation using only person records with SPORDER==1### z1<-usac(vintage=2021,period=1)# z2<-usac(vintage=2020,period=1);# z3<-usac(vintage=2020,period=5);# z4<-usac(vintage=2021,period=5);# z5<-usac(vintage=2021,period=1,state="MA",fips="25");
usac<-function(vintage=2021,state="AL",poverty.level=135,fips="01",period=1) {## ARGUMENTS:## vintage: PUMS vintage# period: PUMS period# state: state abbreviation# fips: state FIPS Code# poverty.level: poverty threshold to use in calculation## VALUE:# # list with## households: number of eligible households in state
# householders: number of eligible householders (SPORDER==1) records only# vintage: argument# period: argument# url.person: Census FTP site url for PUMS Person records# url.house: Census FTP site url for PUMS House records# state: argument# fips argument# data: Derived data.frame used to make calculation.#
error<-0; note<-paste0(vintage," ",period,"- PUMS"); msg<-paste("State=",state,"fips=",fips,"PUMS vintage=",vintage,"period=",period);
note<-paste("vintage: ",vintage,"period: ",period," year PUMS"); urlh<-paste0(""/",period,"-Year/csv_h",state,".zip");">www2.census.gov/.../csv_h",state,".zip"); urlp<-paste0(""/",period,"-Year/csv_p",state,".zip");">www2.census.gov/.../csv_p",state,".zip");
if((vintage==2020)&(period==1)) { # because of covid 2020 1 year PUMS was released with experimental weights urlh<-paste0("".zip");">www2.census.gov/.../csv_h",state,".zip"); urlp<-paste0("".zip");">www2.census.gov/.../csv_p",state,".zip"); }; # if
# note this code downloads zip file to your "working directory" use getwd() to see the directory/folder vh<-try(download.file(urlh,"pumsh.zip")); # housing records from FTP site vp<-try(download.file(urlp,"pumsp.zip")); # person records from FTP site
if(class(vh)[1]=="try-error") { msg<-c(msg,paste0("ERROR downloading housing file url=",urlh,"vintage=",vintage,"period=",period)); error<-error+1; };
if(class(vp)[1]=="try-error") { msg<-c(msg,paste0("ERROR downloading person file url=",urlp)); error<-error+1; }; if(error>0) {cat("usac: ERRORS\n");return(msg); };
hfile<-paste0("psam_h",fips,".csv"); pfile<-paste0("psam_p",fips,".csv");
house<-try(read.csv(unz("pumsh.zip",hfile))); # load housing records directly from zip file person<-try(read.csv(unz("pumsp.zip",pfile))); # load person records directly from zip file
if(class(house)[1]=="try-error") { msg<-c(msg,paste0("ERROR reading zip archive: pumsh.zip file=",hfile)) error<-error+1; };
if(class(person)[1]=="try-error") { msg<-c(msg,paste0("ERROR reading zip archive: pumsp.zip file=",pfile)) error<-error+1; };
if(error>0) {cat("usac: ERRORS\n");return(msg); };
# trim down house and person to the variables of interest hvar0<-c("SERIALNO","TYPEHUGQ","WGTP","FS"); hvar<-c("TYPEHUGQ","WGTP","FS"); pvar<-c("SERIALNO","SPORDER","HINS4","PAP","SSIP","POVPIP","PWGTP"); house<-house[,hvar0]; cat("\nsummary house data.frame=\n");print(summary(house)); person<-person[,pvar];
# merge person and housing records m<-match(as.character(person$SERIALNO),as.character(house$SERIALNO));
# data.frame like the on returned by tidycensus function get_pums() ??# note tidycensus get_pums() appears to use the API. # The API cannot handle blank fields, missing is lowest value - 1 (typically)
dat<-data.frame(person,house[m,c("TYPEHUGQ","WGTP","FS")]); cat("\nWGTP==0 by TYPEHUGQ\n");print(table(wgtp0=addNA(house$WGTP==0),typehq=addNA(as.factor(house$TYPEHUGQ))));
cat("\nusac: summary dat merged person/house data.frame:\n");print(summary(dat)); cat("\nusac: summary dat merged person/house data.frame with house weight (WGTP)==0:\n");print(summary(dat[dat$WGTP==0,]));
# create logical variables in the merged "dat" data.frame pov<-dat$POVPIP;pov[pov<0]<-NA; pov[pov<=poverty.level]<-1; # <= poverty.level argument poverty level clause pov3<-rep(NA,length(pov)); pov3[pov<0]<-3;pov3[is.na(pov)]<-3; flag<-(pov>=0)|(pov<=poverty.level);flag[is.na(flag)]<-FALSE;pov3[flag]<-1; flag<-pov>poverty.level;flag[is.na(flag)]<-FALSE;pov3[flag]<-2; pov3f<-factor(pov3,levels=c(1,2,3),c("below","above","undefined")); pov[pov>poverty.level]<-0; sporder<-dat$SPORDER;
# HINS4 Health Insurance Medicade etc ==1 TRUE other wise no missing hins<-dat$HINS4; hins[is.na(hins)]<-0; # result 1==receives Medicaid etc. 0==missing 2==no # FS Receives Food Stamps/SNAP == 1 (yes) == 2 (no) no missing values cat("summary of PAP>0\n"); print(summary(dat$PAP[dat$PAP>0])); cat("summary of pap>0\n"); print(summary(pap[pap>0])); cat("summary PAP==0/pap==0\n");print(table(PAP0=addNA(dat$PAP==0),pap0=addNA(pap==0))); cat("summary of SSIP>0\n"); print(summary(dat$SSIP[dat$SSIP>0])); cat("summary of ssip>0\n"); print(summary(ssip[ssip>0])); cat("summary SSIP==0/ssip==0\n");print(table(SSIP00=addNA(dat$SSIP==0),ssip0=addNA(ssip==0)));
# some defensive coding fs<-dat$FS==1; fs[is.na(fs)]<-0; fs[fs==2]<-0; # recode 1=receives SNAP yes, 0 missing or no cat("summary of FS/fs\n"); print(table(addNA(as.factor(fs)),addNA(as.factor(dat$FS))));
life<-(fs==1)|(pov3f=="below")|(pap>0)|(ssip>0)|(hins==1); # logic at household person record level
# no missing values for all derived variables including "life" dfs<-data.frame(dat,pov3f,pap,hins,fs,ssip,lifeline=life); # data.frame to make sure everything aligns and check logic
cat("\nusac: Summary of derived per person data frame (dat with derived variables)\n");print(summary(dfs));
cat("\nusac: Summary of derived per person (for households) data frame with POVPIP missing.:\n"); print(summary(dfs[is.na(dfs$POVPIP),])); # check logic on pov variable POVPIP missing pov set to 0 (no)
serialH<-sort(unique(as.character(dat$SERIALNO))); # unique household SERIALNO in dat data.frame
# now apply | (or) logic across values of life for all persons in a household## application form that lifeline applicants fill out# www.usac.org/.../LI_Application_NVstates.pdf# This application seems to indicate that the Household is elgible in anyone individual personin the househould# is eligible not just the ACS SPORDER==1 (Head of Household -- ACS Questionnaire Person 1)# so the calculation should OR across all person records in the Household# N<-length(serialH); # loop over serialH (unique dat$SERIALNO values) lifeline1<-lifeline<-rep(0,N); pweight<-rep(NA,N);
for(i in seq(1,N)) { if(!(i%%2000)) cat("i=",i,"\n"); housei<-serialH[i]==dat$SERIALNO; lifeline[i]<-any(life[housei]); lifeline1[i]<-any(life[housei&(sporder==1)]); pweight[i]<-sum(person$PWGTP[serialH[i]==person$SERIALNO]); # sum person weights }; # for i weight<-dat$WGTP[match(serialH,dat$SERIALNO)]; weight2<-house$WGTP[match(serialH,as.character(house$SERIALNO))];
dataH<-data.frame(SERIALNO=serialH,weight,pweight,lifeline,lifeline1,house[match(serialH,house$SERIALNO),]); cat("\nusac: compare weights: weight-weight2\n");print(summary(weight-weight2)); cat("\nsummary dataH: \n");print(summary(dataH)); cat("\nsummary 0 weights \n");print(summary(dataH[dataH$weight==0,]));
# number of qualified households cat("\nsummary dataH: for weight==0\n");print(summary(dataH[dataH$weight==0,])); qhouse<-sum(dataH$weight[dataH$lifeline==1],na.rm=TRUE); # any person in Household qualifed make Household qualified qhouse1<-sum(dataH$weight[dataH$lifeline1==1],na.rm=TRUE); # qualified Household based on person SPORDER==1 qperson<-sum(dataH$pweight[dataH$TYPEHUGQ!=1]); # qualified Group Quarters persons (sum person weights) qhouse2<-sum(dataH$weight[(dataH$lifeline==1)&(dataH$TYPEHUGQ==1)],na.rm=TRUE);
cat("\nLifeline/TYPEHUGQ=\n");print(table(H.type=addNA(as.factor(dataH$TYPEHUGQ)),lifeline=addNA(as.factor(dataH$lifeline)))); cat("summary(dataH: for Group Quarters=\n");print(summary(dataH[dataH$TYPEHUGQ!=1,])); cat("\nsummary qualified households\n");print(summary(dataH[dataH$lifeline==1,]));
cat("\nusac: Results: ",date(),note,"\n"); cat("usac: Number of households/group.quarters qualified for Lifeline Households=",qhouse,"\n"); cat("usac: Number of households qualified for Lifeline Households=",qhouse2,"\n"); cat("usac: Number of households qualified for Lifeline based on Householder=",qhouse1,"\n"); cat("usac: Number of Group.Quarters persons qualified for Lifeline Households=",qperson,"\n");
rtn<-list(households=qhouse2,householders=qhouse1,personsGQ=qperson,includeGQ=qhouse ,poverty.level=poverty.level,state=toupper(state),fips=fips,note=note,error=error ,url.house=urlh,url.person=urlp,vintage=vintage,period=period,data.person=dfs,data.house=dataH); invisible(rtn);}; # usac
I really appreciate all your effort on this! After trying out several versions of the same datasets (via tidycensus API, downloaded from FTP site, and downloaded from iPUMS), I finally got this figured out. I also realized that the SPORDER == 1 would limit the count to householder information only. I also noticed in your code the TYPEHUGQ field. Once I included that field AND properly dealt with NA values that arose when I joined the person & household records...VOILA! I did keep the POVPIP range at 0:200, because that is the range specified for the Affordable Connectivity Plan (very to lifeline but focused on Internet access, and eligibility-wise POVPIP is the only difference that I'm aware of).
I keep the code for this on github here if you're interested: https://github.com/ILSR-GIS-DATA/Affordable-Connectivity-Program-Analysis.
For future reference, if you want to add in code (see dropdown options in pic below). Thanks again for brainstorming with me!
I just fixed your tidyverse code Here is the corrected code:
lifeline.eligible<-function(state="AL",vintage=2021,period=1) { library(tidyverse) library(tidycensus)
census_api_key(Sys.getenv("CENSUS_API_KEY")) pums.vars <- c("HINS4", "FS", "PAP", "SSIP", "POVPIP")
## note all.pums data.frame has no missing (NA) values.# -1 is used as a missing value for PAP SSIP and POVPIP# all.pums <- get_pums(variables = pums.vars, state = state, year = vintage, survey = paste0("acs",period));
## the following logic drops records with missing (-1) values# selects SPORDER==1 records# note this logic is incorrect but matches that used for the Lifeline eligiblity spreadsheet
AL.pums <- all.pums %>% #filter records that meet eligibility requirements with SPORDER==1 filter((SPORDER==1)&(HINS4 == 1 | FS == 1 | between(PAP,1,30000) | between(SSIP,1,30000)| between(POVPIP,0,135))) %>% summarize(hh.eligible = sum(WGTP)) #calculate total number of households eligible AL.pums;};
ADDED 3:05 PM EST
Working on this has gotten me interested in the ACP and Lifeline. I have a friend in DC who worked with the OMB and now works with the CBO. Apparently he has worked on the program and is familiar with it -- presumably on the Congressional budget side. In any case working with 501(c)(3) organizations is part of the mission/purpose of the Dorer Community Service Foundation. So if you need anymore help we can create a "Project" description and formalize a relationship between DCSF and the ILSR for pro bono consulting services. Email info@dorerfoundation.org if you want to proceed along this line.
The main mistake with you original code was using "subset" in instead of "filter" and using the sequence operator 1:30000. You need a logical AND with >=. (between) I've learned a lot about tidyverse and tidycensus. I tend to like to use the R "base" package as much as possible. You might also connect with the people at USAC and point out that their spreadsheet over estimates the participation rate by quite a bit.
Note get_pums uses the API to download the data so what are blank (missing) fields in the FTP files have a numeric value, in this case -1, as a placeholder for NA
UPDATE email is info@dorerfoundation.org Left off the .org in post earlier
Code pasted as plain text
lifeline.eligible<-function(state="AL",vintage=2021,period=1) {
# Compute Lifeline eligible households from ACS PUMS data## state: abbreviation for state# vintage: year in get_pums# period: either 1 (1-year PUMS) or 5 (5-year PUMS)
library(tidyverse) library(tidycensus)
# Note: get_pums uses API to download PUMS data# with the API there are no missing values. Numeric values are used for missing# lowest non-missing value -1 usually## all.pums data.frame has no missing (NA) values.# -1 is used as a missing value for PAP SSIP and POVPIP# all.pums <- get_pums(variables = pums.vars, state = state, year = vintage, survey = paste0("acs",period));
## selects SPORDER==1 records# note this logic is incorrect but matches that used for the Lifeline eligiblity spreadsheet# www.usac.org/.../Lifeline-Participation-Rate.xlsx## -1 values in | (Logical OR) are out of range so a missing value in PAP SSIP or POVPIP is# counted as FALSE in logic calculation#