Calculating ACP/Lifeline Eligibility Using Tidycensus

Hello all,
I'm trying to put together some state-level estimates of household eligibility for the American Connectivity Program. The eligibility requirements (sheet 2)are essentially the same as for Lifeline, but the income to poverty ration is 200%, rather than 135. I'm running this analysis for all states, but am just focusing on Alabama in this example. The estimate of eligible households produced (AL.pums) is much lower than expected compared to 2021 eligibility estimates in this dataset from USAC (sheet 1) (USAC manages Lifeline). I realize this is a really specific issue, but since this is my first time working with the PUMS data I have a feeling I may have made an error along the way. Thanks in advance for any help with this!

 library(tidyverse)
library(tidycensus)
census_api_key(Sys.getenv("CENSUS_API_KEY"))

pums_vars <- c("HINS4", "FS", "PAP", "SSIP", "POVPIP")

all_pums <- get_pums(variables = pums_vars, state = "AL", year = 2020,
survey = "acs5")

AL.pums <- all_pums %>%
#filter records that meet eligibility requirements
subset(HINS4 == 1 | FS == 1 | PAP == 1:30000 | SSIP == 1:30000 | POVPIP == 0:200) %>%
subset(SPORDER == 1) %>% #retain just one record per household
summarize(hh_eligible = sum(WGTP)) #calculate total number of households eligible
Parents
  • Dear Christine to get back to your question about 0 household weights WGTP: 0 Weights are group quarters, e.g. college dorm, nursing home, prison, military barracks etc.  The group quarters codes are here:

    https://www2.census.gov/programs-surveys/acs/tech_docs/code_lists/2021_ACS_Code_Lists.pdf

    see pdf page 31 for detailed group quarters codes.

    Note due to disclosure avoidance only the institutional/non-institutional breakdown is in the PUMS file variable TYPEHUGQ

    Here is the cross tabulation for 1 PUMA/PUMS file  Since you use R, here is the R code

    State 25 Massachusetts PUMA 03304 (Part of Boston?? -- quite a few College dormitories)

    TYPEHUGQ   Character  1  Type of unit


    1 .Housing unit
    2 .Institutional group quarters
    3 .Noninstitutional group quarters

    WGTP  Housing unit weight 0== group quarters

    z    data.frame with PUMS person level data with housing data merged back in on SERIALNO

    POVn<-as.numeric(z$POVPIP);


    poverty_NA<-povn<0;


    htype<-factor(z$TYPEHUGQ,levels=c(1,2,3),labels=c("house","inst","non-inst"))

    table(pov.NA,htype)

                           htype
    poverty_NA     house inst non-inst
    FALSE              1129     0          44
    TRUE                     3   22          45 

    house.wt0<-z$WGTP==0;
    table(house.wt0,htype);

                               htype
    house.wt0      house inst non-inst
    FALSE            1132     0           0
    TRUE                   0   22         89

    My comment in an earlier post  below is incorrect. Some POVPIP records have POVPIP undefined/missing.

    Note PUMS variables that are positive usually have missing defined as lowest defined value -1 (comment on this forum from person at Census)

    Note  from above 3 individual records from households have POVPIP "undefined"  44 individual records from non-institutional group quarters have POVPIP defined.  So some group quarters (non-intitutional e.g. maybe adult treatment centers, what else ?) have POVPIP defined. What I said below in a previous post that all group quarters have POVPIP missing i incorrect.  Compare this to:

    https://www.census.gov/topics/income-poverty/poverty/guidance/poverty-measures.html

    This web page says college dorms and military barracks do not have POVPIP defined

    The type of group quarters with a non-missing POVPIP values cannot be determined from the PUMS data alone because the detailed group quarters type is not in the PUMS data.  See ACS code list document above for the list of detailed group quarters codes.

    Here are some of the non-institutional codes from the codebook

    OTHER NONINSTITUTIONAL FACILITIES


    EMERGENCY AND TRANSITIONAL SHELTERS (WITH SLEEPING FACILITIES) FOR PEOPLE EXPERIENCING HOMELESSNESS (701)
    .GROUP HOMES INTENDED FOR ADULTS (801)
    .RESIDENTIAL TREATMENT CENTERS FOR ADULTS  (802)
    .WORKERS’ GROUP LIVING QUARTERS AND JOB CORPS CENTERS (901)
    .OTHER NONINSTITUTIONAL GROUP QUARTERS  (904)

    If I understand the USAC Lifeling programs (Google search) they provide phone and internet service for lower income people. 

    I guess a question is "are people in group quarters, for example a nursing home eligible for the program USAC ?"  I was thinking that some assisted living situations might be group quarters but I think that if you have your own room this would be the same as an apartment building. There is quite a bit on documentation about determining if a living situation is a group quarters or a household.

    Hope this helps !

    Dave

  • Hey Dave,

    I believe I understand the point you're making, but USAC explicitly states that they use POVPIP in their eligibility calculations. I'm comparing my calculated estimates to USAC's and mine are incorrect, so I'm trying to understand how they arrive at their estimates. I'm not comparing my estimates to other aggregate tables. As far as I know, their estimates do not include folks in other living situations. These are the specific qualifications listed in their methods.

    Medicaid, Medical Assistance, or any kind of government-assistance plan for those with low incomes or a disability HINS4 1 -- Yes
    Yearly food stamp/Supplemental Nutrition Assistance Program (SNAP) recipiency FS 1 -- Yes
    Public assistance income over past 12 months (any amount) PAP 1 to 30000 -- $1 to $30000 (Rounded)
    Supplemental Security Income over past 12 months (any amount) SSIP 1 to 30000 -- $1 to $30000 (Rounded)
    Poverty status recode indicating household income below the 135% poverty threshold POVPIP 0:135 (inclusive)
  • Dear Christine,

    I think that i understand what numbers you are trying to reproduce.  I can't do it either:

    Here is what I get for AL (fips code "01") Accessing the state csv file on

     https://www2.census.gov/programs-surveys/acs/data/pums/2021/1-Year/  (2021 1- year ) also tried 2021 5 year data.

    773318 AL 2021 1 year PUMS 732903 AL 2021 5-year PUMS including variable POVPIP

    566248 AL 2021 1 year PUMS 539818 AL 2021 5-year PUMS  excluding variable POVPIP

    A couple of issues.  POVPIP is defined for individuals but is based on the family.  I think that the reason for this is that if you have an "unrelated" person in the household their poverty level is undefined.  The poverty variable is defined for members of the family and is computed from the family income, number of members in the family and the number of children in the family. Presumably non-family members of the household are not eligible for the USAC program.   FS is a household variable. Presumably it means that someone in the HOUSEHOLD receive food stamps but the PUMS codebook is not totally clear.  Perhaps it means that someone in the FAMILY receives SNAP.  SSIP is a person variable. I added up the SSIP values for members.  30,000 is the maximum value for a person in the PUMS file but there could be two or more people receiving assistance. Should those values be added or should you take the maximum ? I tried it both ways.  HINS4 is a "person" variable. Presumably the household is eligible if any member of the household has HINS4==1.

    So much for the details.  I call it data archaeology.   You might try emailing them and asking their data person for their computer code.  SAS perhaps.

    Best,

    Dave

Reply
  • Dear Christine,

    I think that i understand what numbers you are trying to reproduce.  I can't do it either:

    Here is what I get for AL (fips code "01") Accessing the state csv file on

     https://www2.census.gov/programs-surveys/acs/data/pums/2021/1-Year/  (2021 1- year ) also tried 2021 5 year data.

    773318 AL 2021 1 year PUMS 732903 AL 2021 5-year PUMS including variable POVPIP

    566248 AL 2021 1 year PUMS 539818 AL 2021 5-year PUMS  excluding variable POVPIP

    A couple of issues.  POVPIP is defined for individuals but is based on the family.  I think that the reason for this is that if you have an "unrelated" person in the household their poverty level is undefined.  The poverty variable is defined for members of the family and is computed from the family income, number of members in the family and the number of children in the family. Presumably non-family members of the household are not eligible for the USAC program.   FS is a household variable. Presumably it means that someone in the HOUSEHOLD receive food stamps but the PUMS codebook is not totally clear.  Perhaps it means that someone in the FAMILY receives SNAP.  SSIP is a person variable. I added up the SSIP values for members.  30,000 is the maximum value for a person in the PUMS file but there could be two or more people receiving assistance. Should those values be added or should you take the maximum ? I tried it both ways.  HINS4 is a "person" variable. Presumably the household is eligible if any member of the household has HINS4==1.

    So much for the details.  I call it data archaeology.   You might try emailing them and asking their data person for their computer code.  SAS perhaps.

    Best,

    Dave

Children
  • Dear Christine,

    Since I use PUMS data and the variables that are included in the Lifeline eligibility criteria quite a bit, i went ahead and wrote an R function to do the "dirty work" of making the calculation. For AL 2021 1 year PUMS data this code produces 774,694 eligible households. The Lifeline spreadsheet Lifeline-Participation-Rate.xlsx shows a value of 626087. Hence the calculations differ by almost 150,000. Since this is off by such a large amount, let me know about any errors or suspicious code. [NOTE 2-18-2023 I modified the code so I can get a value for Lifeline eligibility for individuals in Group Quarters. The WGTP housing weight variable is 0 for these housing records. For the people in Group Quarters you need to sum the person weights PWGTP across the people in each Group Quarters. Only one variable in the eligibility criteria is defined at the Housing record level FS - Food Stamps/SNAP.  If you ignore the value of this variable in the Lifeline calculation you get weights (sum of person weights in the Group Quarter). The net result is you get a value for eligible people (Not Households) who reside in Group Quarters TYPEHUGQ values 2 and 3 For my calculation this only increases the divergence between my calculation and the the spreadsheet on the USAC website but it might help Christine's calculation The data handling rules in the USAC/Lifeline spreadsheet are not clear on this point I don't know if individuals who reside in Group Quarters are eligible for the Lifeline program.]

    The calculation that I did does not use "tidycensus" so the code is somewhat more primitive than your code.  I use the Census FTP site zipped csv files. You need two files for each state, the person file and the housing file. These files need to be merged on the SERIALNO variable to "pull across" the household/group quarters variables into the person level dataset. I believe that tidycensus does this by default but I'm not enough of an expert on tidycensus to be sure.

    Based on my reading of the posted application form https://www.usac.org/wp-content/uploads/lifeline/documents/forms/LI_Application_NVstates.pdf,  I believe that the logical OR "|" is across all variables (suitably normailzed) and all persons in the household. For example if one person in the household is eligible/receives Medicaid than the household is eligible for the service. I use a loop on the unique SERIALNOs in the merged file. I then | (logical or)across all records where the unique SERIALNO matches the SERIALNO in the merged person/housing data.frame, i.e. across all members of the household. Errors often occur when you do a merge so there could be an error in that step.  I used some "defensive" coding to handle missing values. This often a source of errors/differences in results. 

    I have many years of experience in submitting code and datasets to the FDA.  The regulations require a lot of quality checks before you can submit to the FDA. Usually you do "double programming" for these checks.  Often it requires over a day of work by two programmers before the code and dataset pass the required checks. Most data people in the nonprofit world do not use this level of rigor. In any case, when you use different programs you usually get different answers and it takes a lot of work to reconcile the results of different calculations.

    If you send an email to info@dorerfoundation.org, I can email the code as an attachment.

    #
    # USAC lifeline_pums_analysis_021623.R
    #
    # R function to compute number of eligible Lifeline Households from ACS PUMS data
    #
    # www.usac.org/.../LI_Application_NVstates.pdf
    #
    # By David J Dorer, Ph.D.
    #
    # copyright 2023 by Dorer Community Service Foundation Inc of Massachusetts
    #
    # info@dorerfoundation.org
    #
    # v1.0 djd 16 Feb 2023 12:40
    #
    #
    # z<-usac()
    #
    usac<-function(vintage=2021,state="AL",poverty.level=135,fips="01",period=1) {
    #
    # ARGUMENTS:
    #
    # vintage: PUMS vintage
    # period: PUMS period
    # state: state abbreviation
    # fips: state FIPS Code
    # poverty.level: poverty threshhold to use in calculation
    #
    # VALUE:
    #
    # list with
    #
    # households: number of eligible households in state
    # vintage: argument
    # period: argument
    # url.person: Census FTP site url for PUMS Person records
    # url.house: Census FTP site url for PUMS House records
    # state: argument
    # fips argument
    # data: Derived data.frame used to make calculation.
    #

    state<-tolower(state);

    error<-1;
    if((vintage==2021) & (period==1)) {
    note<-paste("vintage: ",vintage,"period: ",period," year PUMS");
    urlh<-paste0("".zip");">www2.census.gov/.../csv_h",state,".zip");
    urlp<-paste0("".zip");">www2.census.gov/.../csv_p",state,".zip");
    error<-0;
    }; # if

    if((vintage==2021)&(period==5)) {
    urlh<-paste0("".zip");">www2.census.gov/.../csv_h",state,".zip");
    urlp<-paste0("".zip");">www2.census.gov/.../csv_p",state,".zip"); note<-"2021 5-year PUMS"
    error<-0;
    }; #
    if(error) {
    state<-toupper(state);
    msg<- paste("usac: ERROR bad arguments: state=",state,"FIPS=",fips,"vintage=",vintage,"period=",period);
    cat(msg,"\n");
    return(list(error=error,state=state,fips=fips,vintage=vintage,period=period,message=msg));
    }; # if error

    # note this code downloads zip file to your "working directory" use getwd() to see the directory/folder
    download.file(urlh,"pumsh.zip"); # housing records from FTP site
    download.file(urlp,"pumsp.zip"); # person records from FTP site

    house<-read.csv(unz("pumsh.zip",paste0("psam_h",fips,".csv"))); # load housing records directly from zip file
    person<-read.csv(unz("pumsp.zip",paste0("psam_p",fips,".csv"))); # load person records directly from zip file

    # trim down house and person to the variables of interest
    hvar0<-c("SERIALNO","TYPEHUGQ","WGTP","FS"); hvar<-c("TYPEHUGQ","WGTP","FS");
    pvar<-c("SERIALNO","HINS4","PAP","SSIP","POVPIP");
    house<-house[,hvar0];
    person<-person[,pvar];

    # merge person and housing records
    m<-match(person$SERIALNO,house$SERIALNO);

    # data.frame like the on returned by tidycensus function get_pums() ??
    dat<-data.frame(person,house[m,c("TYPEHUGQ","WGTP","FS")]);

    # create logical variables in the merged "dat" data.frame
    pov<-dat$POVPIP;pov[pov<0]<-NA; pov[pov<=poverty.level]<-1; # <= poverty.level argument poverty level clause

    pov[pov>poverty.level]<-0; pov[is.na(pov)]<-0; # at this point 0 is above poverty.level or dat$POVPIP is missing

    ssip<-dat$SSIP; ssip[is.na(ssip)]<-0; # Receive Supplemental Social Security Income SSI >= 1 otherwise 0

    pap<-dat$PAP; pap[is.na(pap)]<-0; # Public Assistance Income missing value set to zero

    # HINS4 Health Insurance Medicade etc ==1 TRUE otherwise FALSE no missing
    hins<-dat$HINS4; hins[is.na(hins)]<-0; # result 1==receives Medicaid etc. 0 or missing does not receive Medicaide, etc.

    # FS Receives Food Stamps/SNAP == 1 (yes) == 2 (no) no missing values
    # some defensive coding
    fs<-dat$FS==1; fs[is.na(fs)]<-0; fs[fs==2]<-0; # recode 1=receives SNAP yes, 0 or missing no

    # life<-(fs==1)|(pov>0)|((pap<=30000)&(pap>0))|((ssip<=30000)&(ssip>0))|(hins==1); # logic at household person record level
    life<-(fs==1)|(pov>0)|(pap>0)|(ssip>0)|(hins==1); # logic at household person record level

    # no missing values for all derived variables including "life"
    dfs<-data.frame(dat,pov,pap,hins,fs,ssip,lifeline=life); # data.frame to make sure everything aligns and check logic

    cat("\nusac: Summary of derived per person (for households) derived data frame:\n");
    print(summary(dfs[is.na(dfs$POVPIP),])); # check logic on pov variable POVPIP missing pov set to 0 (no)

    serialH<-sort(unique(dat$SERIALNO)); # unique household SERIALNO in dat data.frame

    mhp<-match(house$SERIALNO,serialH); # look for records in house data.frame that don't match records in dat data.frame
    multi.person<-house[is.na(mhp),];
    cat("\nusac: PUMS Housing Records that match multiple SERIALNO id's from Person Records\n");
    cat(" should be all Household records)\n");
    # all records have TYPEHUGQ set to 1 i.e. household records.
    print(summary(multi.person));

    #
    # now apply | (or) logic across values of life for all persons in a household
    #
    # application form that lifeline applicants fill out
    # www.usac.org/.../LI_Application_NVstates.pdf
    #
    N<-length(serialH); # loop over serialH (unique dat$SERIALNO values)
    lifeline<-rep(0,N);

    for(i in seq(1,N)) {
    if(!(i%%2000)) cat("i=",i,"\n");
    housei<-serialH[i]==dat$SERIALNO;
    lifeline[i]<-any(life[housei])
    }; # for i
    weight<-dat$WGTP[match(serialH,dat$SERIALNO)];
    datH<-data.frame(serialH,weight,lifeline);

    # number of qualified households

    qhouse<-sum(datH$weight[datH$lifeline==1],na.rm=TRUE);
    cat("\nusac: Number of households qualified for Lifeline Households=",qhouse,note,"\n");
    rtn<-list(households=qhouse,poverty.level=poverty.level,state=toupper(state),fips=fips
    ,url.house=urlh,url.person=urlp,vintage=vintage,period=period,data=dfs);
    invisible(rtn);
    }; # usac

    ## END ##