Calculating ACP/Lifeline Eligibility Using Tidycensus

Hello all,
I'm trying to put together some state-level estimates of household eligibility for the American Connectivity Program. The eligibility requirements (sheet 2)are essentially the same as for Lifeline, but the income to poverty ration is 200%, rather than 135. I'm running this analysis for all states, but am just focusing on Alabama in this example. The estimate of eligible households produced (AL.pums) is much lower than expected compared to 2021 eligibility estimates in this dataset from USAC (sheet 1) (USAC manages Lifeline). I realize this is a really specific issue, but since this is my first time working with the PUMS data I have a feeling I may have made an error along the way. Thanks in advance for any help with this!

 library(tidyverse)
library(tidycensus)
census_api_key(Sys.getenv("CENSUS_API_KEY"))

pums_vars <- c("HINS4", "FS", "PAP", "SSIP", "POVPIP")

all_pums <- get_pums(variables = pums_vars, state = "AL", year = 2020, 
 survey = "acs5")

AL.pums <- all_pums %>% 
 #filter records that meet eligibility requirements
 subset(HINS4 == 1 | FS == 1 | PAP == 1:30000 | SSIP == 1:30000 | POVPIP == 0:200) %>% 
 subset(SPORDER == 1) %>% #retain just one record per household
summarize(hh_eligible = sum(WGTP)) #calculate total number of households eligible

David Dorer over 2 years ago

Dear Christine

Just as a "heads up" on using the POVPIP variable. It is defined for every record whereas ACS detail table B17001 only defines poverty for the Universe: "Population for whom poverty status is determined". Hence the PUMS file should have some records where POVPIP is undefined. [NOTE 2-18-2023 The PUMS files do have records where POVPIP is undefined. They have a blank field in the PUMS csv files. However if you use the API as I did, -1 will be used as the missing value. I didn't realize this initially] There is no way to determine these records from the PUMS data. See the link below for details. I've cut an pasted from and earlier thread, including a message from the people at the Census.

I've just been through a round with the people at the Census on reconciling "below poverty" from the PUMS with "below poverty" for table B17001. Poverty is not defined for households only for families which includes only people in the household who are related to the "head of household." For example a renter not related to the family is not included in the calculation and is not part of the "universe." For the calculation with partial details see: https://www.census.gov/topics/income-poverty/poverty/guidance/poverty-measures.html. You add up the income for all family members and then apply the formula which is based on total family income, number of family members and number of children. Here is the email that I just received from the people at the Census: (regarding matching numbers from B17001 and the calculation using the PUMS variables POVPIP RELSHIPP TYPEHUGQ AGEP.

Hi David,

According to the poverty subject matter experts, "It is possible that they [PUMS and B17001] are not matching because the poverty universe excludes a few more types of populations 1. children under the age of 15 who are not related to the householder 2. people living in institutional group quarters (nursing homes and correctional facilities) 3. people living in college dormitories 4. people living in military barracks. Perhaps by excluding all not in the universe he will be closer, that being said we have confidence in the estimates he mentions from 2020. The estimate 120,888 is within the moe of the PUMA that can be seen S1701 (120,938) on data.census.gov, so we have confidence in it."

Let me know if this helps or not David.

Vicki
Cancel
Up 0 Down

Reply

Cancel
christinemparker over 2 years ago in reply to David Dorer

Hi David,

Thanks for bringing up this point. I am aware that some populations are excluded when I use POVPIP, but I'm trying to recreate Lifeline eligibility estimates based on methodology provided by USAC and they specify that variable. So with that in mind, I need to use these specific variables. I'm wondering if the populations with "0" household weight were somehow accounted for in the eligibility estimate.

In their description it states, " Households are determined to be eligible for Lifeline if the householder reported any of the following in their response to ACS" - this leads me to wonder if they filtered on PERNUM (SPORDER) = 1, and applied person weights? I'm not sure if that makes sense, but I'll try it out and report back.
Cancel
Up 0 Down

Reply

Cancel
David Dorer over 2 years ago in reply to christinemparker

Dear Christine,

I have to do some additional research on this, but for the benefit of others who use poverty variables and who are reading this here is some additional information. Different Federal (and state) programs use slightly different definitions for "poverty." I did a small google search and found this HHS web page: https://aspe.hhs.gov/topics/poverty-economic-mobility/poverty-guidelines This webpage uses "family" and "household" interchangeably whereas the Census (ACS) considers them different "concepts."

For the ACS, I believe that a "household" (in contrast to a "group quarters") has a head of household. I think that this is "person 1" on the ACS survey form (please correct me if I have this wrong).

To go back to the Federal Register notice, www.federalregister.gov/.../annual-update-of-the-hhs-poverty-guidelines

"The poverty guidelines are used as an eligibility criterion by Medicaid and a number of other federal programs. The poverty guidelines issued here are a simplified version of the poverty thresholds that the Census Bureau uses to prepare its estimates of the number of individuals and families in poverty. " To continue:

"This notice does not provide definitions of such terms as ‘‘income’’ or ‘‘family’’ as there is considerable variation of these terms among programs that use the poverty guidelines. The legislation or regulations governing each program define these terms and determine how the program applies the poverty guidelines. In cases where legislation or regulations do not establish these definitions, the entity that administers or funds the program is responsible to define such terms as ‘‘income’’ and ‘‘family.’’ Therefore questions such as net or gross income, counted or excluded income, or household size should be directed to the entity that administers or funds the program. Dated: January 12, 2023."

One point that I would like to make is that the POVPIP variable in the PUMS file does not correspond to any of these definitions and should be use with care/modified as needed.

In the past I have communicated with the people at the census and have asked for the SAS code that is used to process the ACS survey form responses to produce the POVPIP variable in the PUMS file. Someone sent an outdated "Annotated Case Report Form" (in the FDA new drug application sense). They are reluctant to share the information presumably because it changes all the time and they don't want it "out there." You could probably make a Freedom of Information Act request (FOIA) to get the code used to process the ACS 1 year 2021 PUMS POVPIP variable. Maybe I'll give it a try.

Best,

Dave

If you want to know what an FDA Annotated case report form looks like, here are the guidelines:

wiki.cdisc.org/.../aCRF_Guideline_v1-0_20201120_publish.pdf

(sorry about the bad link in an earlier version of this post)

If you submit data to the FDA you need to provide an Annotated Case Report Form for every data item you submit. The guidelines give you an idea of the required detail (46 pages) !
Cancel
Up -1 Down

Reply

Cancel
David Dorer over 2 years ago

Dear Christine to get back to your question about 0 household weights WGTP: 0 Weights are group quarters, e.g. college dorm, nursing home, prison, military barracks etc. The group quarters codes are here:

https://www2.census.gov/programs-surveys/acs/tech_docs/code_lists/2021_ACS_Code_Lists.pdf

see pdf page 31 for detailed group quarters codes.

Note due to disclosure avoidance only the institutional/non-institutional breakdown is in the PUMS file variable TYPEHUGQ

Here is the cross tabulation for 1 PUMA/PUMS file Since you use R, here is the R code

State 25 Massachusetts PUMA 03304 (Part of Boston?? -- quite a few College dormitories)

TYPEHUGQ   Character 1 Type of unit

1 .Housing unit
2 .Institutional group quarters
3 .Noninstitutional group quarters

WGTP Housing unit weight 0== group quarters

z    data.frame with PUMS person level data with housing data merged back in on SERIALNO

POVn<-as.numeric(z$POVPIP);

poverty_NA<-povn<0;

htype<-factor(z$TYPEHUGQ,levels=c(1,2,3),labels=c("house","inst","non-inst"))

table(pov.NA,htype)

                       htype
poverty_NA     house inst non-inst
FALSE    1129     0          44
TRUE           3   22          45

house.wt0<-z$WGTP==0;
table(house.wt0,htype);

                           htype
house.wt0      house inst non-inst
FALSE            1132     0 0
TRUE                   0   22         89

My comment in an earlier post below is incorrect. Some POVPIP records have POVPIP undefined/missing.

Note PUMS variables that are positive usually have missing defined as lowest defined value -1 (comment on this forum from person at Census)

Note from above 3 individual records from households have POVPIP "undefined" 44 individual records from non-institutional group quarters have POVPIP defined. So some group quarters (non-intitutional e.g. maybe adult treatment centers, what else ?) have POVPIP defined. What I said below in a previous post that all group quarters have POVPIP missing i incorrect. Compare this to:

https://www.census.gov/topics/income-poverty/poverty/guidance/poverty-measures.html

This web page says college dorms and military barracks do not have POVPIP defined

The type of group quarters with a non-missing POVPIP values cannot be determined from the PUMS data alone because the detailed group quarters type is not in the PUMS data. See ACS code list document above for the list of detailed group quarters codes.

Here are some of the non-institutional codes from the codebook

OTHER NONINSTITUTIONAL FACILITIES

EMERGENCY AND TRANSITIONAL SHELTERS (WITH SLEEPING FACILITIES) FOR PEOPLE EXPERIENCING HOMELESSNESS (701)
.GROUP HOMES INTENDED FOR ADULTS (801)
.RESIDENTIAL TREATMENT CENTERS FOR ADULTS (802)
.WORKERS’ GROUP LIVING QUARTERS AND JOB CORPS CENTERS (901)
.OTHER NONINSTITUTIONAL GROUP QUARTERS (904)

If I understand the USAC Lifeling programs (Google search) they provide phone and internet service for lower income people.

I guess a question is "are people in group quarters, for example a nursing home eligible for the program USAC ?" I was thinking that some assisted living situations might be group quarters but I think that if you have your own room this would be the same as an apartment building. There is quite a bit on documentation about determining if a living situation is a group quarters or a household.

Hope this helps !

Dave
Cancel
Up 0 Down

Reply

Cancel

christinemparker over 2 years ago in reply to David Dorer

Hey Dave,

I believe I understand the point you're making, but USAC explicitly states that they use POVPIP in their eligibility calculations. I'm comparing my calculated estimates to USAC's and mine are incorrect, so I'm trying to understand how they arrive at their estimates. I'm not comparing my estimates to other aggregate tables. As far as I know, their estimates do not include folks in other living situations. These are the specific qualifications listed in their methods.

Medicaid, Medical Assistance, or any kind of government-assistance plan for those with low incomes or a disability	HINS4	1 -- Yes
Yearly food stamp/Supplemental Nutrition Assistance Program (SNAP) recipiency	FS	1 -- Yes
Public assistance income over past 12 months (any amount)	PAP	1 to 30000 -- $1 to $30000 (Rounded)
Supplemental Security Income over past 12 months (any amount)	SSIP	1 to 30000 -- $1 to $30000 (Rounded)
Poverty status recode indicating household income below the 135% poverty threshold	POVPIP	0:135 (inclusive)

David Dorer over 2 years ago in reply to christinemparker

Dear Christine,

I think that i understand what numbers you are trying to reproduce. I can't do it either:

Here is what I get for AL (fips code "01") Accessing the state csv file on

https://www2.census.gov/programs-surveys/acs/data/pums/2021/1-Year/ (2021 1- year ) also tried 2021 5 year data.

773318 AL 2021 1 year PUMS 732903 AL 2021 5-year PUMS including variable POVPIP

566248 AL 2021 1 year PUMS 539818 AL 2021 5-year PUMS excluding variable POVPIP

A couple of issues. POVPIP is defined for individuals but is based on the family. I think that the reason for this is that if you have an "unrelated" person in the household their poverty level is undefined. The poverty variable is defined for members of the family and is computed from the family income, number of members in the family and the number of children in the family. Presumably non-family members of the household are not eligible for the USAC program. FS is a household variable. Presumably it means that someone in the HOUSEHOLD receive food stamps but the PUMS codebook is not totally clear. Perhaps it means that someone in the FAMILY receives SNAP. SSIP is a person variable. I added up the SSIP values for members. 30,000 is the maximum value for a person in the PUMS file but there could be two or more people receiving assistance. Should those values be added or should you take the maximum ? I tried it both ways. HINS4 is a "person" variable. Presumably the household is eligible if any member of the household has HINS4==1.

So much for the details. I call it data archaeology. You might try emailing them and asking their data person for their computer code. SAS perhaps.

Best,

Dave
Cancel
Up 0 Down

Reply

Cancel
David Dorer over 2 years ago in reply to David Dorer

Dear Christine,

Since I use PUMS data and the variables that are included in the Lifeline eligibility criteria quite a bit, i went ahead and wrote an R function to do the "dirty work" of making the calculation. For AL 2021 1 year PUMS data this code produces 774,694 eligible households. The Lifeline spreadsheet Lifeline-Participation-Rate.xlsx shows a value of 626087. Hence the calculations differ by almost 150,000. Since this is off by such a large amount, let me know about any errors or suspicious code. [NOTE 2-18-2023 I modified the code so I can get a value for Lifeline eligibility for individuals in Group Quarters. The WGTP housing weight variable is 0 for these housing records. For the people in Group Quarters you need to sum the person weights PWGTP across the people in each Group Quarters. Only one variable in the eligibility criteria is defined at the Housing record level FS - Food Stamps/SNAP. If you ignore the value of this variable in the Lifeline calculation you get weights (sum of person weights in the Group Quarter). The net result is you get a value for eligible people (Not Households) who reside in Group Quarters TYPEHUGQ values 2 and 3 For my calculation this only increases the divergence between my calculation and the the spreadsheet on the USAC website but it might help Christine's calculation The data handling rules in the USAC/Lifeline spreadsheet are not clear on this point I don't know if individuals who reside in Group Quarters are eligible for the Lifeline program.]

The calculation that I did does not use "tidycensus" so the code is somewhat more primitive than your code. I use the Census FTP site zipped csv files. You need two files for each state, the person file and the housing file. These files need to be merged on the SERIALNO variable to "pull across" the household/group quarters variables into the person level dataset. I believe that tidycensus does this by default but I'm not enough of an expert on tidycensus to be sure.

Based on my reading of the posted application form https://www.usac.org/wp-content/uploads/lifeline/documents/forms/LI_Application_NVstates.pdf, I believe that the logical OR "|" is across all variables (suitably normailzed) and all persons in the household. For example if one person in the household is eligible/receives Medicaid than the household is eligible for the service. I use a loop on the unique SERIALNOs in the merged file. I then | (logical or)across all records where the unique SERIALNO matches the SERIALNO in the merged person/housing data.frame, i.e. across all members of the household. Errors often occur when you do a merge so there could be an error in that step. I used some "defensive" coding to handle missing values. This often a source of errors/differences in results.

I have many years of experience in submitting code and datasets to the FDA. The regulations require a lot of quality checks before you can submit to the FDA. Usually you do "double programming" for these checks. Often it requires over a day of work by two programmers before the code and dataset pass the required checks. Most data people in the nonprofit world do not use this level of rigor. In any case, when you use different programs you usually get different answers and it takes a lot of work to reconcile the results of different calculations.

If you send an email to info@dorerfoundation.org, I can email the code as an attachment.

#
# USAC lifeline_pums_analysis_021623.R
#
# R function to compute number of eligible Lifeline Households from ACS PUMS data
#
# www.usac.org/.../LI_Application_NVstates.pdf
#
# By David J Dorer, Ph.D.
#
# copyright 2023 by Dorer Community Service Foundation Inc of Massachusetts
#
# info@dorerfoundation.org
#
# v1.0 djd 16 Feb 2023 12:40
#
#
# z<-usac()
#
usac<-function(vintage=2021,state="AL",poverty.level=135,fips="01",period=1) {
#
# ARGUMENTS:
#
# vintage: PUMS vintage
# period: PUMS period
# state: state abbreviation
# fips: state FIPS Code
# poverty.level: poverty threshhold to use in calculation
#
# VALUE:
#
# list with
#
# households: number of eligible households in state
# vintage: argument
# period: argument
# url.person: Census FTP site url for PUMS Person records
# url.house: Census FTP site url for PUMS House records
# state: argument
# fips argument
# data: Derived data.frame used to make calculation.
#

state<-tolower(state);

error<-1;
if((vintage==2021) & (period==1)) {
note<-paste("vintage: ",vintage,"period: ",period," year PUMS");
urlh<-paste0("".zip");">www2.census.gov/.../csv_h",state,".zip");
urlp<-paste0("".zip");">www2.census.gov/.../csv_p",state,".zip");
error<-0;
}; # if

if((vintage==2021)&(period==5)) {
urlh<-paste0("".zip");">www2.census.gov/.../csv_h",state,".zip");
urlp<-paste0("".zip");">www2.census.gov/.../csv_p",state,".zip"); note<-"2021 5-year PUMS"
error<-0;
}; #
if(error) {
state<-toupper(state);
msg<- paste("usac: ERROR bad arguments: state=",state,"FIPS=",fips,"vintage=",vintage,"period=",period);
cat(msg,"\n");
return(list(error=error,state=state,fips=fips,vintage=vintage,period=period,message=msg));
}; # if error

# note this code downloads zip file to your "working directory" use getwd() to see the directory/folder
download.file(urlh,"pumsh.zip"); # housing records from FTP site
download.file(urlp,"pumsp.zip"); # person records from FTP site

house<-read.csv(unz("pumsh.zip",paste0("psam_h",fips,".csv"))); # load housing records directly from zip file
person<-read.csv(unz("pumsp.zip",paste0("psam_p",fips,".csv"))); # load person records directly from zip file

# trim down house and person to the variables of interest
hvar0<-c("SERIALNO","TYPEHUGQ","WGTP","FS"); hvar<-c("TYPEHUGQ","WGTP","FS");
pvar<-c("SERIALNO","HINS4","PAP","SSIP","POVPIP");
house<-house[,hvar0];
person<-person[,pvar];

# merge person and housing records
m<-match(person$SERIALNO,house$SERIALNO);

# data.frame like the on returned by tidycensus function get_pums() ??
dat<-data.frame(person,house[m,c("TYPEHUGQ","WGTP","FS")]);

# create logical variables in the merged "dat" data.frame
pov<-dat$POVPIP;pov[pov<0]<-NA; pov[pov<=poverty.level]<-1; # <= poverty.level argument poverty level clause

pov[pov>poverty.level]<-0; pov[is.na(pov)]<-0; # at this point 0 is above poverty.level or dat$POVPIP is missing

ssip<-dat$SSIP; ssip[is.na(ssip)]<-0; # Receive Supplemental Social Security Income SSI >= 1 otherwise 0

pap<-dat$PAP; pap[is.na(pap)]<-0; # Public Assistance Income missing value set to zero

# HINS4 Health Insurance Medicade etc ==1 TRUE otherwise FALSE no missing
hins<-dat$HINS4; hins[is.na(hins)]<-0; # result 1==receives Medicaid etc. 0 or missing does not receive Medicaide, etc.

# FS Receives Food Stamps/SNAP == 1 (yes) == 2 (no) no missing values
# some defensive coding
fs<-dat$FS==1; fs[is.na(fs)]<-0; fs[fs==2]<-0; # recode 1=receives SNAP yes, 0 or missing no

# life<-(fs==1)|(pov>0)|((pap<=30000)&(pap>0))|((ssip<=30000)&(ssip>0))|(hins==1); # logic at household person record level
life<-(fs==1)|(pov>0)|(pap>0)|(ssip>0)|(hins==1); # logic at household person record level

# no missing values for all derived variables including "life"
dfs<-data.frame(dat,pov,pap,hins,fs,ssip,lifeline=life); # data.frame to make sure everything aligns and check logic

cat("\nusac: Summary of derived per person (for households) derived data frame:\n");
print(summary(dfs[is.na(dfs$POVPIP),])); # check logic on pov variable POVPIP missing pov set to 0 (no)

serialH<-sort(unique(dat$SERIALNO)); # unique household SERIALNO in dat data.frame

mhp<-match(house$SERIALNO,serialH); # look for records in house data.frame that don't match records in dat data.frame
multi.person<-house[is.na(mhp),];
cat("\nusac: PUMS Housing Records that match multiple SERIALNO id's from Person Records\n");
cat(" should be all Household records)\n");
# all records have TYPEHUGQ set to 1 i.e. household records.
print(summary(multi.person));

#
# now apply | (or) logic across values of life for all persons in a household
#
# application form that lifeline applicants fill out
# www.usac.org/.../LI_Application_NVstates.pdf
#
N<-length(serialH); # loop over serialH (unique dat$SERIALNO values)
lifeline<-rep(0,N);

for(i in seq(1,N)) {
if(!(i%%2000)) cat("i=",i,"\n");
housei<-serialH[i]==dat$SERIALNO;
lifeline[i]<-any(life[housei])
}; # for i
weight<-dat$WGTP[match(serialH,dat$SERIALNO)];
datH<-data.frame(serialH,weight,lifeline);

# number of qualified households

qhouse<-sum(datH$weight[datH$lifeline==1],na.rm=TRUE);
cat("\nusac: Number of households qualified for Lifeline Households=",qhouse,note,"\n");
rtn<-list(households=qhouse,poverty.level=poverty.level,state=toupper(state),fips=fips
,url.house=urlh,url.person=urlp,vintage=vintage,period=period,data=dfs);
invisible(rtn);
}; # usac

## END ##
Cancel
Up 0 Down

Reply

Cancel
David Dorer over 2 years ago

Dear Christine,

MAKING PROGRESS

I was able to reproduce the AL row in the Lifeline spreadsheet. The value is 626087 I used the 2021 1-year PUMS. I added SPORDER==1 to the code posted earlier This code here computes with only SPORDER==1 person records and with all person records If you can tell me now to paste in the code like you did so it looks OK then I'll post it like you did. If you send an email to info@dorerfoundation.org I'll email the code as an attachment.

I don't use the %>% format in my code but I believe if you add | SPORDER == 1 to the first subset line and eliminate the second subset line you will get the right thing. I'm a long time FORTRAN programmer so I still use loops with an integer index. The other issue is that you used 2020 5 year data and the spreadsheet uses 2021 1-year PUMS data. You might also ask why USAC Lifeline people didn't "|" across all members of the household. The application seems to say that if someone in the household is eligible then the entire household is eligible. So if anyone receives Medicaid then the household is eligible. The ACS has the concept of a single Householder (person 1 on the form). If there is a husband and wife for example and the wife fills out the form as "person 1" and the husband is on Medicaid then the ACS based calculation with SPORDER==1 is wrong and will count the household as NOT eligible. This makes the spreadsheet computed participation rate too high. They should be using the denominator of 774694 (for AL 2021 - 1 year data) which looks at all persons in the household.

Another thing is that the csv and API downloaded data have a different rule for missing values. The csv files have a ",," blank field for missing. The API cannot handle a "blank" data field so the API puts in a value. The rule that a Census person posted on the forum is lowest possible value -1 for missing with the caveat of "usually," This detail does not appear in the PUMS pdf codebook. Your code with POVPIP==0:200 should take care of this in the API case but I'm not sure. Also remember to use a 135 limit for POVPIP to match the Lifeline spreadsheet.

Reference - Lifeline Federal Regulations giving qualification for Lifeline: (uses 135% of poverty GUIDELINE)

https://www.law.cornell.edu/cfr/text/47/54.409

"The consumer, one or more of the consumer's dependents, or the consumer's household must receive benefits from one of the following federal assistance programs: Medicaid; Supplemental Nutrition Assistance Program; Supplemental Security Income; Federal Public Housing Assistance; or Veterans and Survivors Pension Benefit."

ACP (Affordable Connectivity Program (uses 200% of Poverty GUIDELINE (Not Poverty Threshold as does the ACS)

https://www.law.cornell.edu/cfr/text/47/54.1805

Both Lifeline and ACP use "Family"

Household and income are defined here:

https://www.law.cornell.edu/cfr/text/47/54.400#f

#
# USAC lifeline_pums_analysis_021823.R
#
# R function to compute number of eligible Lifeline Households from ACS PUMS data
#
# www.usac.org/.../LI_Application_NVstates.pdf
#
# By David J Dorer, Ph.D.
#
# copyright 2023 by Dorer Community Service Foundation Inc of Massachusetts
#
# info@dorerfoundation.org
#
# v1.0 djd 16 Feb 2023 12:40
# v1.1 djd 18 Feb 2023 13:38 added group quarters eligible persons calculation
# v1.2 djd 19 Feb 2023 18:43 added calculation using only person records with SPORDER==1
#
#
# z1<-usac(vintage=2021,period=1)
# z2<-usac(vintage=2020,period=1);
# z3<-usac(vintage=2020,period=5);
# z4<-usac(vintage=2021,period=5);
# z5<-usac(vintage=2021,period=1,state="MA",fips="25");

usac<-function(vintage=2021,state="AL",poverty.level=135,fips="01",period=1) {
#
# ARGUMENTS:
#
# vintage: PUMS vintage
# period: PUMS period
# state: state abbreviation
# fips: state FIPS Code
# poverty.level: poverty threshold to use in calculation
#
# VALUE:
#
# list with
#
# households: number of eligible households in state

# householders: number of eligible householders (SPORDER==1) records only
# vintage: argument
# period: argument
# url.person: Census FTP site url for PUMS Person records
# url.house: Census FTP site url for PUMS House records
# state: argument
# fips argument
# data: Derived data.frame used to make calculation.
#

state<-tolower(state);

error<-0; note<-paste0(vintage," ",period,"- PUMS");
msg<-paste("State=",state,"fips=",fips,"PUMS vintage=",vintage,"period=",period);

note<-paste("vintage: ",vintage,"period: ",period," year PUMS");
urlh<-paste0(""/",period,"-Year/csv_h",state,".zip");">www2.census.gov/.../csv_h",state,".zip");
urlp<-paste0(""/",period,"-Year/csv_p",state,".zip");">www2.census.gov/.../csv_p",state,".zip");

if((vintage==2020)&(period==1)) { # because of covid 2020 1 year PUMS was released with experimental weights
urlh<-paste0("".zip");">www2.census.gov/.../csv_h",state,".zip");
urlp<-paste0("".zip");">www2.census.gov/.../csv_p",state,".zip");
}; # if

# note this code downloads zip file to your "working directory" use getwd() to see the directory/folder
vh<-try(download.file(urlh,"pumsh.zip")); # housing records from FTP site
vp<-try(download.file(urlp,"pumsp.zip")); # person records from FTP site

if(class(vh)[1]=="try-error") {
msg<-c(msg,paste0("ERROR downloading housing file url=",urlh,"vintage=",vintage,"period=",period));
error<-error+1;
};

if(class(vp)[1]=="try-error") {
msg<-c(msg,paste0("ERROR downloading person file url=",urlp));
error<-error+1;
};
if(error>0) {cat("usac: ERRORS\n");return(msg); };

hfile<-paste0("psam_h",fips,".csv");
pfile<-paste0("psam_p",fips,".csv");

house<-try(read.csv(unz("pumsh.zip",hfile))); # load housing records directly from zip file
person<-try(read.csv(unz("pumsp.zip",pfile))); # load person records directly from zip file

if(class(house)[1]=="try-error") {
msg<-c(msg,paste0("ERROR reading zip archive: pumsh.zip file=",hfile))
error<-error+1;
};

if(class(person)[1]=="try-error") {
msg<-c(msg,paste0("ERROR reading zip archive: pumsp.zip file=",pfile))
error<-error+1;
};

if(error>0) {cat("usac: ERRORS\n");return(msg); };

# trim down house and person to the variables of interest
hvar0<-c("SERIALNO","TYPEHUGQ","WGTP","FS"); hvar<-c("TYPEHUGQ","WGTP","FS");
pvar<-c("SERIALNO","SPORDER","HINS4","PAP","SSIP","POVPIP","PWGTP");
house<-house[,hvar0];
cat("\nsummary house data.frame=\n");print(summary(house));
person<-person[,pvar];

# merge person and housing records
m<-match(as.character(person$SERIALNO),as.character(house$SERIALNO));

# data.frame like the on returned by tidycensus function get_pums() ??
# note tidycensus get_pums() appears to use the API.
# The API cannot handle blank fields, missing is lowest value - 1 (typically)

dat<-data.frame(person,house[m,c("TYPEHUGQ","WGTP","FS")]);
cat("\nWGTP==0 by TYPEHUGQ\n");print(table(wgtp0=addNA(house$WGTP==0),typehq=addNA(as.factor(house$TYPEHUGQ))));

cat("\nusac: summary dat merged person/house data.frame:\n");print(summary(dat));
cat("\nusac: summary dat merged person/house data.frame with house weight (WGTP)==0:\n");print(summary(dat[dat$WGTP==0,]));

# create logical variables in the merged "dat" data.frame
pov<-dat$POVPIP;pov[pov<0]<-NA; pov[pov<=poverty.level]<-1; # <= poverty.level argument poverty level clause
pov3<-rep(NA,length(pov)); pov3[pov<0]<-3;pov3[is.na(pov)]<-3;
flag<-(pov>=0)|(pov<=poverty.level);flag[is.na(flag)]<-FALSE;pov3[flag]<-1;
flag<-pov>poverty.level;flag[is.na(flag)]<-FALSE;pov3[flag]<-2;
pov3f<-factor(pov3,levels=c(1,2,3),c("below","above","undefined"));
pov[pov>poverty.level]<-0;
sporder<-dat$SPORDER;

ssip<-dat$SSIP; ssip[is.na(ssip)]<-0; # Receive Supplemental Social Security Income SSI >= 1 otherwise 0

pap<-dat$PAP; pap[is.na(pap)]<-0; # Public Assistance Income missing value set to zero

# HINS4 Health Insurance Medicade etc ==1 TRUE other wise no missing
hins<-dat$HINS4; hins[is.na(hins)]<-0; # result 1==receives Medicaid etc. 0==missing 2==no
# FS Receives Food Stamps/SNAP == 1 (yes) == 2 (no) no missing values
cat("summary of PAP>0\n"); print(summary(dat$PAP[dat$PAP>0]));
cat("summary of pap>0\n"); print(summary(pap[pap>0]));
cat("summary PAP==0/pap==0\n");print(table(PAP0=addNA(dat$PAP==0),pap0=addNA(pap==0)));
cat("summary of SSIP>0\n"); print(summary(dat$SSIP[dat$SSIP>0]));
cat("summary of ssip>0\n"); print(summary(ssip[ssip>0]));
cat("summary SSIP==0/ssip==0\n");print(table(SSIP00=addNA(dat$SSIP==0),ssip0=addNA(ssip==0)));

# some defensive coding
fs<-dat$FS==1; fs[is.na(fs)]<-0; fs[fs==2]<-0; # recode 1=receives SNAP yes, 0 missing or no
cat("summary of FS/fs\n"); print(table(addNA(as.factor(fs)),addNA(as.factor(dat$FS))));

life<-(fs==1)|(pov3f=="below")|(pap>0)|(ssip>0)|(hins==1); # logic at household person record level

# no missing values for all derived variables including "life"
dfs<-data.frame(dat,pov3f,pap,hins,fs,ssip,lifeline=life); # data.frame to make sure everything aligns and check logic

cat("\nusac: Summary of derived per person data frame (dat with derived variables)\n");print(summary(dfs));

cat("\nusac: Summary of derived per person (for households) data frame with POVPIP missing.:\n");
print(summary(dfs[is.na(dfs$POVPIP),])); # check logic on pov variable POVPIP missing pov set to 0 (no)

serialH<-sort(unique(as.character(dat$SERIALNO))); # unique household SERIALNO in dat data.frame

mhp<-match(house$SERIALNO,serialH); # look for records in house data.frame that don't match records in dat data.frame
multi.person<-house[is.na(mhp),];
cat("\nusac: PUMS Housing Records that match multiple SERIALNO id's from Person Records\n");
cat(" should be all Household records)\n");
# all records have TYPEHUGQ set to 1 i.e. household records.
print(summary(multi.person));

# now apply | (or) logic across values of life for all persons in a household
#
# application form that lifeline applicants fill out
# www.usac.org/.../LI_Application_NVstates.pdf
# This application seems to indicate that the Household is elgible in anyone individual personin the househould
# is eligible not just the ACS SPORDER==1 (Head of Household -- ACS Questionnaire Person 1)
# so the calculation should OR across all person records in the Household
#

N<-length(serialH); # loop over serialH (unique dat$SERIALNO values)
lifeline1<-lifeline<-rep(0,N);
pweight<-rep(NA,N);

for(i in seq(1,N)) {
if(!(i%%2000)) cat("i=",i,"\n");
housei<-serialH[i]==dat$SERIALNO;
lifeline[i]<-any(life[housei]);
lifeline1[i]<-any(life[housei&(sporder==1)]);
pweight[i]<-sum(person$PWGTP[serialH[i]==person$SERIALNO]); # sum person weights
}; # for i
weight<-dat$WGTP[match(serialH,dat$SERIALNO)];
weight2<-house$WGTP[match(serialH,as.character(house$SERIALNO))];

dataH<-data.frame(SERIALNO=serialH,weight,pweight,lifeline,lifeline1,house[match(serialH,house$SERIALNO),]);
cat("\nusac: compare weights: weight-weight2\n");print(summary(weight-weight2));
cat("\nsummary dataH: \n");print(summary(dataH));
cat("\nsummary 0 weights \n");print(summary(dataH[dataH$weight==0,]));

# number of qualified households
cat("\nsummary dataH: for weight==0\n");print(summary(dataH[dataH$weight==0,]));
qhouse<-sum(dataH$weight[dataH$lifeline==1],na.rm=TRUE); # any person in Household qualifed make Household qualified
qhouse1<-sum(dataH$weight[dataH$lifeline1==1],na.rm=TRUE); # qualified Household based on person SPORDER==1
qperson<-sum(dataH$pweight[dataH$TYPEHUGQ!=1]); # qualified Group Quarters persons (sum person weights)
qhouse2<-sum(dataH$weight[(dataH$lifeline==1)&(dataH$TYPEHUGQ==1)],na.rm=TRUE);

cat("\nLifeline/TYPEHUGQ=\n");print(table(H.type=addNA(as.factor(dataH$TYPEHUGQ)),lifeline=addNA(as.factor(dataH$lifeline))));
cat("summary(dataH: for Group Quarters=\n");print(summary(dataH[dataH$TYPEHUGQ!=1,]));
cat("\nsummary qualified households\n");print(summary(dataH[dataH$lifeline==1,]));

cat("\nusac: Results: ",date(),note,"\n");
cat("usac: Number of households/group.quarters qualified for Lifeline Households=",qhouse,"\n");
cat("usac: Number of households qualified for Lifeline Households=",qhouse2,"\n");
cat("usac: Number of households qualified for Lifeline based on Householder=",qhouse1,"\n");
cat("usac: Number of Group.Quarters persons qualified for Lifeline Households=",qperson,"\n");

rtn<-list(households=qhouse2,householders=qhouse1,personsGQ=qperson,includeGQ=qhouse
,poverty.level=poverty.level,state=toupper(state),fips=fips,note=note,error=error
,url.house=urlh,url.person=urlp,vintage=vintage,period=period,data.person=dfs,data.house=dataH);
invisible(rtn);
}; # usac

## END ##
Cancel
Up 0 Down

Reply

Cancel
christinemparker over 2 years ago in reply to David Dorer

Hi David,

I really appreciate all your effort on this! After trying out several versions of the same datasets (via tidycensus API, downloaded from FTP site, and downloaded from iPUMS), I finally got this figured out. I also realized that the SPORDER == 1 would limit the count to householder information only. I also noticed in your code the TYPEHUGQ field. Once I included that field AND properly dealt with NA values that arose when I joined the person & household records...VOILA! I did keep the POVPIP range at 0:200, because that is the range specified for the Affordable Connectivity Plan (very to lifeline but focused on Internet access, and eligibility-wise POVPIP is the only difference that I'm aware of).

I keep the code for this on github here if you're interested: https://github.com/ILSR-GIS-DATA/Affordable-Connectivity-Program-Analysis.

For future reference, if you want to add in code (see dropdown options in pic below). Thanks again for brainstorming with me!
Cancel
Up 0 Down

Reply

Cancel
David Dorer over 2 years ago in reply to christinemparker

Dear Christine,

I just fixed your tidyverse code Here is the corrected code:

lifeline.eligible<-function(state="AL",vintage=2021,period=1) {
library(tidyverse)
library(tidycensus)

census_api_key(Sys.getenv("CENSUS_API_KEY"))
pums.vars <- c("HINS4", "FS", "PAP", "SSIP", "POVPIP")

#
# note all.pums data.frame has no missing (NA) values.
# -1 is used as a missing value for PAP SSIP and POVPIP
#
all.pums <- get_pums(variables = pums.vars, state = state, year = vintage, survey = paste0("acs",period));

#
# the following logic drops records with missing (-1) values
# selects SPORDER==1 records
# note this logic is incorrect but matches that used for the Lifeline eligiblity spreadsheet

AL.pums <- all.pums %>%
#filter records that meet eligibility requirements with SPORDER==1
filter((SPORDER==1)&(HINS4 == 1 | FS == 1 | between(PAP,1,30000) | between(SSIP,1,30000)| between(POVPIP,0,135))) %>%
summarize(hh.eligible = sum(WGTP)) #calculate total number of households eligible
AL.pums;
};

ADDED 3:05 PM EST

Working on this has gotten me interested in the ACP and Lifeline. I have a friend in DC who worked with the OMB and now works with the CBO. Apparently he has worked on the program and is familiar with it -- presumably on the Congressional budget side. In any case working with 501(c)(3) organizations is part of the mission/purpose of the Dorer Community Service Foundation. So if you need anymore help we can create a "Project" description and formalize a relationship between DCSF and the ILSR for pro bono consulting services. Email info@dorerfoundation.org if you want to proceed along this line.

The main mistake with you original code was using "subset" in instead of "filter" and using the sequence operator 1:30000. You need a logical AND with >=. (between) I've learned a lot about tidyverse and tidycensus. I tend to like to use the R "base" package as much as possible. You might also connect with the people at USAC and point out that their spreadsheet over estimates the participation rate by quite a bit.

Note get_pums uses the API to download the data so what are blank (missing) fields in the FTP files have a numeric value, in this case -1, as a placeholder for NA

Best,

Dave

UPDATE email is info@dorerfoundation.org Left off the .org in post earlier
Cancel
Up 0 Down

Reply

Cancel
David Dorer over 2 years ago in reply to David Dorer

Dear Christine,

I just fixed your tidyverse code Here is the corrected code:

lifeline.eligible<-function(state="AL",vintage=2021,period=1) {
library(tidyverse)
library(tidycensus)

census_api_key(Sys.getenv("CENSUS_API_KEY"))
pums.vars <- c("HINS4", "FS", "PAP", "SSIP", "POVPIP")

#
# note all.pums data.frame has no missing (NA) values.
# -1 is used as a missing value for PAP SSIP and POVPIP
#
all.pums <- get_pums(variables = pums.vars, state = state, year = vintage, survey = paste0("acs",period));

#
# the following logic drops records with missing (-1) values
# selects SPORDER==1 records
# note this logic is incorrect but matches that used for the Lifeline eligiblity spreadsheet

AL.pums <- all.pums %>%
#filter records that meet eligibility requirements with SPORDER==1
filter((SPORDER==1)&(HINS4 == 1 | FS == 1 | between(PAP,1,30000) | between(SSIP,1,30000)| between(POVPIP,0,135))) %>%
summarize(hh.eligible = sum(WGTP)) #calculate total number of households eligible
AL.pums;
};

ADDED 3:05 PM EST

Working on this has gotten me interested in the ACP and Lifeline. I have a friend in DC who worked with the OMB and now works with the CBO. Apparently he has worked on the program and is familiar with it -- presumably on the Congressional budget side. In any case working with 501(c)(3) organizations is part of the mission/purpose of the Dorer Community Service Foundation. So if you need anymore help we can create a "Project" description and formalize a relationship between DCSF and the ILSR for pro bono consulting services. Email info@dorerfoundation.org if you want to proceed along this line.

The main mistake with you original code was using "subset" in instead of "filter" and using the sequence operator 1:30000. You need a logical AND with >=. (between) I've learned a lot about tidyverse and tidycensus. I tend to like to use the R "base" package as much as possible. You might also connect with the people at USAC and point out that their spreadsheet over estimates the participation rate by quite a bit.

Note get_pums uses the API to download the data so what are blank (missing) fields in the FTP files have a numeric value, in this case -1, as a placeholder for NA

Best,

Dave

UPDATE email is info@dorerfoundation.org Left off the .org in post earlier
Cancel
Up 0 Down

Reply

Cancel
David Dorer over 2 years ago in reply to christinemparker

Code pasted as plain text

lifeline.eligible<-function(state="AL",vintage=2021,period=1) {

# Compute Lifeline eligible households from ACS PUMS data
#
# state: abbreviation for state
# vintage: year in get_pums
# period: either 1 (1-year PUMS) or 5 (5-year PUMS)

library(tidyverse)
library(tidycensus)

census_api_key(Sys.getenv("CENSUS_API_KEY"))
pums.vars <- c("HINS4", "FS", "PAP", "SSIP", "POVPIP")

# Note: get_pums uses API to download PUMS data
# with the API there are no missing values. Numeric values are used for missing
# lowest non-missing value -1 usually
#
# all.pums data.frame has no missing (NA) values.
# -1 is used as a missing value for PAP SSIP and POVPIP
#
all.pums <- get_pums(variables = pums.vars, state = state, year = vintage, survey = paste0("acs",period));

#
# selects SPORDER==1 records
# note this logic is incorrect but matches that used for the Lifeline eligiblity spreadsheet
# www.usac.org/.../Lifeline-Participation-Rate.xlsx
#
# -1 values in | (Logical OR) are out of range so a missing value in PAP SSIP or POVPIP is
# counted as FALSE in logic calculation
#

AL.pums <- all.pums %>%
#filter records that meet eligibility requirements with SPORDER==1
filter((SPORDER==1)&(HINS4 == 1 | FS == 1 | between(PAP,1,30000) | between(SSIP,1,30000)| between(POVPIP,0,135))) %>%
summarize(hh.eligible = sum(WGTP)) #calculate total number of households eligible
AL.pums;
};

## END ##
Cancel
Up 0 Down

Reply

Cancel