calculating SE and MOE in R package srvyr for a proportion using replicate weights and PUMS data

cbachx over 1 year ago

Using the srvyr package and replicate weights REPWT, I would like to calculate the standard error and margin of error of the proportion (p) of households (HHWT) in the dataframe "data" that are paying over housing tax credit level rents, this is denoted in the dataframe by a "1" in the field OVERLIHTC. I'm able to get as far as shown below, but I don't know how to finish the code and print the results. Ideas?

p <- sum(data$HHWT[data$OVERLIHTC == 1]) / sum(data$HHWT)

svy <- as_survey(data, weight = HHWT , repweights = matches("REPWT[0-9]+"), type = "JK1", scale = 4/ 80 , rscales = rep(1, 80 ), mse = TRUE)

sub_design <- subset(svy, OverLIHTC == 1 )

Elizabeth over 1 year ago

I would personally use dplyr (https://dplyr.tidyverse.org/index.html) to summarize this (since srvyr is designed to work with the dplyr syntax) and do something like this. (I can't see your actual dataset, so you may need to tweak this.) NOTE: You'll need to convert OVERLIHTC to a character vector if it's not one already for this to work correctly.

svy |>

group_by(OVERLIHTC) |>
summarise(Percent = survey_mean())

That should provide a df with the percentage and SE for each percentage.
Cancel
Up 0 Down

Reply

Cancel
Elizabeth over 1 year ago

I would personally use dplyr (https://dplyr.tidyverse.org/index.html) to summarize this (since srvyr is designed to work with the dplyr syntax) and do something like this. (I can't see your actual dataset, so you may need to tweak this.) NOTE: You'll need to convert OVERLIHTC to a character vector if it's not one already for this to work correctly.

svy |>

group_by(OVERLIHTC) |>
summarise(Percent = survey_mean())

That should provide a df with the percentage and SE for each percentage.
Cancel
Up 0 Down

Reply

Cancel
Adam Bickford over 1 year ago in reply to Elizabeth

Here is a great summary of srvyr, which includes the code for outputting SEs and MOEs:

https://cran.r-project.org/web/packages/srvyr/vignettes/srvyr-vs-survey.html

Cheers--

AB
Cancel
Up 0 Down

Reply

Cancel
David Dorer over 1 year ago

Here is an example using the R survey package and the variable SEX

Download PUMS data (using API)
with variable SEX PWGTP and paste0("PWGTP",seq(1,80)) and save in data.frame pums
(SEX 1=male 2=female)

Create an indicator variable for the category "male"
pums$male<-as.numeric(pums$SEX==1);
require(survey)

design<-svrepdesign(ids=~1,data=pums$male,weights=pums$PWGTP,repweights=pums[,paste0("PWGTP",seq(1,80))])

sm<-svymean(~male,design);
mean SE
male 0.45799 0.0094

For an indicator (0/1) variable the mean is just the proportion.
MoE<-sqrt(as.numeric(attr(sm,"var")))*qnorm(0.95);

print(MoE)

0.01200602
Cancel
Up 0 Down

Reply

Cancel
cbachx over 1 year ago

Thanks everyone. Elizabeth, I did add the code you suggested and it produced a percent_se of .0275. I presume the is a standard error of 2.75% (not .0275%), correct? And presuming all this is on the right track, what is the code to get the margin of error (let's say for the 90% confidence interval with its factor of 1.645)?
Cancel
Up 0 Down

Reply

Cancel
David Dorer over 1 year ago in reply to cbachx

The calculation above should use qnorm(0.95) for a 90% CI
Cancel
Up 0 Down

Reply

Cancel
cbachx over 1 year ago in reply to David Dorer

Thanks David, and in my original code with the additions provided by Elizabeth, how do I then utilize qnorm(0.95) to get the margin of error?

And do you agree that standard error is 2.75% (not .0275%)?
Cancel
Up 0 Down

Reply

Cancel
David Dorer over 1 year ago in reply to cbachx

The "output" from the calculation is a fraction so you need to multiply by 100 to get a percent.

the number qnorm(0.95) is 1.644854 If you look in the ACS handbook chapter 8 they use 1.645 to convert a standard error (SE) to a margin of error (MoE) which is what you get when you round qnorm(0.95) up. This gives a conservative estimate for the MoE. In my survey package example the output from the svymean function is complicated (use str to see the actual structure of the return from svymean). The variance of the mean is in an attribute. The SE is the square root (sqrt) of the variance. You then scale the SE to get the MoE. FYI you can use svymean on a categorical variable and it will give the fractions for the various categories along with the SE for each category. You still need to use the attr function to extract the variance covariance matrix. In the case of multiple categories the SE is sqrt(diag(attr(svymean(x),"var"))) The advantage of using svymean is that it will give a better estimate than using svytable and then applying the ratio calculation in chapter 8 of the ACS handbook.

As a note I like to use the "oldest" and simplest package to make a calculation. This means that I prefer the "survey" package over "srvyr" package
Cancel
Up 0 Down

Reply

Cancel
cbachx over 1 year ago in reply to cbachx

I just completed the calculation of SE and SOE manually, following the Census guide. And what I got was the Variance is .0275, the SE was .166 and the MOE is .273. So it appears that the code provided by Elizabeth yielded the Variance -- Although eyeing the data that seemed to be what I would expect for a margin of error. Suggestions?
Cancel
Up 0 Down

Reply

Cancel
cbachx over 1 year ago in reply to cbachx

Forget the most recent post, I made an error in the excel spreadsheet with the manual calculation -- with a correction, I do get a SE of .0275. My question on getting to the MOE still stands however.
Cancel
Up 0 Down

Reply

Cancel
David Dorer over 1 year ago in reply to cbachx

what state and puma are you using ?
Cancel
Up 0 Down

Reply

Cancel
cbachx over 1 year ago in reply to David Dorer

Nevada, 2021 5YR ACS, all PUMAs
Cancel
Up 0 Down

Reply

Cancel
cbachx over 1 year ago in reply to cbachx

I double checked, my table was drawn from NV PUMA #200, 2021 5YR ACS microdata
Cancel
Up 0 Down

Reply

Cancel
David Dorer over 1 year ago in reply to cbachx

Thanks for the info. I assume that you mean PUMA FIPS 00200 (PUMAs have 5 digits). Also where did you get OVERLIHTC LIHTC is a HUD "variable" which is based on the AMI (Area median income). Where did you get the AMI ? I would like to recreate your calculation using my R program.

Thanks,

Dave
Cancel
Up 0 Down

Reply

Cancel
Elizabeth over 1 year ago in reply to cbachx

I think David has much more statistical expertise than I do, so if he disagrees I would defer to him. I typically estimate the MOE by multiplying the SE by the critical value. I typically use 95% CI in my work and use 1.96.
Cancel
Up 0 Down

Reply

Cancel