# calculating SE and MOE in R package srvyr for a proportion using replicate weights and PUMS data

Using the srvyr package and replicate weights REPWT, I would like to calculate the standard error and margin of error of the proportion (p) of households (HHWT) in the dataframe "data" that are paying over housing tax credit level rents, this is denoted in the dataframe by a "1" in the field OVERLIHTC. I'm able to get as far as shown below, but I don't know how to finish the code and print the results. Ideas?

p <- sum(data\$HHWT[data\$OVERLIHTC == 1]) / sum(data\$HHWT)

svy <- as_survey(data, weight = HHWT , repweights = matches("REPWT[0-9]+"), type = "JK1", scale = 4/ 80 , rscales = rep(1, 80 ), mse = TRUE)

sub_design <- subset(svy, OverLIHTC == 1 )

• I would personally use dplyr (https://dplyr.tidyverse.org/index.html) to summarize this (since srvyr is designed to work with the dplyr syntax) and do something like this. (I can't see your actual dataset, so you may need to tweak this.) NOTE: You'll need to convert OVERLIHTC to a character vector if it's not one already for this to work correctly.

svy |>

group_by(OVERLIHTC) |>
summarise(Percent = survey_mean())

That should provide a df with the percentage and SE for each percentage.

• I would personally use dplyr (https://dplyr.tidyverse.org/index.html) to summarize this (since srvyr is designed to work with the dplyr syntax) and do something like this. (I can't see your actual dataset, so you may need to tweak this.) NOTE: You'll need to convert OVERLIHTC to a character vector if it's not one already for this to work correctly.

svy |>

group_by(OVERLIHTC) |>
summarise(Percent = survey_mean())

That should provide a df with the percentage and SE for each percentage.

• Here is a great summary of  srvyr, which includes the code for outputting SEs and MOEs:

https://cran.r-project.org/web/packages/srvyr/vignettes/srvyr-vs-survey.html

Cheers--

AB

• Here is an example using the R survey package and the variable SEX

with variable SEX PWGTP and paste0("PWGTP",seq(1,80)) and save in data.frame pums
(SEX 1=male 2=female)

Create an indicator variable for the category "male"
pums\$male<-as.numeric(pums\$SEX==1);
require(survey)

design<-svrepdesign(ids=~1,data=pums\$male,weights=pums\$PWGTP,repweights=pums[,paste0("PWGTP",seq(1,80))])

sm<-svymean(~male,design);
mean    SE
male 0.45799 0.0094

For an indicator (0/1) variable the mean is just the proportion.
MoE<-sqrt(as.numeric(attr(sm,"var")))*qnorm(0.95);

print(MoE)

0.01200602

• Thanks everyone. Elizabeth, I did add the code you suggested and it produced a percent_se of .0275. I presume the is a standard error of 2.75% (not .0275%), correct? And presuming all this is on the right track, what is the code to get the margin of error (let's say for the 90% confidence interval with its factor of 1.645)?

• The calculation above should use qnorm(0.95) for a 90% CI

• Thanks David, and in my original code with the additions provided by Elizabeth, how do I then utilize qnorm(0.95) to get the margin of error?

And do you agree that standard error is 2.75% (not .0275%)?

• The "output" from the calculation is a fraction so you need to multiply by 100 to get a percent.

the number qnorm(0.95) is 1.644854  If you look in the ACS handbook chapter 8 they use 1.645 to convert a standard error (SE) to a margin of error (MoE) which is what you get when you round qnorm(0.95) up.  This gives a conservative estimate for the MoE.  In my survey package example the output from the svymean function is complicated (use str to see the actual structure of the return from svymean).  The variance of the mean is in an attribute.  The SE is the square root (sqrt) of the variance.  You then scale the SE to get the MoE.  FYI you can use svymean on a categorical variable and it will give the fractions for the various categories along with the SE for each category.  You still need to use the attr function to extract the variance covariance matrix.   In the case of multiple categories the SE is sqrt(diag(attr(svymean(x),"var")))  The advantage of using svymean is that it will give a better estimate than using svytable and then applying the ratio calculation in chapter 8 of the ACS  handbook.

As a note I like to use the "oldest" and simplest package to make a calculation.  This means that I prefer the "survey" package over "srvyr" package

• I just completed the calculation of SE and SOE manually, following the Census guide. And what I got was the Variance is .0275, the SE was .166 and the MOE is .273. So it appears that the code provided by Elizabeth yielded the Variance -- Although eyeing the data that seemed to be what I would expect for a margin of error. Suggestions?

• Forget the most recent post, I made an error in the excel spreadsheet with the manual calculation -- with a correction, I do get a SE of .0275. My question on getting to the MOE still stands however.

• what state and puma are you using ?

• Nevada, 2021 5YR ACS, all PUMAs

• I double checked, my table was drawn from NV PUMA #200, 2021 5YR ACS microdata

• Thanks for the info.  I assume that you mean PUMA FIPS 00200 (PUMAs have 5 digits).  Also where did you get OVERLIHTC   LIHTC is a HUD "variable" which is based on the AMI (Area median income).  Where did you get the AMI ?  I would like to recreate your calculation using my R program.

Thanks,

Dave

• I think David has much more statistical expertise than I do, so if he disagrees I would defer to him. I typically estimate the MOE by multiplying the SE by the critical value. I typically use 95% CI in my work and use 1.96.