Using the srvyr package and replicate weights REPWT, I would like to calculate the standard error and margin of error of the proportion (p) of households (HHWT) in the dataframe "data" that are paying over housing tax credit level rents, this is denoted in the dataframe by a "1" in the field OVERLIHTC. I'm able to get as far as shown below, but I don't know how to finish the code and print the results. Ideas?
p <- sum(data$HHWT[data$OVERLIHTC == 1]) / sum(data$HHWT)
svy <- as_survey(data, weight = HHWT , repweights = matches("REPWT[0-9]+"), type = "JK1", scale = 4/ 80 , rscales = rep(1, 80 ), mse = TRUE)
sub_design <- subset(svy, OverLIHTC == 1 )
Thanks everyone. Elizabeth, I did add the code you suggested and it produced a percent_se of .0275. I presume the is a standard error of 2.75% (not .0275%), correct? And presuming all this is on the right track, what is the code to get the margin of error (let's say for the 90% confidence interval with its factor of 1.645)?
The calculation above should use qnorm(0.95) for a 90% CI
Thanks David, and in my original code with the additions provided by Elizabeth, how do I then utilize qnorm(0.95) to get the margin of error? And do you agree that standard error is 2.75% (not .0275%)?
The "output" from the calculation is a fraction so you need to multiply by 100 to get a percent.
the number qnorm(0.95) is 1.644854 If you look in the ACS handbook chapter 8 they use 1.645 to convert a standard error (SE) to a margin of error (MoE) which is what you get when you round qnorm(0.95) up. This gives a conservative estimate for the MoE. In my survey package example the output from the svymean function is complicated (use str to see the actual structure of the return from svymean). The variance of the mean is in an attribute. The SE is the square root (sqrt) of the variance. You then scale the SE to get the MoE. FYI you can use svymean on a categorical variable and it will give the fractions for the various categories along with the SE for each category. You still need to use the attr function to extract the variance covariance matrix. In the case of multiple categories the SE is sqrt(diag(attr(svymean(x),"var"))) The advantage of using svymean is that it will give a better estimate than using svytable and then applying the ratio calculation in chapter 8 of the ACS handbook.
As a note I like to use the "oldest" and simplest package to make a calculation. This means that I prefer the "survey" package over "srvyr" package
I just completed the calculation of SE and SOE manually, following the Census guide. And what I got was the Variance is .0275, the SE was .166 and the MOE is .273. So it appears that the code provided by Elizabeth yielded the Variance -- Although eyeing the data that seemed to be what I would expect for a margin of error. Suggestions?
Forget the most recent post, I made an error in the excel spreadsheet with the manual calculation -- with a correction, I do get a SE of .0275. My question on getting to the MOE still stands however.
what state and puma are you using ?
Nevada, 2021 5YR ACS, all PUMAs
I double checked, my table was drawn from NV PUMA #200, 2021 5YR ACS microdata
Thanks for the info. I assume that you mean PUMA FIPS 00200 (PUMAs have 5 digits). Also where did you get OVERLIHTC LIHTC is a HUD "variable" which is based on the AMI (Area median income). Where did you get the AMI ? I would like to recreate your calculation using my R program.
Thanks,
Dave