Hello, I'm trying to figure out how to calculate margins of error on counts in R. So far I've had no luck. I'm hoping that there's an R package out there that will make this a bit easier, but I've not seen anything as yet.
At the moment I'm working with 5-Yr PUMS data, and have extracted "Artists" (using Occupation codes), for the Massachusetts, Greater Boston and Boston geographies. I've further broken those up by race (white artists X, Black artists Y and so on). Especially with PUMS data I expect there to be some error here, but I've not been able to figure out how to calculate it through R. If anyone had any good pointers/code to share, I'm all ears.
Thanks,
Peter
Hi -
I learned how to do this recently myself. You can use R packages srvyr or survey to use replicate weights to show uncertainty (as used in the PUMS files, these are all the variables with PWGTP[#] in the person file). Here is an example using srvyr:
library(srvyr)library(tidyverse)personsRaw <- read.csv("ACS_PUMS_Person_File.csv")persons_svy <- personsRaw %>% as_survey_rep( weight = PWGTP, repweights = matches("PWGTP[0-9]+") , scale = 4 / 80, rscales = rep(1 , 80), mse = TRUE, type = "JK1", variables = c(RAC1P,OCCP) )persons_svy %>% # update the next line to include all desired occupation codes filter(OCCP >=2600 & OCCP < 2800) %>% group_by(RAC1P) %>% # vartype: Report variability as one or more of: standard error ("se", default), confidence interval ("ci"), variance ("var") or coefficient of variation ("cv"). summarize(Employees = survey_total(vartype="ci"))
library(srvyr)
library(tidyverse)
personsRaw <- read.csv("ACS_PUMS_Person_File.csv")
persons_svy <- personsRaw %>% as_survey_rep(
weight = PWGTP,
repweights = matches("PWGTP[0-9]+") ,
scale = 4 / 80,
rscales = rep(1 , 80),
mse = TRUE,
type = "JK1",
variables = c(RAC1P,OCCP)
)
persons_svy %>%
# update the next line to include all desired occupation codes
filter(OCCP >=2600 & OCCP < 2800) %>%
group_by(RAC1P) %>%
# vartype: Report variability as one or more of: standard error ("se", default), confidence interval ("ci"), variance ("var") or coefficient of variation ("cv").
summarize(Employees = survey_total(vartype="ci"))
Note that `library(srvyr)` is a tidyverse wrapper for `library(survey)`. If the rest of your workflow is in base R, you can obtain the results with that basic syntax.
Great yeah, I've looked those up a bit more as well.