Calculating Margins of Error for PUMS Estimates in R

Hello, I'm trying to figure out how to calculate margins of error on counts in R. So far I've had no luck. I'm hoping that there's an R package out there that will make this a bit easier, but I've not seen anything as yet.

At the moment I'm working with 5-Yr PUMS data, and have extracted "Artists" (using Occupation codes), for the Massachusetts, Greater Boston and Boston geographies. I've further broken those up by race (white artists X, Black artists Y and so on). Especially with PUMS data I expect there to be some error here, but I've not been able to figure out how to calculate it through R. If anyone had any good pointers/code to share, I'm all ears.

Thanks,

Peter

  • Hi -

    I learned how to do this recently myself. You can use R packages srvyr or survey to use replicate weights to show uncertainty (as used in the PUMS files, these are all the variables with PWGTP[#] in the person file). Here is an example using srvyr:


    library(srvyr)
    library(tidyverse)

    personsRaw <- read.csv("ACS_PUMS_Person_File.csv")

    persons_svy <- personsRaw %>% as_survey_rep(
          weight = PWGTP,
          repweights = matches("PWGTP[0-9]+") ,
          scale = 4 / 80,
          rscales = rep(1 , 80),
          mse = TRUE,
          type = "JK1",
          variables = c(RAC1P,OCCP)
          )

    persons_svy %>%
      # update the next line to include all desired occupation codes
      filter(OCCP >=2600 & OCCP < 2800) %>%
      group_by(RAC1P) %>%
      # vartype: Report variability as one or more of: standard error ("se", default), confidence interval ("ci"), variance ("var") or coefficient of variation ("cv").
      summarize(Employees = survey_total(vartype="ci"))

  • Note that `library(srvyr)` is a tidyverse wrapper for `library(survey)`. If the rest of your workflow is in base R, you can obtain the results with that basic syntax. 

  • Great yeah, I've looked those up a bit more as well.