Calculating median household income for PUMA from PUMS files in R

Hello, 

I am currently working on calculating the median household income for a specific PUMA.

I am using the 2017 1-year ACS PUMS for my state (merged person and household files by SERIALNO). Also, I have converted household income dollars to 2017 dollars via the following code:

puma$hinc2017 <- puma$HINCP * (puma$ADJINC / 1000000)

So far, so good. 

Additionally, I've generated a flag variable which identifies householders
puma$hholder <- factor(ifelse(puma$RELP == 0, 1, NA))

Now, I need to calculate the median household income. In Stata, I have seen the code written like this:

sum hinc2017 if hholder==1 [fweight=wgtp], detail
gen hinc2017_all=r(p50)

How is this done in R?

I know there is a median() function within base R, but when I execute the following code, the result returns NA: 

median(puma$hinc2017)

[1] NA

Any advise would be appreciated. 

Parents
  • R will issue NA if you have missing values. Start debugging by adding na.rm = TRUE as in median(dataframe$varname, na.rm = TRUE)

  • Thank you Ani. This helped. The output now provides an integer, which is good, but it isn't the integer that was provided in the output in the Stata code. That code seems to use a condition and applying weights:

    sum hinc2017 if hholder==1 [fweight=wgtp], detail
    gen hinc2017_all=r(p50)

    How would I do this in R?

  • You need the to rely on some other packages. For example, the survey package or the srvyr package. See here for an example using these + tidycensus and others:  

    https://walker-data.com/tidycensus/articles/pums-data.html 

    With this route you would start by downloading the data, labeling things or otherwise modifying variables of interest, specifying and creating the survey design object, and then running all analyses off the survey design object. 

     

Reply Children