Calculating median household income for PUMA from PUMS files in R

regonzalez over 3 years ago

Hello,

I am currently working on calculating the median household income for a specific PUMA.

I am using the 2017 1-year ACS PUMS for my state (merged person and household files by SERIALNO). Also, I have converted household income dollars to 2017 dollars via the following code:

puma$hinc2017 <- puma$HINCP * (puma$ADJINC / 1000000)

So far, so good.

Additionally, I've generated a flag variable which identifies householders
puma$hholder <- factor(ifelse(puma$RELP == 0, 1, NA))

Now, I need to calculate the median household income. In Stata, I have seen the code written like this:

sum hinc2017 if hholder==1 [fweight=wgtp], detail
gen hinc2017_all=r(p50)

How is this done in R?

I know there is a median() function within base R, but when I execute the following code, the result returns NA:

median(puma$hinc2017)

[1] NA

Any advise would be appreciated.

Top Replies

Ani over 3 years ago +1

R will issue NA if you have missing values. Start debugging by adding na.rm = TRUE as in median(dataframe$varname, na.rm = TRUE)
Ani over 3 years ago in reply to regonzalez +1

You need the to rely on some other packages. For example, the survey package or the srvyr package. See here for an example using these + tidycensus and others:

https://walker-data.com/tidycensus/articles…
Tim Henderson over 3 years ago +1

Try this weighted median function — worked for me www.rdocumentation.org/.../weighted.median

Ani over 3 years ago

R will issue NA if you have missing values. Start debugging by adding na.rm = TRUE as in median(dataframe$varname, na.rm = TRUE)
Cancel
Up +1 Down

Reply

Cancel
regonzalez over 3 years ago in reply to Ani

Thank you Ani. This helped. The output now provides an integer, which is good, but it isn't the integer that was provided in the output in the Stata code. That code seems to use a condition and applying weights:

sum hinc2017 if hholder==1 [fweight=wgtp], detail
gen hinc2017_all=r(p50)

How would I do this in R?
Cancel
Up 0 Down

Reply

Cancel
Ani over 3 years ago in reply to regonzalez

You need the to rely on some other packages. For example, the survey package or the srvyr package. See here for an example using these + tidycensus and others:

https://walker-data.com/tidycensus/articles/pums-data.html

With this route you would start by downloading the data, labeling things or otherwise modifying variables of interest, specifying and creating the survey design object, and then running all analyses off the survey design object.
Cancel
Up +1 Down

Reply

Cancel
Tim Henderson over 3 years ago

Try this weighted median function — worked for me www.rdocumentation.org/.../weighted.median
Cancel
Up +1 Down

Reply

Cancel
regonzalez over 3 years ago in reply to Ani

Thank you.
Cancel
Up 0 Down

Reply

Cancel