Hello,
I am currently working on calculating the median household income for a specific PUMA.
I am using the 2017 1-year ACS PUMS for my state (merged person and household files by SERIALNO). Also, I have converted household income dollars to 2017 dollars via the following code:
puma$hinc2017 <- puma$HINCP * (puma$ADJINC / 1000000)
So far, so good.
Additionally, I've generated a flag variable which identifies householderspuma$hholder <- factor(ifelse(puma$RELP == 0, 1, NA))
Now, I need to calculate the median household income. In Stata, I have seen the code written like this:
sum hinc2017 if hholder==1 [fweight=wgtp], detailgen hinc2017_all=r(p50)
How is this done in R?
I know there is a median() function within base R, but when I execute the following code, the result returns NA:
median(puma$hinc2017)
[1] NA
Any advise would be appreciated.
R will issue NA if you have missing values. Start debugging by adding na.rm = TRUE as in median(dataframe$varname, na.rm = TRUE)
You need the to rely on some other packages. For example, the survey package or the srvyr package. See here for an example using these + tidycensus and others:
https://walker-data.com/tidycensus/articles…
Try this weighted median function — worked for me www.rdocumentation.org/.../weighted.median
Thank you Ani. This helped. The output now provides an integer, which is good, but it isn't the integer that was provided in the output in the Stata code. That code seems to use a condition and applying weights:
How would I do this in R?
https://walker-data.com/tidycensus/articles/pums-data.html
With this route you would start by downloading the data, labeling things or otherwise modifying variables of interest, specifying and creating the survey design object, and then running all analyses off the survey design object.
Thank you.