Calculating Weighted Income Quintiles for PUMS Households in R

Peter.C over 4 years ago

Hi Folks,

I'm interested in creating a dummy value in R to identify which income quintile a household is in using PUMS data. However, I'm not sure what the code should look like. I'm guessing that I need to get the number of households at each income - so weighting the households and grouping them by income. Then, I have to calculate the quintile breakdown, and create a (1,2,3...) dummy variable to use in further analysis.

But I'm somewhat at a loss as to what that might look like - in particular, whether or not there is a quintile calculation function for R.

Anybody have a sense of what that code might look like / where to start looking?

Thanks!

-Peter

Top Replies

Parents

Matt Schroeder over 4 years ago

Both the `reldist` and `Hmisc` packages have a function called `wtd.quantile()`. `reldist::wtd.quantile()` is simpler (and will accept a vector of quantiles to calculate, though the documentation suggests otherwise); `Hmisc::wtd.quantile()` has more flexibility.

So weighted quintiles for household income might look like: `reldist::wtd.quantile(pums$HINCP, q = c(0.2, 0.4, 0.6, 0.8), weight = WGTP)`

If you prefer more transparency in the calculations, you can consult https://stackoverflow.com/questions/62439652/frequency-weighted-percentile-in-dataframe-with-dplyr . This method occasionally produces different results from `reldist::wtd.quantile()`. I believe the former is more accurate, but `reldist::wtd.quantile()` should be fine for most purposes, and it's a lot easier to use in `dplyr::mutate()` and `dplyr::summarize()`.
Cancel
Up +2 Down

Reply

Cancel

Reply

Matt Schroeder over 4 years ago

Both the `reldist` and `Hmisc` packages have a function called `wtd.quantile()`. `reldist::wtd.quantile()` is simpler (and will accept a vector of quantiles to calculate, though the documentation suggests otherwise); `Hmisc::wtd.quantile()` has more flexibility.

So weighted quintiles for household income might look like: `reldist::wtd.quantile(pums$HINCP, q = c(0.2, 0.4, 0.6, 0.8), weight = WGTP)`

If you prefer more transparency in the calculations, you can consult https://stackoverflow.com/questions/62439652/frequency-weighted-percentile-in-dataframe-with-dplyr . This method occasionally produces different results from `reldist::wtd.quantile()`. I believe the former is more accurate, but `reldist::wtd.quantile()` should be fine for most purposes, and it's a lot easier to use in `dplyr::mutate()` and `dplyr::summarize()`.
Cancel
Up +2 Down

Reply

Cancel

Children

Peter.C over 4 years ago in reply to Matt Schroeder

Excellent, thank you very much!
Cancel
Up 0 Down

Reply

Cancel