Calculating Weighted Income Quintiles for PUMS Households in R

Hi Folks, 

I'm interested in creating a dummy value in R to identify which income quintile a household is in using PUMS data. However, I'm not sure what the code should look like. I'm guessing that I need to get the number of households at each income - so weighting the households and grouping them by income. Then, I have to calculate the quintile breakdown, and create a (1,2,3...) dummy variable to use in further analysis. 

But I'm somewhat at a loss as to what that might look like - in particular, whether or not there is a quintile calculation function for R.

Anybody have a sense of what that code might look like / where to start looking?

Thanks!

-Peter

Parents
  • Both the `reldist` and `Hmisc` packages have a function called `wtd.quantile()`. `reldist::wtd.quantile()` is simpler (and will accept a vector of quantiles to calculate, though the documentation suggests otherwise); `Hmisc::wtd.quantile()` has more flexibility.

    So weighted quintiles for household income might look like: `reldist::wtd.quantile(pums$HINCP, q = c(0.2, 0.4, 0.6, 0.8), weight = WGTP)`

    If you prefer more transparency in the calculations, you can consult https://stackoverflow.com/questions/62439652/frequency-weighted-percentile-in-dataframe-with-dplyr . This method occasionally produces different results from `reldist::wtd.quantile()`. I believe the former is more accurate, but `reldist::wtd.quantile()` should be fine for most purposes, and it's a lot easier to use in `dplyr::mutate()` and `dplyr::summarize()`.

Reply
  • Both the `reldist` and `Hmisc` packages have a function called `wtd.quantile()`. `reldist::wtd.quantile()` is simpler (and will accept a vector of quantiles to calculate, though the documentation suggests otherwise); `Hmisc::wtd.quantile()` has more flexibility.

    So weighted quintiles for household income might look like: `reldist::wtd.quantile(pums$HINCP, q = c(0.2, 0.4, 0.6, 0.8), weight = WGTP)`

    If you prefer more transparency in the calculations, you can consult https://stackoverflow.com/questions/62439652/frequency-weighted-percentile-in-dataframe-with-dplyr . This method occasionally produces different results from `reldist::wtd.quantile()`. I believe the former is more accurate, but `reldist::wtd.quantile()` should be fine for most purposes, and it's a lot easier to use in `dplyr::mutate()` and `dplyr::summarize()`.

Children