I'm interested in creating a dummy value in R to identify which income quintile a household is in using PUMS data. However, I'm not sure what the code should look like. I'm guessing that I need to get the number of households at each income - so weighting the households and grouping them by income. Then, I have to calculate the quintile breakdown, and create a (1,2,3...) dummy variable to use in further analysis.
But I'm somewhat at a loss as to what that might look like - in particular, whether or not there is a quintile calculation function for R.
Anybody have a sense of what that code might look like / where to start looking?
Both the `reldist` and `Hmisc` packages have a function called `wtd.quantile()`. `reldist::wtd.quantile()` is simpler (and will accept a vector of quantiles to calculate, though the documentation suggests…
I would try `survey::svyquantile()`. The vignette for `library(ipumsr)` (which is what I hope you use) gives some indication as to what to do with the `survey::svydesign` specifications.
We also have some tools in tidycensus that help getting PUMS data and converting them into survey objects to analyze.
Both the `reldist` and `Hmisc` packages have a function called `wtd.quantile()`. `reldist::wtd.quantile()` is simpler (and will accept a vector of quantiles to calculate, though the documentation suggests otherwise); `Hmisc::wtd.quantile()` has more flexibility.
So weighted quintiles for household income might look like: `reldist::wtd.quantile(pums$HINCP, q = c(0.2, 0.4, 0.6, 0.8), weight = WGTP)`
If you prefer more transparency in the calculations, you can consult https://stackoverflow.com/questions/62439652/frequency-weighted-percentile-in-dataframe-with-dplyr . This method occasionally produces different results from `reldist::wtd.quantile()`. I believe the former is more accurate, but `reldist::wtd.quantile()` should be fine for most purposes, and it's a lot easier to use in `dplyr::mutate()` and `dplyr::summarize()`.
Excellent, thank you very much!