I'm attempting to use the PUMS data to compute earnings percentiles for each undergraduate degree, for various subsets of the population.
For instance, "what are the 10th, 25th, 50th, 75th and 90th percentile incomes for non-Hispanic white men aged 35-40 with a physics degree?"
Is there a straightforward way to compute percentile values over the weighted person records?
It feels very wrong, but the first thing that comes to mind is to create a new dataset with each record duplicated N times, where N = the record's PWGTP value.
For instance, with the person records loaded into a table (and a second table mapping FOD1P codes to descriptions), I'm hoping to do something like the following, but with PWGTP correctly applied:
percentile_cont(0.10) within group (order by p.pernp_adjusted) as pernp_pct_10,
percentile_cont(0.25) within group (order by p.pernp_adjusted) as pernp_pct_25,
percentile_cont(0.50) within group (order by p.pernp_adjusted) as pernp_pct_50,
percentile_cont(0.75) within group (order by p.pernp_adjusted) as pernp_pct_75,
percentile_cont(0.90) within group (order by p.pernp_adjusted) as pernp_pct_90
acs2016_2020_pums_pus p on p.fod1p = m.code or p.fod2p = m.code
p.agep >= 35 and p.agep < 40
IIRC this was the issue that first made me turn to R years ago (never turned back) but I believe this is the package I used rdrr.io/.../weighted_ntile.html