Estimating percentiles from range data other than the median/50th percentile

Hello fellow ACS users, 

I have been estimating medians from range data (from table B19001 HH income) by finding the midpoint household (assuming a uniform distribution within that range containing the midpoint household) and then finding the standard error as outlined on page 17 and 18 of the 2017-2021 Accuracy PUMS document. This has got me thinking: does this procedure apply only to the median (50th percentile), or could it also apply to different percentiles?

Say I was interested in estimating the HH income at the 20th percentile for a particular geography from the range data available in table B19001. Could I find the point estimate by identifying the HH income on the 20th percentile house (again assuming an even distribution within the wage range group that house is in), and the modify the SE(50 percent) formula given on page 17 (and also shown below), to be appropriate for a 20 percent SE? 

Current SE Formula Used...

SE(50 percent)=DF × sqrt((95/5B)×50^2)

Proposed alternative formula for 20th percentile SE

SE(20 percent)=DF × sqrt((95/5B)×20×80)

Thank you for your time!

Parents
  • You can use this CDF to estimate the standard error for the 20th percentile as follows:

    1. Determine the income range group for the 20th percentile.

    2. Calculate the CDF for that income range group.

    3. Use the CDF to estimate the proportion of households within the range group below the 20th percentile.

    4. Calculate the standard error using this proportion and the formula for the standard error (SE) in the PUMS documentation.

    So, the formula would look something like this:

    SE(20 percent) = DF × sqrt((95/5B) × (Proportion at 20th percentile) × (Proportion at 80th percentile))

    "Proportion at 20th percentile" is the proportion of households below the 20th percentile within the relevant income range group, and "Proportion at 80th percentile" is the proportion of households below the 80th percentile within the same range group.

    You would need to calculate these proportions based on the CDF for the range group.

    This approach considers the non-uniform distribution within the income range group and should provide a more accurate b2b API meaning a standard error estimate for the 20th percentile.

  • Thank you for your response! This is very helpful perspective.

    I should have mentioned, despite using this formula found in the PUMS accuracy document, I am studying a few aggregated census tracts in my city.  Consequently, my data is indeed limited to the range data from table B19001. This is why I am making an assumption about uniform distribution within range intervals - essentially as a necessary evil. My rationale for using this formula from the PUMS Accuracy document on published ACS tables comes primarily from this document I found from the Department of Finance for the state of CA, which encouraged this usage back in 2011: https://dof.ca.gov/wp-content/uploads/sites/352/Forecasting/Demographics/Documents/How_to_Recalculate_a_Median.pdf. If I were using microdata samples, finding the CDF would be a great approach. Given my assumption of uniformity, the proportion at the 20th percentile should be 20 and similarly 80 for the 80th percentile. Again, thank you! 

Reply
  • Thank you for your response! This is very helpful perspective.

    I should have mentioned, despite using this formula found in the PUMS accuracy document, I am studying a few aggregated census tracts in my city.  Consequently, my data is indeed limited to the range data from table B19001. This is why I am making an assumption about uniform distribution within range intervals - essentially as a necessary evil. My rationale for using this formula from the PUMS Accuracy document on published ACS tables comes primarily from this document I found from the Department of Finance for the state of CA, which encouraged this usage back in 2011: https://dof.ca.gov/wp-content/uploads/sites/352/Forecasting/Demographics/Documents/How_to_Recalculate_a_Median.pdf. If I were using microdata samples, finding the CDF would be a great approach. Given my assumption of uniformity, the proportion at the 20th percentile should be 20 and similarly 80 for the 80th percentile. Again, thank you! 

Children
No Data