Approximating the Median for a group of census tracts

Hello,

A question I get from time to time is, how to calculate/estimate/approximate the median (age/income/home value/monthly rent/etc.) when grouping together a few different census tracts. I want to have a better answer for them than, "do it using PUMS" because often they're really just looking for a quick approximation.

Here's what I'm thinking, and I would love feedback on it from this group of experts.

Step 1. Get the table for the full distribution with the most detailed intervals/smallest ranges you can find (for age, B01001; for household income, B19001) 

Step 2. Create a list/set/array/dictionary of values that are the midpoint of each range for the count in each range for each tract you're interested in.

(For example, if in a tract, there were 100 people age 0 to 4, and 105 people age 5 to 9, ..., my set would look like [2, 2, ... 2 (100 times), 7, 7, 7, ... (105 times), ... etc.)]

Step 3. Combine into 1 list/set/array/dictionary. So if you have 4 tracts of interest, combine the values from all 4 into 1 big one.

Step 4. Sort and take the median (50th percentile).

Advanced question: is it even possible to approximate an MOE for this?

Parents
  • One thing you could do is combine the distribution tables (like the ones from step 1) into one distribution table, and then calculate the linear interpolated median from it. This is the same methodology used to produce many of the median statistics from the ACS.

    Find the bin that contains the cumulative 50th percentage (i.e., the bin before will have cumulative percent less than 50% and this bin will be above 50%). Let’s call the cumulative percent up to here A and the percent in this bin B.

    the interpolated median is then:

    = Bin width * (50 - A) / B + Bin base

    Consider the table:

    Age group.       Pct.        cumulative Pct
    0-18                 25%            25%

    19-29               17 %            42%

    30-39               15%.            57%

    40-49.              13%.            70%

    50+.                  30%.          100%

    the median falls in the 30-39 group (base=30, width=10). 

    = 10 * (50-42)/15 + 30 = 35.3 years

  • Thank you so much. Such a better approach! This way the possibility space for the median is continuous, not just the midpoint values.

    Of course after I posted this question, I found this document from the California Dept of Finance that goes over this same method: https://dof.ca.gov/wp-content/uploads/sites/352/Forecasting/Demographics/Documents/How_to_Recalculate_a_Median.pdf

  • By chance does anyone have a nice little SAS program that will do this calculation.

Reply Children
  • Hi Doug- 

    We do this automatically in Social Explorer, but we use the the categorical data and impute the median by approximating the proportion in the cateogry that containts the medpoiint.  You can also do a weighted average but that does not work very well if there are outliers.