Hi All,
Occasionally I need a median income or median age value for either an aggregated area or for a population that only has the data available in a distribution table (like in attached screenshot). I created a VBA macro that will run in Excel and calculate the median based on a table range. In the example provided, it is for median household income. I've ran it for known/given median income values and the calculated numbers come up pretty close to those provided in the ACS. The smaller the "bins" (ranges), the better the calculated value.
I have documented each step with comments in the VBA code. I'm not a VBA expert by any means, so there may be better ways to perform some of the steps, but this is works. this is something I've been needing myself for awhile and used to do it with a bunch of formulas in an Excel worksheet. I thought some of you may find it helpful so thought I would share it. I also have a Python version that reads in the data from a CSV file, so if you're interested in that let me know and I can provide it.
Here is the VBA code:
Sub GetMedian() 'have user select the range for the input data 'the first column must be the number for the beginning of the range 'for the bin 'the second column (can have more than one if doing mulitple areas) 'has the population for that bin Dim UserRange As Range Prompt = "Select the input range." & vbCrLf & _ vbCrLf & "The first column must be the beginning of the " & _ vbCrLf & "range for the bin. The second column has the " & _ vbCrLf & "population for the bin. You can have more than " & _ vbCrLf & "one column of populations." & _ vbCrLf Title = "Select a range" 'Display the Input Box Set UserRange = Application.InputBox( _ Prompt:=Prompt, _ Title:=Title, _ Default:=Selection.Address, _ Type:=8) 'Range selection 'get current selected range Dim myString As String 'myString = Selection.Address myString = UserRange.Address 'go though the columns 'how many columns are in the UserRange? NoOfCol = Range(myString).Columns.count 'get the output range from the user Dim OutRange As Range Prompt = "Select a cell for the output" & vbCrLf & _ vbCrLf & "Values for multiple medians will be " & _ vbCrLf & "returned in a row beginning with the " & _ vbCrLf & "selected cell " & _ vbCrLf ' Display the Input Box On Error Resume Next Set OutRange = Application.InputBox( _ Prompt:=Prompt, _ Title:=Title, _ Default:=ActiveCell.Address, _ Type:=8) 'Range selection 'now get to work 'for loop starts with the first population column (2nd in range) 'end is how many pop columns are in the user input range For col = 2 To NoOfCol 'set range as an array Dim myArray() As Variant myArray = Range(myString).Value 'get sum of pop and divide by 2 to get halfpoint 'initialize a total to aggregate values popSum = 0 For i = 1 To UBound(myArray) popSum = popSum + myArray(i, col) Next 'MsgBox popSum MedianIndex = popSum / 2 'MsgBox MedianIndex 'initialize for running total runtotal = 0 'step through each row For i = 1 To UBound(myArray) 'add the cumulative total pop runtotal = runtotal + myArray(i, col) 'check if the runtotal exceeds the medianIndex If runtotal > MedianIndex Then 'get the cumulative pop for the bin just before we exceeded 'so subtract the current array value from the current runTotal prevTotal = runtotal - myArray(i, col) 'determine how much into the bin it is to reach the medianIndex howMany = MedianIndex - prevTotal 'get the pop value in the previous bin Binpop = myArray(i, col) 'determine the pct of how far into this bin the medianIndex falls pctInto = howMany / Binpop 'CORRECTED BLOCK 'determine the span of the bin BinSpan 'and multiply it by the pct into the bin BinSpan = (myArray(i + 1, 1) - myArray(i, 1)) MultiSpan = BinSpan * pctInto 'calculate the median by adding the result to 'the number for the beginning of hte range Median = myArray(i, 1) + MultiSpan 'put the median in the output cell OutRange.Offset(0, col - 2).Value = Median Exit For End If Next Next End Sub
I tested this out with a few different data sets and seem to be getting calculated medians that are pretty different than those provided in the ACS. For example, when I used the distribution table for household income for the United States (2013-17 ACS 5-Year Estimates), the macro gave me a calculated median of $53,987, while the median household income provided in the ACS is $57,652 +/- $138. I ran the macro using the numbers provided for income in Arizona, as per the screenshot above, and was able to replicate the pictured results, so I'm pretty sure that the macro is working as it is intended to. Do you know why my calculations might be so off? Thank you, Bailey Werner