I'm trying to calculate the average (mean) 2021 income of the state of Minnesota using PUMS to see if I can come close to the data in the ACS table S1901.
From S1901, I find that in 2021, 1YR ACS, there are 2,281,033 households in the state and households have a mean income of $103,033.
From microdata file PSAM_h27, I am calculating mean income (in a Power BI Measure) in two steps. First, I summarize for all records a product calculation for each record of HINCP*ADJINC*WGTP/1000000; and second, I divide that summation by the sum of WGTP of all records. In this case I get a total of 2,516,964 households with a mean income of $93,370. This does not seem close enough to the ACS table results.
However, if I filter out all records with a minimum HINCP of 1, and use the same formula as above, I get a total of 2,262,459 households and a mean income of $104,990. This seems pretty close to the Table results.
My question, is my general approach to getting mean income correct, and should I be filtering for HINCP?
(While the question above is my main one, I would also be interested in knowing how the IPUM household data can be used to calculate the Median Income.)
Hi cbachx--
You're right to validate your results with published tabulations: a good way to identify there's a problem.
When using the PSAM_H27 table, be sure that you've excluded vacant housing units. The "universe" of the H table is: Housing units combined with households characteristics for the occupied housing. It should be easy to filter those records of vacant units. Those records will be coded as vacant; and they'll have no households characteristics.
I'll add... to make this post useful to everyone... also know that the "universe" of the P table includes both population in households and population in group quarters. If you were using the P table to get household characteristics (like, for example, grouping by age of reference person) then you would need to filter out the pop in GQs. (Though I don't think that's cbachx's issue.)
Hope that helps,
Todd Graham
Metropolitan Council Research
Thanks very much, Todd. Good to know that you are still out there with all your data knowledge!
When I filtered on HINCP being at least 1, that must have filtered out the vacant units and maybe a few more that shouldn't have been excluded. I'll modify the filtering to just exclude vacant units.
Do you have any ideas regarding how this table, PSAM_H27 can be used to find the Median Income?
Chip Halbach
Right. HINCP could be zero or negative; that would be valid.
You're asking how to calc a weighted median? It depends on the stats analysis software you're using.
Personally, I use SAS. If you're a SAS user, try the PROC MEANS process step; and be sure to invoke the WEIGHT option. PROC MEANS offers (despite its name) results that include the (weighted) 5th, 25th, 50th, 75th, and 95th percentile results.
(I'm sure R offers some similar function, but I don't know it off the top of my head.)
cheers,Todd Graham
Thanks again Todd. I use the Microsoft Power BI program to work with databases so I'll have to probe regarding getting a median number, but my main interest was in making sure that I was approaching the microdata in the right way and you helped get me there.
Hi!!
absolutely agree with your statement
Filtering for HINCP could be a good idea, as that could help you to remove any outliers that might be skewing your results. In general, your approach seems sound, but I would recommend doing some additional research to see if there are any other factors that might be affecting your results.
regards,
grace