Use of microdata file for average state income

I'm trying to calculate the average (mean) 2021 income of the state of Minnesota using PUMS to see if I can come close to the data in the ACS table S1901.

From S1901, I find that in 2021, 1YR ACS, there are 2,281,033 households in the state and households have a mean income of $103,033.

From microdata file PSAM_h27, I am calculating mean income (in a Power BI Measure) in two steps. First, I summarize for all records a product calculation for each record of HINCP*ADJINC*WGTP/1000000; and second, I divide that summation by the sum of WGTP of all records. In this case I get a total of 2,516,964 households with a mean income of $93,370. This does not seem close enough to the ACS table results.

However, if I filter out all records with a minimum HINCP of 1, and use the same formula as above, I get a total of 2,262,459 households and a mean income of $104,990. This seems pretty close to the Table results.

My question, is my general approach to getting mean income correct, and should I be filtering for HINCP?

(While the question above is my main one, I would also be interested in knowing how the IPUM household data can be used to calculate the Median Income.)

Parents
  • Hi cbachx-- 

    You're right to validate your results with published tabulations: a good way to identify there's a problem.

    When using the PSAM_H27 table, be sure that you've excluded vacant housing units. The "universe" of the H table is: Housing units combined with households characteristics for the occupied housing. It should be easy to filter those records of vacant units. Those records will be coded as vacant; and they'll have no households characteristics. 

    I'll add... to make this post useful to everyone... also know that the "universe" of the P table includes both population in households and population in group quarters.  If you were using the P table to get household characteristics (like, for example, grouping by age of reference person) then you would need to filter out the pop in GQs.  (Though I don't think that's cbachx's issue.) 


    Hope that helps,

    Todd Graham

    Metropolitan Council Research

  • Thanks very much, Todd. Good to know that you are still out there with all your data knowledge!

    When I filtered on HINCP being at least 1, that must have filtered out the vacant units and maybe a few more that shouldn't have been excluded. I'll modify the filtering to just exclude vacant units.

    Do you have any ideas regarding how this table, PSAM_H27 can be used to find the Median Income?

    Chip Halbach

  • Right. HINCP could be zero or negative; that would be valid.

    You're asking how to calc a weighted median? It depends on the stats analysis software you're using.

    Personally, I use SAS.  If you're a SAS user, try the PROC MEANS process step; and be sure to invoke the WEIGHT option. PROC MEANS offers (despite its name) results that include the (weighted) 5th, 25th, 50th, 75th, and 95th percentile results.

    (I'm sure R offers some similar function, but I don't know it off the top of my head.)

    cheers,
    Todd Graham

Reply
  • Right. HINCP could be zero or negative; that would be valid.

    You're asking how to calc a weighted median? It depends on the stats analysis software you're using.

    Personally, I use SAS.  If you're a SAS user, try the PROC MEANS process step; and be sure to invoke the WEIGHT option. PROC MEANS offers (despite its name) results that include the (weighted) 5th, 25th, 50th, 75th, and 95th percentile results.

    (I'm sure R offers some similar function, but I don't know it off the top of my head.)

    cheers,
    Todd Graham

Children
  • Thanks again Todd. I use the Microsoft Power BI program to work with databases so I'll have to probe regarding getting a median number, but my main interest was in making sure that I was approaching the microdata in the right way and you helped get me there.