Why don't PUMS household and person weights sum up? WGTP * NP != sum(PWGTP)

In the household data WGTP is the weight of the household and NP is the number of people in the household.

In the person data PWGTP is the weight of the person.

I expect these to sum up when the data is joined. The number of people in the household data should equal the number of people in the person data, but often it does not.

For example in household 2019HU0627001 WGTP * NP is 5 * 4, but in the person data for that household sum(PWGTP) is 18.   5 + 4 + 4 + 5.

Am I misunderstanding these fields?

Thanks -

Rob Seed

  • In general the PUMS household weights (WGTP ) apply to the household variables, for example number of bedrooms  (BDSP).  These weights are related to the sampling probability for choosing that  house. The ACS  sample is (with certain conditions like not selecting the same house 2 years in a row see reference below page 4 & 5  for sample selection details) from the Master Address File (MAP) that the US Census has. The person weights (PWGPT) apply to the person variables such as age (AGEP).  Once the household (address) is selected everyone at the address answers the questions.  The person weights are related to the probability of selecting an individual person.

    The ACS is a 2-stage design with certain "bells and whistles."  For a 2-stage design the sampling probability (proportional to 1/weight ) is the product of the probability of selection at stage 1 ( x the probability of select at stage 2, which for the ACS this  is 1 since all household persons in the household are surveyed.  Hence you would expect the person weight PWGPT to be the same as the household weight WGTP.

    I just looked at a PUMS data set for a SERIALNO (household identifier) with 4 corresponding SPORDER records:

    The housing weight is 18

    while the corresponding PWGPT values are

    18 13 14 14

    By the above reasoning all the PWGPT weights for this household should be 18

    However the ACS does some magic and adjusts the weights using other outside data, such as the current  population number from the PEP analysis.  Also they adjust the weights for various person level covariates.   For example, they adjust things like the age distribution so that the age distribution for an area agrees with the age distribution for a larger overall area. See

    https://www2.census.gov/programs-surveys/acs/tech_docs/accuracy/MultiyearACSAccuracyofData2023.pdf

    section WEIGHTING METHODOLOGY

    for all the details.

    There are tables giving the unweighted sample counts

    B98001 Unweighted Housing Unit Sample

    B98003 Unweighted Total Population Sample

    These are counts of the number of households who completed the survey form and the number of people reported on those surveys.

    As you can see the average person weight for the household should be close to the household weight.

    Best,

    Dave Dorer

    To see more details look at

    www.census.gov/.../acs_pums_handbook_2021.pdf

  • Thanks for the detailed response, that makes sense.

    This stuff is more complicated than it looks -- and it looks complicated!

  • Dear Rob,

    I've located a reference giving the ACS method (to adjust person weights:

    This is why the person weights for the members of the household don't agree with the household weight for the  household corresponding to that person.

    www.asasrms.org/.../JSM2005-000335.pdf

    The goal of the research is to integrate administrative record
    data into ACS estimation, specifically data from the Master
    Address File Auxiliary Reference File. For the 36 ACS test
    counties, an extract has been drawn from this file with
    characteristics for year 2000. The data set includes
    demographic characteristics such as age, sex, race, and
    ethnicity, but omits any sensitive items concerning income
    or benefits. The data set also does not include individual
    names or Social Security numbers.

    Here is the method

    The Major Steps to the Weighting Process
    1. Prepare the files for weighting.
    2. Swap housing units for disclosure avoidance.
    3. Form the collapsed estimation strata
    4. Apply the base weights and CAPI sub-sampling weights.
    5. Apply a monthly adjustment to make the total weighted
    number of responses agree with the actual weighted mail
    out each month. (Monthly sample factor).
    6. Apply a non-interview factor (1) by tract and building type.
    7. Apply another non-interview factor (2) by month and
    building type.
    8. Apply another non-interview factor for CAPI cases only
    using month and building type.
    9. Apply a non-interview factor (mode bias factor) by tenure,
    month, and marital status.
    10. Control the housing unit (HU) counts to a larger
    geographic level.
    11. Form the population control weighting cells.
    12. Apply the HU weights to all people in a HU and control
    their weights to the population controls.
    13. Apply the principal person weight to the HU and apply the
    housing unit controls again.
    14. Round the housing unit and person weights.
    15. Identify and down-weight outliers.

    I think that you should be able to get an idea of the ACS post collection weighting calculation.

    If you have questions you can contact me  at info@dorerfoundation.org

    Best,

    Dave

  • Here is another Robert Fay reference

    www.census.gov/.../2007_Fay_01.html

    www.census.gov/.../2007_Fay_01.pdf

    As I usually say "on the first read -- skip the equations."

    If you read the right column of page 2947 you can get the basic idea. One step is roughly adjusting the ACS totals so that they agree with administrative data. Since you are using PUMS data a simple instance of "ratio estimation" comes from using a larger area(say a PUMA) to adjust to a smaller area say a tract. What you can do is take a multi way PUMS cross tabulation and adjust the counts in the table cells by the ratio tract total population/PUMA total population. People use the Geocorr crosswalk files to do this. This a simple example where you make the total N for the adjusted tabulation agree with the total population of the tract. You can do other adjustments. For example adjust a tabulation of SEX (PUMS variable) to agree with the the counts by sex for a census tract.

    A simple example of a ratio type adjustment comes from epidemiology where death rates are adjusted based on the age (and sex) distribution for 2 populations. For example adjusting the US deaths in 2025 to the expected number of deaths for the 2020 age sex distribution. I like to trace this form of estimation back to the 17th century. This basic idea was used by Halley (as in the comet) to construct a life table that he published in 1693. The life table can be used to compute age adjusted death rates and counts.

    I have a computer program that adjusts a multi-way PUMS tabulation to a census tract using ACS detail tables for the tract.