Why don't PUMS household and person weights sum up? WGTP * NP != sum(PWGTP)

robseed 1 month ago

In the household data WGTP is the weight of the household and NP is the number of people in the household.

In the person data PWGTP is the weight of the person.

I expect these to sum up when the data is joined. The number of people in the household data should equal the number of people in the person data, but often it does not.

For example in household 2019HU0627001 WGTP * NP is 5 * 4, but in the person data for that household sum(PWGTP) is 18. 5 + 4 + 4 + 5.

Am I misunderstanding these fields?

Thanks -

Rob Seed

David Dorer 1 month ago

In general the PUMS household weights (WGTP ) apply to the household variables, for example number of bedrooms (BDSP). These weights are related to the sampling probability for choosing that house. The ACS sample is (with certain conditions like not selecting the same house 2 years in a row see reference below page 4 & 5 for sample selection details) from the Master Address File (MAP) that the US Census has. The person weights (PWGPT) apply to the person variables such as age (AGEP). Once the household (address) is selected everyone at the address answers the questions. The person weights are related to the probability of selecting an individual person.

The ACS is a 2-stage design with certain "bells and whistles." For a 2-stage design the sampling probability (proportional to 1/weight ) is the product of the probability of selection at stage 1 ( x the probability of select at stage 2, which for the ACS this is 1 since all household persons in the household are surveyed. Hence you would expect the person weight PWGPT to be the same as the household weight WGTP.

I just looked at a PUMS data set for a SERIALNO (household identifier) with 4 corresponding SPORDER records:

The housing weight is 18

while the corresponding PWGPT values are

18 13 14 14

By the above reasoning all the PWGPT weights for this household should be 18

However the ACS does some magic and adjusts the weights using other outside data, such as the current population number from the PEP analysis. Also they adjust the weights for various person level covariates. For example, they adjust things like the age distribution so that the age distribution for an area agrees with the age distribution for a larger overall area. See

https://www2.census.gov/programs-surveys/acs/tech_docs/accuracy/MultiyearACSAccuracyofData2023.pdf

section WEIGHTING METHODOLOGY

for all the details.

There are tables giving the unweighted sample counts

B98001 Unweighted Housing Unit Sample

B98003 Unweighted Total Population Sample

These are counts of the number of households who completed the survey form and the number of people reported on those surveys.

As you can see the average person weight for the household should be close to the household weight.

Best,

Dave Dorer

To see more details look at

www.census.gov/.../acs_pums_handbook_2021.pdf
Cancel
Up 0 Down

Reply

Cancel
robseed 1 month ago in reply to David Dorer

Thanks for the detailed response, that makes sense.

This stuff is more complicated than it looks -- and it looks complicated!
Cancel
Up 0 Down

Reply

Cancel
David Dorer 1 month ago in reply to robseed

Dear Rob,

I've located a reference giving the ACS method (to adjust person weights:

This is why the person weights for the members of the household don't agree with the household weight for the household corresponding to that person.

www.asasrms.org/.../JSM2005-000335.pdf

The goal of the research is to integrate administrative record
data into ACS estimation, specifically data from the Master
Address File Auxiliary Reference File. For the 36 ACS test
counties, an extract has been drawn from this file with
characteristics for year 2000. The data set includes
demographic characteristics such as age, sex, race, and
ethnicity, but omits any sensitive items concerning income
or benefits. The data set also does not include individual
names or Social Security numbers.

Here is the method

The Major Steps to the Weighting Process
1. Prepare the files for weighting.
2. Swap housing units for disclosure avoidance.
3. Form the collapsed estimation strata
4. Apply the base weights and CAPI sub-sampling weights.
5. Apply a monthly adjustment to make the total weighted
number of responses agree with the actual weighted mail
out each month. (Monthly sample factor).
6. Apply a non-interview factor (1) by tract and building type.
7. Apply another non-interview factor (2) by month and
building type.
8. Apply another non-interview factor for CAPI cases only
using month and building type.
9. Apply a non-interview factor (mode bias factor) by tenure,
month, and marital status.
10. Control the housing unit (HU) counts to a larger
geographic level.
11. Form the population control weighting cells.
12. Apply the HU weights to all people in a HU and control
their weights to the population controls.
13. Apply the principal person weight to the HU and apply the
housing unit controls again.
14. Round the housing unit and person weights.
15. Identify and down-weight outliers.

I think that you should be able to get an idea of the ACS post collection weighting calculation.

If you have questions you can contact me at info@dorerfoundation.org

Best,

Dave
Cancel
Up 0 Down

Reply

Cancel
David Dorer 1 month ago in reply to robseed

Here is another Robert Fay reference

www.census.gov/.../2007_Fay_01.html

www.census.gov/.../2007_Fay_01.pdf

As I usually say "on the first read -- skip the equations."

If you read the right column of page 2947 you can get the basic idea. One step is roughly adjusting the ACS totals so that they agree with administrative data. Since you are using PUMS data a simple instance of "ratio estimation" comes from using a larger area(say a PUMA) to adjust to a smaller area say a tract. What you can do is take a multi way PUMS cross tabulation and adjust the counts in the table cells by the ratio tract total population/PUMA total population. People use the Geocorr crosswalk files to do this. This a simple example where you make the total N for the adjusted tabulation agree with the total population of the tract. You can do other adjustments. For example adjust a tabulation of SEX (PUMS variable) to agree with the the counts by sex for a census tract.

A simple example of a ratio type adjustment comes from epidemiology where death rates are adjusted based on the age (and sex) distribution for 2 populations. For example adjusting the US deaths in 2025 to the expected number of deaths for the 2020 age sex distribution. I like to trace this form of estimation back to the 17th century. This basic idea was used by Halley (as in the comet) to construct a life table that he published in 1693. The life table can be used to compute age adjusted death rates and counts.

I have a computer program that adjusts a multi-way PUMS tabulation to a census tract using ACS detail tables for the tract.
Cancel
Up 0 Down

Reply

Cancel

Why don't PUMS household and person weights sum up? WGTP * NP != sum(PWGTP)

B98001 Unweighted Housing Unit Sample

B98003 Unweighted Total Population Sample