In the household data WGTP is the weight of the household and NP is the number of people in the household.
In the person data PWGTP is the weight of the person.
I expect these to sum up when the data is joined. The number of people in the household data should equal the number of people in the person data, but often it does not.
For example in household 2019HU0627001 WGTP * NP is 5 * 4, but in the person data for that household sum(PWGTP) is 18. 5 + 4 + 4 + 5.
Am I misunderstanding these fields?
Thanks -
Rob Seed
In general the PUMS household weights (WGTP ) apply to the household variables, for example number of bedrooms (BDSP). These weights are related to the sampling probability for choosing that house. The ACS sample is (with certain conditions like not selecting the same house 2 years in a row see reference below page 4 & 5 for sample selection details) from the Master Address File (MAP) that the US Census has. The person weights (PWGPT) apply to the person variables such as age (AGEP). Once the household (address) is selected everyone at the address answers the questions. The person weights are related to the probability of selecting an individual person.
The ACS is a 2-stage design with certain "bells and whistles." For a 2-stage design the sampling probability (proportional to 1/weight ) is the product of the probability of selection at stage 1 ( x the probability of select at stage 2, which for the ACS this is 1 since all household persons in the household are surveyed. Hence you would expect the person weight PWGPT to be the same as the household weight WGTP.
I just looked at a PUMS data set for a SERIALNO (household identifier) with 4 corresponding SPORDER records:
The housing weight is 18
while the corresponding PWGPT values are
18 13 14 14
By the above reasoning all the PWGPT weights for this household should be 18
However the ACS does some magic and adjusts the weights using other outside data, such as the current population number from the PEP analysis. Also they adjust the weights for various person level covariates. For example, they adjust things like the age distribution so that the age distribution for an area agrees with the age distribution for a larger overall area. See
https://www2.census.gov/programs-surveys/acs/tech_docs/accuracy/MultiyearACSAccuracyofData2023.pdf
section WEIGHTING METHODOLOGY
for all the details.
There are tables giving the unweighted sample counts
These are counts of the number of households who completed the survey form and the number of people reported on those surveys.
As you can see the average person weight for the household should be close to the household weight.
Best,
Dave Dorer
To see more details look at
www.census.gov/.../acs_pums_handbook_2021.pdf
Thanks for the detailed response, that makes sense.
This stuff is more complicated than it looks -- and it looks complicated!
Dear Rob,
I've located a reference giving the ACS method (to adjust person weights:
This is why the person weights for the members of the household don't agree with the household weight for the household corresponding to that person.
www.asasrms.org/.../JSM2005-000335.pdf
The goal of the research is to integrate administrative recorddata into ACS estimation, specifically data from the MasterAddress File Auxiliary Reference File. For the 36 ACS testcounties, an extract has been drawn from this file withcharacteristics for year 2000. The data set includesdemographic characteristics such as age, sex, race, andethnicity, but omits any sensitive items concerning incomeor benefits. The data set also does not include individualnames or Social Security numbers.
Here is the method
The Major Steps to the Weighting Process1. Prepare the files for weighting.2. Swap housing units for disclosure avoidance.3. Form the collapsed estimation strata4. Apply the base weights and CAPI sub-sampling weights.5. Apply a monthly adjustment to make the total weightednumber of responses agree with the actual weighted mailout each month. (Monthly sample factor).6. Apply a non-interview factor (1) by tract and building type.7. Apply another non-interview factor (2) by month andbuilding type.8. Apply another non-interview factor for CAPI cases onlyusing month and building type.9. Apply a non-interview factor (mode bias factor) by tenure,month, and marital status.10. Control the housing unit (HU) counts to a largergeographic level.11. Form the population control weighting cells.12. Apply the HU weights to all people in a HU and controltheir weights to the population controls.13. Apply the principal person weight to the HU and apply thehousing unit controls again.14. Round the housing unit and person weights.15. Identify and down-weight outliers.
I think that you should be able to get an idea of the ACS post collection weighting calculation.
If you have questions you can contact me at info@dorerfoundation.org
Dave
Here is another Robert Fay reference
www.census.gov/.../2007_Fay_01.html
www.census.gov/.../2007_Fay_01.pdf
As I usually say "on the first read -- skip the equations."
If you read the right column of page 2947 you can get the basic idea. One step is roughly adjusting the ACS totals so that they agree with administrative data. Since you are using PUMS data a simple instance of "ratio estimation" comes from using a larger area(say a PUMA) to adjust to a smaller area say a tract. What you can do is take a multi way PUMS cross tabulation and adjust the counts in the table cells by the ratio tract total population/PUMA total population. People use the Geocorr crosswalk files to do this. This a simple example where you make the total N for the adjusted tabulation agree with the total population of the tract. You can do other adjustments. For example adjust a tabulation of SEX (PUMS variable) to agree with the the counts by sex for a census tract.
A simple example of a ratio type adjustment comes from epidemiology where death rates are adjusted based on the age (and sex) distribution for 2 populations. For example adjusting the US deaths in 2025 to the expected number of deaths for the 2020 age sex distribution. I like to trace this form of estimation back to the 17th century. This basic idea was used by Halley (as in the comet) to construct a life table that he published in 1693. The life table can be used to compute age adjusted death rates and counts.
I have a computer program that adjusts a multi-way PUMS tabulation to a census tract using ACS detail tables for the tract.