Thank you for your insights in advance. I would like to better understand the PUMS data variables, missing data, and allocated questions. This is the first time I am working with PUMS data and analyzing such a large dataset.
1. Why do some questions have allocation variables associated with them and others do not? For example, in the household unit variables, the question multigenerational household does not have an associated allocation flag variable.
2. What is the relationship between the allocation flag variable and the corresponding question? Do the questions with allocation flag variables have the allocated answers within the observations or does the missing data in the question responses represent the number of people who chose not to answer or had an implausible response?
Multigenerational household (https://usa.ipums.org/usa-action/variables/MULTGEN) is a derived variable -- everything is taken from the `relate` variable (https://usa.ipums.org/usa-action/variables/RELATE). Census Bureau fully resolves it, and that's why there are no missing values.
The "allocated" items are true missing data. If I were to ask you what year your house was built, some would answer in a snap, some would have to go look for records, and most renters probably would not care figuring that out. This item has one of the higher item-missing data rates (https://www.census.gov/acs/www/methodology/sample-size-and-data-quality/item-allocation-rates/). The way this variable is constructed and edited is includes a number of plausibility checks and borrowing the values across (https://usa.ipums.org/usa-action/variables/BUILTYR2#editing_procedure_section). Allocation / imputation procedure uses donors from the similar ownership status and house value (which itself is frequently imputed).