Hello. I can't seem to find documentation anywhere about this; I know that the released PUMS datasets are approximately two thirds of the total ACS in any given year, but how does Census they go about subsetting this?

The main thing that I'm concerned about is: is it possible that the PUMS includes partial households? As in, some people in a household were removed going from ACS to PUMS, yet other people on the same household are kept in. It seems like this would seriously limit many applications of the PUMS, but I've long since learned to not assume that the PUMS data work how I want.

  • (Turns out, they don't split up households, as expected.)
    Glad you found what you were looking for. I'm adding in some info for the record in case it's helpful to you or others. I find that I have a hard time remembering there things are located because the information is spread over so many documents. But I was recently trying to answer a similar question and most descriptive documentation I could find was in the 1-year ACS PUMS Accuracy document, on page 5.

    The 2016 PUMS was designed to include one percent of the housing units and one percent of the
    GQ persons in the United States and Puerto Rico. The PUMS sample was selected from the full
    sample ACS records separately for Housing Units (HUs) and GQ persons. The PUMS sample
    sizes were based on the Population Estimates Program estimates for housing units and GQ
    The PUMS sample of persons in households was selected by keeping all persons in selected
    PUMS HUs. The systematic sampling method used sampling intervals chosen to yield the
    sample sizes given in Table 1 and Table 2 by state, DC and PR. The sampling interval for each
    state and HU/GQ sample is the ratio of the number of interviewed records available for sampling
    and the required sample size (sampling intervals are not rounded to integers).

    The above compares with this link about overall ACS sample sizes and table B00002 (also B98001):