Implied household population by household size inconsistencies

Working with block groups, I've noticed discrepancies between the households by size and the household population, for example, suppose I have a block group with the following:

- population in occupied households: 1454

- households by household size:

  • 1: 89
  • 2: 273
  • 3: 72
  • 4: 97
  • 5: 31
  • 6: 0
  • 7+: 0

To get household population by household size, I would think that I could take each household count for sizes 1-6, and multiply them by the corresponding household size. The pop for 7+ households would then be the total household population minus the sum of the calculated populations for household sizes 1-6. So for this case:

  • 1: 89
  • 2: 546
  • 3: 216
  • 4: 388
  • 5: 155
  • 6: 0
  • total pop for household sizes 1 - 6: 1394
  • remaining for 7+: 89

Note the inconsistency here: we have 0 7+ plus households, but we to match the household population we would theoretically need to have 89 in this group. I've also noticed cases where the inverse is true: there are 7+ households, but adding the implied total across 1-6 household size groupings is larger than the entire household population for the block group (i.e. the pop in 7+ households would then be negative).

So I guess I'm looking for suggestions as to why this is occurring and on how to make this more consistent.

My first inclination is to scale the population variables so that they match the implied population in the household sizes, but am curious if anyone has dealt with this differently. 


  • The discrepancy here is likely the result of differences between household and person weights. The total population number (1454) is estimated using person weights. Person weights have been adjusted (post-stratified) to match independent population estimates of key demographic groups. This process may cause respondents within the same household to have different weights.

    When you estimate a total population by adding up the number of 1-person households with 2x # of 2-person households, plus 3x # of 3-person households, etc., etc., this is effectively the equivalent to estimating the population with person weights where all are equal to the respondent's household weight.

    I hope this helps.
  • Hi Scott -
    I first came across this problem about 10 years ago. It stems from controlling routines (my understanding is that it stems from the technique used to control group quarters). In fact, this problem was such an issue for the work that I was doing that I wrote a paper describing a technique for creating outside-of-ACS estimates of households by household size.

    The solution you use will be dependent upon your end goal with the data. (For example, do you need a robust estimate of household by size for some infrastructure planning?)

    Anyway - I'm happy to share more information. Please feel free to send me an email (bjarosz (at) prb (dot) org) and I'll share insights and papers.
  • In reply to Beth Jarosz:

    Hi Beth,

    This is an old question, but I'm running into this same issue while trying to generate a synthetic population for transportation modeling purposes. I need to meet total household and population control totals, while also controlling the household size distribution. I would appreciate any insights or resources you can provide on how to adjust or create household size data that will line up with total households and population. Thank you!
  • In reply to Maribeth Todd:

    History repeating itself (that's the exact same application where I first ran into this problem 10 years ago)!

    So... There are two possible approaches: 1 - control to population and household controls (and drop the HH size controls), or 2 - create your own households-by-household size distribution and ignore what's in ACS because it's logically inconsistent with the household population estimates. (If you choose approach 2, you will keep the avg. household size info from ACS, it's just the distribution by size category that's a mess.)

    Given the importance of household size for transportation modeling, option 2 was the approach I took when working on this problem. Developing your own household size bins sounds complicated (a derivation of the Poisson distribution) but the math is actually pretty straightforward.

    The original version of the methodological approach is here:
    And I posted a revised version here:
  • In reply to Beth Jarosz:

    Thank you so much for the prompt response! I'll take a look at the Poisson distribution approach.