# Multigenerational Households In PUMS

I have been working on reproducing the state-level estimates of multigenerational households (households with three or more generations living together) so as to calculate it by PUMA. It's taken me quite some time, but I've got it to be off by no more than 0.1 percentage point of what's available in Table B11017 for 2016-2020, so I think I've got it!  (My understanding is that these slight differences are expected because PUMS is as sample of the full dataset).  I'd like to share this code/logic with anyone else who's interested, seeing as how it took me so long!  Hope this is helpful for someone out there.

My SAS code, below:

/*multigenerational...code the three possibilities from census, checks out with the state values!*/
*
Multigenerational households are family households consisting of three or more generations.
These households include (1) a householder, a parent or parent-in-law of the householder, and an own child of the householder,
(2) a householder, an own child of the householder, and a grandchild of the householder, or
(3) a householder, a parent or parent-in-law of the householder, an own child of the householder, and a grandchild of the householder.
The householder is a person in whose name the home is owned, being bought, or rented, and who answers the survey questionnaire as person 1. (relshipp = 20)
;

data work.parents_of_hhdr;
set out.pums_file;
if relshipp in ('29','31'); *parents or parent-in-law;
keep serialno;
run;

data work.grandchildren_of_hhdr;
set out.pums_file;
if relshipp in ('30'); *grandchild;
keep serialno;
run;

data work.children_of_hhdr;
set out.pums_file;
if relshipp in ('25','26','27');
keep serialno;
run;

data work.householders;
set out.pums_file;
if relshipp = '20';
run;

data work.multigen_1;
merge work.householders (in = a) work.parents_of_hhdr (in = b) work.children_of_hhdr (in = c) work.grandchildren_of_hhdr (in = d);
by serialno;
if (a and b and c); *(1) a householder, a parent or parent-in-law of the householder, and an own child of the householder;
run;

data work.multigen_2;
merge work.householders (in = a) work.parents_of_hhdr (in = b) work.children_of_hhdr (in = c) work.grandchildren_of_hhdr (in = d);
by serialno;
if (a and c and d); *(2) a householder, an own child of the householder, and a grandchild of the householder;
run;

data work.multigen_3;
merge work.householders (in = a) work.parents_of_hhdr (in = b) work.children_of_hhdr (in = c) work.grandchildren_of_hhdr (in = d);
by serialno;
if (a and b and c and d); *(3) a householder, a parent or parent-in-law of the householder, an own child of the householder, and a grandchild of the householder;
run;

data work.multigen;
set work.multigen_1 work.multigen_2 work.multigen_3;
run;

proc sort data = work.multigen noduprecs out=multigen_dedup; by serialno; run;  *dropping duplicate serialno values;

*do state first to check the values;
proc freq data = work.multigen_dedup;
tables st / out=work.multigen_hhds (rename = COUNT = multigen_hhds drop = percent) noprint;
weight wgtp; *using the housing unit (household) weight instead of the pwgtp;
run;

*get denominator;
proc freq data = work.householders;
tables st / out=work.total_hhds (rename = COUNT = total_hhds drop = percent) noprint;
weight wgtp;
run;

proc sort data = work.multigen_hhds; by st; run;
proc sort data = work.total_hhds; by st; run;

*merge by state;
data work.multigen_test;
merge work.multigen_hhds work.total_hhds;
by st;
run;

data work.multigen_test;
set work.multigen_test;
multigen_pct = round((100 * multigen_hhds / total_hhds),.1);
run;

*comparing to published summary table, it's off by .1 percent for many states, but DC was dead on both in counts and percentage. Some were only off by very few counts;