PUMS ACS codebook and the mysterious "b" code

Does anyone know how the Census assigns the PUMS codebook value "b" code to a text value in the downloaded (API) data?

I was looking at the LNGI "Limited English speaking household" PUMS variable.  When you fetch PUMS data via the API the variable values are text fields.

It turns out that for LNGI the "b" in the codebook corresponds to the text string (in the sense of a SAS format)   "b .N/A (GQ/vacant)"   data value "0"

In the LNGI case,the PUMS data field contains the value "0" which it turns means N/A (GQ/vacant). You can guess this because there are no " " fields in the PUMS data and "0" value doesn't appear in the codebook.  By doing a little detective work using the codebook and some downloaded data you can figure this out.  However, "0" isn't used for the "b" in the code book for all variables because sometimes  "0" is a valid non-missing value (i.e. not N/A).

When I asked the people at ACSO they sent me a SAS format statement for LNGI which makes everything clear in the case of this one variable.  But wouldn't it be nice if the Census published their PUMS SAS format catalog so we don't have to keep guessing what data value corresponds to "N/A"  Do people have another way to find out what data value corresponds to "b" in the codebook?  Some PUMS variables have over 100 coded values.   If I'm working late on Friday, do I have to wait until Monday to get an answer ?  How about a holiday weekend ?

Any help appreciated.

Dave Dorer

Parents Reply Children
  • Dear Glenn,

    It could help but when I check LNGI field in the file you referenced, it shows '    '  4 blanks as the "missing value."   The comment at the head of the formats file indicates that it is from 2008 and thus it would not be current.  In the 5 year 2020 PUMS codebook it indicates a 1 character field.  The email that I got from the people at ACSO was correct and it indicates that "missing - GQ/vacant" corresponds to "0." for the 5 year 2020 vintage PUMS data.  So at this point the only way that I can see to proceed is to download some data and do what I call "data archaeology"  to figure out which code in the downloaded data does not appear in the codebook. For fields with a lot of codes when you download some data all the codes in the codebook might not appear in the downloaded data and you will be "stuck."   Hopefully someone in this group will have experience with this.

    Best,

    Dave