PUMS ACS codebook and the mysterious "b" code

Does anyone know how the Census assigns the PUMS codebook value "b" code to a text value in the downloaded (API) data?

I was looking at the LNGI "Limited English speaking household" PUMS variable.  When you fetch PUMS data via the API the variable values are text fields.

It turns out that for LNGI the "b" in the codebook corresponds to the text string (in the sense of a SAS format)   "b .N/A (GQ/vacant)"   data value "0"

In the LNGI case,the PUMS data field contains the value "0" which it turns means N/A (GQ/vacant). You can guess this because there are no " " fields in the PUMS data and "0" value doesn't appear in the codebook.  By doing a little detective work using the codebook and some downloaded data you can figure this out.  However, "0" isn't used for the "b" in the code book for all variables because sometimes  "0" is a valid non-missing value (i.e. not N/A).

When I asked the people at ACSO they sent me a SAS format statement for LNGI which makes everything clear in the case of this one variable.  But wouldn't it be nice if the Census published their PUMS SAS format catalog so we don't have to keep guessing what data value corresponds to "N/A"  Do people have another way to find out what data value corresponds to "b" in the codebook?  Some PUMS variables have over 100 coded values.   If I'm working late on Friday, do I have to wait until Monday to get an answer ?  How about a holiday weekend ?

Any help appreciated.

Dave Dorer

  • I can't help, but I can say that all the different values for 0 or N/A etc are truly terrible.  I had a conversation with Donna Daily a few weeks back and she said that data.census was a work in progress. However, it's not just that platform, regardless of the tool, there are lots of issues.  I sent a follow-up email with the following text

    "with with a department with hundreds of PhDs working the math, it makes me think that we're not getting our money's worth from your budget of 4-5 billion."

    She was nice and upfront with acknowledging issues. However, no fire to fix all this stuff. 

    I'd like to get her on a zoom meeting with a few key people in this group (not me, as I'm clueless). There is a specific person in her department who is supposed to be working on these issues (I wasn't given a name). I just want all the percentages formatted with a % and have a field name with the word percent in it. (which should be low-hanging fruit).

    Tom

  • Dear Glenn,

    It could help but when I check LNGI field in the file you referenced, it shows '    '  4 blanks as the "missing value."   The comment at the head of the formats file indicates that it is from 2008 and thus it would not be current.  In the 5 year 2020 PUMS codebook it indicates a 1 character field.  The email that I got from the people at ACSO was correct and it indicates that "missing - GQ/vacant" corresponds to "0." for the 5 year 2020 vintage PUMS data.  So at this point the only way that I can see to proceed is to download some data and do what I call "data archaeology"  to figure out which code in the downloaded data does not appear in the codebook. For fields with a lot of codes when you download some data all the codes in the codebook might not appear in the downloaded data and you will be "stuck."   Hopefully someone in this group will have experience with this.

    Best,

    Dave

  • Dear Tom,

    I realize that when dealing with "the government" it can be frustrating at times.  The last 2 sentences at the end of my initial posting expressed some of this.  My comments were inappropriate and I am apologizing to all of the intelligent, hard working, dedicated people who work at the Census.  All of my interactions with the people at the Census have been very professional and their responses prompt and detailed.  Earlier in my career I spent some time at NASA.  The people at NASA were the same.   I've also emailed, talked with  or interviewed with people at the FDA (Food and Drug Administration), the NCI (National Cancer Institute) and the CDC (Centers for Disease Control and Prevention).  Even the IRS.   When I get frustrated, I "take out a piece of paper," i.e. start a word processing document, and I write my two Senators and my Representative. I let my letter "sit" for a day or two.  Then I end the letter with my suggestions as to how to improve the situation.  I sign the letter with my name and address and then  email a scan of the signed letter. Ultimately Congress is responsible for how our Federal agencies are funded and whether there is adequate staffing. I know this this reply is "off topic."  But I think that it needs to be said.

    Best,

    Dave

  • Hi Dave!

    The API does not handle 'blank' as a value, so in the processing of the data into the data API format those blank values get assigned to a new value. This new value is typically the variable's minimum value minus 1, for numeric fields. For character fields they may get stored as an 'N'. The best way to know how the values are defined for the API is to use the API discovery documentation (for info on this, see https://www.census.gov/data/developers/updates/new-discovery-tool.html).
     
    For example, for the ACS 1-Year 2021 PUMS file, you can look here: https://api.census.gov/data/2021/acs/acs1/pums.html and go the the 'variables' link.
    This brings you to a page listing every available item, an each item name is clickable to take you to the definition of the item and its values.
    Thank you!
    -Amanda
  • Dear Amanda,

    Thank You,

    I'm writing an R program to read/decode

    https://api.census.gov/data.json

    I'll post how things go !

    Professional, intelligent, prompt and detailed !

    I expect that PUMS users will find this post useful,

    Thank you again.

    Dave

  • Thank you, please do keep us posted! We will have some opportunities coming for data users to share the scripts they create!

  • Trying to keep this positive (and failing), I'm not a researcher or in the industry. But I'm old and have probably pulled 100 API sources and have never seen such a funky system. Even the PubMed API which uses strange 20-year-old syntax has a clear methodology for nulls \ blanks \ and zero values. Taking a random value based on min is a mix between Dilbert and Steven King. Accepting this is a failure on our part.

  • For those of you who know R and who want to look up the code/label for a single PUMS variable here is some code that uses the "metadata" api 

    you need "jsonlite"

    install.packages("jsonlite") to install

    The blog interface messes up the R code paste with the url .

    replace H-T-T-P-S with https.

    # Example Call: gpcv<-get.pums.codebook.variable("PERNP",vintage=2020,period=5,debug=1)

    get.pums.codebook.variable<-function(vname="LNGI",vintage="2021",period="1",debug=0) {
    vname<-toupper(vname);

    url<-paste("h-t-t-p-s://api.census.gov/data/",vintage,"/acs/acs",period,"/pums/variables.json",sep="");

    if(debug) cat("get.pums.codebook.variable: url=\n",url,"\n",sep="");
    require("jsonlite");
    r<-try(jsonlite::fromJSON(url));
    if(class(r)[1]=="try-error") {
    cat("get.pums.codebook.variable: Download error: url=\n",url,"\n",sep="");
    return(2);
    }; # if class

    variables<-r$variables;
    nmv<-toupper(names(variables));
    m<-match(vname,nmv);
    if(any(is.na(m))) {cat("get.pums.codebook.variable: ERROR variable: ",vname,"not matched.\n");return(1);};
    vm<-variables[[nmv[m]]];
    val<-vm[["values"]];
    items<-val[["item"]];
    nmi<-names(items);
    cb<-NULL;
    M<-length(nmi);
    if(M>0) {
    lab<-rep("",M);
    for(j in seq(1,M)) {
    lab[j]<-items[[j]];
    }; # for j
    cb<-data.frame(code=nmi,value=lab);
    }; # # M > 0
    rng<-val[["range"]];
    invisible(list(variable=vname,label=vm$label,codes=cb,range=rng,type=vm$predicateType));
    }; # get.pums.codebook.variable

    Enjoy

    If you want to contact me with comments or bug reports:

      info@dorerfoundation.org

    Note this was updated 12-21-2022 at 3:47 pm EST

    Note I've found 2 bug so far..  If you downloaded earlier, please update.

    Dave