Does anyone know how the Census assigns the PUMS codebook value "b" code to a text value in the downloaded (API) data?
I was looking at the LNGI "Limited English speaking household" PUMS variable. When you fetch PUMS data via the API the variable values are text fields.
It turns out that for LNGI the "b" in the codebook corresponds to the text string (in the sense of a SAS format) "b .N/A (GQ/vacant)" data value "0"
In the LNGI case,the PUMS data field contains the value "0" which it turns means N/A (GQ/vacant). You can guess this because there are no " " fields in the PUMS data and "0" value doesn't appear in the codebook. By doing a little detective work using the codebook and some downloaded data you can figure this out. However, "0" isn't used for the "b" in the code book for all variables because sometimes "0" is a valid non-missing value (i.e. not N/A).
When I asked the people at ACSO they sent me a SAS format statement for LNGI which makes everything clear in the case of this one variable. But wouldn't it be nice if the Census published their PUMS SAS format catalog so we don't have to keep guessing what data value corresponds to "N/A" Do people have another way to find out what data value corresponds to "b" in the codebook? Some PUMS variables have over 100 coded values. If I'm working late on Friday, do I have to wait until Monday to get an answer ? How about a holiday weekend ?
Any help appreciated.
Dave Dorer
Dear Tom,
I realize that when dealing with "the government" it can be frustrating at times. The last 2 sentences at the end of my initial posting expressed some of this. My comments were inappropriate…
Hi Dave!
Dear Amanda,
Thank You,
I'm writing an R program to read/decode
https://api.census.gov/data.json
I'll post how things go !
Professional, intelligent, prompt and detailed !
I expect that PUMS users will find this post useful,
Thank you again.
Dave
Thank you, please do keep us posted! We will have some opportunities coming for data users to share the scripts they create!
Trying to keep this positive (and failing), I'm not a researcher or in the industry. But I'm old and have probably pulled 100 API sources and have never seen such a funky system. Even the PubMed API which uses strange 20-year-old syntax has a clear methodology for nulls \ blanks \ and zero values. Taking a random value based on min is a mix between Dilbert and Steven King. Accepting this is a failure on our part.
For those of you who know R and who want to look up the code/label for a single PUMS variable here is some code that uses the "metadata" api
you need "jsonlite"
install.packages("jsonlite") to install
The blog interface messes up the R code paste with the url .
replace H-T-T-P-S with https.
# Example Call: gpcv<-get.pums.codebook.variable("PERNP",vintage=2020,period=5,debug=1)
get.pums.codebook.variable<-function(vname="LNGI",vintage="2021",period="1",debug=0) { vname<-toupper(vname);
url<-paste("h-t-t-p-s://api.census.gov/data/",vintage,"/acs/acs",period,"/pums/variables.json",sep="");
if(debug) cat("get.pums.codebook.variable: url=\n",url,"\n",sep=""); require("jsonlite"); r<-try(jsonlite::fromJSON(url)); if(class(r)[1]=="try-error") { cat("get.pums.codebook.variable: Download error: url=\n",url,"\n",sep=""); return(2); }; # if class
variables<-r$variables; nmv<-toupper(names(variables)); m<-match(vname,nmv); if(any(is.na(m))) {cat("get.pums.codebook.variable: ERROR variable: ",vname,"not matched.\n");return(1);}; vm<-variables[[nmv[m]]]; val<-vm[["values"]]; items<-val[["item"]]; nmi<-names(items); cb<-NULL; M<-length(nmi); if(M>0) { lab<-rep("",M); for(j in seq(1,M)) { lab[j]<-items[[j]]; }; # for j cb<-data.frame(code=nmi,value=lab); }; # # M > 0 rng<-val[["range"]]; invisible(list(variable=vname,label=vm$label,codes=cb,range=rng,type=vm$predicateType));}; # get.pums.codebook.variable
Enjoy
If you want to contact me with comments or bug reports:
info@dorerfoundation.org
Note this was updated 12-21-2022 at 3:47 pm EST
Note I've found 2 bug so far.. If you downloaded earlier, please update.