The ACS Office at the Census Bureau is currently testing a new format for the ACS Summary File, which is a comma-delimited text file that contains all the Detailed Tables for the ACS.
Information about the proposed updates to the ACS Summary File are described on the Census Bureau's website.
We are starting this new Discussion Thread so that ACS data users can post any comments or questions about the proposes changes. ACS Summary File users are also encouraged to participate in the webinar scheduled for this afternoon on this topic.
This seems like something we could adapt to fairly readily.
I'd like to make a plea for structured metadata which is published in something other than a variety of XLSX files. Things that application…
As a longtime ACS Summary File user, this is a huge, and welcome change. Perhaps the best improvement is having column headers in the data files. This not only reduces the complexity in using the files…
The FTP site includes a file that I think is the complete data file (acsdt5y2018.zip) --but it's listed as 11 gigabytes. After unzipping, that's a TON of data to sift through. I also appreciate having…
The proposed new naming convention (e.g., B01001_001E) is consistent with the Census API, which my organization makes great use of. We use the summary file a lot as well, and the first step we do with the summary file is convert the field names into the API format, so that we're using one naming convention across our work. I think the new naming convention is a welcome change.
I'm sure that having one scheme would make it easier for Census Bureau staff, and for users who might need to join API and summary file data.
At the same time, I use the API and the summary files very differently, and I personally don't need the naming conventions to be consistent. The API is great when I just need a few tables for a single set of geographies, but not if I need many tables for many kinds of geographies (which is more often the case). I and many others have so much code that depends on the existing naming convention--specifically the ability to refer to ranges of variables by numbers, which will be much harder with the proposed framework. This is true for users of SAS, R, Stata, and probably other programs. I realize that we users can always convert the API-style names back to existing summary file-style names (B01001_001E --> B01001e1), but that extra work for users (which would be quite difficult for novices) seems to undermine one of the reasons for this change.
I'd love any information that would assuage my concerns, though--have you found advantages to using the API's naming convention rather than the summary file's naming convention?