The ACS Office at the Census Bureau is currently testing a new format for the ACS Summary File, which is a comma-delimited text file that contains all the Detailed Tables for the ACS.
Information about the proposed updates to the ACS Summary File are described on the Census Bureau's website.
We are starting this new Discussion Thread so that ACS data users can post any comments or questions about the proposes changes. ACS Summary File users are also encouraged to participate in the webinar scheduled for this afternoon on this topic.
This seems like something we could adapt to fairly readily.
I'd like to make a plea for structured metadata which is published in something other than a variety of XLSX files. Things that application…
As a longtime ACS Summary File user, this is a huge, and welcome change. Perhaps the best improvement is having column headers in the data files. This not only reduces the complexity in using the files…
The FTP site includes a file that I think is the complete data file (acsdt5y2018.zip) --but it's listed as 11 gigabytes. After unzipping, that's a TON of data to sift through. I also appreciate having…
As a longtime ACS Summary File user, this is a huge, and welcome change. Perhaps the best improvement is having column headers in the data files. This not only reduces the complexity in using the files, but it lessens the possibility of errors.One request: When the data is made available, it would be helpful for the files to be available as a bulk download, for people that want the entire dataset for all geography levels for the whole country. This could mean as a single (or set) of compressed files, or folders in a real FTP page, so they could be accessed like a file system (rather than a web page).
Another suggestion: Perhaps the data files could have a column with the time frame of the data. Even though this would only be one value for the whole file (the year of the data), it would help when loading it into a database.Thank you this change!
The FTP site includes a file that I think is the complete data file (acsdt5y2018.zip) --but it's listed as 11 gigabytes. After unzipping, that's a TON of data to sift through. I also appreciate having the column headers, and I like your suggestion to add the time period of the data. But for people (like me) who need data from many tables but only one or two states, it's going to be extremely difficult to have the files separated by tables rather than by states. I would not look forward to reading data for the entire country for every single table I need.
Oh for sure, it shouldn't be available exclusively as a single compressed file! The way the files are currently available, there are compressed folders for all larger geographies and tracts and block groups, as well as folders with individual files for each state/table. Continuing to have these various options would be great.
I do understand that the proposal is for the ACS but will these changes potentially also apply to 2020 Census data products?
As someone who uses SAS to build datasets from the raw ACS data files and perform subsequent data analysis, I would strongly advise against naming the fields/variables with an "E' or "M" at the end of the name. This would make it more difficult to use a range of variables in calculations, for example, when collapsing a table into broader categories like age groups, educational attainment, etc. So instead of fields/variables formatted like this:
B01001_001EB01001_001MB01001_002EB01001_002MB01001_003EB01001_003M
I would suggest a naming convention more like this:
B01001_E001B01001_M001B01001_E002B01001_M002B01001_E003B01001_M003
Just my .02
Yes, yes, YES to this. I'm also a "power" SAS user; this change would really impact named ranges in SAS programs.
It'll be a bit confusing since some tables have letter suffixes (like B01001E_001). Someone could easily confuse B01001E_E001 with B01001_E001. Not to say it shouldn't be done, but something to consider. Also, it would kill continuity of column headers with previous years' data (which I assume will not be re-released in the new format).Also, I've been using this data since the beginning of ACS, and I still think "Error" and not "Estimate" every time I see that E. Am I the only one?
you're not the only one!
In reference to your statement,
"Also, it would kill continuity of column headers with previous years' data (which I assume will not be re-released in the new format).",
were the previous data ever released with the "E" or "M" appended to the end of the field/variable name? We use custom SAS programs to build the SAS datasets from the raw ACS data (not the CB provided SAS programs). I don't recall any of the previous data files having a header row with field/variable names. From the CB provided SAS programs, it appears the variables are named in the xxxe001 manner and not as xxx001e, but I could be mistaken.Regardless, it looks like any change that includes both the estimates and MOEs in the same data will necessitate naming the variables in such a way that it may "break" continuity with previous data releases, unless the end-user built the datasets to account for that.
Good point; my bad. The E/M was made as a prefix to the data filenames and worksheet tab name in the data templates. So it will necessitate a change, as you say.
NO!
Jeffrey Jordan said:were the previous data ever released with the "E" or "M" appended to the end of the field/variable name?
As for Bernie's comment -- Lots of us depend on these files to be machine-readable. Please let's not sacrifice any of that for a minor reduction in confusion for human readers.
Jeffrey Jordan said:From the CB provided SAS programs, it appears the variables are named in the xxxe001 manner and not as xxx001e, but I could be mistaken.
Yes, you're correct, e.g.
/*SEX BY AGE (WHITE ALONE) *//*Universe: People who are White alone */ B01001Ae1='Total:'B01001Ae2='Male:'B01001Ae3='Under 5 years'B01001Ae4='5 to 9 years'B01001Ae5='10 to 14 years'B01001Ae6='15 to 17 years'B01001Ae7='18 and 19 years'
/*SEX BY AGE (WHITE ALONE) *//*Universe: People who are White alone */ B01001Am1='Total:'B01001Am2='Male:'B01001Am3='Under 5 years'B01001Am4='5 to 9 years'B01001Am5='10 to 14 years'B01001Am6='15 to 17 years'
Maybe even something like:
B01001_E_001
B01001_E_002
B01001_E_003
B01001_M_001
B01001_M_002
B01001_M_003
B01001A_E_001
B01001A_E_002
...would accommodate SAS users while still reducing confusion
I do like the underscore to separate the table id from the table item and I do prefer the table item padded with zeros. I could go either way with the second underscore between the E/M and the table item.
We should also identify any name length restrictions of any software packages users are using to work with the data and variable names that may exceed these limits. For example, I believe old DBF files had a 10-character field name restriction.