Proposed changes to the ACS Summary File format

The ACS Office at the Census Bureau is currently testing a new format for the ACS Summary File, which is a comma-delimited text file that contains all the Detailed Tables for the ACS.  

Information about the proposed updates to the ACS Summary File are described on the Census Bureau's website. 

We are starting this new Discussion Thread so that ACS data users can post any comments or questions about the proposes changes. ACS Summary File users are also encouraged to participate in the webinar scheduled for this afternoon on this topic.

Parents
  • As someone who uses SAS to build datasets from the raw ACS data files and perform subsequent data analysis, I would strongly advise against naming the fields/variables with an "E' or "M" at the end of the name.  This would make it more difficult to use a range of variables in calculations, for example, when collapsing a table into broader categories like age groups, educational attainment, etc.  So instead of fields/variables formatted like this:

    B01001_001E
    B01001_001M
    B01001_002E
    B01001_002M
    B01001_003E
    B01001_003M

    I would suggest a naming convention more like this:

    B01001_E001
    B01001_M001
    B01001_E002
    B01001_M002
    B01001_E003
    B01001_M003

    Just my .02

Reply
  • As someone who uses SAS to build datasets from the raw ACS data files and perform subsequent data analysis, I would strongly advise against naming the fields/variables with an "E' or "M" at the end of the name.  This would make it more difficult to use a range of variables in calculations, for example, when collapsing a table into broader categories like age groups, educational attainment, etc.  So instead of fields/variables formatted like this:

    B01001_001E
    B01001_001M
    B01001_002E
    B01001_002M
    B01001_003E
    B01001_003M

    I would suggest a naming convention more like this:

    B01001_E001
    B01001_M001
    B01001_E002
    B01001_M002
    B01001_E003
    B01001_M003

    Just my .02

Children
  • Yes, yes, YES to this. I'm also a "power" SAS user; this change would really impact named ranges in SAS programs.

  • It'll be a bit confusing since some tables have letter suffixes (like B01001E_001). Someone could easily confuse B01001E_E001 with B01001_E001. Not to say it shouldn't be done, but something to consider. Also, it would kill continuity of column headers with previous years' data (which I assume will not be re-released in the new format).

    Also, I've been using this data since the beginning of ACS, and I still think "Error" and not "Estimate" every time I see that E. Am I the only one?

  • you're not the only one!

  • In reference to your statement,

    "Also, it would kill continuity of column headers with previous years' data (which I assume will not be re-released in the new format).",

    were the previous data ever released with the "E" or "M" appended to the end of the field/variable name?  We use custom SAS programs to build the SAS datasets from the raw ACS data (not the CB provided SAS programs).  I don't recall any of the previous data files having a header row with field/variable names.  From the CB provided SAS programs, it appears the variables are named in the xxxe001 manner and not as xxx001e, but I could be mistaken.

    Regardless, it looks like any change that includes both the estimates and MOEs in the same data will necessitate naming the variables in such a way that it may "break" continuity with previous data releases, unless the end-user built the datasets to account for that.

  • Good point; my bad. The E/M was made as a prefix to the data filenames and worksheet tab name in the data templates. So it will necessitate a change, as you say.

  • NO!

    were the previous data ever released with the "E" or "M" appended to the end of the field/variable name? 

    As for Bernie's comment -- Lots of us depend on these files to be machine-readable. Please let's not sacrifice any of that for a minor reduction in confusion for human readers.

  • From the CB provided SAS programs, it appears the variables are named in the xxxe001 manner and not as xxx001e, but I could be mistaken.

    Yes, you're correct, e.g.

    /*SEX BY AGE (WHITE ALONE) */
    /*Universe: People who are White alone */

    B01001Ae1='Total:'
    B01001Ae2='Male:'
    B01001Ae3='Under 5 years'
    B01001Ae4='5 to 9 years'
    B01001Ae5='10 to 14 years'
    B01001Ae6='15 to 17 years'
    B01001Ae7='18 and 19 years'

    /*SEX BY AGE (WHITE ALONE) */
    /*Universe: People who are White alone */

    B01001Am1='Total:'
    B01001Am2='Male:'
    B01001Am3='Under 5 years'
    B01001Am4='5 to 9 years'
    B01001Am5='10 to 14 years'
    B01001Am6='15 to 17 years'

  • Maybe even something like:

    B01001_E_001

    B01001_E_002

    B01001_E_003

    B01001_M_001

    B01001_M_002

    B01001_M_003

    B01001A_E_001

    B01001A_E_002

    ...would accommodate SAS users while still reducing confusion 

  • I do like the underscore to separate the table id from the table item and I do prefer the table item padded with zeros.  I could go either way with the second underscore between the E/M and the table item.

  • We should also identify any name length restrictions of any software packages users are using to work with the data and variable names that may exceed these limits.  For example, I believe old DBF files had a 10-character field name restriction.

  • Sure. I merely added the second underscore as a possible mitigation for the confusion problem Bernie mentioned, that would still be usable in SAS programs with only minor modification