How to read the ACS data into SAS?

Greetings! How to read the data from Table DP05 of the 2019 American Community Survey (ACS) 5-year estimates into SAS? When I deleted the second row on variable definition and read it into SAS, errors came (copied below). When I read all lines starting the third row, all numbers became characters. I reviewed the user groups but could not find a good solution. 

I am new to ACS. I plan to run a multilevel analysis of 391 Latinx immigrants interviewed around 2019 to 2021 in Durham North Carolina. I want to generate a neighborhood concentrated disadvantage index using 2015–2019 5-year estimates of American Community Survey (ACS).

Thank you. Best wishes, Qing

NOTE: Invalid data for DP05_0019PE in line 1511 83-83.
NOTE: Invalid data for DP05_0019PM in line 1511 85-86.
NOTE: Invalid data for DP05_0038PE in line 1511 88-88.
NOTE: Invalid data for DP05_0038PM in line 1511 90-91.
NOTE: Invalid data for DP05_0071PE in line 1511 93-93.
NOTE: Invalid data for DP05_0071PM in line 1511 95-96.
WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed.
1511 1400000US37129990100,"Census Tract 9901, New Hanover County, North Carolina",0,12,-,**,-,**,-,** 96
VAR1=1400000US37129990100 NAME=Census Tract 9901, New Hanover County, North Carolina DP05_0001E=0 DP05_0001M=12 DP05_0019PE=.
DP05_0019PM=. DP05_0038PE=. DP05_0038PM=. DP05_0071PE=. DP05_0071PM=. _ERROR_=1 _N_=1510
NOTE: 2195 records were read from the infile 'C:\Rosa\Data\agerace1.csv'.
The minimum record length was 89.
The maximum record length was 115.
NOTE: The data set WORK.BB has 2195 observations and 10 variables.
NOTE: DATA statement used (Total process time):
real time 0.09 seconds
cpu time 0.07 seconds

  • I'm not a SAS expert, but have you seen the SAS programs they provide for the ACS summary files here:

    www.census.gov/.../summary-file-documentation.html

  • Thank you! Trying. Best, 

  • I pre-edit the CSV file using a text editor.  I do a set of global search and replace operations.  For example, changing occurrences of ",**" to "," (change a comma-asterisk-asterisk to just a comma).  SAS will then treat these as missing values.

  •  Thank you, David, Jamie, and other readers. I tried to use % of under 18, Black, and Hispanic from DP05.CVS. How to fix the errors below? Thank you. Best, Qing

    PROC IMPORT OUT= agerace1
    DATAFILE= "C:\DP05.csv"
    DBMS=CSV REPLACE;
    GETNAMES=YES;
    DATAROW=3;
    RUN;

     

    13698  data agerace;

    13699  set agerace1;

    13700  Keep GEO   Under18 Black Hispanic;

    13701  GEO=INPUT(SUBSTR (GEO_ID,10,11),11.);

    13702  Under18=input(DP05_0019PE, BEST12.)/100;

    13703  Black=input(DP05_0038PE, BEST12.)/100;

    13704  Hispanic=input(DP05_0071PE, BEST12.)/100;

    13705  run;

     

    NOTE: Invalid argument to function INPUT at line 13702 column 9.

    NOTE: Invalid argument to function INPUT at line 13703 column 7.

    NOTE: Invalid argument to function INPUT at line 13704 column 10.

    GEO_ID=1400000US37019990100 NAME=Census Tract 9901, Brunswick County, North Carolina DP05_0001E=0 DP05_0001M=12 DP05_0001PE=0 DP05_0001PM=(X) DP05_0002E=0 DP05_0002M=12 DP05_0002PE=- DP05_0002PM=** DP05_0003E=0 DP05_0003M=12 DP05_0003PE=- DP05_0003PM=**

    . . . .DP05_0088PM=** DP05_0089E=0 DP05_0089M=12

    DP05_0089PE=- DP05_0089PM=** GEO=37019990100 Under18=. Black=. Hispanic=. _ERROR_=1 _N_=117

     

    OTE: Invalid argument to function INPUT at line 13702 column 9.

    NOTE: Invalid argument to function INPUT at line 13703 column 7.

    NOTE: Invalid argument to function INPUT at line 13704 column 10.

    WARNING: Limit set by ERRORS= option reached.  Further errors of this type will not be printed.

    GEO_ID=1400000US37133990100 NAME=Census Tract 9901, Onslow County, North Carolina DP05_0001E=0 DP05_0001M=12 DP05_0001PE=0

    DP05_0001PM=(X) DP05_0002E=0 DP05_0002M=12 DP05_0002PE=- DP05_0002PM=** DP05_0003E=0 DP05_0003M=12 DP05_0003PE=- DP05_0003PM=**

    DP05_0089PE=- DP05_0089PM=** GEO=37133990100 Under18=. Black=. Hispanic=. _ERROR_=1 _N_=1547

    NOTE: Missing values were generated as a result of performing an operation on missing values.

          Each place is given by: (Number of times) at (Line):(Column).

          25 at 13702:36   25 at 13703:34   25 at 13704:37

    NOTE: Mathematical operations could not be performed at the following places. The results of the operations have been set to missing values.

          Each place is given by: (Number of times) at (Line):(Column).

          25 at 13702:9    25 at 13703:7    25 at 13704:10

    NOTE: There were 2195 observations read from the data set WORK.AGERACE1.

    NOTE: The data set WORK.AGERACE has 2195 observations and 4 variables.

    NOTE: DATA statement used (Total process time):

          real time           0.26 seconds

          cpu time            0.20 seconds

  • I don't think the INPUT statement works that way.

    You'll need to read in the whole record from DP05, then do your math

    INPUT
    GEO_ID $11
    NAME $60.
    DP05_0001E BEST12.
    DP05_0001M BEST12.
    DP05_0001PE BEST12.
    DP05_0001PM BEST12.
    DP05_0002E BEST12.
    DP05_0002M BEST12.
    DP05_0002PE BEST12.
    DP05_0002PM BEST12.
    [etc]
    ;
    Under18= DP05_0019PE / 100;
    Black = DP05_0038PE / 100;
    Hispanic = DP05_0071PE / 100;
    geoid = SUBSTR(GEO_ID,10,11);
  • Hi Qing -

    I think the problem is that some of the rows contain text annotation so that they cannot be read with a numeric informat like BEST12.  Your approach could work (PROC IMPORT followed by a DATA step).  However, you will need to check to be sure the value is a number before doing the conversion from text to number.  One way to check is to use the verify function:

    if verify(DP05_0019PE,'0123456789. ') = 0 then
        Under18=input(DP05_0019PE, BEST12.)/100;
    if verify(DP05_0038PE,'0123456789. ') = 0 then
        Black=input(DP05_0038PE, BEST12.)/100;
    if verify(DP05_0071PE,'0123456789. ') = 0 then
        Hispanic=input(DP05_0071PE, BEST12.)/100;

    I hope this help!

    -- Dave

  • My mistake, you're trying to use an INPUT function, not statement. @David's answer below should work.

  • Thank you David and Glenn. The verify function solved the problem. With your help, I programmed neighborhood measures of concentrated disadvantage and ethnic enclaves, which was a dichotomous variable with Latinx/Hispanic concentration at the 90th percentile or higher of the distribution at census tracts. Among 215 Latinx participants in 42 census tracts in a survey study in NC, 22 Latinx participants lived in ethnic enclaves in five census tracts with Latinx concentration (% of Hispanic) ≥ 29.2% in a multilevel analysis of intimate partner violence for a conference abstract due 8/31. I am very grateful to your expertise and help to me in this data users group. Best regards, Qing