How to read the ACS data into SAS?

Qing Li over 4 years ago

Greetings! How to read the data from Table DP05 of the 2019 American Community Survey (ACS) 5-year estimates into SAS? When I deleted the second row on variable definition and read it into SAS, errors came (copied below). When I read all lines starting the third row, all numbers became characters. I reviewed the user groups but could not find a good solution.

I am new to ACS. I plan to run a multilevel analysis of 391 Latinx immigrants interviewed around 2019 to 2021 in Durham North Carolina. I want to generate a neighborhood concentrated disadvantage index using 2015–2019 5-year estimates of American Community Survey (ACS).

Thank you. Best wishes, Qing

NOTE: Invalid data for DP05_0019PE in line 1511 83-83.
NOTE: Invalid data for DP05_0019PM in line 1511 85-86.
NOTE: Invalid data for DP05_0038PE in line 1511 88-88.
NOTE: Invalid data for DP05_0038PM in line 1511 90-91.
NOTE: Invalid data for DP05_0071PE in line 1511 93-93.
NOTE: Invalid data for DP05_0071PM in line 1511 95-96.
WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed.
1511 1400000US37129990100,"Census Tract 9901, New Hanover County, North Carolina",0,12,-,**,-,**,-,** 96
VAR1=1400000US37129990100 NAME=Census Tract 9901, New Hanover County, North Carolina DP05_0001E=0 DP05_0001M=12 DP05_0019PE=.
DP05_0019PM=. DP05_0038PE=. DP05_0038PM=. DP05_0071PE=. DP05_0071PM=. _ERROR_=1 _N_=1510
NOTE: 2195 records were read from the infile 'C:\Rosa\Data\agerace1.csv'.
The minimum record length was 89.
The maximum record length was 115.
NOTE: The data set WORK.BB has 2195 observations and 10 variables.
NOTE: DATA statement used (Total process time):
real time 0.09 seconds
cpu time 0.07 seconds

Top Replies

Dave Stinchcomb over 4 years ago in reply to Qing Li +1

Hi Qing -

I think the problem is that some of the rows contain text annotation so that they cannot be read with a numeric informat like BEST12. Your approach could work (PROC IMPORT followed by a DATA…

Parents

Dave Stinchcomb over 4 years ago

I pre-edit the CSV file using a text editor. I do a set of global search and replace operations. For example, changing occurrences of ",**" to "," (change a comma-asterisk-asterisk to just a comma). SAS will then treat these as missing values.
Cancel
Up 0 Down

Reply

Cancel
Qing Li over 4 years ago in reply to Dave Stinchcomb

Thank you, David, Jamie, and other readers. I tried to use % of under 18, Black, and Hispanic from DP05.CVS. How to fix the errors below? Thank you. Best, Qing

PROC IMPORT OUT= agerace1
DATAFILE= "C:\DP05.csv"
DBMS=CSV REPLACE;
GETNAMES=YES;
DATAROW=3;
RUN;

13698 data agerace;

13699 set agerace1;

13700 Keep GEO   Under18 Black Hispanic;

13701 GEO=INPUT(SUBSTR (GEO_ID,10,11),11.);

13702 Under18=input(DP05_0019PE, BEST12.)/100;

13703 Black=input(DP05_0038PE, BEST12.)/100;

13704 Hispanic=input(DP05_0071PE, BEST12.)/100;

13705 run;

NOTE: Invalid argument to function INPUT at line 13702 column 9.

NOTE: Invalid argument to function INPUT at line 13703 column 7.

NOTE: Invalid argument to function INPUT at line 13704 column 10.

GEO_ID=1400000US37019990100 NAME=Census Tract 9901, Brunswick County, North Carolina DP05_0001E=0 DP05_0001M=12 DP05_0001PE=0 DP05_0001PM=(X) DP05_0002E=0 DP05_0002M=12 DP05_0002PE=- DP05_0002PM=** DP05_0003E=0 DP05_0003M=12 DP05_0003PE=- DP05_0003PM=**

. . . .DP05_0088PM=** DP05_0089E=0 DP05_0089M=12

DP05_0089PE=- DP05_0089PM=** GEO=37019990100 Under18=. Black=. Hispanic=. _ERROR_=1 _N_=117

OTE: Invalid argument to function INPUT at line 13702 column 9.

NOTE: Invalid argument to function INPUT at line 13703 column 7.

NOTE: Invalid argument to function INPUT at line 13704 column 10.

WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed.

GEO_ID=1400000US37133990100 NAME=Census Tract 9901, Onslow County, North Carolina DP05_0001E=0 DP05_0001M=12 DP05_0001PE=0

DP05_0001PM=(X) DP05_0002E=0 DP05_0002M=12 DP05_0002PE=- DP05_0002PM=** DP05_0003E=0 DP05_0003M=12 DP05_0003PE=- DP05_0003PM=**

DP05_0089PE=- DP05_0089PM=** GEO=37133990100 Under18=. Black=. Hispanic=. _ERROR_=1 _N_=1547

NOTE: Missing values were generated as a result of performing an operation on missing values.

      Each place is given by: (Number of times) at (Line):(Column).

      25 at 13702:36   25 at 13703:34   25 at 13704:37

NOTE: Mathematical operations could not be performed at the following places. The results of the operations have been set to missing values.

      Each place is given by: (Number of times) at (Line):(Column).

      25 at 13702:9    25 at 13703:7    25 at 13704:10

NOTE: There were 2195 observations read from the data set WORK.AGERACE1.

NOTE: The data set WORK.AGERACE has 2195 observations and 4 variables.

NOTE: DATA statement used (Total process time):

      real time           0.26 seconds

      cpu time            0.20 seconds
Cancel
Up 0 Down

Reply

Cancel

Glenn Rice over 4 years ago in reply to Qing Li

I don't think the INPUT statement works that way.

You'll need to read in the whole record from DP05, then do your math

INPUT
 GEO_ID $11
 NAME $60.
 DP05_0001E BEST12.
 DP05_0001M BEST12.
 DP05_0001PE BEST12.
 DP05_0001PM BEST12.
 DP05_0002E BEST12.
 DP05_0002M BEST12.
 DP05_0002PE BEST12.
 DP05_0002PM BEST12.
 [etc]
 ;
Under18= DP05_0019PE / 100;
Black = DP05_0038PE / 100;
Hispanic = DP05_0071PE / 100;
geoid = SUBSTR(GEO_ID,10,11);

Reply

Glenn Rice over 4 years ago in reply to Qing Li

I don't think the INPUT statement works that way.

You'll need to read in the whole record from DP05, then do your math

INPUT
 GEO_ID $11
 NAME $60.
 DP05_0001E BEST12.
 DP05_0001M BEST12.
 DP05_0001PE BEST12.
 DP05_0001PM BEST12.
 DP05_0002E BEST12.
 DP05_0002M BEST12.
 DP05_0002PE BEST12.
 DP05_0002PM BEST12.
 [etc]
 ;
Under18= DP05_0019PE / 100;
Black = DP05_0038PE / 100;
Hispanic = DP05_0071PE / 100;
geoid = SUBSTR(GEO_ID,10,11);

Children

Glenn Rice over 4 years ago in reply to Glenn Rice

My mistake, you're trying to use an INPUT function, not statement. @David's answer below should work.
Cancel
Up 0 Down

Reply

Cancel
Qing Li over 4 years ago in reply to Glenn Rice

Thank you David and Glenn. The verify function solved the problem. With your help, I programmed neighborhood measures of concentrated disadvantage and ethnic enclaves, which was a dichotomous variable with Latinx/Hispanic concentration at the 90th percentile or higher of the distribution at census tracts. Among 215 Latinx participants in 42 census tracts in a survey study in NC, 22 Latinx participants lived in ethnic enclaves in five census tracts with Latinx concentration (% of Hispanic) ≥ 29.2% in a multilevel analysis of intimate partner violence for a conference abstract due 8/31. I am very grateful to your expertise and help to me in this data users group. Best regards, Qing
Cancel
Up 0 Down

Reply

Cancel