Greetings! How to read the data from Table DP05 of the 2019 American Community Survey (ACS) 5-year estimates into SAS? When I deleted the second row on variable definition and read it into SAS, errors came (copied below). When I read all lines starting the third row, all numbers became characters. I reviewed the user groups but could not find a good solution.
I am new to ACS. I plan to run a multilevel analysis of 391 Latinx immigrants interviewed around 2019 to 2021 in Durham North Carolina. I want to generate a neighborhood concentrated disadvantage index using 2015–2019 5-year estimates of American Community Survey (ACS).
Thank you. Best wishes, Qing
NOTE: Invalid data for DP05_0019PE in line 1511 83-83.NOTE: Invalid data for DP05_0019PM in line 1511 85-86.NOTE: Invalid data for DP05_0038PE in line 1511 88-88.NOTE: Invalid data for DP05_0038PM in line 1511 90-91.NOTE: Invalid data for DP05_0071PE in line 1511 93-93.NOTE: Invalid data for DP05_0071PM in line 1511 95-96.WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed.1511 1400000US37129990100,"Census Tract 9901, New Hanover County, North Carolina",0,12,-,**,-,**,-,** 96VAR1=1400000US37129990100 NAME=Census Tract 9901, New Hanover County, North Carolina DP05_0001E=0 DP05_0001M=12 DP05_0019PE=.DP05_0019PM=. DP05_0038PE=. DP05_0038PM=. DP05_0071PE=. DP05_0071PM=. _ERROR_=1 _N_=1510NOTE: 2195 records were read from the infile 'C:\Rosa\Data\agerace1.csv'. The minimum record length was 89. The maximum record length was 115.NOTE: The data set WORK.BB has 2195 observations and 10 variables.NOTE: DATA statement used (Total process time): real time 0.09 seconds cpu time 0.07 seconds
Hi Qing -
I think the problem is that some of the rows contain text annotation so that they cannot be read with a numeric informat like BEST12. Your approach could work (PROC IMPORT followed by a DATA…
I pre-edit the CSV file using a text editor. I do a set of global search and replace operations. For example, changing occurrences of ",**" to "," (change a comma-asterisk-asterisk to just a comma). SAS will then treat these as missing values.
Thank you, David, Jamie, and other readers. I tried to use % of under 18, Black, and Hispanic from DP05.CVS. How to fix the errors below? Thank you. Best, Qing
PROC IMPORT OUT= agerace1 DATAFILE= "C:\DP05.csv" DBMS=CSV REPLACE; GETNAMES=YES; DATAROW=3;RUN;
13698 data agerace;
13699 set agerace1;
13700 Keep GEO Under18 Black Hispanic;
13701 GEO=INPUT(SUBSTR (GEO_ID,10,11),11.);
13702 Under18=input(DP05_0019PE, BEST12.)/100;
13703 Black=input(DP05_0038PE, BEST12.)/100;
13704 Hispanic=input(DP05_0071PE, BEST12.)/100;
13705 run;
NOTE: Invalid argument to function INPUT at line 13702 column 9.
NOTE: Invalid argument to function INPUT at line 13703 column 7.
NOTE: Invalid argument to function INPUT at line 13704 column 10.
GEO_ID=1400000US37019990100 NAME=Census Tract 9901, Brunswick County, North Carolina DP05_0001E=0 DP05_0001M=12 DP05_0001PE=0 DP05_0001PM=(X) DP05_0002E=0 DP05_0002M=12 DP05_0002PE=- DP05_0002PM=** DP05_0003E=0 DP05_0003M=12 DP05_0003PE=- DP05_0003PM=**
. . . .DP05_0088PM=** DP05_0089E=0 DP05_0089M=12
DP05_0089PE=- DP05_0089PM=** GEO=37019990100 Under18=. Black=. Hispanic=. _ERROR_=1 _N_=117
OTE: Invalid argument to function INPUT at line 13702 column 9.
WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed.
GEO_ID=1400000US37133990100 NAME=Census Tract 9901, Onslow County, North Carolina DP05_0001E=0 DP05_0001M=12 DP05_0001PE=0
DP05_0001PM=(X) DP05_0002E=0 DP05_0002M=12 DP05_0002PE=- DP05_0002PM=** DP05_0003E=0 DP05_0003M=12 DP05_0003PE=- DP05_0003PM=**
DP05_0089PE=- DP05_0089PM=** GEO=37133990100 Under18=. Black=. Hispanic=. _ERROR_=1 _N_=1547
NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
25 at 13702:36 25 at 13703:34 25 at 13704:37
NOTE: Mathematical operations could not be performed at the following places. The results of the operations have been set to missing values.
25 at 13702:9 25 at 13703:7 25 at 13704:10
NOTE: There were 2195 observations read from the data set WORK.AGERACE1.
NOTE: The data set WORK.AGERACE has 2195 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.26 seconds
cpu time 0.20 seconds
I don't think the INPUT statement works that way.
You'll need to read in the whole record from DP05, then do your math
INPUT GEO_ID $11 NAME $60. DP05_0001E BEST12. DP05_0001M BEST12. DP05_0001PE BEST12. DP05_0001PM BEST12. DP05_0002E BEST12. DP05_0002M BEST12. DP05_0002PE BEST12. DP05_0002PM BEST12. [etc] ;Under18= DP05_0019PE / 100;Black = DP05_0038PE / 100;Hispanic = DP05_0071PE / 100;geoid = SUBSTR(GEO_ID,10,11);
I think the problem is that some of the rows contain text annotation so that they cannot be read with a numeric informat like BEST12. Your approach could work (PROC IMPORT followed by a DATA step). However, you will need to check to be sure the value is a number before doing the conversion from text to number. One way to check is to use the verify function:
if verify(DP05_0019PE,'0123456789. ') = 0 then Under18=input(DP05_0019PE, BEST12.)/100;if verify(DP05_0038PE,'0123456789. ') = 0 then Black=input(DP05_0038PE, BEST12.)/100;if verify(DP05_0071PE,'0123456789. ') = 0 then Hispanic=input(DP05_0071PE, BEST12.)/100;
I hope this help!
-- Dave