Greetings! How to read the data from Table DP05 of the 2019 American Community Survey (ACS) 5-year estimates into SAS? When I deleted the second row on variable definition and read it into SAS, errors came (copied below). When I read all lines starting the third row, all numbers became characters. I reviewed the user groups but could not find a good solution.
I am new to ACS. I plan to run a multilevel analysis of 391 Latinx immigrants interviewed around 2019 to 2021 in Durham North Carolina. I want to generate a neighborhood concentrated disadvantage index using 2015–2019 5-year estimates of American Community Survey (ACS).
Thank you. Best wishes, Qing
NOTE: Invalid data for DP05_0019PE in line 1511 83-83.NOTE: Invalid data for DP05_0019PM in line 1511 85-86.NOTE: Invalid data for DP05_0038PE in line 1511 88-88.NOTE: Invalid data for DP05_0038PM in line 1511 90-91.NOTE: Invalid data for DP05_0071PE in line 1511 93-93.NOTE: Invalid data for DP05_0071PM in line 1511 95-96.WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed.1511 1400000US37129990100,"Census Tract 9901, New Hanover County, North Carolina",0,12,-,**,-,**,-,** 96VAR1=1400000US37129990100 NAME=Census Tract 9901, New Hanover County, North Carolina DP05_0001E=0 DP05_0001M=12 DP05_0019PE=.DP05_0019PM=. DP05_0038PE=. DP05_0038PM=. DP05_0071PE=. DP05_0071PM=. _ERROR_=1 _N_=1510NOTE: 2195 records were read from the infile 'C:\Rosa\Data\agerace1.csv'. The minimum record length was 89. The maximum record length was 115.NOTE: The data set WORK.BB has 2195 observations and 10 variables.NOTE: DATA statement used (Total process time): real time 0.09 seconds cpu time 0.07 seconds
Hi Qing -
I think the problem is that some of the rows contain text annotation so that they cannot be read with a numeric informat like BEST12. Your approach could work (PROC IMPORT followed by a DATA…
I pre-edit the CSV file using a text editor. I do a set of global search and replace operations. For example, changing occurrences of ",**" to "," (change a comma-asterisk-asterisk to just a comma). SAS will then treat these as missing values.
Thank you, David, Jamie, and other readers. I tried to use % of under 18, Black, and Hispanic from DP05.CVS. How to fix the errors below? Thank you. Best, Qing
PROC IMPORT OUT= agerace1 DATAFILE= "C:\DP05.csv" DBMS=CSV REPLACE; GETNAMES=YES; DATAROW=3;RUN;
13698 data agerace;
13699 set agerace1;
13700 Keep GEO Under18 Black Hispanic;
13701 GEO=INPUT(SUBSTR (GEO_ID,10,11),11.);
13702 Under18=input(DP05_0019PE, BEST12.)/100;
13703 Black=input(DP05_0038PE, BEST12.)/100;
13704 Hispanic=input(DP05_0071PE, BEST12.)/100;
13705 run;
NOTE: Invalid argument to function INPUT at line 13702 column 9.
NOTE: Invalid argument to function INPUT at line 13703 column 7.
NOTE: Invalid argument to function INPUT at line 13704 column 10.
GEO_ID=1400000US37019990100 NAME=Census Tract 9901, Brunswick County, North Carolina DP05_0001E=0 DP05_0001M=12 DP05_0001PE=0 DP05_0001PM=(X) DP05_0002E=0 DP05_0002M=12 DP05_0002PE=- DP05_0002PM=** DP05_0003E=0 DP05_0003M=12 DP05_0003PE=- DP05_0003PM=**
. . . .DP05_0088PM=** DP05_0089E=0 DP05_0089M=12
DP05_0089PE=- DP05_0089PM=** GEO=37019990100 Under18=. Black=. Hispanic=. _ERROR_=1 _N_=117
OTE: Invalid argument to function INPUT at line 13702 column 9.
WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed.
GEO_ID=1400000US37133990100 NAME=Census Tract 9901, Onslow County, North Carolina DP05_0001E=0 DP05_0001M=12 DP05_0001PE=0
DP05_0001PM=(X) DP05_0002E=0 DP05_0002M=12 DP05_0002PE=- DP05_0002PM=** DP05_0003E=0 DP05_0003M=12 DP05_0003PE=- DP05_0003PM=**
DP05_0089PE=- DP05_0089PM=** GEO=37133990100 Under18=. Black=. Hispanic=. _ERROR_=1 _N_=1547
NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
25 at 13702:36 25 at 13703:34 25 at 13704:37
NOTE: Mathematical operations could not be performed at the following places. The results of the operations have been set to missing values.
25 at 13702:9 25 at 13703:7 25 at 13704:10
NOTE: There were 2195 observations read from the data set WORK.AGERACE1.
NOTE: The data set WORK.AGERACE has 2195 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.26 seconds
cpu time 0.20 seconds
I don't think the INPUT statement works that way.
You'll need to read in the whole record from DP05, then do your math
INPUT GEO_ID $11 NAME $60. DP05_0001E BEST12. DP05_0001M BEST12. DP05_0001PE BEST12. DP05_0001PM BEST12. DP05_0002E BEST12. DP05_0002M BEST12. DP05_0002PE BEST12. DP05_0002PM BEST12. [etc] ;Under18= DP05_0019PE / 100;Black = DP05_0038PE / 100;Hispanic = DP05_0071PE / 100;geoid = SUBSTR(GEO_ID,10,11);
My mistake, you're trying to use an INPUT function, not statement. @David's answer below should work.
Thank you David and Glenn. The verify function solved the problem. With your help, I programmed neighborhood measures of concentrated disadvantage and ethnic enclaves, which was a dichotomous variable with Latinx/Hispanic concentration at the 90th percentile or higher of the distribution at census tracts. Among 215 Latinx participants in 42 census tracts in a survey study in NC, 22 Latinx participants lived in ethnic enclaves in five census tracts with Latinx concentration (% of Hispanic) ≥ 29.2% in a multilevel analysis of intimate partner violence for a conference abstract due 8/31. I am very grateful to your expertise and help to me in this data users group. Best regards, Qing