Clean data downloads?

I'm a non-profit researcher and policy analyst and I feel like I'm constantly banging my head against a wall to download ACS data in a usable format. I have decent statistical analysis and data analysis chops but I am not a coder.

For example -- I'm currently trying to download S2701, Selected Characteristics of Health Insurance Coverage in the US. I just want to produce an excel spreadsheet where I can quickly reference the percent of residents in a certain age group who are uninsured, by county. But I cannot figure out how to download the excel data in a way that the spreadsheet is usable. It produces really gnarly tables with tons of merged cells, and it's impossible to use that format of table and then sort the data or use it for a vlookup. I need to pull the 254 counties in my state which is too much to go through manually.

I want to produce a table where I can have the counties in column A and the values in column B. Maybe margin of error in column C. Instead I have the county name every 10th cell down in column A.

Is there a fix that doesn't involve me writing macros or coding to get usable data?

  • I'd take a look at the API. You can just specify which variables you want at a specific geographic level by just altering a URL.

    Here are some examples for ACS 5-year 2020: Census Data API: /data/2020/acs/acs5/subject/examples.

  • This link:

    ...will give you geoids and counties (or whatever your selected geography is) in columns A & B and estimates, MOEs and annotations in the other columns. 

  • Can you get geography level by city (not MSA) using API?  If not, can you get city geography using MDAT?  Just curious because I need to download all the cities in the state of Florida and only need one variable for an easy formatted download.

  • You're asking me? I don't know. I don't use the API.

  • That's what I thought too but had to double check.  

    Thank you!

  • I have three suggestions.

    1. For your immediate problem of health insurance info for Texas (the only state with 254 counties) use the SAHIE interactive data tool. That's from a separate Census program aimed specifically at health insurance. The table underneath the map should give you exactly what you want. 

    SAHIE (census.gov)

    2. Use Excel's PowerQuery (also known as get and transform data) subsystem to format the spreadsheet the way you wanted. I'm guessing (could be wrong, you didn't say) that you entered S2701 into data.census.gov, set the geography to counties, picked Texas, and then clicked the transpose button to get the counties into columns and that messed up the formatting. I'm suggesting you download the table as Excel without the transpose, open it in Excel PowerQuery, delete the headers that would break the formatting there, and then use PowerQuery for the transpose. FWIW is a fantastic tool for cleaning up spreadsheets and there are lots of short tutorials on Youtube and the Internet on how to use it. Also good for doing SQL-type joins through a GUI without any coding.

    3. Use one of the Census API packages in R or Python. I am very fond of tidycensus in R which can produce a dataframe that contains both data and map info (latter optional). You can then export that dataframe as Excel or CSV. That only involves a wee bit of coding (no algorithms). It's really well documented at tidycensus (walker-data.com)

  • SAHIE is an excellent suggestion.

    EDIT: All are good suggestions!

  • OK, I do things a little differently, and will someday use R etc. I have excel sheets for each table and fields I want to pull down. For S2701 (which is a good choice) I have a list of all the fields (from Census data, advanced, table download - Col Metadata file) and then I have a little formula. So I just put the field names in rows (up to 49 names) and the formula creates the API for those fields for all counties, all states and the US, it takes about 1 minute. For example; (all counties)

    https://api.census.gov/data/2021/acs/acs5/subject?get=S2701_C01_001E,S2701_C01_017E,S2701_C01_023E,S2701_C01_024E,S2701_C01_032E,S2701_C01_035E,S2701_C05_001E,S2701_C05_002E,S2701_C05_011E,NAME&for=county:*

    S2701_C01_001E Estimate!!Total!!Civilian noninstitutionalized population
    S2701_C01_017E Estimate!!Total!!Civilian noninstitutionalized population!!RACE AND HISPANIC OR LATINO ORIGIN!!Black or African American alone
    S2701_C01_023E Estimate!!Total!!Civilian noninstitutionalized population!!RACE AND HISPANIC OR LATINO ORIGIN!!Hispanic or Latino (of any race)
    S2701_C01_024E Estimate!!Total!!Civilian noninstitutionalized population!!RACE AND HISPANIC OR LATINO ORIGIN!!White alone, not Hispanic or Latino
    S2701_C01_032E Estimate!!Total!!Civilian noninstitutionalized population!!NATIVITY AND U.S. CITIZENSHIP STATUS!!Foreign born
    S2701_C01_035E Estimate!!Total!!Civilian noninstitutionalized population!!DISABILITY STATUS!!With a disability
    S2701_C05_001E Estimate!!Percent Uninsured!!Civilian noninstitutionalized population
    S2701_C05_002E Estimate!!Percent Uninsured!!Civilian noninstitutionalized population!!AGE!!Under 6 years
    S2701_C05_011E Estimate!!Percent Uninsured!!Civilian noninstitutionalized population!!AGE!!Under 19 years

    I can send you the file, you can use it for any table it takes 5 minutes to get the metadata file copy it into the sheet and then choose which fields you want.  There are 305 fields in S2701, otherwise, I'd just paste the API for all fields here. I think for short pulls (no heavy lifting) this works great. For heavy lifting tidycensus and other solutions that this group knows in their sleep are best.

    Again, I can send you the file, let me know and good luck.

    Tom