ACS 2013 5-year data available in MS Access 2007

Hi all,

I will be importing the 5year estimate summary files for Oregon, Arizona, and Washington D.C. into an Access database (2007 version). I've set up a macro to run the import routine so it wouldn't be too difficult to do additional states. The files are large so, as you can imagine, it actually takes multiple Access files to import them all. Ideally these should be used on a SQL server, though I currently don't have access to SQL so this is my method for now.

If anyone would like the Access files for any state, let me know and I can import it for you or send you documentation on how to do it for yourself. I will need to test my import routine when the data are released on Thursday so I don't want to give out the documentation files until I know this works properly. I did this for the 2012 5year estimates, so I know it works, but I've had to make adjustments due to changes in the Sequence Files so I want to make sure I got the adjustments correct before I distribute it.
Parents
  • Here are the 5 main scripts I use to import flat Bureau data files, combine the seq files and populate a MySQL database. Some caveats:

    1. These were written for our processing environment and include some references to custom Python libraries that you won't have. Chief among these are a database helper library. You would need to replace any references to this with your own DB library, or add code directly to the scripts for DB authentication, open, and query.

    2. MySQL has limitations on the number of columns you can have in any table, as well as a row size limit of 64k. I found the practical limit to be around 1000 columns. Therefore, you can't have a single giant MySQL table with all of the ACS variables. My scripts break up the ACS data into multiple MySQL tables (e.g. T_00_05 has columns for all variables in ACS tables B00001 through B05999).

    3. import_basetables.py uses Parallel Python (PP) for parallel processing of state data. The more CPUs you can use, the faster. It also requires running a webserver (start_webserver.py) on localhost to serve up the state files as needed by the parallel workers.

    merge_state_seqs.py is a helper script to combine the paired sequence files used in the 5-year data releases. That is, it will combine the "all geographies except tracts/BGs" with the "tracts and block groups only" sequence files for each state, by seq number and type (estimates/MOEs).

    I don't worry about data types. All data columns start out as floats, and I go back later and use PROCEDURE ANALYSE to retroactively determine the best data type for each column. (That's what optimize_basetables.py does.)



Reply
  • Here are the 5 main scripts I use to import flat Bureau data files, combine the seq files and populate a MySQL database. Some caveats:

    1. These were written for our processing environment and include some references to custom Python libraries that you won't have. Chief among these are a database helper library. You would need to replace any references to this with your own DB library, or add code directly to the scripts for DB authentication, open, and query.

    2. MySQL has limitations on the number of columns you can have in any table, as well as a row size limit of 64k. I found the practical limit to be around 1000 columns. Therefore, you can't have a single giant MySQL table with all of the ACS variables. My scripts break up the ACS data into multiple MySQL tables (e.g. T_00_05 has columns for all variables in ACS tables B00001 through B05999).

    3. import_basetables.py uses Parallel Python (PP) for parallel processing of state data. The more CPUs you can use, the faster. It also requires running a webserver (start_webserver.py) on localhost to serve up the state files as needed by the parallel workers.

    merge_state_seqs.py is a helper script to combine the paired sequence files used in the 5-year data releases. That is, it will combine the "all geographies except tracts/BGs" with the "tracts and block groups only" sequence files for each state, by seq number and type (estimates/MOEs).

    I don't worry about data types. All data columns start out as floats, and I go back later and use PROCEDURE ANALYSE to retroactively determine the best data type for each column. (That's what optimize_basetables.py does.)



Children
No Data