I'm looking for a resource or some ideas about handling the metadata that indent levels provide in ACS row names. The use of indents to represent different categories often creates repeated names for rows, under different parent categories.
Row names are often structured in a nested way, e.g. in table B01001, Sex by Age, the first several rows are
So Row 3 is named just named "Under 5 years" with an indent. That's the same as Row 27, under Female. The Excel table downloads and templates don't provide the names for parent categories in the data row - and categories can go at least a couple of layers deep. CSV's do the same, with multiple spacing instead of tabs. This issue can make working with individual rows from downloaded data tricky.
When you download csv files in a ZIP format, you do get metadata and data files with complete data column names like "Estimate!!Total:!!Male:!!Under 5 years", which can be converted into something legible. But that's an extra step for each distinct table you download. I don't know of a way to do it if you want to work with metadata for all tables.
Another inelegant solution would be to find a cell's indent level in files downloaded directly from the UI and infer that the previous indent level(s) represents the parent category/categories. That's doable in data analytic software, even in Excel with VBA, but relying on typography to infer data categories doesn't seem like the best idea.
So, does anyone know of a resource that would provide columns named something like "Male: Under 5 years" directly for a all tables, or a set of tables? Or to be complete, a name like "Total: Male: Under 5 years" might be better in general.
Does anyone have any better ideas or resources? Am I missing something?
Thanks -
Jon
That's weird, the indents are not in the Excel versions of the table shells lists. They were included in the text version of the file, which for some reason the Bureau has removed from their site.
Someone on the SCD email list just mentioned that all versions of the table-shells file are available on the FTP site at https://www2.census.gov/programs-surveys/acs/summary_file/2022/table-based-SF/documentation…
TidyCensus does a nice job of aggregating the various subparts with the load_variables function, for example
x <- load_variables(2022, "acs5")
You get output like