Better stand-alone names for categories like age groups

I'm looking for a resource or some ideas about handling the metadata that indent levels provide in ACS row names.  The use of indents to represent different categories often creates repeated names for rows, under different parent categories.

Row names are often structured in a nested way, e.g. in table B01001, Sex by Age, the first several rows are

1 Total: 
2 Male: 
3 Under 5 years 
4 5 to 9 years 
5 10 to 14 years 
6 15 to 17 years 
7 18 and 19 years 
8 20 years 
9 21 years 

So Row 3 is named just named "Under 5 years" with an indent. That's the same as Row 27, under Female. The Excel table downloads and templates don't provide the names for parent categories in the data row - and categories can go at least a couple of layers deep. CSV's do the same, with multiple spacing instead of tabs.  This issue can make working with individual rows from downloaded data tricky.

When you download csv files in a ZIP format, you do get metadata and data files with complete data column names like "Estimate!!Total:!!Male:!!Under 5 years", which can be converted into something legible. But that's an extra step for each distinct table you download.  I don't know of a way to do it if you want to work with metadata for all tables.

Another inelegant solution would be to find a cell's indent level in files downloaded directly from the UI and infer that the previous indent level(s) represents the parent category/categories. That's doable in data analytic software, even in Excel with VBA, but relying on typography to infer data categories doesn't seem like the best idea.

So, does anyone know of a resource that would provide columns named something like "Male: Under 5 years" directly for a all tables, or a set of tables?  Or to be complete, a name like "Total: Male: Under 5 years" might be better in general.

Does anyone have any better ideas or resources?  Am I missing something? 

Thanks -

Jon

Parents
  • TidyCensus does a nice job of aggregating the various subparts with the load_variables function, for example

    x <- load_variables(2022, "acs5")

    You get output like

    name label
    B01001A_001 Estimate!!Total:
    B01001A_002 Estimate!!Total:!!Male:
    B01001A_003 Estimate!!Total:!!Male:!!Under 5 years
    B01001A_004 Estimate!!Total:!!Male:!!5 to 9 years
    B01001A_005 Estimate!!Total:!!Male:!!10 to 14 years
Reply
  • TidyCensus does a nice job of aggregating the various subparts with the load_variables function, for example

    x <- load_variables(2022, "acs5")

    You get output like

    name label
    B01001A_001 Estimate!!Total:
    B01001A_002 Estimate!!Total:!!Male:
    B01001A_003 Estimate!!Total:!!Male:!!Under 5 years
    B01001A_004 Estimate!!Total:!!Male:!!5 to 9 years
    B01001A_005 Estimate!!Total:!!Male:!!10 to 14 years
Children
No Data