Better stand-alone names for categories like age groups

jnigrine over 1 year ago

I'm looking for a resource or some ideas about handling the metadata that indent levels provide in ACS row names. The use of indents to represent different categories often creates repeated names for rows, under different parent categories.

Row names are often structured in a nested way, e.g. in table B01001, Sex by Age, the first several rows are

1	Total:
2	Male:
3	Under 5 years
4	5 to 9 years
5	10 to 14 years
6	15 to 17 years
7	18 and 19 years
8	20 years
9	21 years

So Row 3 is named just named "Under 5 years" with an indent. That's the same as Row 27, under Female. The Excel table downloads and templates don't provide the names for parent categories in the data row - and categories can go at least a couple of layers deep. CSV's do the same, with multiple spacing instead of tabs. This issue can make working with individual rows from downloaded data tricky.

When you download csv files in a ZIP format, you do get metadata and data files with complete data column names like "Estimate!!Total:!!Male:!!Under 5 years", which can be converted into something legible. But that's an extra step for each distinct table you download. I don't know of a way to do it if you want to work with metadata for all tables.

Another inelegant solution would be to find a cell's indent level in files downloaded directly from the UI and infer that the previous indent level(s) represents the parent category/categories. That's doable in data analytic software, even in Excel with VBA, but relying on typography to infer data categories doesn't seem like the best idea.

So, does anyone know of a resource that would provide columns named something like "Male: Under 5 years" directly for a all tables, or a set of tables? Or to be complete, a name like "Total: Male: Under 5 years" might be better in general.

Does anyone have any better ideas or resources? Am I missing something?

Thanks -

Jon

Top Replies

Parents

Jonathan Schroeder over 1 year ago
A couple other sources you could use:

IPUMS NHGIS ACS data files, which come with concatenated variable labels in metadata files ("codebooks") and in a descriptive header row (if you choose that option). You can request multiple tables in a single NHGIS file, so you wouldn't have do "an extra step for each distinct table you download." We also have an API you could use to get our metadata.

The Census Bureau's API for ACS includes endpoints for variable labels with concatenated categories in HTML, XML or JSON format. You can find links to these endpoints on any of the ACS API pages (e.g., 2022 5-year is here.)

FWIW, when we add the data to NHGIS, we generally try to use the API endpoints to get the concatenated labels, but unfortunately, we start processing the 5-year data during the 2-day embargo period before the public data release, and new endpoints aren't available until then. This year we got started on 2022 5-year processing by using a 2022 1-year variables list and adding in 2021 5-year labels for the 10 5-year tables that aren't in 1-year data.

Also: the API list is apparently randomly ordered, so it may take a little extra effort to select and order the labels for your variables of interest.
Cancel
Up 0 Down

Reply

Cancel

Reply

Jonathan Schroeder over 1 year ago
A couple other sources you could use:

IPUMS NHGIS ACS data files, which come with concatenated variable labels in metadata files ("codebooks") and in a descriptive header row (if you choose that option). You can request multiple tables in a single NHGIS file, so you wouldn't have do "an extra step for each distinct table you download." We also have an API you could use to get our metadata.

The Census Bureau's API for ACS includes endpoints for variable labels with concatenated categories in HTML, XML or JSON format. You can find links to these endpoints on any of the ACS API pages (e.g., 2022 5-year is here.)

FWIW, when we add the data to NHGIS, we generally try to use the API endpoints to get the concatenated labels, but unfortunately, we start processing the 5-year data during the 2-day embargo period before the public data release, and new endpoints aren't available until then. This year we got started on 2022 5-year processing by using a 2022 1-year variables list and adding in 2021 5-year labels for the 10 5-year tables that aren't in 1-year data.

Also: the API list is apparently randomly ordered, so it may take a little extra effort to select and order the labels for your variables of interest.
Cancel
Up 0 Down

Reply

Cancel

Children

No Data