For the latest 5-year ACS estimates, I noticed a discrepancy in the values for MEDIAN YEAR STRUCTURE BUILT BY TENURE (B25037_001) between the 5-year state summary file records and the matching data on data.census.gov.
For example, Lincoln County, Kansas (FIPS: 105) shows a median year of “0” in the summary file (All_Geographies_Not_Tracts_Block_Groups\Kansas_All_Geographies_Not_Tracts_Block_Groups\e20195ks0113000.txt) but a median year of "1939-" online (https://data.census.gov/cedsci/table?q=median%20year%20house%20built&g=0500000US20105,36047&tid=ACSDT5Y2019.B25037&hidePreview=true).
This pattern appears to be consistent for 2014-2018 and 2015-2019 ACS data, with values of "0" in the summary files being matched by "1939-" entries on the website. Since there are also no values of '1939' in either of those summary files I am guessing that the zero entries represent median years that are 1939 or older. However, I’ve reached out to the Census to confirm this.
Is anyone else finding similar issues with the summary file and website values not matching?
Yes, "jam values" such as the top end and bottom end of these types of year categories are different in the summary file data / API data, vs. data.census.gov. Data.census.gov has the true va…
What's weird is the ACS documentation itself says the jam value for age of housing is 1939 (not 0). 1939 doesn't mean 1939, it means older than 1940. Are the jam values working correctly for other median…
Yes, "jam values" such as the top end and bottom end of these types of year categories are different in the summary file data / API data, vs. data.census.gov. Data.census.gov has the true value - 1939 and older, in this case - whereas the other formats need to keep it as numeric rather than string. Here's the logic of how I handled the jam values in the summary file and API formats: (If I've missed something, I'd love to know!)
define new string variable = B25037_calc_txt001E
if B25037_001E == 0, then return "Before 1939"
else if B25037 == 19, then return "2014 or later"
else, return B25037_001E
What's weird is the ACS documentation itself says the jam value for age of housing is 1939 (not 0). 1939 doesn't mean 1939, it means older than 1940. Are the jam values working correctly for other median variables?
Thanks Diana, it's really helpful to know how you handled this! I'm hesitant to assume that "0" is the jam value marker since it's not officially listed, but I think you're right that the jam value simply changed starting in 2014-2018.
Sorry, another point to make here is that the display at data.census.gov is very misleading. When it says "1939-", this could easily be interpreted to mean "1939-present", meaning it's a maximum jam value. If it said "2013-", this is exactly what you'd think it means. I assume the dash is in fact meant as a minus sign, like, 1939 negative (the counterpart would be "2013+"), but there's no legend to explain this.
Take a look at these three cases, I was able to match them with my logic above using the API data. The value for the DC tract came in as zero, the value for the Queens, NY tract came in as null, and the value for the Fort Bend Texas tract came in as 18 (this is 2018 5-yr data, use 19 with the most recent data).