Hello, I am interested in joining person and household data using the SERIALNO variable from the 2019 5-year ACS. Here is my code for this action using RStudio Version 1.4.1106:
# Load Person and Household data for 2019 ACS
pums2019pop <- read.csv("C:\\psam_p48.csv")pums2019hsg <- read.csv("C:\\psam_h48.csv")
# Join person and household data using SERIALNOpums2019 <- inner_join(pums2017pop, pums2017hsg, by = c("SERIALNO"))
Once I have completed this action, I check the output for pums2019 and immediately notice that some variables such as PUMA, DIVISION, and ADJINC, have ".x" appended to the end of their names so that they now appear as: "PUMA.x", "DIVISION.x", and "ADJINC.x".
While I can work with the file without issue, I just have to spell out the variable name with the appended ".x", I am wondering if there is anyway to keep this from happening in the first place?
Any help would be appreciated.
Thank you.
Hi - this is happening because the housing and person records have some variables in common, and `dplyr::inner_join()` will not overwrite values in one dataset with values in another dataset.
Two simple ways to handle this are:
Good luck!