I want to recode values in one dataset based on values in another dataset. My overall goal is to applyrecode across multiple columns of the dataframe.
Data:
df <- data.frame( gender=c(1,2,1,2), condition=c(1,1,2,2))df gender condition1 1 12 2 13 1 24 2 2Other dataset:
codes <- data.frame( gender_values= c("`1`='male', `2`='female'"), condition_values = c("`1`='exp', `2`='control'"))codes gender_values condition_values1 `1`='male', `2`='female' `1`='exp', `2`='control'Attempt:
df %>% dplyr::mutate( gender= dplyr::recode(gender, cat(noquote(codes[1,"gender_values"])), .default = NA_character_))`1`='male', `2`='female' gender condition1 <NA> 12 <NA> 13 <NA> 24 <NA> 2Wanted:
gender condition1 male exp2 female exp3 male control4 female control- In your
codes, you havec("1='male',2='female'")which is one long, single string. Was this intentional, or is this supposed to be two elements (1="male"and2="female")? Something like,c("1='male'", "2='female'")?jpsmith– jpsmith2023-01-18 14:17:27 +00:00CommentedJan 18, 2023 at 14:17 - You sure that codes has this weird form? Why not log form (i.e. variable_name, variable_vaule, variable_text? Or wide form? Most straight forward solution would be to use a join to bind the "recoded" values to the numbers...dario– dario2023-01-18 14:18:21 +00:00CommentedJan 18, 2023 at 14:18
- 1Does this answer your question?Recoding values in second data frame based on values in a different data framedario– dario2023-01-18 14:19:08 +00:00CommentedJan 18, 2023 at 14:19
- @jpsmith I intended a long string so recode can use the values. For some reason, recode is not using the values in the same way as recode(gender,
1="male"...).EML– EML2023-01-18 14:28:47 +00:00CommentedJan 18, 2023 at 14:28 - @dario I believe it does not because the value labels for my dataset are in one cell for each variable.EML– EML2023-01-18 15:31:40 +00:00CommentedJan 18, 2023 at 15:31
1 Answer1
If you want to usedplyr::recode, you can exploit the splice operator!!! to help out here and evaluate the values incodes. It helps if you simplify yourcodes data:
codes <- data.frame( gender_values = c("male", "female"), condition_values = c("exp", "control"))Then, for example, on a single column you can do:
dplyr::recode(df$gender, !!!codes$gender_values)# [1] "male" "female" "male" "female"One way to apply it across columns given your example data is to usesapply:
sapply(names(df), function(x) dplyr::recode(df[,x], !!!codes[,paste0(x, "_values")]))# gender condition# [1,] "male" "exp" # [2,] "female" "exp" # [3,] "male" "control"# [4,] "female" "control"(Note this specific example assumed all column names incodes for variable “x” (indf) is “x_values”, as in your example data)
Also, if you wanted to keep youcodes values exactly as is, you could do:
codes <- data.frame( gender_values= c("`1`='male', `2`='female'"), condition_values = c("`1`='exp', `2`='control'"))# single column exampledplyr::recode(df$gender, !!!strsplit(codes$gender_values, ",")[[1]])# [1] "`1`='male'" " `2`='female'" "`1`='male'" " `2`='female'"# multiple columnssapply(names(df), function(x) dplyr::recode(df[,x], !!!strsplit(codes[,paste0(x, "_values")], ",")[[1]]))# gender condition # [1,] "`1`='male'" "`1`='exp'" # [2,] " `2`='female'" "`1`='exp'" # [3,] "`1`='male'" " `2`='control'"# [4,] " `2`='female'" " `2`='control'"To clean it up a bit, though, you could preprocess the original codes:
codes2 <- lapply(codes[], function(x) gsub("`|'|=|[[:digit:]]+", "", trimws(unlist(strsplit(x, ",")))))sapply(names(df), function(x) dplyr::recode(df[,x], !!!codes2[[paste0(x, "_values")]]))# gender condition# [1,] "male" "exp" # [2,] "female" "exp" # [3,] "male" "control"# [4,] "female" "control"I would also advise changing "exp" to "exposure" or "expos" or something else, asexp is a function in R to calculate exponentials, and its good practice not to confuse things!
1 Comment
Explore related questions
See similar questions with these tags.