2

I want to recode values in one dataset based on values in another dataset. My overall goal is to applyrecode across multiple columns of the dataframe.

Data:

df <- data.frame(  gender=c(1,2,1,2),  condition=c(1,1,2,2))df  gender condition1      1         12      2         13      1         24      2         2

Other dataset:

codes <- data.frame(  gender_values= c("`1`='male', `2`='female'"),  condition_values = c("`1`='exp', `2`='control'"))codes             gender_values         condition_values1 `1`='male', `2`='female' `1`='exp', `2`='control'

Attempt:

df %>%   dplyr::mutate(  gender= dplyr::recode(gender, cat(noquote(codes[1,"gender_values"])), .default = NA_character_))`1`='male', `2`='female'  gender condition1   <NA>         12   <NA>         13   <NA>         24   <NA>         2

Wanted:

  gender condition1   male       exp2 female       exp3   male   control4 female   control
jpsmith's user avatar
jpsmith
18.6k6 gold badges25 silver badges48 bronze badges
askedJan 18, 2023 at 14:06
EML's user avatar
6
  • In yourcodes, you havec("1='male',2='female'") which is one long, single string. Was this intentional, or is this supposed to be two elements (1="male" and2="female")? Something like,c("1='male'", "2='female'")?CommentedJan 18, 2023 at 14:17
  • You sure that codes has this weird form? Why not log form (i.e. variable_name, variable_vaule, variable_text? Or wide form? Most straight forward solution would be to use a join to bind the "recoded" values to the numbers...CommentedJan 18, 2023 at 14:18
  • 1
    Does this answer your question?Recoding values in second data frame based on values in a different data frameCommentedJan 18, 2023 at 14:19
  • @jpsmith I intended a long string so recode can use the values. For some reason, recode is not using the values in the same way as recode(gender,1="male"...).CommentedJan 18, 2023 at 14:28
  • @dario I believe it does not because the value labels for my dataset are in one cell for each variable.CommentedJan 18, 2023 at 15:31

1 Answer1

2

If you want to usedplyr::recode, you can exploit the splice operator!!! to help out here and evaluate the values incodes. It helps if you simplify yourcodes data:

codes <- data.frame(  gender_values = c("male", "female"),  condition_values = c("exp", "control"))

Then, for example, on a single column you can do:

dplyr::recode(df$gender, !!!codes$gender_values)# [1] "male"   "female" "male"   "female"

One way to apply it across columns given your example data is to usesapply:

sapply(names(df), function(x) dplyr::recode(df[,x], !!!codes[,paste0(x, "_values")]))#      gender   condition# [1,] "male"   "exp"    # [2,] "female" "exp"    # [3,] "male"   "control"# [4,] "female" "control"

(Note this specific example assumed all column names incodes for variable “x” (indf) is “x_values”, as in your example data)

Also, if you wanted to keep youcodes values exactly as is, you could do:

codes <- data.frame(  gender_values= c("`1`='male', `2`='female'"),  condition_values = c("`1`='exp', `2`='control'"))# single column exampledplyr::recode(df$gender, !!!strsplit(codes$gender_values, ",")[[1]])# [1] "`1`='male'"    " `2`='female'" "`1`='male'"    " `2`='female'"# multiple columnssapply(names(df), function(x) dplyr::recode(df[,x], !!!strsplit(codes[,paste0(x, "_values")], ",")[[1]]))#      gender          condition       # [1,] "`1`='male'"    "`1`='exp'"     # [2,] " `2`='female'" "`1`='exp'"     # [3,] "`1`='male'"    " `2`='control'"# [4,] " `2`='female'" " `2`='control'"

To clean it up a bit, though, you could preprocess the original codes:

codes2 <- lapply(codes[], function(x)  gsub("`|'|=|[[:digit:]]+", "", trimws(unlist(strsplit(x, ",")))))sapply(names(df), function(x) dplyr::recode(df[,x], !!!codes2[[paste0(x, "_values")]]))#      gender   condition# [1,] "male"   "exp"    # [2,] "female" "exp"    # [3,] "male"   "control"# [4,] "female" "control"

I would also advise changing "exp" to "exposure" or "expos" or something else, asexp is a function in R to calculate exponentials, and its good practice not to confuse things!

answeredJan 18, 2023 at 15:53
jpsmith's user avatar
Sign up to request clarification or add additional context in comments.

1 Comment

I’d probably need the dput - this is best suited for its own question as it may be valuable to others. Ping me if you create a new one - Good luck!

Your Answer

Sign up orlog in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

By clicking “Post Your Answer”, you agree to ourterms of service and acknowledge you have read ourprivacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.