- Notifications
You must be signed in to change notification settings - Fork38
Open
Description
Cross posting this on here since onSOthis question is not getting much traction. What is the appropriate way to pass to.f a list of character vectors in.x that reference globalValues so thatfuture_map inmultisession can locate the global objects referenced?
The reason I'm assigning thedf values to my global environment is to try and lower the size of the globals exported as this is significantly slowing the code when running multisession on remote clusters. In reality, mydf list containing variable values is a large file that kills any benefits of running multisession on remote clusters.
Consider the following reproducible example:
library(future)library(furrr)library(kit)library(tictoc)library(tidyverse)options(future.rng.onMisuse = "ignore")plan(multisession)## reprex datavars <- paste0(letters,1:10)bestVars <- combn(vars, 5, simplify = F)df <- data.frame( matrix(data = rnorm(50000*length(vars),200,500), nrow = 50000, ncol = length(vars)))names(df) <- varsdf$actual_value <- rnorm(n = nrow(df), 350, 300)df <- df %>% dplyr::select(actual_value,everything(.))df <- lapply(split.default(x = df, names(df)), function(x) x[[1]])list2env(df, globalenv())rm(df)## function to run variables combination in paralell in local machinerun_sim_in_par <- function(vars_to_sim){ sampled_rows <- sample(x = length(actual_value), size = 50, replace = F) varname <- paste(names(vars_to_sim), collapse = "*") best <- Reduce(vars_to_sim, f = '*')[sampled_rows] row_idx <- kit::topn(best, n = 5, decreasing = T, hasna = FALSE, index = TRUE) best_row_actual_value <- actual_value[sampled_rows][row_idx] sim <- data.frame(var = varname, mean_actual_value = mean(best_row_actual_value)) return(sim)}## testing to ensure run_sim_par function worksx <- bestVars[[1]]simulated_res <- run_sim_in_par(vars_to_sim = mget(x))> simulated_resvar mean_actual_value1 a1*b2*c3*d4*e5 361.17## attempt to pass .x to custom function (doesn't know where to find .x values)future_map( .x = bestVars, .f = ~run_sim_in_par(vars_to_sim = mget(.x)))Error in (function (.x, .f, ..., .progress = FALSE) : ℹ In index: 1.Caused by error:! value for 'a1' not found## what if i expclicitly declare the .x values in furr_options(globals....); test just one list element first?future_map( .x = bestVars[[1]], .f = ~run_sim_in_par(vars_to_sim = mget(.x)), .options = furrr_options(globals = c(bestVars[[1]], "run_sim_in_par", "actual_value")))Error in future$envir$...future.seeds_ii : $ operator is invalid for atomic vectorsMetadata
Metadata
Assignees
Labels
No labels