- Notifications
You must be signed in to change notification settings - Fork38
Apply Mapping Functions in Parallel using Futures
License
Unknown, MIT licenses found
Licenses found
futureverse/furrr
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
The goal of furrr is to combine purrr’s family of mapping functions withfuture’s parallel processing capabilities. The result is near drop inreplacements for purrr functions such asmap() andmap2_dbl(), whichcan be replaced with their furrr equivalents offuture_map() andfuture_map2_dbl() to map in parallel.
The code draws heavily from the implementations of purrr andfuture.apply and this package would not be possible without either ofthem.
Every variant of the following functions has been implemented:
map()map2()pmap()walk()imap()modify()
This includes atomic variants likemap_dbl() throughfuture_map_dbl() and predicate variants likemap_at() throughfuture_map_at().
You can install the released version of furrr fromCRAN with:
install.packages("furrr")And the development version fromGitHub with:
# install.packages("remotes")remotes::install_github("futureverse/furrr")
The easiest way to learn about furrr is to browsethewebsite. In particular, thefunctionreference page canbe useful to get a general overview of the functions in the package, andthe following vignettes are deep dives into various parts of furrr:
furrr has been designed to function as identically to purrr as possible,so that you can immediately have familiarity with it.
library(furrr)library(purrr)map(c("hello","world"),~.x)#> [[1]]#> [1] "hello"#>#> [[2]]#> [1] "world"future_map(c("hello","world"),~.x)#> [[1]]#> [1] "hello"#>#> [[2]]#> [1] "world"
The default backend for future (and through it, furrr) is a sequentialone. This means that the above code will run out of the box, but it willnot be in parallel. The design of future makes it incredibly easy tochange this so that your code will run in parallel.
# Set a "plan" for how the code should run.plan(multisession,workers=2)# This does run in parallel!future_map(c("hello","world"),~.x)#> [[1]]#> [1] "hello"#>#> [[2]]#> [1] "world"
If you are still skeptical, here is some proof that we are running inparallel.
library(tictoc)# This should take 6 seconds in total running sequentiallyplan(sequential)tic()nothingness<- future_map(c(2,2,2),~Sys.sleep(.x))toc()#> 6.08 sec elapsed
# This should take ~2 seconds running in parallel, with a little overhead# in `future_map()` from sending data to the workers. There is generally also# a one time cost from `plan(multisession)` setting up the workers.plan(multisession,workers=3)tic()nothingness<- future_map(c(2,2,2),~Sys.sleep(.x))toc()#> 2.212 sec elapsed
It’s important to remember that data has to be passed back and forthbetween the workers. This means that whatever performance gain you mighthave gotten from your parallelization can be crushed by moving largeamounts of data around. For example, if you are moving large data framesto the workers, running models in parallel, and returning large modelobjects back, the shuffling of data can take a large chunk of that time.Rather than returning the entire model object, you might consider onlyreturning a performance metric, or smaller specific pieces of that modelthat you are most interested in.
This performance drop can especially be prominent if usingfuture_pmap() to iterate over rows and return large objects at eachiteration.
About
Apply Mapping Functions in Parallel using Futures
Resources
License
Unknown, MIT licenses found
Licenses found
Code of conduct
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors6
Uh oh!
There was an error while loading.Please reload this page.
