- Notifications
You must be signed in to change notification settings - Fork6
Simplified R caching for reproducible big data projects
License
databio/simpleCache
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
simpleCache is an R package providing functions for caching R objects. Itspurpose is to encourage writing reusable, restartable, and reproducible analysispipelines for projects with massive data and computational requirements.
Like its name indicates,simpleCache is intended to be simple. You choose alocation to store your caches, and then provide the function with nothing morethan a cache name and instructions (R code) for how to produce the R object.While simple,simpleCache also provides some advanced options like environmentassignments, recreating caches, reloading caches, and even cluster computebindings (using thebatchtools package) making it flexible enough for use inlarge-scale data analysis projects.
simpleCache is onCRAN and canbe installed as usual:
install.packages("simpleCache")simpleCache comes with a single primary function (simpleCache()) that will do almosteverything you need. In short, you run it with a few lines like this:
library(simpleCache) setCacheDir(tempdir())simpleCache("normSample", { rnorm(1e7, 0,1) }, recreate=TRUE)simpleCache("normSample", { rnorm(1e7, 0,1) })simpleCache also interfaces with thebatchtools package to let you buildcaches on any cluster resource manager.
simpleCache(): Creates and caches or reloads cached results of provided R instruction codelistCaches(): Lists all of the caches available in thecacheDirdeleteCaches(): Deletes cache(s) from thecacheDirsetCacheDir(): Sets a global option for a cache directory so you don't have to specify one in eachsimpleCachecallsimpleCacheOptions(): Views all of thesimpleCacheglobal options that have been set
The use case I had in mind forsimpleCache is that you find yourselfconstantly recalculating the same R object in several different scripts, orrepeatedly in the same script, every time you open it and want to continue thatproject. SimpleCache is well-suited for interactive analysis, allowing you topick up right where you left off in a new R session, without having torecalculate everything. It is equally useful in automatic pipelines, whereseparate scripts may benefit from loading, instead of recalculating, the same Robjects produced by other scripts.
R provides some base functions (save,serialize, andload) to let you saveand reload such objects, but these low-level functions are a bit cumbersome.simpleCache simply provides a convenient, user-friendly interface to thesefunctions, streamlining the process. For example, a singlesimpleCache callwill check for a cache and load it if it exists, or create it if it does not.With the base Rsave andload functions, you can't just write a singlefunction call and then run the same thing every time you start the script --even this simple use case requires additional logic to check for an existingcache.simpleCache just does all this for you.
The thing to keep in mind withsimpleCache is thatthe cache name isparamount.simpleCache assumes that your name for an object is a perfectidentifier for that object; in other words, don't cache things that you plan tochange.
simpleCache is licensed under the2-Clause BSD License. Questions, feature requests and bug reports are welcome via theissue queue. The maintainer will review pull requests and incorporate contributions at his discretion.
For more information refer to the contributing document and pull request / issue templates in the.github folder of this repository.
About
Simplified R caching for reproducible big data projects
Resources
License
Contributing
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Contributors4
Uh oh!
There was an error while loading.Please reload this page.