Movatterモバイル変換


[0]ホーム

URL:


Dirk Eddelbuettel

prrd: Parallel Running [of] Reverse Depends

Build StatusLicenseCRANDownloadsLast Commit

Motivation

R packages available via theCRAN mirror network system are of consistently high quality and tend to “Just Work”. One of the many reasons for this is a good culture of “do not break other packages” which is controlled for / enacted by the CRAN maintainers. Package maintainers are expected to do their part—by checking their packages!

To take one example,Rcpp is package with a pretty large tail of dependencies: as of this writing in late 2017, about 1270 other packages use it. So 1270 other packages need to be tested. This takes time, especially when running serially. But it is easy to parallelise.

How

Previously, a few ad-hoc scripts (availablehere) were used for a number of packages. The scripts were one-offs and did their job. But with the idea of running jobs in parallel, theliteq package byGabor Csardi fit the requirements nicely.

Enqueuing

The first operation is toenqueue jobs. In the simplest form we do (assuming the included script is in thePATH)

$enqueueJobs -q queueDirectory Rcpp

The same operation can also be done from R itself, seehelp(enqueueJobs). A package name has to be supplied; a directory name (for the queue directory) is optional. This function uses two base R functions to get all available packages and then determine the (non-recursive, first-level) reverse dependencies of the given package. These are then added to the queue as “jobs”.

Dequeuing

This is the second operation, and it can be done in parallel. In other words, in several shells do:

$dequeueJobs -q queueDirectory Rcpp

which will find the (current) queue file in the specified directory for the given package—hereRcpp. Again, this can also be done from an R prompt if prefered, seehelp(dequeueJobs).

Each worker, when idle, goes to the queue and requests a job, which he then labors over by the testing the thus-given reverse depedency. Once done, the worker is idle and returns to the queue.

As there is absolutely no interdepedence between the tests, this parallelises easily and up to resource level of the machine.

Performance

To illustrate, “wall time” for a reverse-dependecy check ofRcpp decreased from 14.91 hours to 3.75 hours (or almost four-fold) using six workers. An earlier run ofRcppArmadillo decreased from 5.87 hours to 1.92 hours (or just over three-fold) using four workers, and to 1.29 hours (or by 4.5) using six workers (and a freshccache, seehere for its impact). In all cases the machine which was used was generally not idle.

The following screenshot shows a run forRcppArmadillo with six workers. It shows the successes in green, skipped jobs in blue (from packages which sometimes would result in runaway tests), and no failures (which would be shown in red).

The split screen, as well as the additional tabls, is thanks to the wonderfulbyobu wrapper aroundtmux.

Configuration

The scripts use an internal YAML file access via theconfig package by JJ. The following locations are searched:.prrd.yaml,~/.R/prrd.yaml,~/.prrd.yaml, and/etc/R/prrd.yaml. For my initial tests I used these values:

## config for prrd packagedefault:  setup: "~/git/prrd/local/setup.R"  workdir: "/tmp/prrd"  libdir: "/tmp/prrd/lib"  debug: "false"  verbose: "false"

Theworkdir andlibdir variables specify where tests are run and which additonal library directory is used. A more interesting variable issetup, which points to a helper script which gets sourced. This permits setting of the CRAN repo address and any additonal environment variables needed for tests. My current script isin the repository.

Status

While the package is new, it has already been used for a few complete reverse depends tests runs.

Installation

The package is not yet on CRAN, but may be uploaded “soon”.

Authors

Dirk Eddelbuettel

License

GPL (>= 2)

Initially created: Sun Dec 31 14:45:32 CST 2017
Last modified: Sat May 30 08:26:38 CDT 2020


[8]ページ先頭

©2009-2025 Movatter.jp