- Notifications
You must be signed in to change notification settings - Fork14
Scalable decision tree training and inference.
License
suiji/Arborist
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
The Arborist project hosts fast, open-source implementations of several decision-tree algorithms. Breiman and Cutler'sRandom Forest algorithm is implemented and Friedman'sStochastic Gradient Boosting is available as an alpha release. A spin providing Friedman and Fisher'sPRIM ("Patient Rule Induction Method") has been developed by Decision Patterns, LLC. Arborist derivatives achieve their speed through parallelized and vectorized inner loops. Parallel, distributed training is also possible for independently-trained trees. Considerable attention has been devoted to minimizing and regularizing data movement, a key challenge to accelerating these algorithms.
Bindings are provided forR. A language-agnostic bridge design supports development of bindings for additional front ends, such asPython andJulia.
CRAN hosts the released packageRborist, which implements the Random Forest algorithm.
Installation of thereleased version usingR:
> install.packages('Rborist')Installation of thedevelopment version, hosted on this archive, from the top-level directory:
> ./Rborist/Package/Rborist.CRAN.sh> R CMD INSTALL Rborist_*.*-*.tar.gzACRAN-friendly snapshot of thedevelopment source is mirrored by the neighboring archiveRborist.CRAN. This archive is intended for remote access byR utilities such asdevtools.
- Rborist version 0.3-7 is hosted onCRAN.
- Version 0.1-0 has been archived.
- Test cases sought.
Performance metrics have been measured usingbenchm-ml. Partial results can be foundhere
Some users have reported diminished performance when running single-threaded. We recommend running with at least two cores, as frequently-executed inner loops have been cast specifically to take advantage of multiple cores. In particular, when using a scaffold such ascaret, please prefer to let Rborist be greedier with cores than is the scaffold.
In a 2015 paper,https://www.jstatsoft.org/article/view/v077i01/v77i01.pdf, Wright and Ziegler compare severalR-language implementations of the Random Forest algorithm, including Ranger and Rborist. A key finding was that Rborist failed to achieve the
A 2019 paper compares several categories of regression tools, including Random Forests. Rborist is among the faster packages offering high prediction accuracy: (https://doi.org/10.1109/ACCESS.2019.2933261). Based on the findings, we have updated the package's default settings. In particular, fixed-number predictor sampling (mtry) appears to provide more accurate predictions at low dimension than the current approach of Bernoulli sampling.
- Scalability Issues in Training Decision Trees (video), Nimbix Developer Summit, 2017.
- Controlling for Monotonicity in Random Forest Regressors (PDF), R in Finance, May 2016.
- Accelerating the Random Forest algorithm for commodity parallel hardware (Video), PyData, July, 2015.
- The Arborist: A High-Performance Random Forest (TM) Implementation, R in Finance, May 2015.
- Training Random Forests on the GPU: Tree Unrolling (PDF), GTC, March, 2015.
- New archivesgbArb.CRAN mirrors thesgbArb package for stochastic gradient boosting.
- New archiveRborist.CRAN mirrors theRborist package source in a form directly amenable to utilities such asdevtools.
- New commandrfTrain exposes the training component of the compound format/sample/train/validate task performed byrfArb. This provides separate training of sampled, prefomatted data.
- New prediction optionkeyedFrame accesses prediction columns by name, bypassing a previous requirement that training and prediction frames have the same column ordering. In addition to arbitrary ordering, the prediction frame may now include columns not submitted to training.
- New commandforestWeight computes Meinshausen's forest-wide weights. Nonterminals are weighted in addition to leaves, both to facilitate post-pruning and to accommodate early exit under prediction with trap-and-bail.
- New prediction optionindexing=TRUE records final node indices of tree walks.
- Training ignores missing predictor values, splitting over appropriately reduced subnodes.
- Quantile estimation supports both leaf and nonterminal (i.e., trap-and-bail) prediction modes.
- Prediction and validiation support large (> 32 bits) observation counts.
- Support for training more than 2^32 observations may be enabled by recompiling.
- New optionimpPermute introduces permutation-based variable importance.
- Following the introduction of standalone sampling, a break in backward compatibility appears in versions 0.3-0 and higher of theRborist package. Prediction with models trained using earlier versions throws an unidentified-index exception from within theRcpp glue layer. Older models should therefore be retrained in order to use version 0.3-0 and above.
- Documentation for thevalidate command are out of date and may lead to errors. The documentation has been revised with version 0.3-8.
- Classification was reporting theoobErr value as accuracy instead of out-of-bag error. This has been repaired in 0.3-8.
- Non-binary classification with moderately-sized factor predictors (>= 64 levels) was failing due a wraparound shift. Repaired in 0.3-8.
Correctness and runtime errors are addressed as received. With reproducible test cases, repairs are typically uploaded to theRborst.CRAN repository within several days.
Feature requests are addressed on a case-by-case basis.
About
Scalable decision tree training and inference.
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors6
Uh oh!
There was an error while loading.Please reload this page.