Movatterモバイル変換

Rforestry:Random Forests, Linear Trees, and Gradient Boosting for Inference andInterpretability

Sören Künzel, Theo Saarinen, Simon Walter, Edward Liu, Allen Tang,Jasjeet Sekhon

Introduction

Rforestry is a fast implementation of Random Forests, GradientBoosting, and Linear Random Forests, with an emphasis on inference andinterpretability.

How to install

The GFortran compiler has to be up to date. GFortran Binaries can befoundhere.
Thedevtools packagehas to be installed. You can install it using,install.packages("devtools").
The package contains compiled code, and you must have a developmentenvironment to install the development version. You can usedevtools::has_devel() to check whether you do. If nodevelopment environment exists, Windows users download and installRtools andmacOS users download and installXcode.
The latest development version can then be installed usingdevtools::install_github("forestry-labs/Rforestry"). ForWindows users, you’ll need to skip 64-bit compilationdevtools::install_github("forestry-labs/Rforestry", INSTALL_opts = c('--no-multiarch'))due to an outstanding gcc issue.

Usage

set.seed(292315)library(Rforestry)test_idx<-sample(nrow(iris),3)x_train<- iris[-test_idx,-1]y_train<- iris[-test_idx,1]x_test<- iris[test_idx,-1]rf<-forestry(x = x_train,y = y_train)weights=predict(rf, x_test,aggregation ="weightMatrix")$weightMatrixweights%*% y_trainpredict(rf, x_test)

Ridge Random Forest

A fast implementation of random forests using ridge penalizedsplitting and ridge regression for predictions.

Example:

set.seed(49)library(Rforestry)n<-c(100)a<-rnorm(n)b<-rnorm(n)c<-rnorm(n)y<-4*a+5.5*b- .78*cx<-data.frame(a,b,c)forest<-forestry(x, y,ridgeRF =TRUE)predict(forest, x)

Monotonic Constraints

A parameter controlling monotonic constraints for features inforestry.

library(Rforestry)x<-rnorm(150)+5y<- .15*x+ .5*sin(3*x)data_train<-data.frame(x1 = x,x2 =rnorm(150)+5,y = y+rnorm(150,sd = .4))monotone_rf<-forestry(x = data_train%>%select(-y),y = data_train$y,monotonicConstraints =c(-1,-1),nodesizeStrictSpl =5,nthread =1,ntree =25)predict(monotone_rf,feature.new = data_train%>%select(-y))

OOB Predictions

We can return the predictions for the training dataset using only thetrees in which each observation was out of bag. Note that when there arefew trees, or a high proportion of the observations sampled, there maybe some observations which are not out of bag for any trees. Thepredictions for these are returned NaN.

library(Rforestry)# Train a forestrf<-forestry(x = iris[,-1],y = iris[,1],ntree =500)# Get the OOB predictions for the training setoob_preds<-getOOBpreds(rf)# This should be equal to the OOB errorsum((oob_preds-  iris[,1])^2)getOOB(rf)

[8]ページ先頭