DART booster
XGBoost mostly combines a huge number of regression trees with a small learning rate.In this situation, trees added early are significant and trees added late are unimportant.
Vinayak and Gilad-Bachrach proposed a new method to add dropout techniques from the deep neural net community to boosted trees, and reported better results in some situations.
This is a instruction of new tree boosterdart.
Original paper
Rashmi Korlakai Vinayak, Ran Gilad-Bachrach. “DART: Dropouts meet Multiple Additive Regression Trees.” [PMLR,arXiv].
Features
Drop trees in order to solve the over-fitting.
Trivial trees (to correct trivial errors) may be prevented.
Because of the randomness introduced in the training, expect the following few differences:
Training can be slower than
gbtreebecause the random dropout prevents usage of the prediction buffer.The early stop might not be stable, due to the randomness.
How it works
In\(m\)-th training round, suppose\(k\) trees are selected to be dropped.
Let\(D = \sum_{i \in \mathbf{K}} F_i\) be the leaf scores of dropped trees and\(F_m = \eta \tilde{F}_m\) be the leaf scores of a new tree.
The objective function is as follows:
\(D\) and\(F_m\) are overshooting, so using scale factor
Parameters
The boosterdart inheritsgbtree booster, so it supports all parameters thatgbtree does, such aseta,gamma,max_depth etc.
Additional parameters are noted below:
sample_type: type of sampling algorithm.uniform: (default) dropped trees are selected uniformly.weighted: dropped trees are selected in proportion to weight.
normalize_type: type of normalization algorithm.tree: (default) New trees have the same weight of each of dropped trees.
\[\begin{split}a \left( \sum_{i \in \mathbf{K}} F_i + \frac{1}{k} F_m \right)&= a \left( \sum_{i \in \mathbf{K}} F_i + \frac{\eta}{k} \tilde{F}_m \right) \\&\sim a \left( 1 + \frac{\eta}{k} \right) D \\&= a \frac{k + \eta}{k} D = D , \\&\quad a = \frac{k}{k + \eta}\end{split}\]forest: New trees have the same weight of sum of dropped trees (forest).
\[\begin{split}a \left( \sum_{i \in \mathbf{K}} F_i + F_m \right)&= a \left( \sum_{i \in \mathbf{K}} F_i + \eta \tilde{F}_m \right) \\&\sim a \left( 1 + \eta \right) D \\&= a (1 + \eta) D = D , \\&\quad a = \frac{1}{1 + \eta} .\end{split}\]rate_drop: dropout rate.range: [0.0, 1.0]
skip_drop: probability of skipping dropout.If a dropout is skipped, new trees are added in the same manner as gbtree.
range: [0.0, 1.0]
Sample Script
importxgboostasxgb# read in datadtrain=xgb.DMatrix('demo/data/agaricus.txt.train?format=libsvm')dtest=xgb.DMatrix('demo/data/agaricus.txt.test?format=libsvm')# specify parameters via mapparam={'booster':'dart','max_depth':5,'learning_rate':0.1,'objective':'binary:logistic','sample_type':'uniform','normalize_type':'tree','rate_drop':0.1,'skip_drop':0.5}num_round=50bst=xgb.train(param,dtrain,num_round)preds=bst.predict(dtest)