- Notifications
You must be signed in to change notification settings - Fork5
A collection of boosting algorithms written in Rust 🦀
License
rmitsuboshi/miniboosts
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
MiniBoosts is a library for boosting algorithm researchers.
Boosting is a repeated game between aBooster and aWeak Learner.
For each round of the game,
- TheBooster chooses a distribution over training examples,
- Then theWeak Learner chooses a hypothesis (function)whose accuracy w.r.t. the distribution isslightly better than random guessing.
After sufficient rounds, theBooster outputs a hypothesisthat performs significantly better on training examples.
Write the following in yourcargo.toml.
[dependencies]minibosts = {version ="0.4.0" }
All boosting algorithms are implemented withoutGurobi.but, if you have aGurobi license,you can use the Gurobi version of the algorithms by setting:
[dependencies]minibosts = {version ="0.4.0",features = ["gurobi"] }
Caution
Since I am no longer a student, I cannot check whether the compilation succeeded with the"gurobi"
flag.
Currently, following boosting algorithms are available:
BOOSTER | FEATURE FLAG |
---|---|
AdaBoost by Freund and Schapire, 1997 | |
MadaBoost by Domingo and Watanabe, 2000 | |
GBM (Gradient Boosting Machine) by Jerome H. Friedman, 2001 | |
LPBoost by Demiriz, Bennett, and Shawe-Taylor, 2002 | gurobi |
SmoothBoost by Servedio, 2003 | |
AdaBoostV by Rätsch and Warmuth, 2005 | |
TotalBoost by Warmuth, Liao, and Rätsch, 2006 | gurobi |
SoftBoost by Warmuth, Glocer, and Rätsch, 2007 | gurobi |
ERLPBoost by Warmuth and Glocer, and Vishwanathan, 2008 | gurobi |
CERLPBoost (Corrective ERLPBoost) by Shalev-Shwartz and Singer, 2010 | gurobi |
MLPBoost by Mitsuboshi, Hatano, and Takimoto, 2022 | gurobi |
GraphSepBoost (Graph Separation Boosting) by Alon, Gonen, Hazan, and Moran, 2023 |
If you invent a new boosting algorithm,you can introduce it by implementingBooster
trait.Seecargo doc -F gurobi --open
for details.
WEAK LEARNER |
---|
Decision Tree |
Regression Tree |
A worst-case weak learner for LPBoost |
Gaussian Naive Bayes |
Neural Network (Experimental) |
If you write a paper about boosting algorithms,you need to compare your algorithm against others.At this point, some issues arise.
- Some boosting algorithms,such asLightGBM orXGBoost,are implemented and available for free.These are very easy to use in Python3 but hard to compare to other algorithmssince they are implemented in C++ internally.Implementing your algorithm in Python3makes the running time comparison unfair(Python3 is significantly slow compared to C++).However, implementing it in C++ is extremely hard (based on my experience).
- Most boosting algorithms are designedfor a decision-tree weak learnereven though the boosting protocol does not demand.
- There is no implementation for margin optimization boosting algorithms.Margin optimization is a better goal than empirical risk minimizationin binary classification.
MiniBoosts is a crate to address the above issues.
This crate provides the followings.
- Two main traits, named
Booster
andWeakLearner.
- If you invent a new Boosting algorithm,all you need is to implement
Booster.
- If you invent a new Weak Learning algorithm,all you need is to implement
WeakLearner.
- If you invent a new Boosting algorithm,all you need is to implement
- Some famous boosting algorithms,includingAdaBoost,LPBoost,ERLPBoost, etc.
- Some weak learners, including Decision-Tree, Regression-Tree, etc.
Sometimes, one wants to log each step of boosting procedure.You can useLogger
struct to output log to.csv
file,while printing the status like this:
SeeResearch feature section for detail.
Write the following toCargo.toml
.
miniboosts = {version ="0.4.0" }
If you want to usegurobi
features, enable the flag:
miniboosts = {version ="0.4.0",features = ["gurobi"] }
Here is a sample code:
use miniboosts::prelude::*;fnmain(){// Set file namelet file ="/path/to/input/data.csv";// Read the CSV file// The column named `class` corresponds to the labels (targets).let sample =SampleReader::new().file(file).has_header(true).target_feature("class").read().unwrap();// Set tolerance parameter as `0.01`.let tol:f64 =0.01;// Initialize Boosterletmut booster =AdaBoost::init(&sample).tolerance(tol);// Set the tolerance parameter.// Construct `DecisionTree` Weak Learner from `DecisionTreeBuilder`.let weak_learner =DecisionTreeBuilder::new(&sample).max_depth(3)// Specify the max depth (default is 2).criterion(Criterion::Twoing)// Choose the split rule..build();// Build `DecisionTree`.// Run the boosting algorithm// Each booster returns a combined hypothesis.let f = booster.run(&weak_learner);// Get the batch prediction for all examples in `data`.let predictions = f.predict_all(&sample);// You can predict the `i`th instance.let i =0_usize;let prediction = f.predict(&sample, i);// You can convert the hypothesis `f` to `String`.let s = serde_json::to_string(&f);}
If you use boosting for soft margin optimization,initialize booster like this:
let n_sample = sample.shape().0;// Get the number of training exampleslet nu = n_sampleasf64*0.2;// Set the upper-bound of the number of outliers.let lpboost =LPBoost::init(&sample).tolerance(tol).nu(nu);// Set a capping parameter.
Note that the capping parameter must satisfies1 <= nu && nu <= n_sample
.
This crate can output a CSV file for such values in each step.
Here is an example:
use miniboosts::prelude::*;use miniboosts::{Logger,LoggerBuilder,SoftMarginObjective,};// Define a loss functionfnzero_one_loss<H>(sample:&Sample,f:&H) ->f64whereH:Classifier{let n_sample = sample.shape().0asf64;let target = sample.target(); f.predict_all(sample).into_iter().zip(target.into_iter()).map(|(fx,&y)|if fx != yasi64{1.0}else{0.0}).sum::<f64>() / n_sample}fnmain(){// Read the training datalet path ="/path/to/train/data.csv";let train =SampleReader::new().file(path).has_header(true).target_feature("class").read().unwrap();// Set some parameters used later.let n_sample = train.shape().0asf64;let nu =0.01* n_sample;// Read the test datalet path ="/path/to/test/data.csv";let test =SampleReader::new().file(path).has_header(true).target_feature("class").read().unwrap();let booster =LPBoost::init(&train).tolerance(0.01).nu(nu);let weak_learner =DecisionTreeBuilder::new(&train).max_depth(2).criterion(Criterion::Entropy).build();// Set the objective function.// One can use your own function by implementing ObjectiveFunction trait.let objective =SoftMarginObjective::new(nu);letmut logger =LoggerBuilder::new().booster(booster).weak_learner(tree).train_sample(&train).test_sample(&test).objective_function(objective).loss_function(zero_one_loss).time_limit_as_secs(120)// Terminate after 120 seconds.print_every(10)// Print log every 10 rounds..build();// Each line of `lpboost.csv` contains the following four information:// Objective value, Train loss, Test loss, Time per iteration// The returned value `f` is the combined hypothesis.let f = logger.run("logfile.csv").expect("Failed to logging");}
- Currently, this crate mainly supportsboosting algorithms for binary classification.
- Some boosting algorithms useGurobi optimizer,so you must acquire a license to use this library.If you have the license, you can use these boosting algorithms (boosters)by specifying
features = ["gurobi"]
inCargo.toml
.The compilation failsif you try to use the gurobi feature without a Gurobi license. - One can log your algorithm by implementing
Research
trait. - Run
cargo doc -F gurobi --open
to see more information. GraphSepBoost
only supports the aggregation ruleshown in Lemma 4.2 of their paper.
Boosters
Weak Learners
- Bag of words
- TF-IDF
- RBF-Net
About
A collection of boosting algorithms written in Rust 🦀