NotificationsYou must be signed in to change notification settings
Fork352
Star6.6k

Commit7ae42b1

authored

ML in Rust (#306)

1 parent9c1072d commit7ae42b1Copy full SHA for 7ae42b1

File tree

2 files changed

+121

-0

lines changed

pgml-docs
- docs/blog
  - oxidizing-machine-learning.md
- mkdocs.yml

2 files changed

+121

-0

lines changed

`‎pgml-docs/docs/blog/oxidizing-machine-learning.md‎`

Lines changed: 120 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,120 @@`
	`1`	`+---`
	`2`	`+author:Lev Kokotov`
	`3`	`+description:Machine learning in Python is slow and error-prone, while Rust makes it fast and reliable.`
	`4`	`+---`
	`5`	`+`
	`6`	`+`
	`7`	`+#Oxidizing Machine Learning`
	`8`	`+`
	`9`	`+<pclass="author">`
	`10`	`+ <imgwidth="54px"height="54px"src="/images/team/lev.jpg"alt="Author" />`
	`11`	`+ Lev Kokotov<br/>`
	`12`	`+ September 7, 2022`
	`13`	`+</p>`
	`14`	`+`
	`15`	`+`
	`16`	`+Machine learning in Python can be hard to deploy at scale. We all love Python, but it's no secret`
	`17`	`+that its overhead is large:`
	`18`	`+`
	`19`	`+* Load data from large CSV files`
	`20`	`+* Do some post-processing with NumPy`
	`21`	`+* Move and join data into a Pandas dataframe`
	`22`	`+* Load data into the algorithm`
	`23`	`+`
	`24`	`+Each step incurs at least one copy of the data in memory; 4x storage and compute cost for training a model sounds inefficient, but when you add Python's memory allocation, the price tag increases exponentially.`
	`25`	`+`
	`26`	`+Even if you could find the money to pay for the compute needed, fitting the dataset we want into the RAM we have becomes difficult.`
	`27`	`+`
	`28`	`+The status quo needs a shake up, and along came Rust.`
	`29`	`+`
	`30`	`+##The State of ML in Rust`
	`31`	`+`
	`32`	+Doing machine learning in anything but Python sounds wild, but if one looks under the hood, ML algorithms are mostly written in C++:`libtorch` (Torch), XGBoost, large parts of Tensorflow,`libsvm` (Support Vector Machines), and the list goes on. A linear regression can be (and is) written in about 10 lines of for-loops.
	`33`	`+`
	`34`	`+It then should come to no surprise that the Rust ML community is alive, and doing well:`
	`35`	`+`
	`36`	`+* SmartCore[^1] is rivaling Scikit for commodity algorithms`
	`37`	`+* XGBoost bindings[^2] work great for gradient boosted trees`
	`38`	`+* Torch bindings[^3] are first class for building any kind of neural network`
	`39`	`+* Tensorflow bindings[^4] are also in the mix, although parts of them are still Python (e.g. Keras)`
	`40`	`+`
	`41`	`+If you start missing NumPy, don't worry, the Rust version[^5] has got you covered, and the list of available tools keeps growing.`
	`42`	`+`
	`43`	`+When you only need 4 bytes to represent a floating point instead of Python's 26 bytes[^6], suddenly you can do more.`
	`44`	`+`
	`45`	`+##XGBoost, Rustified`
	`46`	`+`
	`47`	`+Let's do a quick example to illustrate our point.`
	`48`	`+`
	`49`	`+XGBoost is a popular decision tree algorithm which uses gradient boosting, a fancy optimization technique, to train algorithms on data that could confuse simpler linear models. It comes with a Python interface, which calls into its C++ primitives, but now, it has a Rust interface as well.`
	`50`	`+`
	`51`	`+_Cargo.toml_`
	`52`	+```toml
	`53`	`+[dependencies]`
	`54`	`+xgboost ="0.1"`
	`55`	+```
	`56`	`+`
	`57`	`+_src/main.rs_`
	`58`	+```rust
	`59`	`+usexgboost::{parameters,Booster,DMatrix};`
	`60`	`+`
	`61`	`+fnmain() {`
	`62`	`+// Data is read directly into the C++ data structure.`
	`63`	`+lettrain=DMatrix::load("train.txt").unwrap();`
	`64`	`+lettest=DMatrix::load("test.txt").unwrap();`
	`65`	`+`
	`66`	`+// Task (regression or classification)`
	`67`	`+letlearning_params=parameters::learning::LearningTaskParametersBuilder::default()`
	`68`	`+.objective(parameters::learning::Objective::BinaryLogistic)`
	`69`	`+.build()`
	`70`	`+.unwrap();`
	`71`	`+`
	`72`	`+// Tree parameters (e.g. depth)`
	`73`	`+lettree_params=parameters::tree::TreeBoosterParametersBuilder::default()`
	`74`	`+.max_depth(2)`
	`75`	`+.eta(1.0)`
	`76`	`+.build()`
	`77`	`+.unwrap();`
	`78`	`+`
	`79`	`+// Gradient boosting parameters`
	`80`	`+letbooster_params=parameters::BoosterParametersBuilder::default()`
	`81`	`+.booster_type(parameters::BoosterType::Tree(tree_params))`
	`82`	`+.learning_params(learning_params)`
	`83`	`+.build()`
	`84`	`+.unwrap();`
	`85`	`+`
	`86`	`+// Train on train data, test accuracy on test data`
	`87`	`+letevaluation_sets=&[(&train,"train"), (&test,"test")];`
	`88`	`+`
	`89`	`+// Final algorithm configuration`
	`90`	`+letparams=parameters::TrainingParametersBuilder::default()`
	`91`	`+.dtrain(&train)`
	`92`	`+.boost_rounds(2)// n_estimators`
	`93`	`+.booster_params(booster_params)`
	`94`	`+.evaluation_sets(Some(evaluation_sets))`
	`95`	`+.build()`
	`96`	`+.unwrap();`
	`97`	`+`
	`98`	`+// Train!`
	`99`	`+letmodel=Booster::train(&params).unwrap();`
	`100`	`+`
	`101`	`+// Save and load later in any language that has XGBoost bindings.`
	`102`	`+model.save("/tmp/xbgoost_model.bin").unwrap();`
	`103`	`+}`
	`104`	+```
	`105`	`+`
	`106`	+<small>Example created from`rust-xgboost`[^7] documentation and my own experiments.</small>
	`107`	`+`
	`108`	`+That's it! You just trained an XGBoost model in Rust, in just a few lines of efficient and ergonomic code.`
	`109`	`+`
	`110`	`+Unlike Python, Rust compiles and verifies your code, so you'll know that it's likely to work before you even run it. When it can take several hours to train a model, it's great to know that you don't have a syntax error on your last line.`
	`111`	`+`
	`112`	`+`
	`113`	`+[^1]:[SmartCore](https://smartcorelib.org/)`
	`114`	`+[^2]:[XGBoost bindings](https://github.com/davechallis/rust-xgboost)`
	`115`	`+[^3]:[Torch bindings](https://github.com/LaurentMazare/tch-rs)`
	`116`	`+[^4]:[Tensorflow bindings](https://github.com/tensorflow/rust)`
	`117`	`+[^5]:[rust-ndarray](https://github.com/rust-ndarray/ndarray)`
	`118`	`+[^6]:[Python floating points](https://github.com/python/cpython/blob/e42b705188271da108de42b55d9344642170aa2b/Include/floatobject.h#L15)`
	`119`	+[^7]:[`rust-xgboost`](https://docs.rs/xgboost/latest/xgboost/)
	`120`	`+`

`‎pgml-docs/mkdocs.yml‎`

Lines changed: 1 addition & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -144,6 +144,7 @@ nav:`
`144`	`144`	`-Data is Living and Relational:blog/data-is-living-and-relational.md`
`145`	`145`	`-Postgres Full Text Search is Awesome:blog/postgres-full-text-search-is-awesome.md`
`146`	`146`	`-Which Database, That is the Question:blog/which-database-that-is-the-question.md`
	`147`	`+ -Oxidizing Machine Learning:blog/oxidizing-machine-learning.md`
`147`	`148`	`-About:`
`148`	`149`	`-Team:about/team.md`
`149`	`150`	`-Roadmap:about/roadmap.md`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit7ae42b1

File tree

2 files changed

2 files changed

`‎pgml-docs/docs/blog/oxidizing-machine-learning.md‎`

`‎pgml-docs/mkdocs.yml‎`

0 commit comments