This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Note
Access to this page requires authorization. You can trysigning in orchanging directories.
Access to this page requires authorization. You can trychanging directories.
For eachML.NET task, there are multiple training algorithms to choose from. Which one to choose depends on the problem you are trying to solve, the characteristics of your data, and the compute and storage resources you have available. It is important to note that training a machine learning model is an iterative process. You might need to try multiple algorithms to find the one that works best.
Algorithms operate onfeatures. Features are numerical values computed from your input data. They are optimal inputs for machine learning algorithms. You transform your raw input data into features using one or moredata transforms. For example, text data is transformed into a set of word counts and word combination counts. Once the features have been extracted from a raw data type using data transforms, they are referred to asfeaturized. For example, featurized text, or featurized image data.
An algorithm is the math that executes to produce amodel. Different algorithms produce models with different characteristics.
With ML.NET, the same algorithm can be applied to different tasks. For example, Stochastic Dual Coordinate Ascent can be used for Binary Classification, Multiclass Classification, and Regression. The difference is in how the output of the algorithm is interpreted to match the task.
For each algorithm/task combination, ML.NET provides a component that executes the training algorithm and makes the interpretation. These components are called trainers. For example, theSdcaRegressionTrainer uses theStochasticDualCoordinatedAscent algorithm applied to theRegression task.
Linear algorithms produce a model that calculatesscores from a linear combination of the input data and a set ofweights. The weights are parameters of the model estimated during training.
Linear algorithms work well for features that arelinearly separable.
Before training with a linear algorithm, the features should be normalized. This prevents one feature from having more influence over the result than others.
In general, linear algorithms are scalable, fast, cheap to train, and cheap to predict. They scale by the number of features and approximately by the size of the training data set.
Linear algorithms make multiple passes over the training data. If your dataset fits into memory, then adding acache checkpoint to your ML.NET pipeline before appending the trainer will make the training run faster.
Best for text classification.
Trainer | Task | ONNX Exportable |
---|---|---|
AveragedPerceptronTrainer | Binary classification | Yes |
Tuning not needed for good default performance.
Trainer | Task | ONNX Exportable |
---|---|---|
SdcaLogisticRegressionBinaryTrainer | Binary classification | Yes |
SdcaNonCalibratedBinaryTrainer | Binary classification | Yes |
SdcaMaximumEntropyMulticlassTrainer | Multiclass classification | Yes |
SdcaNonCalibratedMulticlassTrainer | Multiclass classification | Yes |
SdcaRegressionTrainer | Regression | Yes |
Use when number of features is large. Produces logistic regression training statistics, but doesn't scale as well as the AveragedPerceptronTrainer.
Trainer | Task | ONNX Exportable |
---|---|---|
LbfgsLogisticRegressionBinaryTrainer | Binary classification | Yes |
LbfgsMaximumEntropyMulticlassTrainer | Multiclass classification | Yes |
LbfgsPoissonRegressionTrainer | Regression | Yes |
Fastest and most accurate linear binary classification trainer. Scales well with number of processors.
Trainer | Task | ONNX Exportable |
---|---|---|
SymbolicSgdLogisticRegressionBinaryTrainer | Binary classification | Yes |
Implements the standard (non-batch) stochastic gradient descent, with a choice of loss functions, and an option to update the weight vector using the average of the vectors seen over time.
Trainer | Task | ONNX Exportable |
---|---|---|
OnlineGradientDescentTrainer | Regression | Yes |
Decision tree algorithms create a model that contains a series of decisions: effectively a flow chart through the data values.
Features do not need to be linearly separable to use this type of algorithm. And features do not need to be normalized, because the individual values in the feature vector are used independently in the decision process.
Decision tree algorithms are generally very accurate.
Except for Generalized Additive Models (GAMs), tree models can lack explainability when the number of features is large.
Decision tree algorithms take more resources and do not scale as well as linear ones do. They do perform well on datasets that can fit into memory.
Boosted decision trees are an ensemble of small trees where each tree scores the input data and passes the score onto the next tree to produce a better score, and so on, where each tree in the ensemble improves on the previous.
Fastest and most accurate of the binary classification tree trainers. Highly tunable.
Trainer | Task | ONNX Exportable |
---|---|---|
LightGbmBinaryTrainer | Binary classification | Yes |
LightGbmMulticlassTrainer | Multiclass classification | Yes |
LightGbmRegressionTrainer | Regression | Yes |
LightGbmRankingTrainer | Ranking | No |
Use for featurized image data. Resilient to unbalanced data. Highly tunable.
Trainer | Task | ONNX Exportable |
---|---|---|
FastTreeBinaryTrainer | Binary classification | Yes |
FastTreeRegressionTrainer | Regression | Yes |
FastTreeTweedieTrainer | Regression | Yes |
FastTreeRankingTrainer | Ranking | No |
Works well with noisy data.
Trainer | Task | ONNX Exportable |
---|---|---|
FastForestBinaryTrainer | Binary classification | Yes |
FastForestRegressionTrainer | Regression | Yes |
Best for problems that perform well with tree algorithms but where explainability is a priority.
Trainer | Task | ONNX Exportable |
---|---|---|
GamBinaryTrainer | Binary classification | No |
GamRegressionTrainer | Regression | No |
Used forcollaborative filtering in recommendation.
Trainer | Task | ONNX Exportable |
---|---|---|
MatrixFactorizationTrainer | Recommendation | No |
Best for sparse categorical data, with large datasets.
Trainer | Task | ONNX Exportable |
---|---|---|
FieldAwareFactorizationMachineTrainer | Binary classification | No |
These trainers create a multiclass trainer from a binary trainer. Use withAveragedPerceptronTrainer,LbfgsLogisticRegressionBinaryTrainer,SymbolicSgdLogisticRegressionBinaryTrainer,LightGbmBinaryTrainer,FastTreeBinaryTrainer,FastForestBinaryTrainer,GamBinaryTrainer.
This multiclass classifier trains one binary classifier for each class, which distinguishes that class from all other classes. Is limited in scale by the number of classes to categorize.
Trainer | Task | ONNX Exportable |
---|---|---|
OneVersusAllTrainer | Multiclass classification | Yes |
This multiclass classifier trains a binary classification algorithm on each pair of classes. Is limited in scale by the number of classes, as each combination of two classes must be trained.
Trainer | Task | ONNX Exportable |
---|---|---|
PairwiseCouplingTrainer | Multiclass classification | No |
Used for clustering.
Trainer | Task | ONNX Exportable |
---|---|---|
KMeansTrainer | Clustering | Yes |
Used for anomaly detection.
Trainer | Task | ONNX Exportable |
---|---|---|
RandomizedPcaTrainer | Anomaly detection | No |
Use this multi-class classification algorithm when the features are independent, and the training dataset is small.
Trainer | Task | ONNX Exportable |
---|---|---|
NaiveBayesMulticlassTrainer | Multiclass classification | Yes |
Use this binary classification algorithm to baseline the performance of other trainers. To be effective, the metrics of the other trainers should be better than the prior trainer.
Trainer | Task | ONNX Exportable |
---|---|---|
PriorTrainer | Binary classification | Yes |
Support vector machines (SVMs) are an extremely popular and well-researched class of supervised learning models, which can be used in linear and non-linear classification tasks.
Recent research has focused on ways to optimize these models to efficiently scale to larger training sets.
Predicts a target using a linear binary classification model trained over boolean labeled data. Alternates between stochastic gradient descent steps and projection steps.
Trainer | Task | ONNX Exportable |
---|---|---|
LinearSvmTrainer | Binary classification | Yes |
Predicts a target using a non-linear binary classification model. Reduces the prediction time cost; the prediction cost grows logarithmically with the size of the training set, rather than linearly, with a tolerable loss in classification accuracy.
Trainer | Task | ONNX Exportable |
---|---|---|
LdSvmTrainer | Binary classification | Yes |
Ordinary least squares (OLS) is one of the most commonly used techniques in linear regression.
Ordinary least squares refers to the loss function, which computes error as the sum of the square of distance from the actual value to the predicted line, and fits the model by minimizing the squared error. This method assumes a strong linear relationship between the inputs and the dependent variable.
Trainer | Task | ONNX Exportable |
---|---|---|
OlsTrainer | Regression | Yes |
Was this page helpful?
Was this page helpful?