This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Note
Access to this page requires authorization. You can trysigning in orchanging directories.
Access to this page requires authorization. You can trychanging directories.
A machine learningtask is a type of prediction or inference that's based on both:
For example, the classification task assigns data to categories, and the clustering task groups data according to similarity.
Machine learning tasks rely on patterns in the data rather than being explicitly programmed.
This article describes the different machine learning tasks that are available in ML.NET and some common use cases.
Once you've decided which task works for your scenario, then you need to choose the best algorithm to train your model. The available algorithms are listed in the section for each task.
Binary classification is asupervised machine learning task that's used to predict which ofexactly two classes (categories) an instance of data belongs to. The input of a classification algorithm is a set of labeled examples, where each label is an integer of either 0 or 1. The output of a binary classification algorithm is a classifier, which you can use to predict the class of new unlabeled instances. Examples of binary classification scenarios include:
For more information, see theBinary classification article on Wikipedia.
You can train a binary classification model using the following algorithms:
For best results with binary classification, the training data should be balanced (that is, equal numbers of positive and negative training data). Missing values should be handled before training.
The input label column data must beBoolean.The input features column data must be a fixed-size vector ofSingle.
These trainers output the following columns:
Output Column Name | Column Type | Description |
---|---|---|
Score | Single | The raw score that the model calculated. |
PredictedLabel | Boolean | The predicted label, based on the sign of the score. A negative score maps tofalse and a positive score maps totrue . |
Multiclass classification is asupervised machine learning task that's used to classify an instance of data into one ofthree or more classes (categories). The input of a classification algorithm is a set of labeled examples. Each label normally starts as text. It's then run through the TermTransform, which converts it to the Key (numeric) type. The output of a classification algorithm is a classifier, which you can use to predict the class of new unlabeled instances. Examples of multiclass classification scenarios include:
For more information, see theMulticlass classification article on Wikipedia.
Note
One-vs.-rest upgrades anybinary classification learner to act on multiclass datasets.
You can train a multiclass classification model using the following training algorithms:
The input label column data must bekey type.The feature column must be a fixed size vector ofSingle.
This trainer outputs the following:
Output Name | Type | Description |
---|---|---|
Score | Vector ofSingle | The scores of all classes. Higher value means higher probability to fall into the associated class. If thei -th element has the largest value, the predicted label index would bei . Note thati is zero-based index. |
PredictedLabel | key | The predicted label's index. If its value isi , the actual label would be thei -th category in the key-valued input label type. |
Text classification is a subcategory of multiclass classification that deals specifically with raw text. Text poses interesting challenges because you have to account for the context and semantics in which the text occurs. As such, it can be difficult to encode meaning and context.
Deep learning models have emerged as a promising technique to solve natural language problems. More specifically, a type of neural network known as atransformer has become the predominant way of solving natural language problems like text classification, translation, summarization, and question answering. Some popular transformer architectures for natural language tasks include:
The ML.NET text classification API is powered byTorchSharp. TorchSharp is a .NET library that provides access to the library that powers PyTorch. TorchSharp contains the building blocks for training neural networks from scratch in .NET. ML.NET abstracts some of the complexity of TorchSharp to the scenario level. It uses a pretrained version of theNAS-BERT model and fine tunes it with your data.
For a text classification example, seeGet started with the text classification API.
Image classification is asupervised machine learning task that's used to predict the class (category) of an image. The input is a set of labeled examples. Each label normally starts as text. It's then run through the TermTransform, which converts it to the Key (numeric) type. The output of the image classification algorithm is a classifier, which you can use to predict the class of new images. The image classification task is a type of multiclass classification. Examples of image classification scenarios include:
You can train an image classification model using the following training algorithms:
The input label column data must bekey type.The feature column must be a variable-sized vector ofByte.
This trainer outputs the following columns:
Output Name | Type | Description |
---|---|---|
Score | Single | The scores of all classes. Higher value means higher probability to fall into the associated class. If thei -th element has the largest value, the predicted label index would bei . (i is a zero-based index.) |
PredictedLabel | Key type | The predicted label's index. If its value isi , the actual label would be thei -th category in the key-valued input label type. |
Regression is asupervised machine learning task that's used to predict the value of the label from a set of related features. The label can be of any real value and isn't from a finite set of values as in classification tasks. Regression algorithms model the dependency of the label on its related features to determine how the label will change as the values of the features are varied. The input of a regression algorithm is a set of examples with labels of known values. The output of a regression algorithm is a function, which you can use to predict the label value for any new set of input features. Examples of regression scenarios include:
You can train a regression model using the following algorithms:
The input label column data must beSingle.
The trainers for this task output the following:
Output Name | Type | Description |
---|---|---|
Score | Single | The raw score that was predicted by the model |
Clustering is anunsupervised machine learning task that's used to group instances of data into clusters that contain similar characteristics. Clustering can also be used to identify relationships in a dataset that you might not logically derive by browsing or simple observation. The inputs and outputs of a clustering algorithm depend on the methodology chosen. You can take a distribution, centroid, connectivity, or density-based approach. ML.NET currently supports a centroid-based approach using K-Means clustering. Examples of clustering scenarios include:
You can train a clustering model using the following algorithm:
The input features data must beSingle. No labels are needed.
This trainer outputs the following:
Output Name | Type | Description |
---|---|---|
Score | Vector ofSingle | The distances of the given data point to all clusters' centroids. |
PredictedLabel | key type | The closest cluster's index predicted by the model. |
The anomaly detection task creates an anomaly detection model by using principal component analysis (PCA). PCA-based anomaly detection helps you build a model in scenarios where it's easy to obtain training data from one class, such as valid transactions, but difficult to obtain sufficient samples of the targeted anomalies.
An established technique in machine learning, PCA is frequently used in exploratory data analysis because it reveals the inner structure of the data and explains the variance in the data. PCA works by analyzing data that contains multiple variables. It looks for correlations among the variables and determines the combination of values that best captures differences in outcomes. These combined feature values are used to create a more compact feature space called the principal components.
Anomaly detection encompasses many important tasks in machine learning:
Because anomalies are rare events by definition, it can be difficult to collect a representative sample of data to use for modeling. The algorithms included in this category have been especially designed to address the core challenges of building and training models by using imbalanced data sets.
You can train an anomaly detection model using the following algorithm:
The input features must be a fixed-sized vector ofSingle.
This trainer outputs the following:
Output Name | Type | Description |
---|---|---|
Score | Single | The non-negative, unbounded score that was calculated by the anomaly detection model. |
PredictedLabel | Boolean | true if the input is an anomaly orfalse if it isn't. |
A ranking task constructs a ranker from a set of labeled examples. This example set consists of instance groups that can be scored with a given criteria. The ranking labels are { 0, 1, 2, 3, 4 } for each instance. The ranker is trained to rank new instance groups with unknown scores for each instance. ML.NET ranking learners aremachine-learned ranking based.
You can train a ranking model with the following algorithms:
The input label data type must bekeytype orSingle. The value of the label determines relevance, wherehigher values indicate higher relevance. If the label is akey type, then the key index is therelevance value, where the smallest index is the least relevant. If the label is aSingle, larger values indicate higher relevance.
The feature data must be a fixed size vector ofSingle and input row groupcolumn must bekey type.
This trainer outputs the following:
Output Name | Type | Description |
---|---|---|
Score | Single | The unbounded score that was calculated by the model to determine the prediction. |
A recommendation task enables producing a list of recommended products or services. ML.NET usesMatrix factorization (MF), acollaborative filtering algorithm for recommendations when you have historical product rating data in your catalog. For example, you have historical movie rating data for your users and want to recommend other movies they're likely to watch next.
You can train a recommendation model with the following algorithm:
The forecasting task use past time-series data to make predictions about future behavior. Scenarios applicable to forecasting include weather forecasting, seasonal sales predictions, and predictive maintenance.
You can train a forecasting model with the following algorithm:
Object detection is asupervised machine learning task that's used to predict the class (category) of an image but also gives a bounding box to where that category is within the image. Instead of classifying a single object in an image, object detection can detect multiple objects within an image. Examples of object detection include:
Object-detection model training is currently only available inModel Builder using Azure Machine Learning.
Was this page helpful?
Was this page helpful?