Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

This browser is no longer supported.

Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.

Download Microsoft EdgeMore info about Internet Explorer and Microsoft Edge
Table of contentsExit focus mode

What is ML.NET and how does it work?

  • 2024-12-19
Feedback

In this article

ML.NET gives you the ability to add machine learning to .NET applications, in either online or offline scenarios. With this capability, you can make automatic predictions using the data available to your application. Machine learning applications make use of patterns in the data to make predictions rather than needing to be explicitly programmed.

Central to ML.NET is a machine learningmodel. The model specifies the steps needed to transform your input data into a prediction. With ML.NET, you can train a custom model by specifying an algorithm, or you can import pretrained TensorFlow and Open Neural Network Exchange (ONNX) models.

Once you have a model, you can add it to your application to make the predictions.

ML.NET runs on Windows, Linux, and macOS using .NET, or on Windows using .NET Framework. 64 bit is supported on all platforms. 32 bit is supported on Windows, except for TensorFlow, LightGBM, and ONNX-related functionality.

The following table shows examples of the type of predictions that you can make with ML.NET.

Prediction typeExample
Classification/CategorizationAutomatically divide customer feedback into positive and negative categories.
Regression/Predict continuous valuesPredict the price of houses based on size and location.
Anomaly detectionDetect fraudulent banking transactions.
RecommendationsSuggest products that online shoppers might want to buy, based on their previous purchases.
Time series/sequential dataForecast the weather or product sales.
Image classificationCategorize pathologies in medical images.
Text classificationCategorize documents based on their content.
Sentence similarityMeasure how similar two sentences are.

Simple ML.NET app

The code in the following snippet demonstrates the simplest ML.NET application. This example constructs a linear regression model to predict house prices using house size and price data.

using Microsoft.ML;using Microsoft.ML.Data;class Program{    public record HouseData    {        public float Size { get; set; }        public float Price { get; set; }    }    public record Prediction    {        [ColumnName("Score")]        public float Price { get; set; }    }    static void Main(string[] args)    {        MLContext mlContext = new();        // 1. Import or create training data.        HouseData[] houseData = [                new() { Size = 1.1F, Price = 1.2F },                new() { Size = 1.9F, Price = 2.3F },                new() { Size = 2.8F, Price = 3.0F },                new() { Size = 3.4F, Price = 3.7F }                ];        IDataView trainingData = mlContext.Data.LoadFromEnumerable(houseData);        // 2. Specify data preparation and model training pipeline.        EstimatorChain<RegressionPredictionTransformer<Microsoft.ML.Trainers.LinearRegressionModelParameters>> pipeline = mlContext.Transforms.Concatenate("Features", ["Size"])            .Append(mlContext.Regression.Trainers.Sdca(labelColumnName: "Price", maximumNumberOfIterations: 100));        // 3. Train model.        TransformerChain<RegressionPredictionTransformer<Microsoft.ML.Trainers.LinearRegressionModelParameters>> model = pipeline.Fit(trainingData);        // 4. Make a prediction.        HouseData size = new() { Size = 2.5F };        Prediction price = mlContext.Model.CreatePredictionEngine<HouseData, Prediction>(model).Predict(size);        Console.WriteLine($"Predicted price for size: {size.Size * 1000} sq ft = {price.Price * 100:C}k");        // Predicted price for size: 2500 sq ft = $261.98k    }}

Code workflow

The following diagram represents the application code structure and the iterative process of model development:

  • Collect and load training data into anIDataView object
  • Specify a pipeline of operations to extract features and apply a machine learning algorithm
  • Train a model by callingFit(IDataView) on the pipeline
  • Evaluate the model and iterate to improve
  • Save the model into binary format, for use in an application
  • Load the model back into anITransformer object
  • Make predictions by callingPredictionEngineBase<TSrc,TDst>.Predict

ML.NET application development flow including components for data generation, pipeline development, model training, model evaluation, and model usage.

Let's dig a little deeper into those concepts.

Machine learning model

An ML.NET model is an object that contains transformations to perform on your input data to arrive at the predicted output.

Basic

The most basic model is two-dimensional linear regression, where one continuous quantity is proportional to another, as in the house price example shown previously.

Linear Regression Model with bias and weight parameters.

The model is simply: $Price = b + Size * w$. The parameters $b$ and $w$ are estimated by fitting a line on a set of (size, price) pairs. The data used to find the parameters of the model is calledtraining data. The inputs of a machine learning model are calledfeatures. In this example, $Size$ is the only feature. The ground-truth values used to train a machine learning model are calledlabels. Here, the $Price$ values in the training data set are the labels.

More complex

A more complex model classifies financial transactions into categories using the transaction text description.

Each transaction description is broken down into a set of features by removing redundant words and characters, and counting word and character combinations. The feature set is used to train a linear model based on the set of categories in the training data. The more similar a new description is to the ones in the training set, the more likely it will be assigned to the same category.

Text Classification Model

Both the house price model and the text classification model arelinear models. Depending on the nature of your data and the problem you're solving, you can also usedecision tree models,generalized additive models, and others. You can find out more about the models inTasks.

Data preparation

In most cases, the data that you have available isn't suitable to be used directly to train a machine learning model. The raw data needs to be prepared, or preprocessed, before it can be used to find the parameters of your model. Your data might need to be converted from string values to a numerical representation. You might have redundant information in your input data. You might need to reduce or expand the dimensions of your input data. Your data might need to be normalized or scaled.

TheML.NET tutorials teach you about different data processing pipelines for text, image, numerical, and time-series data used for specific machine learning tasks.

How to prepare your data shows you how to apply data preparation more generally.

You can find an appendix of all of theavailable transformations in the resources section.

Model evaluation

Once you've trained your model, how do you know how well it will make future predictions? With ML.NET, you can evaluate your model against some new test data.

Each type of machine learning task has metrics used to evaluate the accuracy and precision of the model against the test data set.

The house price example shown earlier used theRegression task. To evaluate the model, add the following code to the original sample.

        HouseData[] testHouseData =        {            new HouseData() { Size = 1.1F, Price = 0.98F },            new HouseData() { Size = 1.9F, Price = 2.1F },            new HouseData() { Size = 2.8F, Price = 2.9F },            new HouseData() { Size = 3.4F, Price = 3.6F }        };        var testHouseDataView = mlContext.Data.LoadFromEnumerable(testHouseData);        var testPriceDataView = model.Transform(testHouseDataView);        var metrics = mlContext.Regression.Evaluate(testPriceDataView, labelColumnName: "Price");        Console.WriteLine($"R^2: {metrics.RSquared:0.##}");        Console.WriteLine($"RMS error: {metrics.RootMeanSquaredError:0.##}");        // R^2: 0.96        // RMS error: 0.19

The evaluation metrics tell you that the error is low-ish, and that correlation between the predicted output and the test output is high. That was easy! In real examples, it takes more tuning to achieve good model metrics.

ML.NET architecture

This section describes the architectural patterns of ML.NET. If you're an experienced .NET developer, some of these patterns will be familiar to you, and some will be less familiar.

An ML.NET application starts with anMLContext object. This singleton object containscatalogs. A catalog is a factory for data loading and saving, transforms, trainers, and model operation components. Each catalog object has methods to create the different types of components.

TaskCatalog
Data loading and savingDataOperationsCatalog
Data preparationTransformsCatalog
Binary classificationBinaryClassificationCatalog
Multiclass classificationMulticlassClassificationCatalog
Anomaly detectionAnomalyDetectionCatalog
ClusteringClusteringCatalog
ForecastingForecastingCatalog
RankingRankingCatalog
RegressionRegressionCatalog
RecommendationRecommendationCatalog
Time seriesTimeSeriesCatalog
Model usageModelOperationsCatalog

You can navigate to the creation methods in each of the listed categories. If you use Visual Studio, the catalogs also show up via IntelliSense.

Intellisense for Regression Trainers

Build the pipeline

Inside each catalog is a set of extension methods that you can use to create a training pipeline.

var pipeline = mlContext.Transforms.Concatenate("Features", new[] { "Size" })    .Append(mlContext.Regression.Trainers.Sdca(labelColumnName: "Price", maximumNumberOfIterations: 100));

In the snippet,Concatenate andSdca are both methods in the catalog. They each create anIEstimator object that's appended to the pipeline.

At this point, the objects have been created, but no execution has happened.

Train the model

Once the objects in the pipeline have been created, data can be used to train the model.

var model = pipeline.Fit(trainingData);

CallingFit() uses the input training data to estimate the parameters of the model. This is known as training the model. Remember, the linear regression model shown earlier had two model parameters:bias andweight. After theFit() call, the values of the parameters are known. (Most models will have many more parameters than this.)

You can learn more about model training inHow to train your model.

The resulting model object implements theITransformer interface. That is, the model transforms input data into predictions.

IDataView predictions = model.Transform(inputData);

Use the model

You can transform input data into predictions in bulk, or one input at a time. The house price example did both: in bulk to evaluate the model, and one at a time to make a new prediction. Let's look at making single predictions.

var size = new HouseData() { Size = 2.5F };var predEngine = mlContext.CreatePredictionEngine<HouseData, Prediction>(model);var price = predEngine.Predict(size);

TheCreatePredictionEngine() method takes an input class and an output class. The field names or code attributes determine the names of the data columns used during model training and prediction. For more information, seeMake predictions with a trained model.

Data models and schema

At the core of an ML.NET machine learning pipeline areDataView objects.

Each transformation in the pipeline has an input schema (data names, types, and sizes that the transform expects to see on its input); and an output schema (data names, types, and sizes that the transform produces after the transformation).

If the output schema from one transform in the pipeline doesn't match the input schema of the next transform, ML.NET will throw an exception.

A data view object has columns and rows. Each column has a name and a type and a length. For example, the input columns in the house price example areSize andPrice. They are both typeSingle and they're scalar quantities rather than vector ones.

ML.NET Data View example with house price prediction data

All ML.NET algorithms look for an input column that's a vector. By default, this vector column is calledFeatures. That's why the house price example concatenated theSize column into a new column calledFeatures.

var pipeline = mlContext.Transforms.Concatenate("Features", new[] { "Size" })

All algorithms also create new columns after they've performed a prediction. The fixed names of these new columns depend on the type of machine learning algorithm. For the regression task, one of the new columns is calledScore as shown in the price data attribute.

public class Prediction{    [ColumnName("Score")]    public float Price { get; set; }}

You can find out more about output columns of different machine learning tasks in theMachine Learning Tasks guide.

An important property of DataView objects is that they're evaluatedlazily. Data views are only loaded and operated on during model training and evaluation, and data prediction. While you're writing and testing your ML.NET application, you can use the Visual Studio debugger to take a peek at any data view object by calling thePreview method.

var debug = testPriceDataView.Preview();

You can watch thedebug variable in the debugger and examine its contents.

Note

Don't use thePreview(IDataView, Int32) method in production code, as it significantly degrades performance.

Model deployment

In real-life applications, your model training and evaluation code will be separate from your prediction. In fact, these two activities are often performed by separate teams. Your model development team can save the model for use in the prediction application.

mlContext.Model.Save(model, trainingData.Schema, "model.zip");

Next steps

Collaborate with us on GitHub
The source for this content can be found on GitHub, where you can also create and review issues and pull requests. For more information, seeour contributor guide.

Feedback

Was this page helpful?

YesNo

In this article

Was this page helpful?

YesNo