Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

This browser is no longer supported.

Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.

Download Microsoft EdgeMore info about Internet Explorer and Microsoft Edge
Table of contentsExit focus mode

Tutorial: Detect anomalies in product sales with ML.NET

  • 2021-11-29
Feedback

In this article

Learn how to build an anomaly detection application for product sales data. This tutorial creates a .NET console application using C# in Visual Studio.

In this tutorial, you learn how to:

  • Load the data
  • Create a transform for spike anomaly detection
  • Detect spike anomalies with the transform
  • Create a transform for change point anomaly detection
  • Detect change point anomalies with the transform

You can find the source code for this tutorial at thedotnet/samples repository.

Prerequisites

Note

The data format inproduct-sales.csv is based on the dataset “Shampoo Sales Over a Three Year Period” originally sourced from DataMarket and provided by Time Series Data Library (TSDL), created by Rob Hyndman.“Shampoo Sales Over a Three Year Period” Dataset Licensed Under the DataMarket Default Open License.

Create a console application

  1. Create a C#Console Application called "ProductSalesAnomalyDetection". Click theNext button.

  2. Choose .NET 8 as the framework to use. Click theCreate button.

  3. Create a directory namedData in your project to save your data set files.

  4. Install theMicrosoft.ML NuGet Package:

    Note

    This sample uses the latest stable version of the NuGet packages mentioned unless otherwise stated.

    In Solution Explorer, right-click on your project and selectManage NuGet Packages. Choose "nuget.org" as the Package source, select the Browse tab, search forMicrosoft.ML and selectInstall. Select theOK button on thePreview Changes dialog and then select theI Accept button on theLicense Acceptance dialog if you agree with the license terms for the packages listed. Repeat these steps forMicrosoft.ML.TimeSeries.

  5. Add the followingusing directives at the top of yourProgram.cs file:

    using Microsoft.ML;using ProductSalesAnomalyDetection;

Download your data

  1. Download the dataset and save it to theData folder you previously created:

    • Right click onproduct-sales.csv and select "Save Link (or Target) As..."

      Make sure you either save the *.csv file to theData folder, or after you save it elsewhere, move the *.csv file to theData folder.

  2. In Solution Explorer, right-click the *.csv file and selectProperties. UnderAdvanced, change the value ofCopy to Output Directory toCopy if newer.

The following table is a data preview from your *.csv file:

MonthProductSales
1-Jan271
2-Jan150.9
..........
1-Feb199.3
..........

Create classes and define paths

Next, define your input and prediction class data structures.

Add a new class to your project:

  1. InSolution Explorer, right-click the project, and then selectAdd > New Item.

  2. In theAdd New Item dialog box, selectClass and change theName field toProductSalesData.cs. Then, selectAdd.

    TheProductSalesData.cs file opens in the code editor.

  3. Add the followingusing directive to the top ofProductSalesData.cs:

    using Microsoft.ML.Data;
  4. Remove the existing class definition and add the following code, which has two classesProductSalesData andProductSalesPrediction, to theProductSalesData.cs file:

    public class ProductSalesData{    [LoadColumn(0)]    public string? Month;    [LoadColumn(1)]    public float numSales;}public class ProductSalesPrediction{    //vector to hold alert,score,p-value values    [VectorType(3)]    public double[]? Prediction { get; set; }}

    ProductSalesData specifies an input data class. TheLoadColumn attribute specifies which columns (by column index) in the dataset should be loaded.

    ProductSalesPrediction specifies the prediction data class. For anomaly detection, the prediction consists of an alert to indicate whether there is an anomaly, a raw score, and p-value. The closer the p-value is to 0, the more likely an anomaly has occurred.

  5. Create two global fields to hold the recently downloaded dataset file path and the saved model file path:

    • _dataPath has the path to the dataset used to train the model.
    • _docsize has the number of records in dataset file. You'll use_docSize to calculatepvalueHistoryLength.
  6. Add the following code to the line right below theusing directives to specify those paths:

    string _dataPath = Path.Combine(Environment.CurrentDirectory, "Data", "product-sales.csv");//assign the Number of records in dataset file to constant variableconst int _docsize = 36;

Initialize variables

  1. Replace theConsole.WriteLine("Hello World!") line with the following code to declare and initialize themlContext variable:

    MLContext mlContext = new MLContext();

    TheMLContext class is a starting point for all ML.NET operations, and initializingmlContext creates a new ML.NET environment that can be shared across the model creation workflow objects. It's similar, conceptually, toDBContext in Entity Framework.

Load the data

Data in ML.NET is represented as anIDataView interface.IDataView is a flexible, efficient way of describing tabular data (numeric and text). Data can be loaded from a text file or from other sources (for example, SQL database or log files) to anIDataView object.

  1. Add the following code after creating themlContext variable:

    IDataView dataView = mlContext.Data.LoadFromTextFile<ProductSalesData>(path: _dataPath, hasHeader: true, separatorChar: ',');

    TheLoadFromTextFile() defines the data schema and reads in the file. It takes in the data path variables and returns anIDataView.

Time series anomaly detection

Anomaly detection flags unexpected or unusual events or behaviors. It gives clues where to look for problems and helps you answer the question "Is this weird?".

Example of the "Is this weird" anomaly detection.

Anomaly detection is the process of detecting time-series data outliers; points on a given input time-series where the behavior isn't what was expected, or "weird".

Anomaly detection can be useful in lots of ways. For instance:

If you have a car, you might want to know: Is this oil gauge reading normal, or do I have a leak?If you're monitoring power consumption, you’d want to know: Is there an outage?

There are two types of time series anomalies that can be detected:

  • Spikes indicate temporary bursts of anomalous behavior in the system.

  • Change points indicate the beginning of persistent changes over time in the system.

In ML.NET, The IID Spike Detection or IID Change point Detection algorithms are suited forindependent and identically distributed datasets. They assume that your input data is a sequence of data points that are independently sampled fromone stationary distribution.

Unlike the models in the other tutorials, the time series anomaly detector transforms operate directly on input data. TheIEstimator.Fit() method does not need training data to produce the transform. It does need the data schema though, which is provided by a data view generated from an empty list ofProductSalesData.

You'll analyze the same product sales data to detect spikes and change points. The building and training model process is the same for spike detection and change point detection; the main difference is the specific detection algorithm used.

Spike detection

The goal of spike detection is to identify sudden yet temporary bursts that significantly differ from the majority of the time series data values. It's important to detect these suspicious rare items, events, or observations in a timely manner to be minimized. The following approach can be used to detect a variety of anomalies such as: outages, cyber-attacks, or viral web content. The following image is an example of spikes in a time series dataset:

Screenshot that shows two spike detections.

Add the CreateEmptyDataView() method

Add the following method toProgram.cs:

IDataView CreateEmptyDataView(MLContext mlContext) {    // Create empty DataView. We just need the schema to call Fit() for the time series transforms    IEnumerable<ProductSalesData> enumerableData = new List<ProductSalesData>();    return mlContext.Data.LoadFromEnumerable(enumerableData);}

TheCreateEmptyDataView() produces an empty data view object with the correct schema to be used as input to theIEstimator.Fit() method.

Create the DetectSpike() method

TheDetectSpike() method:

  • Creates the transform from the estimator.
  • Detects spikes based on historical sales data.
  • Displays the results.
  1. Create theDetectSpike() method at the bottom of theProgram.cs file using the following code:

    DetectSpike(MLContext mlContext, int docSize, IDataView productSales){}
  2. Use theIidSpikeEstimator to train the model for spike detection. Add it to theDetectSpike() method with the following code:

    var iidSpikeEstimator = mlContext.Transforms.DetectIidSpike(outputColumnName: nameof(ProductSalesPrediction.Prediction), inputColumnName: nameof(ProductSalesData.numSales), confidence: 95d, pvalueHistoryLength: docSize / 4);
  3. Create the spike detection transform by adding the following as the next line of code in theDetectSpike() method:

    Tip

    Theconfidence andpvalueHistoryLength parameters impact how spikes are detected.confidence determines how sensitive your model is to spikes. The lower the confidence, the more likely the algorithm is to detect "smaller" spikes. ThepvalueHistoryLength parameter defines the number of data points in a sliding window. The value of this parameter is usually a percentage of the entire dataset. The lower thepvalueHistoryLength, the faster the model forgets previous large spikes.

    ITransformer iidSpikeTransform = iidSpikeEstimator.Fit(CreateEmptyDataView(mlContext));
  4. Add the following line of code to transform theproductSales data as the next line in theDetectSpike() method:

    IDataView transformedData = iidSpikeTransform.Transform(productSales);

    The previous code uses theTransform() method to make predictions for multiple input rows of a dataset.

  5. Convert yourtransformedData into a strongly typedIEnumerable for easier display using theCreateEnumerable() method with the following code:

    var predictions = mlContext.Data.CreateEnumerable<ProductSalesPrediction>(transformedData, reuseRowObject: false);
  6. Create a display header line using the followingConsole.WriteLine() code:

    Console.WriteLine("Alert\tScore\tP-Value");

    You'll display the following information in your spike detection results:

    • Alert indicates a spike alert for a given data point.
    • Score is theProductSales value for a given data point in the dataset.
    • P-Value The "P" stands for probability. The closer the p-value is to 0, the more likely the data point is an anomaly.
  7. Use the following code to iterate through thepredictionsIEnumerable and display the results:

    foreach (var p in predictions){    if (p.Prediction is not null)    {        var results = $"{p.Prediction[0]}\t{p.Prediction[1]:f2}\t{p.Prediction[2]:F2}";        if (p.Prediction[0] == 1)        {            results += " <-- Spike detected";        }        Console.WriteLine(results);    }}Console.WriteLine("");
  8. Add the call to theDetectSpike() method below the call to theLoadFromTextFile() method:

    DetectSpike(mlContext, _docsize, dataView);

Spike detection results

Your results should be similar to the following. During processing, messages are displayed. You may see warnings, or processing messages. Some of the messages have been removed from the following results for clarity.

Detect temporary changes in pattern=============== Training the model ============================== End of training process ===============Alert   Score   P-Value0       271.00  0.500       150.90  0.000       188.10  0.410       124.30  0.130       185.30  0.470       173.50  0.470       236.80  0.190       229.50  0.270       197.80  0.480       127.90  0.131       341.50  0.00 <-- Spike detected0       190.90  0.480       199.30  0.480       154.50  0.240       215.10  0.420       278.30  0.190       196.40  0.430       292.00  0.170       231.00  0.450       308.60  0.180       294.90  0.191       426.60  0.00 <-- Spike detected0       269.50  0.470       347.30  0.210       344.70  0.270       445.40  0.060       320.90  0.490       444.30  0.120       406.30  0.290       442.40  0.211       580.50  0.00 <-- Spike detected0       412.60  0.451       687.00  0.01 <-- Spike detected0       480.30  0.400       586.30  0.200       651.90  0.14

Change point detection

Change points are persistent changes in a time series event stream distribution of values, like level changes and trends. These persistent changes last much longer thanspikes and could indicate catastrophic event(s).Change points are not usually visible to the naked eye, but can be detected in your data using approaches such as in the following method. The following image is an example of a change point detection:

Screenshot that shows a change point detection.

Create the DetectChangepoint() method

TheDetectChangepoint() method executes the following tasks:

  • Creates the transform from the estimator.
  • Detects change points based on historical sales data.
  • Displays the results.
  1. Create theDetectChangepoint() method, just after theDetectSpike() method declaration, using the following code:

    void DetectChangepoint(MLContext mlContext, int docSize, IDataView productSales){}
  2. Create theiidChangePointEstimator in theDetectChangepoint() method with the following code:

    var iidChangePointEstimator = mlContext.Transforms.DetectIidChangePoint(outputColumnName: nameof(ProductSalesPrediction.Prediction), inputColumnName: nameof(ProductSalesData.numSales), confidence: 95d, changeHistoryLength: docSize / 4);
  3. As you did previously, create the transform from the estimator by adding the following line of code in theDetectChangePoint() method:

    Tip

    The detection of change points happens with a slight delay as the model needs to make sure the current deviation is a persistent change and not just some random spikes before creating an alert. The amount of this delay is equal to thechangeHistoryLength parameter. By increasing the value of this parameter, change detection alerts on more persistent changes, but the trade-off would be a longer delay.

    var iidChangePointTransform = iidChangePointEstimator.Fit(CreateEmptyDataView(mlContext));
  4. Use theTransform() method to transform the data by adding the following code toDetectChangePoint():

    IDataView transformedData = iidChangePointTransform.Transform(productSales);
  5. As you did previously, convert yourtransformedData into a strongly typedIEnumerable for easier display using theCreateEnumerable()method with the following code:

    var predictions = mlContext.Data.CreateEnumerable<ProductSalesPrediction>(transformedData, reuseRowObject: false);
  6. Create a display header with the following code as the next line in theDetectChangePoint() method:

    Console.WriteLine("Alert\tScore\tP-Value\tMartingale value");

    You'll display the following information in your change point detection results:

    • Alert indicates a change point alert for a given data point.
    • Score is theProductSales value for a given data point in the dataset.
    • P-Value The "P" stands for probability. The closer the P-value is to 0, the more likely the data point is an anomaly.
    • Martingale value is used to identify how "weird" a data point is, based on the sequence of P-values.
  7. Iterate through thepredictionsIEnumerable and display the results with the following code:

    foreach (var p in predictions){    if (p.Prediction is not null)    {        var results = $"{p.Prediction[0]}\t{p.Prediction[1]:f2}\t{p.Prediction[2]:F2}\t{p.Prediction[3]:F2}";        if (p.Prediction[0] == 1)        {            results += " <-- alert is on, predicted changepoint";        }        Console.WriteLine(results);    }}Console.WriteLine("");
  8. Add the following call to theDetectChangepoint()method after the call to theDetectSpike() method:

    DetectChangepoint(mlContext, _docsize, dataView);

Change point detection results

Your results should be similar to the following. During processing, messages are displayed. You may see warnings, or processing messages. Some messages have been removed from the following results for clarity.

Detect Persistent changes in pattern=============== Training the model Using Change Point Detection Algorithm============================== End of training process ===============Alert   Score   P-Value Martingale value0       271.00  0.50    0.000       150.90  0.00    2.330       188.10  0.41    2.800       124.30  0.13    9.160       185.30  0.47    9.770       173.50  0.47    10.410       236.80  0.19    24.460       229.50  0.27    42.381       197.80  0.48    44.23 <-- alert is on, predicted changepoint0       127.90  0.13    145.250       341.50  0.00    0.010       190.90  0.48    0.010       199.30  0.48    0.000       154.50  0.24    0.000       215.10  0.42    0.000       278.30  0.19    0.000       196.40  0.43    0.000       292.00  0.17    0.010       231.00  0.45    0.000       308.60  0.18    0.000       294.90  0.19    0.000       426.60  0.00    0.000       269.50  0.47    0.000       347.30  0.21    0.000       344.70  0.27    0.000       445.40  0.06    0.020       320.90  0.49    0.010       444.30  0.12    0.020       406.30  0.29    0.010       442.40  0.21    0.010       580.50  0.00    0.010       412.60  0.45    0.010       687.00  0.01    0.120       480.30  0.40    0.080       586.30  0.20    0.030       651.90  0.14    0.09

Congratulations! You've now successfully built machine learning models for detecting spikes and change point anomalies in sales data.

You can find the source code for this tutorial at thedotnet/samples repository.

In this tutorial, you learned how to:

  • Load the data
  • Train the model for spike anomaly detection
  • Detect spike anomalies with the trained model
  • Train the model for change point anomaly detection
  • Detect change point anomalies with the trained mode

Next steps

Check out the Machine Learning samples GitHub repository to explore a seasonality data anomaly detection sample.

Collaborate with us on GitHub
The source for this content can be found on GitHub, where you can also create and review issues and pull requests. For more information, seeour contributor guide.

Feedback

Was this page helpful?

YesNo

In this article

Was this page helpful?

YesNo