This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Note
Access to this page requires authorization. You can trysigning in orchanging directories.
Access to this page requires authorization. You can trychanging directories.
Learn how to build an anomaly detection application for product sales data. This tutorial creates a .NET console application using C# in Visual Studio.
In this tutorial, you learn how to:
You can find the source code for this tutorial at thedotnet/samples repository.
Visual Studio 2022 with the ".NET Desktop Development" workload installed.
Note
The data format inproduct-sales.csv
is based on the dataset “Shampoo Sales Over a Three Year Period” originally sourced from DataMarket and provided by Time Series Data Library (TSDL), created by Rob Hyndman.“Shampoo Sales Over a Three Year Period” Dataset Licensed Under the DataMarket Default Open License.
Create a C#Console Application called "ProductSalesAnomalyDetection". Click theNext button.
Choose .NET 8 as the framework to use. Click theCreate button.
Create a directory namedData in your project to save your data set files.
Install theMicrosoft.ML NuGet Package:
Note
This sample uses the latest stable version of the NuGet packages mentioned unless otherwise stated.
In Solution Explorer, right-click on your project and selectManage NuGet Packages. Choose "nuget.org" as the Package source, select the Browse tab, search forMicrosoft.ML and selectInstall. Select theOK button on thePreview Changes dialog and then select theI Accept button on theLicense Acceptance dialog if you agree with the license terms for the packages listed. Repeat these steps forMicrosoft.ML.TimeSeries.
Add the followingusing
directives at the top of yourProgram.cs file:
using Microsoft.ML;using ProductSalesAnomalyDetection;
Download the dataset and save it to theData folder you previously created:
Right click onproduct-sales.csv and select "Save Link (or Target) As..."
Make sure you either save the *.csv file to theData folder, or after you save it elsewhere, move the *.csv file to theData folder.
In Solution Explorer, right-click the *.csv file and selectProperties. UnderAdvanced, change the value ofCopy to Output Directory toCopy if newer.
The following table is a data preview from your *.csv file:
Month | ProductSales |
---|---|
1-Jan | 271 |
2-Jan | 150.9 |
..... | ..... |
1-Feb | 199.3 |
..... | ..... |
Next, define your input and prediction class data structures.
Add a new class to your project:
InSolution Explorer, right-click the project, and then selectAdd > New Item.
In theAdd New Item dialog box, selectClass and change theName field toProductSalesData.cs. Then, selectAdd.
TheProductSalesData.cs file opens in the code editor.
Add the followingusing
directive to the top ofProductSalesData.cs:
using Microsoft.ML.Data;
Remove the existing class definition and add the following code, which has two classesProductSalesData
andProductSalesPrediction
, to theProductSalesData.cs file:
public class ProductSalesData{ [LoadColumn(0)] public string? Month; [LoadColumn(1)] public float numSales;}public class ProductSalesPrediction{ //vector to hold alert,score,p-value values [VectorType(3)] public double[]? Prediction { get; set; }}
ProductSalesData
specifies an input data class. TheLoadColumn attribute specifies which columns (by column index) in the dataset should be loaded.
ProductSalesPrediction
specifies the prediction data class. For anomaly detection, the prediction consists of an alert to indicate whether there is an anomaly, a raw score, and p-value. The closer the p-value is to 0, the more likely an anomaly has occurred.
Create two global fields to hold the recently downloaded dataset file path and the saved model file path:
_dataPath
has the path to the dataset used to train the model._docsize
has the number of records in dataset file. You'll use_docSize
to calculatepvalueHistoryLength
.Add the following code to the line right below theusing
directives to specify those paths:
string _dataPath = Path.Combine(Environment.CurrentDirectory, "Data", "product-sales.csv");//assign the Number of records in dataset file to constant variableconst int _docsize = 36;
Replace theConsole.WriteLine("Hello World!")
line with the following code to declare and initialize themlContext
variable:
MLContext mlContext = new MLContext();
TheMLContext class is a starting point for all ML.NET operations, and initializingmlContext
creates a new ML.NET environment that can be shared across the model creation workflow objects. It's similar, conceptually, toDBContext
in Entity Framework.
Data in ML.NET is represented as anIDataView interface.IDataView
is a flexible, efficient way of describing tabular data (numeric and text). Data can be loaded from a text file or from other sources (for example, SQL database or log files) to anIDataView
object.
Add the following code after creating themlContext
variable:
IDataView dataView = mlContext.Data.LoadFromTextFile<ProductSalesData>(path: _dataPath, hasHeader: true, separatorChar: ',');
TheLoadFromTextFile() defines the data schema and reads in the file. It takes in the data path variables and returns anIDataView
.
Anomaly detection flags unexpected or unusual events or behaviors. It gives clues where to look for problems and helps you answer the question "Is this weird?".
Anomaly detection is the process of detecting time-series data outliers; points on a given input time-series where the behavior isn't what was expected, or "weird".
Anomaly detection can be useful in lots of ways. For instance:
If you have a car, you might want to know: Is this oil gauge reading normal, or do I have a leak?If you're monitoring power consumption, you’d want to know: Is there an outage?
There are two types of time series anomalies that can be detected:
Spikes indicate temporary bursts of anomalous behavior in the system.
Change points indicate the beginning of persistent changes over time in the system.
In ML.NET, The IID Spike Detection or IID Change point Detection algorithms are suited forindependent and identically distributed datasets. They assume that your input data is a sequence of data points that are independently sampled fromone stationary distribution.
Unlike the models in the other tutorials, the time series anomaly detector transforms operate directly on input data. TheIEstimator.Fit()
method does not need training data to produce the transform. It does need the data schema though, which is provided by a data view generated from an empty list ofProductSalesData
.
You'll analyze the same product sales data to detect spikes and change points. The building and training model process is the same for spike detection and change point detection; the main difference is the specific detection algorithm used.
The goal of spike detection is to identify sudden yet temporary bursts that significantly differ from the majority of the time series data values. It's important to detect these suspicious rare items, events, or observations in a timely manner to be minimized. The following approach can be used to detect a variety of anomalies such as: outages, cyber-attacks, or viral web content. The following image is an example of spikes in a time series dataset:
Add the following method toProgram.cs
:
IDataView CreateEmptyDataView(MLContext mlContext) { // Create empty DataView. We just need the schema to call Fit() for the time series transforms IEnumerable<ProductSalesData> enumerableData = new List<ProductSalesData>(); return mlContext.Data.LoadFromEnumerable(enumerableData);}
TheCreateEmptyDataView()
produces an empty data view object with the correct schema to be used as input to theIEstimator.Fit()
method.
TheDetectSpike()
method:
Create theDetectSpike()
method at the bottom of theProgram.cs file using the following code:
DetectSpike(MLContext mlContext, int docSize, IDataView productSales){}
Use theIidSpikeEstimator to train the model for spike detection. Add it to theDetectSpike()
method with the following code:
var iidSpikeEstimator = mlContext.Transforms.DetectIidSpike(outputColumnName: nameof(ProductSalesPrediction.Prediction), inputColumnName: nameof(ProductSalesData.numSales), confidence: 95d, pvalueHistoryLength: docSize / 4);
Create the spike detection transform by adding the following as the next line of code in theDetectSpike()
method:
Tip
Theconfidence
andpvalueHistoryLength
parameters impact how spikes are detected.confidence
determines how sensitive your model is to spikes. The lower the confidence, the more likely the algorithm is to detect "smaller" spikes. ThepvalueHistoryLength
parameter defines the number of data points in a sliding window. The value of this parameter is usually a percentage of the entire dataset. The lower thepvalueHistoryLength
, the faster the model forgets previous large spikes.
ITransformer iidSpikeTransform = iidSpikeEstimator.Fit(CreateEmptyDataView(mlContext));
Add the following line of code to transform theproductSales
data as the next line in theDetectSpike()
method:
IDataView transformedData = iidSpikeTransform.Transform(productSales);
The previous code uses theTransform() method to make predictions for multiple input rows of a dataset.
Convert yourtransformedData
into a strongly typedIEnumerable
for easier display using theCreateEnumerable() method with the following code:
var predictions = mlContext.Data.CreateEnumerable<ProductSalesPrediction>(transformedData, reuseRowObject: false);
Create a display header line using the followingConsole.WriteLine() code:
Console.WriteLine("Alert\tScore\tP-Value");
You'll display the following information in your spike detection results:
Alert
indicates a spike alert for a given data point.Score
is theProductSales
value for a given data point in the dataset.P-Value
The "P" stands for probability. The closer the p-value is to 0, the more likely the data point is an anomaly.Use the following code to iterate through thepredictions
IEnumerable
and display the results:
foreach (var p in predictions){ if (p.Prediction is not null) { var results = $"{p.Prediction[0]}\t{p.Prediction[1]:f2}\t{p.Prediction[2]:F2}"; if (p.Prediction[0] == 1) { results += " <-- Spike detected"; } Console.WriteLine(results); }}Console.WriteLine("");
Add the call to theDetectSpike()
method below the call to theLoadFromTextFile()
method:
DetectSpike(mlContext, _docsize, dataView);
Your results should be similar to the following. During processing, messages are displayed. You may see warnings, or processing messages. Some of the messages have been removed from the following results for clarity.
Detect temporary changes in pattern=============== Training the model ============================== End of training process ===============Alert Score P-Value0 271.00 0.500 150.90 0.000 188.10 0.410 124.30 0.130 185.30 0.470 173.50 0.470 236.80 0.190 229.50 0.270 197.80 0.480 127.90 0.131 341.50 0.00 <-- Spike detected0 190.90 0.480 199.30 0.480 154.50 0.240 215.10 0.420 278.30 0.190 196.40 0.430 292.00 0.170 231.00 0.450 308.60 0.180 294.90 0.191 426.60 0.00 <-- Spike detected0 269.50 0.470 347.30 0.210 344.70 0.270 445.40 0.060 320.90 0.490 444.30 0.120 406.30 0.290 442.40 0.211 580.50 0.00 <-- Spike detected0 412.60 0.451 687.00 0.01 <-- Spike detected0 480.30 0.400 586.30 0.200 651.90 0.14
Change points
are persistent changes in a time series event stream distribution of values, like level changes and trends. These persistent changes last much longer thanspikes
and could indicate catastrophic event(s).Change points
are not usually visible to the naked eye, but can be detected in your data using approaches such as in the following method. The following image is an example of a change point detection:
TheDetectChangepoint()
method executes the following tasks:
Create theDetectChangepoint()
method, just after theDetectSpike()
method declaration, using the following code:
void DetectChangepoint(MLContext mlContext, int docSize, IDataView productSales){}
Create theiidChangePointEstimator in theDetectChangepoint()
method with the following code:
var iidChangePointEstimator = mlContext.Transforms.DetectIidChangePoint(outputColumnName: nameof(ProductSalesPrediction.Prediction), inputColumnName: nameof(ProductSalesData.numSales), confidence: 95d, changeHistoryLength: docSize / 4);
As you did previously, create the transform from the estimator by adding the following line of code in theDetectChangePoint()
method:
Tip
The detection of change points happens with a slight delay as the model needs to make sure the current deviation is a persistent change and not just some random spikes before creating an alert. The amount of this delay is equal to thechangeHistoryLength
parameter. By increasing the value of this parameter, change detection alerts on more persistent changes, but the trade-off would be a longer delay.
var iidChangePointTransform = iidChangePointEstimator.Fit(CreateEmptyDataView(mlContext));
Use theTransform()
method to transform the data by adding the following code toDetectChangePoint()
:
IDataView transformedData = iidChangePointTransform.Transform(productSales);
As you did previously, convert yourtransformedData
into a strongly typedIEnumerable
for easier display using theCreateEnumerable()
method with the following code:
var predictions = mlContext.Data.CreateEnumerable<ProductSalesPrediction>(transformedData, reuseRowObject: false);
Create a display header with the following code as the next line in theDetectChangePoint()
method:
Console.WriteLine("Alert\tScore\tP-Value\tMartingale value");
You'll display the following information in your change point detection results:
Alert
indicates a change point alert for a given data point.Score
is theProductSales
value for a given data point in the dataset.P-Value
The "P" stands for probability. The closer the P-value is to 0, the more likely the data point is an anomaly.Martingale value
is used to identify how "weird" a data point is, based on the sequence of P-values.Iterate through thepredictions
IEnumerable
and display the results with the following code:
foreach (var p in predictions){ if (p.Prediction is not null) { var results = $"{p.Prediction[0]}\t{p.Prediction[1]:f2}\t{p.Prediction[2]:F2}\t{p.Prediction[3]:F2}"; if (p.Prediction[0] == 1) { results += " <-- alert is on, predicted changepoint"; } Console.WriteLine(results); }}Console.WriteLine("");
Add the following call to theDetectChangepoint()
method after the call to theDetectSpike()
method:
DetectChangepoint(mlContext, _docsize, dataView);
Your results should be similar to the following. During processing, messages are displayed. You may see warnings, or processing messages. Some messages have been removed from the following results for clarity.
Detect Persistent changes in pattern=============== Training the model Using Change Point Detection Algorithm============================== End of training process ===============Alert Score P-Value Martingale value0 271.00 0.50 0.000 150.90 0.00 2.330 188.10 0.41 2.800 124.30 0.13 9.160 185.30 0.47 9.770 173.50 0.47 10.410 236.80 0.19 24.460 229.50 0.27 42.381 197.80 0.48 44.23 <-- alert is on, predicted changepoint0 127.90 0.13 145.250 341.50 0.00 0.010 190.90 0.48 0.010 199.30 0.48 0.000 154.50 0.24 0.000 215.10 0.42 0.000 278.30 0.19 0.000 196.40 0.43 0.000 292.00 0.17 0.010 231.00 0.45 0.000 308.60 0.18 0.000 294.90 0.19 0.000 426.60 0.00 0.000 269.50 0.47 0.000 347.30 0.21 0.000 344.70 0.27 0.000 445.40 0.06 0.020 320.90 0.49 0.010 444.30 0.12 0.020 406.30 0.29 0.010 442.40 0.21 0.010 580.50 0.00 0.010 412.60 0.45 0.010 687.00 0.01 0.120 480.30 0.40 0.080 586.30 0.20 0.030 651.90 0.14 0.09
Congratulations! You've now successfully built machine learning models for detecting spikes and change point anomalies in sales data.
You can find the source code for this tutorial at thedotnet/samples repository.
In this tutorial, you learned how to:
Check out the Machine Learning samples GitHub repository to explore a seasonality data anomaly detection sample.
Was this page helpful?
Was this page helpful?