This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Note
Access to this page requires authorization. You can trysigning in orchanging directories.
Access to this page requires authorization. You can trychanging directories.
Learn how to use a pretrained ONNX model in ML.NET to detect objects in images.
Training an object detection model from scratch requires setting millions of parameters, a large amount of labeled training data and a vast amount of compute resources (hundreds of GPU hours). Using a pretrained model allows you to shortcut the training process.
In this tutorial, you learn how to:
This sample creates a .NET core console application that detects objects within an image using a pretrained deep learning ONNX model. The code for this sample can be found on thedotnet/machinelearning-samples repository on GitHub.
Object detection is a computer vision problem. While closely related to image classification, object detection performs image classification at a more granular scale. Object detection both locatesand categorizes entities within images. Object detection models are commonly trained using deep learning and neural networks. SeeDeep learning vs machine learning for more information.
Use object detection when images contain multiple objects of different types.
Some use cases for object detection include:
Deep learning is a subset of machine learning. To train deep learning models, large quantities of data are required. Patterns in the data are represented by a series of layers. The relationships in the data are encoded as connections between the layers containing weights. The higher the weight, the stronger the relationship. Collectively, this series of layers and connections are known as artificial neural networks. The more layers in a network, the "deeper" it is, making it a deep neural network.
There are different types of neural networks, the most common being Multi-Layered Perceptron (MLP), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). The most basic is the MLP, which maps a set of inputs to a set of outputs. This neural network is good when the data does not have a spatial or time component. The CNN makes use of convolutional layers to process spatial information contained in the data. A good use case for CNNs is image processing to detect the presence of a feature in a region of an image (for example, is there a nose in the center of an image?). Finally, RNNs allow for the persistence of state or memory to be used as input. RNNs are used for time-series analysis, where the sequential ordering and context of events is important.
Object detection is an image-processing task. Therefore, most deep learning models trained to solve this problem are CNNs. The model used in this tutorial is the Tiny YOLOv2 model, a more compact version of the YOLOv2 model described in the paper:"YOLO9000: Better, Faster, Stronger" by Redmon and Farhadi. Tiny YOLOv2 is trained on the Pascal VOC dataset and is made up of 15 layers that can predict 20 different classes of objects. Because Tiny YOLOv2 is a condensed version of the original YOLOv2 model, a tradeoff is made between speed and accuracy. The different layers that make up the model can be visualized using tools like Netron. Inspecting the model would yield a mapping of the connections between all the layers that make up the neural network, where each layer would contain the name of the layer along with the dimensions of the respective input / output. The data structures used to describe the inputs and outputs of the model are known as tensors. Tensors can be thought of as containers that store data in N-dimensions. In the case of Tiny YOLOv2, the name of the input layer isimage
and it expects a tensor of dimensions3 x 416 x 416
. The name of the output layer isgrid
and generates an output tensor of dimensions125 x 13 x 13
.
The YOLO model takes an image3(RGB) x 416px x 416px
. The model takes this input and passes it through the different layers to produce an output. The output divides the input image into a13 x 13
grid, with each cell in the grid consisting of125
values.
The Open Neural Network Exchange (ONNX) is an open source format for AI models. ONNX supports interoperability between frameworks. This means you can train a model in one of the many popular machine learning frameworks like PyTorch, convert it into ONNX format and consume the ONNX model in a different framework like ML.NET. To learn more, visit theONNX website.
The pretrained Tiny YOLOv2 model is stored in ONNX format, a serialized representation of the layers and learned patterns of those layers. In ML.NET, interoperability with ONNX is achieved with theImageAnalytics
andOnnxTransformer
NuGet packages. TheImageAnalytics
package contains a series of transforms that take an image and encode it into numerical values that can be used as input into a prediction or training pipeline. TheOnnxTransformer
package leverages the ONNX Runtime to load an ONNX model and use it to make predictions based on input provided.
Now that you have a general understanding of what ONNX is and how Tiny YOLOv2 works, it's time to build the application.
Create a C#Console Application called "ObjectDetection". Click theNext button.
Choose .NET 8 as the framework to use. Click theCreate button.
Install theMicrosoft.ML NuGet Package:
Note
This sample uses the latest stable version of the NuGet packages mentioned unless otherwise stated.
DownloadThe project assets directory zip file and unzip.
Copy theassets
directory into yourObjectDetection project directory. This directory and its subdirectories contain the image files (except for the Tiny YOLOv2 model, which you'll download and add in the next step) needed for this tutorial.
Download the Tiny YOLOv2 model from theONNX Model Zoo.
Copy themodel.onnx
file into yourObjectDetection projectassets\Model
directory and rename it toTinyYolo2_model.onnx
. This directory contains the model needed for this tutorial.
In Solution Explorer, right-click each of the files in the asset directory and subdirectories and selectProperties. UnderAdvanced, change the value ofCopy to Output Directory toCopy if newer.
Open theProgram.cs file and add the following additionalusing
directives to the top of the file:
using System.Drawing;using System.Drawing.Drawing2D;using ObjectDetection.YoloParser;using ObjectDetection.DataStructures;using ObjectDetection;using Microsoft.ML;
Next, define the paths of the various assets.
First, create theGetAbsolutePath
method at the bottom of theProgram.cs file.
string GetAbsolutePath(string relativePath){ FileInfo _dataRoot = new FileInfo(typeof(Program).Assembly.Location); string assemblyFolderPath = _dataRoot.Directory.FullName; string fullPath = Path.Combine(assemblyFolderPath, relativePath); return fullPath;}
Then, below theusing
directives, create fields to store the location of your assets.
var assetsRelativePath = @"../../../assets";string assetsPath = GetAbsolutePath(assetsRelativePath);var modelFilePath = Path.Combine(assetsPath, "Model", "TinyYolo2_model.onnx");var imagesFolder = Path.Combine(assetsPath, "images");var outputFolder = Path.Combine(assetsPath, "images", "output");
Add a new directory to your project to store your input data and prediction classes.
InSolution Explorer, right-click the project, and then selectAdd >New Folder. When the new folder appears in the Solution Explorer, name it "DataStructures".
Create your input data class in the newly createdDataStructures directory.
InSolution Explorer, right-click theDataStructures directory, and then selectAdd >New Item.
In theAdd New Item dialog box, selectClass and change theName field toImageNetData.cs. Then, selectAdd.
TheImageNetData.cs file opens in the code editor. Add the followingusing
directive to the top ofImageNetData.cs:
using System.Collections.Generic;using System.IO;using System.Linq;using Microsoft.ML.Data;
Remove the existing class definition and add the following code for theImageNetData
class to theImageNetData.cs file:
public class ImageNetData{ [LoadColumn(0)] public string ImagePath; [LoadColumn(1)] public string Label; public static IEnumerable<ImageNetData> ReadFromFile(string imageFolder) { return Directory .GetFiles(imageFolder) .Where(filePath => Path.GetExtension(filePath) != ".md") .Select(filePath => new ImageNetData { ImagePath = filePath, Label = Path.GetFileName(filePath) }); }}
ImageNetData
is the input image data class and has the followingString fields:
ImagePath
contains the path where the image is stored.Label
contains the name of the file.Additionally,ImageNetData
contains a methodReadFromFile
that loads multiple image files stored in theimageFolder
path specified and returns them as a collection ofImageNetData
objects.
Create your prediction class in theDataStructures directory.
InSolution Explorer, right-click theDataStructures directory, and then selectAdd >New Item.
In theAdd New Item dialog box, selectClass and change theName field toImageNetPrediction.cs. Then, selectAdd.
TheImageNetPrediction.cs file opens in the code editor. Add the followingusing
directive to the top ofImageNetPrediction.cs:
using Microsoft.ML.Data;
Remove the existing class definition and add the following code for theImageNetPrediction
class to theImageNetPrediction.cs file:
public class ImageNetPrediction{ [ColumnName("grid")] public float[] PredictedLabels;}
ImageNetPrediction
is the prediction data class and has the followingfloat[]
field:
PredictedLabels
contains the dimensions, objectness score, and class probabilities for each of the bounding boxes detected in an image.TheMLContext class is a starting point for all ML.NET operations, and initializingmlContext
creates a new ML.NET environment that can be shared across the model creation workflow objects. It's similar, conceptually, toDBContext
in Entity Framework.
Initialize themlContext
variable with a new instance ofMLContext
by adding the following line below theoutputFolder
field.
MLContext mlContext = new MLContext();
The model segments an image into a13 x 13
grid, where each grid cell is32px x 32px
. Each grid cell contains 5 potential object bounding boxes. A bounding box has 25 elements:
x
the x position of the bounding box center relative to the grid cell it's associated with.y
the y position of the bounding box center relative to the grid cell it's associated with.w
the width of the bounding box.h
the height of the bounding box.o
the confidence value that an object exists within the bounding box, also known as objectness score.p1-p20
class probabilities for each of the 20 classes predicted by the model.In total, the 25 elements describing each of the 5 bounding boxes make up the 125 elements contained in each grid cell.
The output generated by the pretrained ONNX model is a float array of length21125
, representing the elements of a tensor with dimensions125 x 13 x 13
. In order to transform the predictions generated by the model into a tensor, some post-processing work is required. To do so, create a set of classes to help parse the output.
Add a new directory to your project to organize the set of parser classes.
The data output by the model contains coordinates and dimensions of the bounding boxes of objects within the image. Create a base class for dimensions.
InSolution Explorer, right-click theYoloParser directory, and then selectAdd >New Item.
In theAdd New Item dialog box, selectClass and change theName field toDimensionsBase.cs. Then, selectAdd.
TheDimensionsBase.cs file opens in the code editor. Remove allusing
directives and existing class definition.
Add the following code for theDimensionsBase
class to theDimensionsBase.cs file:
public class DimensionsBase{ public float X { get; set; } public float Y { get; set; } public float Height { get; set; } public float Width { get; set; }}
DimensionsBase
has the followingfloat
properties:
X
contains the position of the object along the x-axis.Y
contains the position of the object along the y-axis.Height
contains the height of the object.Width
contains the width of the object.Next, create a class for your bounding boxes.
InSolution Explorer, right-click theYoloParser directory, and then selectAdd >New Item.
In theAdd New Item dialog box, selectClass and change theName field toYoloBoundingBox.cs. Then, selectAdd.
TheYoloBoundingBox.cs file opens in the code editor. Add the followingusing
directive to the top ofYoloBoundingBox.cs:
using System.Drawing;
Just above the existing class definition, add a new class definition calledBoundingBoxDimensions
that inherits from theDimensionsBase
class to contain the dimensions of the respective bounding box.
public class BoundingBoxDimensions : DimensionsBase { }
Remove the existingYoloBoundingBox
class definition and add the following code for theYoloBoundingBox
class to theYoloBoundingBox.cs file:
public class YoloBoundingBox{ public BoundingBoxDimensions Dimensions { get; set; } public string Label { get; set; } public float Confidence { get; set; } public RectangleF Rect { get { return new RectangleF(Dimensions.X, Dimensions.Y, Dimensions.Width, Dimensions.Height); } } public Color BoxColor { get; set; }}
YoloBoundingBox
has the following properties:
Dimensions
contains dimensions of the bounding box.Label
contains the class of object detected within the bounding box.Confidence
contains the confidence of the class.Rect
contains the rectangle representation of the bounding box's dimensions.BoxColor
contains the color associated with the respective class used to draw on the image.Now that the classes for dimensions and bounding boxes are created, it's time to create the parser.
InSolution Explorer, right-click theYoloParser directory, and then selectAdd >New Item.
In theAdd New Item dialog box, selectClass and change theName field toYoloOutputParser.cs. Then, selectAdd.
TheYoloOutputParser.cs file opens in the code editor. Add the followingusing
directives to the top ofYoloOutputParser.cs:
using System;using System.Collections.Generic;using System.Drawing;using System.Linq;
Inside the existingYoloOutputParser
class definition, add a nested class that contains the dimensions of each of the cells in the image. Add the following code for theCellDimensions
class that inherits from theDimensionsBase
class at the top of theYoloOutputParser
class definition.
class CellDimensions : DimensionsBase { }
Inside theYoloOutputParser
class definition, add the following constants and field.
public const int ROW_COUNT = 13;public const int COL_COUNT = 13;public const int CHANNEL_COUNT = 125;public const int BOXES_PER_CELL = 5;public const int BOX_INFO_FEATURE_COUNT = 5;public const int CLASS_COUNT = 20;public const float CELL_WIDTH = 32;public const float CELL_HEIGHT = 32;private int channelStride = ROW_COUNT * COL_COUNT;
ROW_COUNT
is the number of rows in the grid the image is divided into.COL_COUNT
is the number of columns in the grid the image is divided into.CHANNEL_COUNT
is the total number of values contained in one cell of the grid.BOXES_PER_CELL
is the number of bounding boxes in a cell,BOX_INFO_FEATURE_COUNT
is the number of features contained within a box (x,y,height,width,confidence).CLASS_COUNT
is the number of class predictions contained in each bounding box.CELL_WIDTH
is the width of one cell in the image grid.CELL_HEIGHT
is the height of one cell in the image grid.channelStride
is the starting position of the current cell in the grid.When the model makes a prediction, also known as scoring, it divides the416px x 416px
input image into a grid of cells the size of13 x 13
. Each cell contains is32px x 32px
. Within each cell, there are 5 bounding boxes each containing 5 features (x, y, width, height, confidence). In addition, each bounding box contains the probability of each of the classes, which in this case is 20. Therefore, each cell contains 125 pieces of information (5 features + 20 class probabilities).
Create a list of anchors belowchannelStride
for all 5 bounding boxes:
private float[] anchors = new float[]{ 1.08F, 1.19F, 3.42F, 4.41F, 6.63F, 11.38F, 9.42F, 5.11F, 16.62F, 10.52F};
Anchors are predefined height and width ratios of bounding boxes. Most object or classes detected by a model have similar ratios. This is valuable when it comes to creating bounding boxes. Instead of predicting the bounding boxes, the offset from the predefined dimensions is calculated therefore reducing the computation required to predict the bounding box. Typically these anchor ratios are calculated based on the dataset used. In this case, because the dataset is known and the values have been precomputed, the anchors can be hard-coded.
Next, define the labels or classes that the model will predict. This model predicts 20 classes, which is a subset of the total number of classes predicted by the original YOLOv2 model.
Add your list of labels below theanchors
.
private string[] labels = new string[]{ "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"};
There are colors associated with each of the classes. Assign your class colors below yourlabels
:
private static Color[] classColors = new Color[]{ Color.Khaki, Color.Fuchsia, Color.Silver, Color.RoyalBlue, Color.Green, Color.DarkOrange, Color.Purple, Color.Gold, Color.Red, Color.Aquamarine, Color.Lime, Color.AliceBlue, Color.Sienna, Color.Orchid, Color.Tan, Color.LightPink, Color.Yellow, Color.HotPink, Color.OliveDrab, Color.SandyBrown, Color.DarkTurquoise};
There are a series of steps involved in the post-processing phase. To help with that, several helper methods can be employed.
The helper methods used in by the parser are:
Sigmoid
applies the sigmoid function that outputs a number between 0 and 1.Softmax
normalizes an input vector into a probability distribution.GetOffset
maps elements in the one-dimensional model output to the corresponding position in a125 x 13 x 13
tensor.ExtractBoundingBoxes
extracts the bounding box dimensions using theGetOffset
method from the model output.GetConfidence
extracts the confidence value that states how sure the model is that it has detected an object and uses theSigmoid
function to turn it into a percentage.MapBoundingBoxToCell
uses the bounding box dimensions and maps them onto its respective cell within the image.ExtractClasses
extracts the class predictions for the bounding box from the model output using theGetOffset
method and turns them into a probability distribution using theSoftmax
method.GetTopResult
selects the class from the list of predicted classes with the highest probability.IntersectionOverUnion
filters overlapping bounding boxes with lower probabilities.Add the code for all the helper methods below your list ofclassColors
.
private float Sigmoid(float value){ var k = (float)Math.Exp(value); return k / (1.0f + k);}private float[] Softmax(float[] values){ var maxVal = values.Max(); var exp = values.Select(v => Math.Exp(v - maxVal)); var sumExp = exp.Sum(); return exp.Select(v => (float)(v / sumExp)).ToArray();}private int GetOffset(int x, int y, int channel){ // YOLO outputs a tensor that has a shape of 125x13x13, which // WinML flattens into a 1D array. To access a specific channel // for a given (x,y) cell position, we need to calculate an offset // into the array return (channel * this.channelStride) + (y * COL_COUNT) + x;}private BoundingBoxDimensions ExtractBoundingBoxDimensions(float[] modelOutput, int x, int y, int channel){ return new BoundingBoxDimensions { X = modelOutput[GetOffset(x, y, channel)], Y = modelOutput[GetOffset(x, y, channel + 1)], Width = modelOutput[GetOffset(x, y, channel + 2)], Height = modelOutput[GetOffset(x, y, channel + 3)] };}private float GetConfidence(float[] modelOutput, int x, int y, int channel){ return Sigmoid(modelOutput[GetOffset(x, y, channel + 4)]);}private CellDimensions MapBoundingBoxToCell(int x, int y, int box, BoundingBoxDimensions boxDimensions){ return new CellDimensions { X = ((float)x + Sigmoid(boxDimensions.X)) * CELL_WIDTH, Y = ((float)y + Sigmoid(boxDimensions.Y)) * CELL_HEIGHT, Width = (float)Math.Exp(boxDimensions.Width) * CELL_WIDTH * anchors[box * 2], Height = (float)Math.Exp(boxDimensions.Height) * CELL_HEIGHT * anchors[box * 2 + 1], };}public float[] ExtractClasses(float[] modelOutput, int x, int y, int channel){ float[] predictedClasses = new float[CLASS_COUNT]; int predictedClassOffset = channel + BOX_INFO_FEATURE_COUNT; for (int predictedClass = 0; predictedClass < CLASS_COUNT; predictedClass++) { predictedClasses[predictedClass] = modelOutput[GetOffset(x, y, predictedClass + predictedClassOffset)]; } return Softmax(predictedClasses);}private ValueTuple<int, float> GetTopResult(float[] predictedClasses){ return predictedClasses .Select((predictedClass, index) => (Index: index, Value: predictedClass)) .OrderByDescending(result => result.Value) .First();}private float IntersectionOverUnion(RectangleF boundingBoxA, RectangleF boundingBoxB){ var areaA = boundingBoxA.Width * boundingBoxA.Height; if (areaA <= 0) return 0; var areaB = boundingBoxB.Width * boundingBoxB.Height; if (areaB <= 0) return 0; var minX = Math.Max(boundingBoxA.Left, boundingBoxB.Left); var minY = Math.Max(boundingBoxA.Top, boundingBoxB.Top); var maxX = Math.Min(boundingBoxA.Right, boundingBoxB.Right); var maxY = Math.Min(boundingBoxA.Bottom, boundingBoxB.Bottom); var intersectionArea = Math.Max(maxY - minY, 0) * Math.Max(maxX - minX, 0); return intersectionArea / (areaA + areaB - intersectionArea);}
Once you have defined all of the helper methods, it's time to use them to process the model output.
Below theIntersectionOverUnion
method, create theParseOutputs
method to process the output generated by the model.
public IList<YoloBoundingBox> ParseOutputs(float[] yoloModelOutputs, float threshold = .3F){}
Create a list to store your bounding boxes and define variables inside theParseOutputs
method.
var boxes = new List<YoloBoundingBox>();
Each image is divided into a grid of13 x 13
cells. Each cell contains five bounding boxes. Below theboxes
variable, add code to process all of the boxes in each of the cells.
for (int row = 0; row < ROW_COUNT; row++){ for (int column = 0; column < COL_COUNT; column++) { for (int box = 0; box < BOXES_PER_CELL; box++) { } }}
Inside the inner-most loop, calculate the starting position of the current box within the one-dimensional model output.
var channel = (box * (CLASS_COUNT + BOX_INFO_FEATURE_COUNT));
Directly below that, use theExtractBoundingBoxDimensions
method to get the dimensions of the current bounding box.
BoundingBoxDimensions boundingBoxDimensions = ExtractBoundingBoxDimensions(yoloModelOutputs, row, column, channel);
Then, use theGetConfidence
method to get the confidence for the current bounding box.
float confidence = GetConfidence(yoloModelOutputs, row, column, channel);
After that, use theMapBoundingBoxToCell
method to map the current bounding box to the current cell being processed.
CellDimensions mappedBoundingBox = MapBoundingBoxToCell(row, column, box, boundingBoxDimensions);
Before doing any further processing, check whether your confidence value is greater than the threshold provided. If not, process the next bounding box.
if (confidence < threshold) continue;
Otherwise, continue processing the output. The next step is to get the probability distribution of the predicted classes for the current bounding box using theExtractClasses
method.
float[] predictedClasses = ExtractClasses(yoloModelOutputs, row, column, channel);
Then, use theGetTopResult
method to get the value and index of the class with the highest probability for the current box and compute its score.
var (topResultIndex, topResultScore) = GetTopResult(predictedClasses);var topScore = topResultScore * confidence;
Use thetopScore
to once again keep only those bounding boxes that are above the specified threshold.
if (topScore < threshold) continue;
Finally, if the current bounding box exceeds the threshold, create a newBoundingBox
object and add it to theboxes
list.
boxes.Add(new YoloBoundingBox(){ Dimensions = new BoundingBoxDimensions { X = (mappedBoundingBox.X - mappedBoundingBox.Width / 2), Y = (mappedBoundingBox.Y - mappedBoundingBox.Height / 2), Width = mappedBoundingBox.Width, Height = mappedBoundingBox.Height, }, Confidence = topScore, Label = labels[topResultIndex], BoxColor = classColors[topResultIndex]});
Once all cells in the image have been processed, return theboxes
list. Add the following return statement below the outer-most for-loop in theParseOutputs
method.
return boxes;
Now that all of the highly confident bounding boxes have been extracted from the model output, additional filtering needs to be done to remove overlapping images. Add a method calledFilterBoundingBoxes
below theParseOutputs
method:
public IList<YoloBoundingBox> FilterBoundingBoxes(IList<YoloBoundingBox> boxes, int limit, float threshold){}
Inside theFilterBoundingBoxes
method, start off by creating an array equal to the size of detected boxes and marking all slots as active or ready for processing.
var activeCount = boxes.Count;var isActiveBoxes = new bool[boxes.Count];for (int i = 0; i < isActiveBoxes.Length; i++) isActiveBoxes[i] = true;
Then, sort the list containing your bounding boxes in descending order based on confidence.
var sortedBoxes = boxes.Select((b, i) => new { Box = b, Index = i }) .OrderByDescending(b => b.Box.Confidence) .ToList();
After that, create a list to hold the filtered results.
var results = new List<YoloBoundingBox>();
Begin processing each bounding box by iterating over each of the bounding boxes.
for (int i = 0; i < boxes.Count; i++){}
Inside of this for-loop, check whether the current bounding box can be processed.
if (isActiveBoxes[i]){}
If so, add the bounding box to the list of results. If the results exceed the specified limit of boxes to be extracted, break out of the loop. Add the following code inside the if-statement.
var boxA = sortedBoxes[i].Box;results.Add(boxA);if (results.Count >= limit) break;
Otherwise, look at the adjacent bounding boxes. Add the following code below the box limit check.
for (var j = i + 1; j < boxes.Count; j++){}
Like the first box, if the adjacent box is active or ready to be processed, use theIntersectionOverUnion
method to check whether the first box and the second box exceed the specified threshold. Add the following code to your innermost for-loop.
if (isActiveBoxes[j]){ var boxB = sortedBoxes[j].Box; if (IntersectionOverUnion(boxA.Rect, boxB.Rect) > threshold) { isActiveBoxes[j] = false; activeCount--; if (activeCount <= 0) break; }}
Outside of the inner-most for-loop that checks adjacent bounding boxes, see whether there are any remaining bounding boxes to be processed. If not, break out of the outer for-loop.
if (activeCount <= 0) break;
Finally, outside of the initial for-loop of theFilterBoundingBoxes
method, return the results:
return results;
Great! Now it's time to use this code along with the model for scoring.
Just like with post-processing, there are a few steps in the scoring steps. To help with this, add a class that will contain the scoring logic to your project.
InSolution Explorer, right-click the project, and then selectAdd >New Item.
In theAdd New Item dialog box, selectClass and change theName field toOnnxModelScorer.cs. Then, selectAdd.
TheOnnxModelScorer.cs file opens in the code editor. Add the followingusing
directives to the top ofOnnxModelScorer.cs:
using System;using System.Collections.Generic;using System.Linq;using Microsoft.ML;using Microsoft.ML.Data;using ObjectDetection.DataStructures;using ObjectDetection.YoloParser;
Inside theOnnxModelScorer
class definition, add the following variables.
private readonly string imagesFolder;private readonly string modelLocation;private readonly MLContext mlContext;private IList<YoloBoundingBox> _boundingBoxes = new List<YoloBoundingBox>();
Directly below that, create a constructor for theOnnxModelScorer
class that will initialize the previously defined variables.
public OnnxModelScorer(string imagesFolder, string modelLocation, MLContext mlContext){ this.imagesFolder = imagesFolder; this.modelLocation = modelLocation; this.mlContext = mlContext;}
Once you have created the constructor, define a couple of structs that contain variables related to the image and model settings. Create a struct calledImageNetSettings
to contain the height and width expected as input for the model.
public struct ImageNetSettings{ public const int imageHeight = 416; public const int imageWidth = 416;}
After that, create another struct calledTinyYoloModelSettings
that contains the names of the input and output layers of the model. To visualize the name of the input and output layers of the model, you can use a tool likeNetron.
public struct TinyYoloModelSettings{ // for checking Tiny yolo2 Model input and output parameter names, //you can use tools like Netron, // which is installed by Visual Studio AI Tools // input tensor name public const string ModelInput = "image"; // output tensor name public const string ModelOutput = "grid";}
Next, create the first set of methods use for scoring. Create theLoadModel
method inside of yourOnnxModelScorer
class.
private ITransformer LoadModel(string modelLocation){}
Inside theLoadModel
method, add the following code for logging.
Console.WriteLine("Read model");Console.WriteLine($"Model location: {modelLocation}");Console.WriteLine($"Default parameters: image size=({ImageNetSettings.imageWidth},{ImageNetSettings.imageHeight})");
ML.NET pipelines need to know the data schema to operate on when theFit
method is called. In this case, a process similar to training will be used. However, because no actual training is happening, it is acceptable to use an emptyIDataView
. Create a newIDataView
for the pipeline from an empty list.
var data = mlContext.Data.LoadFromEnumerable(new List<ImageNetData>());
Below that, define the pipeline. The pipeline will consist of four transforms.
LoadImages
loads the image as a Bitmap.ResizeImages
rescales the image to the size specified (in this case,416 x 416
).ExtractPixels
changes the pixel representation of the image from a Bitmap to a numerical vector.ApplyOnnxModel
loads the ONNX model and uses it to score on the data provided.Define your pipeline in theLoadModel
method below thedata
variable.
var pipeline = mlContext.Transforms.LoadImages(outputColumnName: "image", imageFolder: "", inputColumnName: nameof(ImageNetData.ImagePath)) .Append(mlContext.Transforms.ResizeImages(outputColumnName: "image", imageWidth: ImageNetSettings.imageWidth, imageHeight: ImageNetSettings.imageHeight, inputColumnName: "image")) .Append(mlContext.Transforms.ExtractPixels(outputColumnName: "image")) .Append(mlContext.Transforms.ApplyOnnxModel(modelFile: modelLocation, outputColumnNames: new[] { TinyYoloModelSettings.ModelOutput }, inputColumnNames: new[] { TinyYoloModelSettings.ModelInput }));
Now it's time to instantiate the model for scoring. Call theFit
method on the pipeline and return it for further processing.
var model = pipeline.Fit(data);return model;
Once the model is loaded, it can then be used to make predictions. To facilitate that process, create a method calledPredictDataUsingModel
below theLoadModel
method.
private IEnumerable<float[]> PredictDataUsingModel(IDataView testData, ITransformer model){}
Inside thePredictDataUsingModel
, add the following code for logging.
Console.WriteLine($"Images location: {imagesFolder}");Console.WriteLine("");Console.WriteLine("=====Identify the objects in the images=====");Console.WriteLine("");
Then, use theTransform
method to score the data.
IDataView scoredData = model.Transform(testData);
Extract the predicted probabilities and return them for additional processing.
IEnumerable<float[]> probabilities = scoredData.GetColumn<float[]>(TinyYoloModelSettings.ModelOutput);return probabilities;
Now that both steps are set up, combine them into a single method. Below thePredictDataUsingModel
method, add a new method calledScore
.
public IEnumerable<float[]> Score(IDataView data){ var model = LoadModel(modelLocation); return PredictDataUsingModel(data, model);}
Almost there! Now it's time to put it all to use.
Now that all of the setup is complete, it's time to detect some objects.
Below the creation of themlContext
variable, add a try-catch statement.
try{}catch (Exception ex){ Console.WriteLine(ex.ToString());}
Inside of thetry
block, start implementing the object detection logic. First, load the data into anIDataView
.
IEnumerable<ImageNetData> images = ImageNetData.ReadFromFile(imagesFolder);IDataView imageDataView = mlContext.Data.LoadFromEnumerable(images);
Then, create an instance ofOnnxModelScorer
and use it to score the loaded data.
// Create instance of model scorervar modelScorer = new OnnxModelScorer(imagesFolder, modelFilePath, mlContext);// Use model to score dataIEnumerable<float[]> probabilities = modelScorer.Score(imageDataView);
Now it's time for the post-processing step. Create an instance ofYoloOutputParser
and use it to process the model output.
YoloOutputParser parser = new YoloOutputParser();var boundingBoxes = probabilities .Select(probability => parser.ParseOutputs(probability)) .Select(boxes => parser.FilterBoundingBoxes(boxes, 5, .5F));
Once the model output has been processed, it's time to draw the bounding boxes on the images.
After the model has scored the images and the outputs have been processed, the bounding boxes have to be drawn on the image. To do so, add a method calledDrawBoundingBox
below theGetAbsolutePath
method inside ofProgram.cs.
void DrawBoundingBox(string inputImageLocation, string outputImageLocation, string imageName, IList<YoloBoundingBox> filteredBoundingBoxes){}
First, load the image and get the height and width dimensions in theDrawBoundingBox
method.
Image image = Image.FromFile(Path.Combine(inputImageLocation, imageName));var originalImageHeight = image.Height;var originalImageWidth = image.Width;
Then, create a for-each loop to iterate over each of the bounding boxes detected by the model.
foreach (var box in filteredBoundingBoxes){}
Inside of the for-each loop, get the dimensions of the bounding box.
var x = (uint)Math.Max(box.Dimensions.X, 0);var y = (uint)Math.Max(box.Dimensions.Y, 0);var width = (uint)Math.Min(originalImageWidth - x, box.Dimensions.Width);var height = (uint)Math.Min(originalImageHeight - y, box.Dimensions.Height);
Because the dimensions of the bounding box correspond to the model input of416 x 416
, scale the bounding box dimensions to match the actual size of the image.
x = (uint)originalImageWidth * x / OnnxModelScorer.ImageNetSettings.imageWidth;y = (uint)originalImageHeight * y / OnnxModelScorer.ImageNetSettings.imageHeight;width = (uint)originalImageWidth * width / OnnxModelScorer.ImageNetSettings.imageWidth;height = (uint)originalImageHeight * height / OnnxModelScorer.ImageNetSettings.imageHeight;
Then, define a template for text that will appear above each bounding box. The text will contain the class of the object inside of the respective bounding box as well as the confidence.
string text = $"{box.Label} ({(box.Confidence * 100).ToString("0")}%)";
In order to draw on the image, convert it to aGraphics
object.
using (Graphics thumbnailGraphic = Graphics.FromImage(image)){}
Inside theusing
code block, tune the graphic'sGraphics
object settings.
thumbnailGraphic.CompositingQuality = CompositingQuality.HighQuality;thumbnailGraphic.SmoothingMode = SmoothingMode.HighQuality;thumbnailGraphic.InterpolationMode = InterpolationMode.HighQualityBicubic;
Below that, set the font and color options for the text and bounding box.
// Define Text OptionsFont drawFont = new Font("Arial", 12, FontStyle.Bold);SizeF size = thumbnailGraphic.MeasureString(text, drawFont);SolidBrush fontBrush = new SolidBrush(Color.Black);Point atPoint = new Point((int)x, (int)y - (int)size.Height - 1);// Define BoundingBox optionsPen pen = new Pen(box.BoxColor, 3.2f);SolidBrush colorBrush = new SolidBrush(box.BoxColor);
Create and fill a rectangle above the bounding box to contain the text using theFillRectangle
method. This will help contrast the text and improve readability.
thumbnailGraphic.FillRectangle(colorBrush, (int)x, (int)(y - size.Height - 1), (int)size.Width, (int)size.Height);
Then, Draw the text and bounding box on the image using theDrawString
andDrawRectangle
methods.
thumbnailGraphic.DrawString(text, drawFont, fontBrush, atPoint);// Draw bounding box on imagethumbnailGraphic.DrawRectangle(pen, x, y, width, height);
Outside of the for-each loop, add code to save the images in theoutputFolder
.
if (!Directory.Exists(outputImageLocation)){ Directory.CreateDirectory(outputImageLocation);}image.Save(Path.Combine(outputImageLocation, imageName));
For additional feedback that the application is making predictions as expected at run time, add a method calledLogDetectedObjects
below theDrawBoundingBox
method in theProgram.cs file to output the detected objects to the console.
void LogDetectedObjects(string imageName, IList<YoloBoundingBox> boundingBoxes){ Console.WriteLine($".....The objects in the image {imageName} are detected as below...."); foreach (var box in boundingBoxes) { Console.WriteLine($"{box.Label} and its Confidence score: {box.Confidence}"); } Console.WriteLine("");}
Now that you have helper methods to create visual feedback from the predictions, add a for-loop to iterate over each of the scored images.
for (var i = 0; i < images.Count(); i++){}
Inside of the for-loop, get the name of the image file and the bounding boxes associated with it.
string imageFileName = images.ElementAt(i).Label;IList<YoloBoundingBox> detectedObjects = boundingBoxes.ElementAt(i);
Below that, use theDrawBoundingBox
method to draw the bounding boxes on the image.
DrawBoundingBox(imagesFolder, outputFolder, imageFileName, detectedObjects);
Lastly, use theLogDetectedObjects
method to output predictions to the console.
LogDetectedObjects(imageFileName, detectedObjects);
After the try-catch statement, add additional logic to indicate the process is done running.
Console.WriteLine("========= End of Process..Hit any Key ========");
That's it!
After following the previous steps, run your console app (Ctrl + F5). Your results should be similar to the following output. You may see warnings or processing messages, but these messages have been removed from the following results for clarity.
=====Identify the objects in the images=====.....The objects in the image image1.jpg are detected as below....car and its Confidence score: 0.9697262car and its Confidence score: 0.6674225person and its Confidence score: 0.5226039car and its Confidence score: 0.5224892car and its Confidence score: 0.4675332.....The objects in the image image2.jpg are detected as below....cat and its Confidence score: 0.6461141cat and its Confidence score: 0.6400049.....The objects in the image image3.jpg are detected as below....chair and its Confidence score: 0.840578chair and its Confidence score: 0.796363diningtable and its Confidence score: 0.6056048diningtable and its Confidence score: 0.3737402.....The objects in the image image4.jpg are detected as below....dog and its Confidence score: 0.7608147person and its Confidence score: 0.6321323dog and its Confidence score: 0.5967442person and its Confidence score: 0.5730394person and its Confidence score: 0.5551759========= End of Process..Hit any Key ========
To see the images with bounding boxes, navigate to theassets/images/output/
directory. Below is a sample from one of the processed images.
Congratulations! You've now successfully built a machine learning model for object detection by reusing a pretrainedONNX
model in ML.NET.
You can find the source code for this tutorial at thedotnet/machinelearning-samples repository.
In this tutorial, you learned how to:
Check out the Machine Learning samples GitHub repository to explore an expanded object detection sample.
Was this page helpful?
Was this page helpful?