SALE!Use codeBF40 for 40% off everything!
Hurry, sale ends soon!Click to see the full catalog.

Navigation

Making developers awesome at machine learning

Click Here to Take the FREE Machine Learning with OpenCV Crash-Course

Image Vector Representation for Machine Learning Using OpenCV

By Stefania CristinaonJanuary 30, 2024in OpenCV 0

One of the pre-processing steps that are often carried out on images before feeding them into a machine learning algorithm is to convert them into a feature vector. As we will see in this tutorial, there are several advantages to converting an image into a feature vector that makes the latter more efficient.

Among the different techniques for converting an image into a feature vector, two of the most popular techniques used in conjunction with different machine learning algorithms are the Histogram of Oriented Gradients and the Bag-of-Words techniques.

In this tutorial, you will discover the Histogram of Oriented Gradients (HOG) and the Bag-of-Words (BoW) techniques for image vector representation.

After completing this tutorial, you will know:

What are the advantages of using the Histogram of Oriented Gradients and the Bag-of-Words techniques for image vector representation.
How to use the Histogram of Oriented Gradients technique in OpenCV.
How to use the Bag-of-Words technique in OpenCV.

Kick-start your project with my bookMachine Learning in OpenCV. It providesself-study tutorials withworking code.

Let’s get started.

Image Vector Representation for Machine Learning Using OpenCV
Photo byJohn Fowler, some rights reserved.

Tutorial Overview

This tutorial is divided into four parts; they are:

What are the Advantages of Using HOG or BoW for Image Vector Representation?
The Histogram of Oriented Gradients Technique
The Bag-of-Words Technique
Putting the Techniques to Test

What are the Advantages of Using HOG or BoW for Image Vector Representation?

When working with machine learning algorithms, the image data typically undergoes a data pre-processing step, which is structured so that the machine learning algorithms can work with it.

In OpenCV, for instance, the ml module requires that the image data is fed into the machine learning algorithms in the form of feature vectors of equal length.

Each training sample is a vector of values (in Computer Vision it’s sometimes referred to as feature vector). Usually all the vectors have the same number of components (features); OpenCV ml module assumes that.
–OpenCV, 2023.

One way of structuring the image data is to flatten it out into a one-dimensional vector, where the vector’s length would equal the number of pixels in the image. For example, a $20\times 20$ pixel image would result in a one-dimensional vector of length 400 pixels. This one-dimensional vector serves as the feature set fed into the machine learning algorithm, where the intensity value of each pixel represents every feature.

However, while this is the simplest feature set we can create, it is not the most effective one, especially when working with larger images that will result in too many input features to be processed effectively by a machine learning algorithm.

This can dramatically impact the performance of machine learning algorithms fit on data with many input features, generally referred to as the “curse of dimensionality.”
–Introduction to Dimensionality Reduction for Machine Learning, 2020.

Rather, we want to reduce the number of input features that represent each image so that, in turn, the machine learning algorithm can generalize better to the input data. In more technical words, it is desirable to perform dimensionality reduction that transforms the image data from a high-dimensional space to a lower one.

One way of doing so is to apply feature extraction and representation techniques, such as the Histogram of Oriented Gradients (HOG) or the Bag-of-Words (BoW), to represent an image in a more compact manner and, in turn, reduce the redundancy in the feature set and the computational requirements to process it.

Another advantage to converting the image data into a feature vector using the aforementioned techniques is that the vector representation of the image becomes more robust to variations in illumination, scale, or viewpoint.

Want to Get Started With Machine Learning with OpenCV?

Take my free email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

In the following sections, we will explore using the HOG and BoW techniques for image vector representation.

The Histogram of Oriented Gradients Technique

The HOG is a feature extraction technique that aims to represent the local shape and appearance of objects inside the image space by a distribution of their edge directions.

In a nutshell, the HOG technique performs the following steps when applied to an image:

1. Computes the image gradients in horizontal and vertical directions using, for example, a Prewitt operator. The magnitude and direction of the gradient are then computed for every pixel in the image.

1. Divide the image into non-overlapping cells of fixed size and compute a histogram of gradients for each cell. This histogram representation of every image cell is more compact and more robust to noise. The cell size is typically set according to the size of the image features we want to capture.

1. Concatenates the histograms over blocks of cells into one-dimensional feature vectors and normalizes them. This makes the descriptor more robust to lighting variations.

1. Finally, it concatenates all normalized feature vectors representing the blocks of cells to obtain a final feature vector representation of the entire image.

The HOG implementation in OpenCV takes several input arguments that correspond to the aforementioned steps, including:

- The window size (winSize) that corresponds to the minimum object size to be detected.
- The cell size (cellSize) typically captures the size of the image features of interest.
- The block size (blockSize) tackles the problem of variation in illumination.
- The block stride (blockStride) controls how much neighboring blocks overlap.
- The number of histogram bins (nbins) to capture gradients between 0 and 180 degrees.

Let’s create a function,hog_descriptors()that computes feature vectors for a set of images using the HOG technique:

Python

def hog_descriptors(imgs):    # Create a list to store the HOG feature vectors    hog_features = []    # Set parameter values for the HOG descriptor based on the image data in use    winSize = (20, 20)    blockSize = (10, 10)    blockStride = (5, 5)    cellSize = (10, 10)    nbins = 9    # Set the remaining parameters to their default values    derivAperture = 1    winSigma = -1.    histogramNormType = 0    L2HysThreshold = 0.2    gammaCorrection = False    nlevels = 64    # Create a HOG descriptor    hog = HOGDescriptor(winSize, blockSize, blockStride, cellSize, nbins, derivAperture, winSigma,                        histogramNormType, L2HysThreshold, gammaCorrection, nlevels)    # Compute HOG descriptors for the input images and append the feature vectors to the list    for img in imgs:        hist = hog.compute(img.reshape(20, 20).astype(uint8))        hog_features.append(hist)    return array(hog_features)

defhog_descriptors(imgs):

# Create a list to store the HOG feature vectors

hog_features=[]

# Set parameter values for the HOG descriptor based on the image data in use

winSize=(20,20)

blockSize=(10,10)

blockStride=(5,5)

cellSize=(10,10)

nbins=9

# Set the remaining parameters to their default values

derivAperture=1

winSigma=-1.

histogramNormType=0

L2HysThreshold=0.2

gammaCorrection=False

nlevels=64

# Create a HOG descriptor

hog=HOGDescriptor(winSize,blockSize,blockStride,cellSize,nbins,derivAperture,winSigma,

histogramNormType,L2HysThreshold,gammaCorrection,nlevels)

# Compute HOG descriptors for the input images and append the feature vectors to the list

forimginimgs:

hist=hog.compute(img.reshape(20,20).astype(uint8))

hog_features.append(hist)

returnarray(hog_features)

Note: It is important to note that how the images are being reshaped here corresponds to the image dataset that will be later used in this tutorial. If you use a different dataset, do not forget to tweak this part of the code accordingly.

The Bag-of-Words Technique

The BoW technique has been introduced inthis tutorial as applied to modeling text with machine learning algorithms.

Nonetheless, this technique can also be applied to computer vision, where images are treated as visual words from which features can be extracted. For this reason, when applied to computer vision, the BoW technique is often called the Bag-of-Visual-Words technique.

In a nutshell, the BoW technique performs the following steps when applied to an image:

1. Extracts feature descriptors from an image using algorithms such as the Scale-Invariant Feature Transform (SIFT) or Speeded Up Robust Features (SURF). Ideally, the extracted features should be invariant to intensity, scale, rotation, and affine variations.

1. Generates codewords from the feature descriptors where each codeword is representative of similar image patches. One way of generating these codewords is to use k-means clustering to aggregate similar descriptors into clusters, where the centers of the clusters would then represent the visual words, while the number of clusters represents the vocabulary size.

1. Maps the feature descriptors to the nearest cluster in the vocabulary, essentially assigning a codeword to each feature descriptor.

1. Bins the codewords into a histogram and uses this histogram as a feature vector representation of the image.

Let’s create a function,bow_descriptors(), that applies the BoW technique using SIFT to a set of images:

Python

def bow_descriptors(imgs):    # Create a SIFT descriptor    sift = SIFT_create()    # Create a BoW descriptor    # The number of clusters equal to 50 (analogous to the vocabulary size) has been chosen empirically    bow_trainer = BOWKMeansTrainer(50)    bow_extractor = BOWImgDescriptorExtractor(sift, BFMatcher(NORM_L2))    for img in imgs:        # Reshape each RGB image and convert it to grayscale        img = reshape(img, (32, 32, 3), 'F')        img = cvtColor(img, COLOR_RGB2GRAY).transpose()        # Extract the SIFT descriptors        _, descriptors = sift.detectAndCompute(img, None)        # Add the SIFT descriptors to the BoW vocabulary trainer        if descriptors is not None:            bow_trainer.add(descriptors)    # Perform k-means clustering and return the vocabulary    voc = bow_trainer.cluster()    # Assign the vocabulary to the BoW descriptor extractor    bow_extractor.setVocabulary(voc)    # Create a list to store the BoW feature vectors    bow_features = []    for img in imgs:        # Reshape each RGB image and convert it to grayscale        img = reshape(img, (32, 32, 3), 'F')        img = cvtColor(img, COLOR_RGB2GRAY).transpose()        # Compute the BoW feature vector        hist = bow_extractor.compute(img, sift.detect(img))        # Append the feature vectors to the list        if hist is not None:            bow_features.append(hist[0])    return array(bow_features)

defbow_descriptors(imgs):

# Create a SIFT descriptor

sift=SIFT_create()

# Create a BoW descriptor

# The number of clusters equal to 50 (analogous to the vocabulary size) has been chosen empirically

bow_trainer=BOWKMeansTrainer(50)

bow_extractor=BOWImgDescriptorExtractor(sift,BFMatcher(NORM_L2))

forimginimgs:

# Reshape each RGB image and convert it to grayscale

img=reshape(img,(32,32,3),'F')

img=cvtColor(img,COLOR_RGB2GRAY).transpose()

# Extract the SIFT descriptors

_,descriptors=sift.detectAndCompute(img,None)

# Add the SIFT descriptors to the BoW vocabulary trainer

ifdescriptorsisnotNone:

bow_trainer.add(descriptors)

# Perform k-means clustering and return the vocabulary

voc=bow_trainer.cluster()

# Assign the vocabulary to the BoW descriptor extractor

bow_extractor.setVocabulary(voc)

# Create a list to store the BoW feature vectors

bow_features=[]

forimginimgs:

# Reshape each RGB image and convert it to grayscale

img=reshape(img,(32,32,3),'F')

img=cvtColor(img,COLOR_RGB2GRAY).transpose()

# Compute the BoW feature vector

hist=bow_extractor.compute(img,sift.detect(img))

# Append the feature vectors to the list

ifhistisnotNone:

bow_features.append(hist[0])

returnarray(bow_features)

Putting the Techniques to Test

There isn’t necessarily a single best technique for all cases, and the choice of technique for the image data you are working with often requires controlled experiments.

In this tutorial, as an example, we will apply the HOG technique to the digits dataset that comes with OpenCV, and the BoW technique to images from the CIFAR-10 dataset. For this tutorial, we will only be considering a subset of images from these two datasets to reduce the required processing time. Nonetheless, the same code can be easily extended to the full datasets.

We will start by loading the datasets we will be working with. Recall that we had seen how to extract the images from each dataset in this tutorial. Thedigits_dataset and thecifar_dataset are Python scripts that I have created and which contain the code for loading the digits and the CIFAR-10 datasets, respectively:

Python

from digits_dataset import split_images, split_datafrom cifar_dataset import load_images# Load the digits imageimg, sub_imgs = split_images('Images/digits.png', 20)# Obtain a dataset from the digits imagedigits_imgs, _, _, _ = split_data(20, sub_imgs, 0.8)# Load a batch of images from the CIFAR datasetcifar_imgs = load_images('Images/cifar-10-batches-py/data_batch_1')# Consider only a subset of imagesdigits_subset = digits_imgs[0:100, :]cifar_subset = cifar_imgs[0:100, :]

fromdigits_datasetimportsplit_images,split_data

fromcifar_datasetimportload_images

# Load the digits image

img,sub_imgs=split_images('Images/digits.png',20)

# Obtain a dataset from the digits image

digits_imgs,_,_,_=split_data(20,sub_imgs,0.8)

# Load a batch of images from the CIFAR dataset

cifar_imgs=load_images('Images/cifar-10-batches-py/data_batch_1')

# Consider only a subset of images

digits_subset=digits_imgs[0:100,:]

cifar_subset=cifar_imgs[0:100,:]

We may then proceed to pass on the datasets to thehog_descriptors() and thebow_descriptors() functions that we have created earlier in this tutorial:

Python

digits_hog = hog_descriptors(digits_subset)print('Size of HOG feature vectors:', digits_hog.shape)cifar_bow = bow_descriptors(cifar_subset)print('Size of BoW feature vectors:', cifar_bow.shape)

digits_hog=hog_descriptors(digits_subset)

print('Size of HOG feature vectors:',digits_hog.shape)

cifar_bow=bow_descriptors(cifar_subset)

print('Size of BoW feature vectors:',cifar_bow.shape)

The complete code listing looks as follows:

Python

from cv2 import (imshow, waitKey, HOGDescriptor, SIFT_create, BOWKMeansTrainer,                 BOWImgDescriptorExtractor, BFMatcher, NORM_L2, cvtColor, COLOR_RGB2GRAY)from digits_dataset import split_images, split_datafrom cifar_dataset import load_imagesfrom numpy import uint8, array, reshape# Load the digits imageimg, sub_imgs = split_images('Images/digits.png', 20)# Obtain a dataset from the digits imagedigits_imgs, _, _, _ = split_data(20, sub_imgs, 0.8)# Load a batch of images from the CIFAR datasetcifar_imgs = load_images('Images/cifar-10-batches-py/data_batch_1')# Consider only a subset of imagesdigits_subset = digits_imgs[0:100, :]cifar_subset = cifar_imgs[0:100, :]def hog_descriptors(imgs):    # Create a list to store the HOG feature vectors    hog_features = []    # Set parameter values for the HOG descriptor based on the image data in use    winSize = (20, 20)    blockSize = (10, 10)    blockStride = (5, 5)    cellSize = (10, 10)    nbins = 9    # Set the remaining parameters to their default values    derivAperture = 1    winSigma = -1.    histogramNormType = 0    L2HysThreshold = 0.2    gammaCorrection = False    nlevels = 64    # Create a HOG descriptor    hog = HOGDescriptor(winSize, blockSize, blockStride, cellSize, nbins, derivAperture, winSigma,                        histogramNormType, L2HysThreshold, gammaCorrection, nlevels)    # Compute HOG descriptors for the input images and append the feature vectors to the list    for img in imgs:        hist = hog.compute(img.reshape(20, 20).astype(uint8))        hog_features.append(hist)    return array(hog_features)def bow_descriptors(imgs):    # Create a SIFT descriptor    sift = SIFT_create()    # Create a BoW descriptor    # The number of clusters equal to 50 (analogous to the vocabulary size) has been chosen empirically    bow_trainer = BOWKMeansTrainer(50)    bow_extractor = BOWImgDescriptorExtractor(sift, BFMatcher(NORM_L2))    for img in imgs:        # Reshape each RGB image and convert it to grayscale        img = reshape(img, (32, 32, 3), 'F')        img = cvtColor(img, COLOR_RGB2GRAY).transpose()        # Extract the SIFT descriptors        _, descriptors = sift.detectAndCompute(img, None)        # Add the SIFT descriptors to the BoW vocabulary trainer        if descriptors is not None:            bow_trainer.add(descriptors)    # Perform k-means clustering and return the vocabulary    voc = bow_trainer.cluster()    # Assign the vocabulary to the BoW descriptor extractor    bow_extractor.setVocabulary(voc)    # Create a list to store the BoW feature vectors    bow_features = []    for img in imgs:        # Reshape each RGB image and convert it to grayscale        img = reshape(img, (32, 32, 3), 'F')        img = cvtColor(img, COLOR_RGB2GRAY).transpose()        # Compute the BoW feature vector        hist = bow_extractor.compute(img, sift.detect(img))        # Append the feature vectors to the list        if hist is not None:            bow_features.append(hist[0])    return array(bow_features)digits_hog = hog_descriptors(digits_subset)print('Size of HOG feature vectors:', digits_hog.shape)cifar_bow = bow_descriptors(cifar_subset)print('Size of BoW feature vectors:', cifar_bow.shape)

100

fromcv2import(imshow,waitKey,HOGDescriptor,SIFT_create,BOWKMeansTrainer,

BOWImgDescriptorExtractor,BFMatcher,NORM_L2,cvtColor,COLOR_RGB2GRAY)

fromdigits_datasetimportsplit_images,split_data

fromcifar_datasetimportload_images

fromnumpyimportuint8,array,reshape

# Load the digits image

img,sub_imgs=split_images('Images/digits.png',20)

# Obtain a dataset from the digits image

digits_imgs,_,_,_=split_data(20,sub_imgs,0.8)

# Load a batch of images from the CIFAR dataset

cifar_imgs=load_images('Images/cifar-10-batches-py/data_batch_1')

# Consider only a subset of images

digits_subset=digits_imgs[0:100,:]

cifar_subset=cifar_imgs[0:100,:]

defhog_descriptors(imgs):

# Create a list to store the HOG feature vectors

hog_features=[]

# Set parameter values for the HOG descriptor based on the image data in use

winSize=(20,20)

blockSize=(10,10)

blockStride=(5,5)

cellSize=(10,10)

nbins=9

# Set the remaining parameters to their default values

derivAperture=1

winSigma=-1.

histogramNormType=0

L2HysThreshold=0.2

gammaCorrection=False

nlevels=64

# Create a HOG descriptor

hog=HOGDescriptor(winSize,blockSize,blockStride,cellSize,nbins,derivAperture,winSigma,

histogramNormType,L2HysThreshold,gammaCorrection,nlevels)

# Compute HOG descriptors for the input images and append the feature vectors to the list

forimginimgs:

hist=hog.compute(img.reshape(20,20).astype(uint8))

hog_features.append(hist)

returnarray(hog_features)

defbow_descriptors(imgs):

# Create a SIFT descriptor

sift=SIFT_create()

# Create a BoW descriptor

# The number of clusters equal to 50 (analogous to the vocabulary size) has been chosen empirically

bow_trainer=BOWKMeansTrainer(50)

bow_extractor=BOWImgDescriptorExtractor(sift,BFMatcher(NORM_L2))

forimginimgs:

# Reshape each RGB image and convert it to grayscale

img=reshape(img,(32,32,3),'F')

img=cvtColor(img,COLOR_RGB2GRAY).transpose()

# Extract the SIFT descriptors

_,descriptors=sift.detectAndCompute(img,None)

# Add the SIFT descriptors to the BoW vocabulary trainer

ifdescriptorsisnotNone:

bow_trainer.add(descriptors)

# Perform k-means clustering and return the vocabulary

voc=bow_trainer.cluster()

# Assign the vocabulary to the BoW descriptor extractor

bow_extractor.setVocabulary(voc)

# Create a list to store the BoW feature vectors

bow_features=[]

forimginimgs:

# Reshape each RGB image and convert it to grayscale

img=reshape(img,(32,32,3),'F')

img=cvtColor(img,COLOR_RGB2GRAY).transpose()

# Compute the BoW feature vector

hist=bow_extractor.compute(img,sift.detect(img))

# Append the feature vectors to the list

ifhistisnotNone:

bow_features.append(hist[0])

returnarray(bow_features)

digits_hog=hog_descriptors(digits_subset)

print('Size of HOG feature vectors:',digits_hog.shape)

cifar_bow=bow_descriptors(cifar_subset)

print('Size of BoW feature vectors:',cifar_bow.shape)

The code above returns the following output:

Python

Size of HOG feature vectors:  (100, 81)Size of BoW feature vectors: (100, 50)

1 2	Size of HOG feature vectors: (100, 81) Size of BoW feature vectors: (100, 50)

Based on our choice of parameter values, we may see that the HOG technique returns feature vectors of size $1\times 81$ for each image. This means each image is now represented by points in an 81-dimensional space. The BoW technique, on the other hand, returns vectors of size $1\times 50$ for each image, where the vector length has been determined by the number of k-means clusters of choice, which is also analogous to the vocabulary size.

Hence, we may see that, instead of simply flattening out each image into a one-dimensional vector, we have managed to represent each image more compactly by applying the HOG and BoW techniques.

Our next step will be to see how we can exploit this data using different machine learning algorithms.

Summary

In this tutorial, you will discover the Histogram of Oriented Gradients and the Bag-of-Words techniques for image vector representation.

Specifically, you learned:

What are the advantages of using the Histogram of Oriented Gradients and the Bag-of-Words techniques for image vector representation
How to use the Histogram of Oriented Gradients technique in OpenCV.
How to use the Bag-of-Words technique in OpenCV.

Do you have any questions?
Ask your questions in the comments below, and I will do my best to answer.

Get Started on Machine Learning in OpenCV!

Learn how to use machine learning techniques in image processing projects

...using OpenCV in advanced ways and work beyond pixels

Discover how in my new Ebook:
Machine Learing in OpenCV

It providesself-study tutorials withall working code in Python to turn you from a novice to expert. It equips you with
logistic regression,random forest,SVM,k-means clustering,neural networks, and much more...all using the machine learning module in OpenCV