Integral Channel Features (ICF), also known asChnFtrs, is a method forobject detection incomputer vision. It usesintegral images to extract features such as local sums, histograms andHaar-like features from multiple registered image channels. This method was highly exploited by Dolláret al. in their work forpedestrian detection, that was first described at the BMVC in 2009.[1]
Typically, a "channel" refers to a certain component that defines pixel values in adigital image. A color image, for example is an aggregate of three channels (red, green and blue). The color data of an image is stored in three arrays of values, known as channels. While this definition of a "channel" is widely accepted across various domains, there exists a broader definition incomputer vision, which allows one to exploit other features of an image besides the color information. One such definition refers to a channel as aregistered map of the original image where the output pixels are mapped to input pixels by some linear or non-transformation.[1] According to this notion of a channel, color channels of an image can be redefined as output images that are obtained by extracting one specific color information point from the input image at a time. Similarly, a channel for a grayscale input image is simply equal to a grayscale input image. The simple MATLAB implementation below shows how color channels and grayscale channel can be extracted from an input image.
I=imread('I_RGB.png');% input color image% Output_image = color_channel(I),% where color channel could be red, green or blue. The three output images% are extracted from input image as followsred_channel=I(:,:,1);green_channel=I(:,:,2);blue_channel=I(:,:,3);% Output image = grayscale_image(I).% Note if input image I was already a grayscale image, grayscale channel% would have simply been equal to input image, i.e., gray channel = Igray_channel=rgb2gray(I);
It is clear from the above examples that a channel can be generated by either simply extracting specific information from the original image or by manipulating the input image in some form to obtain the desired channel. Dolláret al. defined a channel generation function as Ω, which can be used to relate a channel (that is, an output image) to the original image as follows.[1]
The next section discusses other relatively complex channel types as mentioned in the original paper by Dolláret al. MATLAB implementation is given for some of the channels.
% Output image = DoG(I)% Difference of Gaussian applied on input imageH1=fspecial('gaussian',25,0.5);% create a Gaussian with signal 0.5H2=fspecial('gaussian',25,3);% create a Gaussian with signal 3DoG_filter=H1-H2;% create a DoGimage=double(rgb2gray(imread('RGB_1.jpg')));DoG_channel=conv2(image,DoG_filter,'same');% convolve DoG with input image
Note that these channels can be used alone or in combination with each other.
Once channels are obtained from an input image, various features can be extracted from these channels. These features are called channel features and can be categorized into two main types:[1]
The ChnFtrs method allows one to pool features that capture the richness from diverse channels. Dollár, et al. based their experimental results on first order features since there was not much added value by the second order features.[1] The channels are re-computed at multiple scales to extract a pool of channel features that can represent the entire scale space.There is a MATLAB toolbox that can be used as a guidance to implement ChnFtrs method. Further, OpenCV has a complete implementation of ChnFtrs.[2][3]
To study the performance of ChnFtrs, Dolláret al. first evaluated the effectiveness of various channels when used individually. The channels studied werehistogram of oriented gradients (HOG), gradient histogram channel (Hist), gradient magnitude (Grad), color channels (RGB, HSV, LUV) and grayscale channel. The performance was evaluated in terms of pedestrian detection rates at the reference point of 10 - 4 fppw (false positive per window). HOG turned out to be the most informative channel compared with rest of the channels. The detection rate of HOG was 89%. Further, among the color channels (RGB, HSV and LUV), LUV had the best detection rate of 55.8%.Grayscale channel was least informative with the detection rate of only 30.7%.Next, they evaluated the performance of various channel combinations, which is their proposed method. The combination of LUV, Hist and Grad channels had the highest detection rate of 91.9%. This channel combination was further used in their experiments on INRIA and Caltech datasets.
About 30,000 first order features were used to train AdaBoost classifier. TheChnFtrs +AdaBoost detector was tested on full images from INRIA and Caltech datasets. The performance was compared with 12 other detectors including HOG, which is the most popular method. ChnFtrs outperformed all except LatSvm. The detection rate for ChnFtrs was 86% on INRIA dataset and 60% on a more challenging Caltech dataset.
The ICF method (ChnFtrs) has been widely exploited by researchers in Computer Vision after the work was initially published by Dollaret al.. In fact, it is now used as a baseline detector due to its proven efficiency and reasonable performance. Several authors have obtained even better performance by either extending feature pool in various ways or by carefully choosing the classifier and training it with a larger dataset. Work by Zhanget al also exploited integral channel features in developing Informed Haar detector for pedestrian detection.[4] They used the same combination of channels as Dollár et al. but were able achieve approximately 20% higher performance than the baseline ChnFtrs method. The added performance was due to the fact that they provided better prior knowledge to their detector.[4] It is also important to note that they used informed Haar-like features, which are second order features according to the terminology described in,[1] whereas Dolláret al. demonstrated their results using first order channel features only, as their analysis showed that second order features barely added 0.6% increase to their detection rate. Further, Benensonet al. were able to increase the detection speed of baseline ChnFtrs method by avoiding the need to resize input image.[5]
ChnFtrs is a versatile method that allows one to extract features from multiple channels, thus allowing to capture diverse information from a single input image. The performance of a base detector developed by Dolláret al. has been shown to be enhanced by adding better prior knowledge and training with a larger dataset.