CN105005768B

Movatterモバイル変換

Info

Publication number: CN105005768B
Application number: CN201510391152.3A
Authority: CN
Inventors: 李东新; 张鸿鹏
Original assignee: Hohai University HHU
Current assignee: Hohai University HHU
Priority date: 2015-07-06
Filing date: 2015-07-06
Publication date: 2018-09-14
Anticipated expiration: 2035-07-06
Also published as: CN105005768A

Abstract

Translated fromChinese

本发明公开了一种动态百分比样本裁剪AdaBoost人脸检测算法，具体为：在每次迭代开始的时候首先确定所需裁剪样本个数的百分比f，每一轮按照f裁剪掉权重较小的样本，用剩余样本进行训练,当训练得到的本次迭代的最佳弱分类器错误率大于随机值产生的错误率，通过减小裁剪的常量f，扩大样本集数量，对于本次迭代重新进行训练。如果当采用全部样本进行训练时,错误率仍然超过0.5则停止迭代。本发明适用于参与训练的样本个数过多时，通过选取部分对性能提升效果更好的样本，来达到节省训练时间的目的。

The invention discloses a dynamic percentage sample clipping AdaBoost face detection algorithm, specifically: at the beginning of each iteration, the percentage f of the number of samples to be clipped is first determined, and samples with smaller weights are clipped according to f in each round , use the remaining samples for training, when the error rate of the best weak classifier of this iteration obtained by training is greater than the error rate generated by the random value, by reducing the constant f of the clipping, the number of sample sets is enlarged, and retraining for this iteration . If the error rate still exceeds 0.5 when all samples are used for training, the iteration is stopped. The present invention is suitable for saving training time by selecting some samples with better effect on performance improvement when there are too many samples participating in training.

Description

Translated fromChinese

动态百分比样本裁剪AdaBoost人脸检测方法AdaBoost face detection method based on dynamic percentage sample cropping

技术领域technical field

本发明涉及一种动态百分比样本裁剪AdaBoost人脸检测方法，属于模式识别技术领域。The invention relates to a dynamic percentage sample clipping AdaBoost face detection method, which belongs to the technical field of pattern recognition.

背景技术Background technique

生物特征识别技术是通过每个个体所独有的生理特征和行为特征来实现身份证实或个体鉴别的目的。人脸作为生物特征的一种，具有易于获取，接口友好等特点，相较于现在常用的方式，如口令、信用卡、身份卡等，具有不可复制、携带方便、鉴别性强等优势。因此在视频监控、智能家居和刑事侦查等领域具有广阔的前景。随着嵌入式设备运算能力越来越强，智能算法越来越多地应用于嵌入式开发领域，实现不同的功能。其中人脸检测作为人脸识别的基础，成为了人工智能领域的研究热点。Biometric identification technology is to achieve the purpose of identity verification or individual identification through the unique physiological and behavioral characteristics of each individual. As a kind of biometrics, the face is easy to obtain and has a friendly interface. Compared with the commonly used methods such as passwords, credit cards, and identity cards, it has the advantages of non-replicable, easy to carry, and strong identification. Therefore, it has broad prospects in the fields of video surveillance, smart home and criminal investigation. As the computing power of embedded devices becomes stronger and stronger, intelligent algorithms are more and more used in the field of embedded development to realize different functions. Among them, face detection, as the basis of face recognition, has become a research hotspot in the field of artificial intelligence.

AdaBoost算法其核心是通过迭代的方法从大量的Haar特征中提取出分类效果最好的特征作为弱分类器，而最终生成的强分类器是由大量的弱分类器组成。AdaBoost实用而简单，而基于AdaBoost算法的人脸检测方法对于单一人脸图像的检测不仅具有极高的检测精度，而且具备很快的检测速度,因此基于该算法的人脸识别技术得到了广泛的应用。The core of the AdaBoost algorithm is to extract the feature with the best classification effect from a large number of Haar features through an iterative method as a weak classifier, and the final strong classifier is composed of a large number of weak classifiers. AdaBoost is practical and simple, and the face detection method based on the AdaBoost algorithm not only has extremely high detection accuracy for the detection of a single face image, but also has a fast detection speed, so the face recognition technology based on this algorithm has been widely used. application.

当训练样本，样本特征，弱分类器个数较多的时候，采用AdaBoost算法训练的分类器会消耗大量的训练时间。特征个数决定了算法的迭代次数，每次迭代获取相应特征在训练样本集中的错误率，最后通过比较错误率取得最佳弱分类器。每训练完一个最佳弱分类器，训练样本的权重会相应的发生变化，因此如果需要更多的弱分类器，则需要重复相应次数的上述步骤。由此可见，当训练样本，样本特征个数和弱分类器个数增加时，训练时间会以三次方的数量级增加。When the number of training samples, sample features, and weak classifiers is large, the classifier trained by the AdaBoost algorithm will consume a lot of training time. The number of features determines the number of iterations of the algorithm. Each iteration obtains the error rate of the corresponding feature in the training sample set, and finally obtains the best weak classifier by comparing the error rate. Every time the best weak classifier is trained, the weight of the training sample will change accordingly, so if more weak classifiers are needed, the above steps need to be repeated for a corresponding number of times. It can be seen that when the number of training samples, the number of sample features and the number of weak classifiers increase, the training time will increase in the order of cubic.

发明内容Contents of the invention

本发明的目的在于克服现有技术中的不足，提供一种动态百分比样本裁剪AdaBoost人脸检测方法，解决现有技术中采用AdaBoost算法训练的分类器会消耗大量的训练时间的技术问题。The purpose of the present invention is to overcome the deficiencies in the prior art, provide a kind of dynamic percentage sample cropping AdaBoost face detection method, solve the technical problem that the classifier that adopts AdaBoost algorithm training in the prior art can consume a large amount of training time.

为解决上述技术问题，本发明所采用的技术方案是：动态百分比样本裁剪AdaBoost人脸检测方法，在每次迭代开始的时候，首先确定所需裁剪样本个数的百分比f，每一轮按照f裁剪掉权重较小的样本，用剩余样本进行训练；In order to solve the above-mentioned technical problems, the technical solution adopted in the present invention is: the dynamic percentage sample cutting AdaBoost face detection method, when each iteration starts, at first determine the percentage f of the required cutting sample number, each round according to f Cut out samples with smaller weights and use the remaining samples for training;

当训练得到的本次迭代的最佳弱分类器错误率大于随机值产生的错误率，通过减小裁剪的常量f，扩大样本集数量，对于本次迭代重新进行训练；When the error rate of the best weak classifier of this iteration obtained by training is greater than the error rate generated by the random value, the number of sample sets is enlarged by reducing the clipping constant f, and retraining for this iteration;

如果当采用全部样本进行训练时，错误率仍然超过0.5，则停止迭代；If the error rate still exceeds 0.5 when all samples are used for training, stop the iteration;

具体算法包括如下步骤：The specific algorithm includes the following steps:

步骤一：设输入的训练样本总数为N，其中负样本为m个，正样本为n个，训练样本集为S＝{(x₁,y₁),...(x_n,y_n)}，其中x_i表示第i个样本，y_i＝{1,0}，分别用于标识正负样本；Step 1: Let the total number of input training samples be N, among which there are m negative samples and n positive samples, and the training sample set is S={(x₁ ,y₁ ),...(x_n ,y_n ) }, where x_i represents the i-th sample, y_i ={1,0}, which are used to identify positive and negative samples respectively;

步骤二：初始化样本权重：Step 2: Initialize sample weights:

步骤三：假设每一轮舍去的样本百分比为f，那么每一轮参与训练的样本个数为N×(1-f)，迭代次数t＝1,2,…,T；Step 3: Assuming that the percentage of samples discarded in each round is f, then the number of samples participating in training in each round is N×(1-f), and the number of iterations t=1,2,...,T;

步骤四：获取最优弱分类器，求得弱分类器h_t在强分类器中的加权系数α_t，方法如下：Step 4: Obtain the optimal weak classifier, and obtain the weighting coefficient α_t of the weak classifier h_t in the strong classifier, the method is as follows:

步骤401：归一化样本的权重值：Step 401: Normalize the weight value of the sample:

步骤402：针对每个特征j，训练一个简单弱分类器h_j(x,f_j,p_j,θ_j)：Step 402: For each feature j, train a simple weak classifier h_j (x,f_j ,p_j ,θ_j ):

其中，f_j(x)为特征值，p_j表示不等号方向，θ_j为弱分类器阈值；Among them, f_j (x) is the feature value, p_j represents the direction of the inequality sign, and θ_j is the threshold of the weak classifier;

步骤403：选择最小错误率对应的弱分类器h_t(x)，其中最小错误率定义为：Step 403: Select the weak classifier h_t (x) corresponding to the minimum error rate, where the minimum error rate is defined as:

步骤404：如果ε_t＝0或者在第一轮训练时就出现ε_t≥0.5，则令T＝t-1，跳到步骤六；如果ε_t≥0.5且不是第一轮，则令T＝t-1，判断f是否大于2/3，若大于则令f＝2×f-1，否则令f＝f/2跳转到步骤五；Step 404: If ε_t = 0 or ε_t ≥ 0.5 in the first round of training, then set T = t-1, skip to step 6; if ε_t ≥ 0.5 and not in the first round, then set T = t-1, judge whether f is greater than 2/3, if greater, set f=2×f-1, otherwise set f=f/2 and jump to step five;

步骤405：更新样本权重：Step 405: Update sample weights:

当样本x_i被错误分类时e_i＝0，反之e_i＝1，When the sample x_i is misclassified, e_i =0, otherwise e_i =1,

步骤406：求得弱分类器h_t在强分类器中的加权系数：Step 406: Obtain the weighting coefficient of the weak classifier h_t in the strong classifier:

步骤五：对训练集中样本，按权重值从小到大进行排列，根据裁剪的百分比f，裁剪掉权重较小的前n×f个样本；Step 5: Arrange the samples in the training set according to the weight value from small to large, and cut out the first n×f samples with smaller weights according to the clipping percentage f;

步骤六：输出强分类器：Step 6: Output strong classifier:

与现有技术相比，本发明所达到的有益效果是：适用于参与训练的样本个数过多时，通过选取部分对性能提升效果更好的样本，来达到节省训练时间的目的。Compared with the prior art, the beneficial effect achieved by the present invention is: when there are too many samples for training, the purpose of saving training time is achieved by selecting some samples with better effect on performance improvement.

附图说明Description of drawings

图1是本发明方法的流程图。Figure 1 is a flow chart of the method of the present invention.

图2是获取最优弱分类器的流程图。Fig. 2 is a flow chart of obtaining the optimal weak classifier.

具体实施方式Detailed ways

下面结合附图对本发明作进一步描述。The present invention will be further described below in conjunction with the accompanying drawings.

附图中各函数所表示的含义如下：The meanings of each function in the accompanying drawings are as follows:

函数cvGetTickCount()：返回从操作系统启动到当前所经过的毫秒数，通过计算两个返回量的差值便可以统计训练所耗费的时间。Function cvGetTickCount(): returns the number of milliseconds elapsed from the start of the operating system to the current time, and the time spent on training can be counted by calculating the difference between the two returned values.

函数Single_Classifier(int i)：用于产生一个强分类器，传入的参数表示构成此强分类器的弱分类器个数。Function Single_Classifier(int i): It is used to generate a strong classifier, and the parameter passed in indicates the number of weak classifiers constituting the strong classifier.

函数Generate_AllFeatures(int count)：用于生成所有Haar-like的特征，count表示使用特征类型的数量。本发明选用了5种常用特征模板，因此count值为5。Function Generate_AllFeatures(int count): used to generate all Haar-like features, count represents the number of feature types used. The present invention selects 5 commonly used feature templates, so the count value is 5.

函数Input_Samples()：从指定目录中读入正负样本。Function Input_Samples(): read positive and negative samples from the specified directory.

函数Select_WeakClassifier()：用于获取最优弱分类器。Function Select_WeakClassifier(): used to obtain the optimal weak classifier.

函数Output_WeakClassifier()：用于输出生成的弱分类器。Function Output_WeakClassifier(): A weak classifier for output generation.

函数Cal_HaarValue(j,k)：用于计算第k个样本的第j个特征。Function Cal_HaarValue(j,k): used to calculate the jth feature of the kth sample.

函数qsort()：根据特征值的大小对样本进行排序。Function qsort(): Sort the samples according to the size of the eigenvalues.

如图1所示，动态百分比样本裁剪AdaBoost人脸检测方法，在每次迭代开始的时候，首先确定所需裁剪样本个数的百分比f，每一轮按照f裁剪掉权重较小的样本，用剩余样本进行训练；As shown in Figure 1, the dynamic percentage sample cropping AdaBoost face detection method, at the beginning of each iteration, first determine the percentage f of the number of clipped samples, and cut out samples with smaller weights according to f in each round, using The remaining samples are used for training;

步骤二：初始化样本权重：Step 2: Initialize sample weights:

步骤四：获取最优弱分类器，求得弱分类器h_t在强分类器中的加权系数α_t，如图2所示，方法如下：Step 4: Obtain the optimal weak classifier, and obtain the weighting coefficient α_t of the weak classifier h_t in the strong classifier, as shown in Figure 2, the method is as follows:

步骤405：更新样本权重：Step 405: Update sample weights:

步骤六：输出强分类器：Step 6: Output strong classifier:

以上所述仅是本发明的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明技术原理的前提下，还可以做出若干改进和变形，这些改进和变形也应视为本发明的保护范围。The above is only a preferred embodiment of the present invention, it should be pointed out that for those of ordinary skill in the art, without departing from the technical principle of the present invention, some improvements and modifications can also be made. It should also be regarded as the protection scope of the present invention.