CN105894050A

Movatterモバイル変換

Info

Publication number: CN105894050A
Application number: CN201610383915.4A
Authority: CN
Inventors: 袁家政; 赵新超
Original assignee: Beijing Union University
Current assignee: Beijing Union University
Priority date: 2016-06-01
Filing date: 2016-06-01
Publication date: 2016-08-24

Abstract

Translated fromChinese

一种基于多任务学习的人脸图像种族和性别识别方法，该方法涉及数字图像处理、模式识别、计算机视觉、生理学等技术领域，用于解决多种场合下基于静态图像或视频图像的人脸图像种族和性别识别问题。多任务学习方法是一种通过相关任务学习来提高学习性能的学习方法，既可以区分任务间学习的差异性，也可以共享任务间相关特征，通过相关性来提高学习能力，缓解高维小样本过学习问题。将多任务学习方法引入人脸图像种族和性别识别中，以不同语义作为不同任务，提出基于语义的多任务特征选择，应用于种族和性别识别，能显著提高学习系统的泛化能力和识别效果。

A face image race and gender recognition method based on multi-task learning, which involves digital image processing, pattern recognition, computer vision, physiology and other technical fields, and is used to solve face recognition based on static images or video images in various occasions Image race and gender identification issues. The multi-task learning method is a learning method that improves learning performance by learning related tasks. It can not only distinguish the difference in learning between tasks, but also share related features between tasks, improve learning ability through correlation, and alleviate high-dimensional small sample over learning issues. Introduce the multi-task learning method into face image race and gender recognition, take different semantics as different tasks, propose a semantic-based multi-task feature selection, and apply it to race and gender recognition, which can significantly improve the generalization ability and recognition effect of the learning system .

Description

Translated fromChinese

一种基于多任务学习的人脸图像种族和性别识别方法A face image race and gender recognition method based on multi-task learning

技术领域technical field

本发明是一种基于多任务学习的人脸图像种族和性别识别方法，该方法涉及数字图像处理、模式识别、计算机视觉、生理学等技术领域。The invention relates to a face image race and gender recognition method based on multi-task learning, and the method relates to technical fields such as digital image processing, pattern recognition, computer vision, and physiology.

背景技术Background technique

人脸图像蕴含着丰富的信息，从模式识别的角度，可以进行种族识别、性别识别、身份识别等。Face images contain a wealth of information. From the perspective of pattern recognition, it can be used for ethnic identification, gender identification, identity identification, etc.

主成分分析法(Principle Component Analysis,PCA)，其基本思想是通过K-L变换来提取样本的主要特征，通过解训练样本协方差矩阵的特征向量得到展开基，按特征值降序排序代表主成分的重要程度。Kirby(Turk,M.,Pentland,A.,Eigenface for Recognition[J].Journal of cognitive Neuroscience.Vol.3,No.1,1991,pp.1-17)等在1990年将PCA用于解决人脸识别问题，Turk(Kirby,M.,Sirovich,L.,Application of the Karhunen-Loeve procedure for the characterization of humanfaces[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1990,Vol.12,No.1,pp.103-108)等将之发展为特征脸(Eigenfaces)，用于正面人脸识别，从此人脸子空间的方法得到研究者的重视。Principal component analysis (Principle Component Analysis, PCA), its basic idea is to extract the main characteristics of the sample through the K-L transformation, and obtain the expansion base by solving the eigenvector of the covariance matrix of the training sample, and sorting in descending order of the eigenvalues represents the importance of the principal components. degree. Kirby (Turk, M., Pentland, A., Eigenface for Recognition [J]. Journal of cognitive Neuroscience. Vol. 3, No. 1, 1991, pp. 1-17) etc. used PCA in 1990 to solve human Face recognition problem, Turk (Kirby, M., Sirovich, L., Application of the Karhunen-Loeve procedure for the characterization of humanfaces [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1990, Vol.12, No.1 , pp.103-108) and others developed it into Eigenfaces for frontal face recognition. Since then, the method of face subspace has attracted the attention of researchers.

支持向量机[42](Support Vector Machine，SVM)(Cortes,V.V.,Support vectornetworks[J].Machine Learning,Vol.20,1995,pp.273～297)是由Cortes和Vapnik于1995年提出用来解决手写识别问题的。它可以根据有限的样本信息在模型的复杂度和学习能力之间寻求最佳的折衷，以期得到最好的推广能力。Support vector machine [42] (Support Vector Machine, SVM) (Cortes, V.V., Support vectornetworks [J].Machine Learning, Vol.20, 1995, pp.273～297) was proposed by Cortes and Vapnik in 1995 to be used Solve the problem of handwriting recognition. It can seek the best compromise between the complexity of the model and the learning ability according to the limited sample information, in order to obtain the best generalization ability.

SVM的最终决策函数只由少数的支持向量所确定，计算的复杂度取决于支持向量的数目，而不是样本空间的维数，这不但可以帮助我们抓住关键样本、剔除大量冗余样本，而且方法简单，具有较好的鲁棒性，主要体现在：The final decision function of SVM is only determined by a small number of support vectors, and the computational complexity depends on the number of support vectors, rather than the dimension of the sample space, which can not only help us capture key samples and eliminate a large number of redundant samples, but also The method is simple and has good robustness, which is mainly reflected in:

1.增、删非支持向量样本对模型没有影响；1. Adding or deleting non-support vector samples has no effect on the model;

2.支持向量样本集具有一定的鲁棒性；2. The support vector sample set has certain robustness;

3.对核的选取相对不敏感。3. Relatively insensitive to kernel selection.

SVM本质上是两类分类器。但面对多类问题，可以构造一对多，一对一和SVM决策树等方法。SVM决策树(SVM Decision Tree)将SVM和二叉决策树结合起来，构成多级分类器。SVMs are essentially two-class classifiers. But in the face of multi-class problems, methods such as one-to-many, one-to-one, and SVM decision trees can be constructed. SVM Decision Tree (SVM Decision Tree) combines SVM and binary decision tree to form a multi-level classifier.

多任务学习方法是一种通过相关任务学习来提高学习性能的学习方法，既可以区分任务间学习的差异性，也可以共享任务间相关特征，通过相关性来提高学习能力，缓解高维小样本过学习问题。The multi-task learning method is a learning method that improves learning performance by learning related tasks. It can not only distinguish the difference in learning between tasks, but also share related features between tasks, improve learning ability through correlation, and alleviate high-dimensional small sample over learning issues.

发明内容Contents of the invention

人脸图像包含的丰富语义为多任务学习的引入提供可能，我们将不同的语义作为不同的任务，提出了基于语义的多任务特征选择算法，对于给定的人脸图像训练集，提取特征后得到按任务排序的数据集，通过迭代求得最优的c个任务的稀疏系数W，W的每一列对应单个任务的稀疏系数，稀疏值绝对值越大，表明该维特征对所在任务识别的贡献率越大，按贡献值排序后即可实现对每种任务的特征进行选择。The rich semantics contained in face images provide the possibility for the introduction of multi-task learning. We regard different semantics as different tasks and propose a semantic-based multi-task feature selection algorithm. For a given face image training set, after feature extraction Obtain the data set sorted by task, and iteratively obtain the optimal sparsity coefficient W of c tasks. Each column of W corresponds to the sparse coefficient of a single task. The larger the absolute value of the sparsity value, it indicates that the feature of this dimension is more effective in identifying the task. The larger the contribution rate, the feature selection of each task can be realized after sorting by the contribution value.

此法主要包含训练部分和测试部分。训练部分是通过训练集和学习策略得到性别和种族的特征选择标签，在测试过程中选择不同的任务类标，经分类器后得到识别结果，经多次测量得到多任务特征选择的识别准确率。This method mainly includes a training part and a testing part. The training part is to obtain the feature selection labels of gender and race through the training set and learning strategy. During the test process, different task class labels are selected, and the recognition results are obtained after the classifier. The recognition accuracy of multi-task feature selection is obtained after multiple measurements. .

一种基于多任务学习的人脸图像种族和性别识别方法,该方法包括如下步骤：A face image race and gender recognition method based on multi-task learning, the method comprising the steps of:

S1图片预处理S1 image preprocessing

图片预处理是人脸识别流程中重要的一个环节，其主要目的是削弱甚至消除人脸图片中对识别无关的信息，滤除噪声干扰，增强有用信息的比重，便于后期提取更加有效的特征用于分类。本发明采用的图片预处理的方法主要包括：Image preprocessing is an important part of the face recognition process. Its main purpose is to weaken or even eliminate information irrelevant to recognition in face images, filter out noise interference, increase the proportion of useful information, and facilitate later extraction of more effective features. for classification. The method for the picture pretreatment that the present invention adopts mainly comprises:

(1)直方图均衡化：该方法将已知图片经过某种变换，使其灰度概率密度呈均勾分布，以此削弱光照不均匀变化的影响，从而增强图像整体对比度。直方图均衡化分三步：(1) Histogram equalization: This method transforms the known image to make its gray probability density uniformly distributed, so as to weaken the influence of uneven illumination changes, thereby enhancing the overall image contrast. Histogram equalization is divided into three steps:

a)统计原图像的直方图；a) Statistical histogram of the original image;

b)由原始图片的灰度直方图根据积分累计函数做变换生成新灰度直方图；b) Transform the gray histogram of the original image according to the integral accumulation function to generate a new gray histogram;

S_k＝T(r_k)＝∑P_r(r)S_k ＝T(r_k )＝∑P_r (r)

其中，S_k表示变换后的图像灰度；r表示原始图像灰度；k表示灰度级编号；T(r_k)表示变换函数；P_r(r)表示图像灰度级的概率密度函数。Among them, S_k represents the transformed image gray level; r represents the original image gray level; k represents the gray level number; T(r_k ) represents the transformation function; P_r (r) represents the probability density function of the image gray level.

c)用新灰度直方图代替原直方图，此时可使图片中各个像素值出现的概率相近。c) Replace the original histogram with a new grayscale histogram, at this time, the probability of occurrence of each pixel value in the picture can be made similar.

(2)灰度拉伸：灰度拉伸是最基本的灰度变换方法之一，通过采用分段线性函数对原始图像灰度值的任一范围进行转换，使其变化到指定的区间，从而可以增大图像在某一区间的对比度。灰度拉伸分两步完成：(2) Grayscale stretching: Grayscale stretching is one of the most basic grayscale transformation methods. By using a piecewise linear function to convert any range of the original image grayscale value, it changes to a specified interval. Thereby, the contrast of the image in a certain interval can be increased. Grayscale stretching is done in two steps:

a)在原始图像上计算直方图，根据原始图像的直方图分布情况来确定灰度拉伸的拐点。a) Calculate the histogram on the original image, and determine the inflection point of grayscale stretching according to the histogram distribution of the original image.

b)使用确定的分段线性函数将原像素灰度值映射成指定数值，以此代替原像素值。b) Using a determined piecewise linear function to map the original pixel gray value to a specified value, so as to replace the original pixel value.

通过该方法对人脸图片的像素值的指定范围进行映射，从而增强灰度值局部对比度。Through this method, the specified range of the pixel value of the face picture is mapped, thereby enhancing the local contrast of the gray value.

(3)图像归一化：人脸图像归一化包含两个方面：(3) Image normalization: Face image normalization includes two aspects:

a)将人脸图片进行缩放或者旋转使其成为大小一致、人脸特征点位置大致相同的一批图片，这一步骤称为人脸图片的几何归一化。这是由于人脸图片在采集的过程中，一些物理位置的差异造成的如人脸拍摄距离不同或者人脸的拍摄角度差异。a) Scale or rotate the face picture to make it a batch of pictures with the same size and approximately the same position of the feature points of the face. This step is called geometric normalization of the face picture. This is due to differences in physical positions during the collection of face pictures, such as different shooting distances of faces or differences in shooting angles of faces.

b)消除在人脸图片采集过程中由于光照变化等带来的图片像素值差异较大的影响。对这一类差异我们可进行图像的灰度值归一化操作，以此削弱因光照或者拍摄角度不同引起的灰度值的差异。通过对图像进行归一化可以在很大程度上改善人脸识别系统的性能，提高识别算法识别率，因此具有很重要的意义。b) Eliminate the influence of large differences in image pixel values caused by changes in illumination during the face image collection process. For this type of difference, we can normalize the gray value of the image to weaken the difference in gray value caused by different lighting or shooting angles. Normalizing the image can greatly improve the performance of the face recognition system and improve the recognition rate of the recognition algorithm, so it is of great significance.

S2多任务标记S2 multitasking flag

(1)从训练样本库中选择M幅人脸图片作为训练样本集，并进行预处理。(1) Select M face pictures from the training sample library as the training sample set, and perform preprocessing.

(2)对于给定的c个相关学习任务，训练集标记为：{(X₁,y₁),…,(X_c,y_c)}，其中：(2) For a given c related learning tasks, the training set is marked as: {(X₁ ,y₁ ),…,(X_c ,y_c )}, where:

表示第i个任务的训练样本； Indicates the training sample of the i-th task;

表示第i个任务的类标； Indicates the class label of the i-th task;

n：表示第i个任务训练样本的个数；n: Indicates the number of training samples for the i-th task;

d：表示训练样本的维数。d: Indicates the dimensionality of the training sample.

通过c个任务的共同学习，以期得到权重矩阵：W＝[w₁,…,w_c]∈R^d*c，为第i个任务的权值系数。Through the common learning of c tasks, in order to obtain the weight matrix: W=[w₁ ,…,w_c ]∈R^d*c , is the weight coefficient of the i-th task.

S3训练模型S3 training model

假设J_i(w_i,X_i,y_i)＝|w_iX_i-y_i|为第i个任务的损失函数，通用的损失函数有：对数似然函数、指数似然函数和Hinge函数。Assuming that J_i (w_i ,X_i ,y_i )=|w_i X_i -y_i | is the loss function of the i-th task, the general loss functions are: logarithmic likelihood function, exponential likelihood function and Hinge function.

对于第i个任务，最小化经验误差的同时，使用范式得到优化问题如下：For the i-th task, while minimizing the empirical error, the optimization problem obtained using the paradigm is as follows:

$\underset{{w w}_{i i}}{m m i i n no} {J J}_{i i} (({w w}_{i i},, {X x}_{i i},, {y the y}_{i i})) + + λ λ | | | | {w w}_{i i} | | {| |}_{11}$

对单个任务来看，上述问题通常被称为LASSO(least absolute shrinkage andselection operator)，即稀疏表示分类(sparse representation for classification，SRC)方法，LASSO是一个凸优化问题，不再具有解析解，但求解过程中会使w_i中多项趋近于0，具有稀疏性，是l₀正则化很好的近似(l₀是NP难问题)。For a single task, the above problem is usually called LASSO (least absolute shrinkage and selection operator), that is, the sparse representation for classification (SRC) method. LASSO is a convex optimization problem that no longer has an analytical solution, but it can be solved In the process, the multiple items in w_i will approach 0, which is sparse, and it is a good approximation of l₀ regularization (l₀ is an NP-hard problem).

单独求解c个任务的最优解与求解联合任务的全局目标函数一致，表述如下：Solving the optimal solution of c tasks alone is consistent with the global objective function of solving the joint task, expressed as follows:

$\underset{w w}{m m i i n no} {Σ Σ}_{i i = = 11}^{c c} {J J}_{i i} (({w w}_{i i},, {X x}_{i i},, {y the y}_{i i})) + + λ λ {Σ Σ}_{i i = = 11}^{c c} | | | | {w w}_{i i} | | {| |}_{11}$

$W W = = (\begin{matrix} {w w}_{11}^{11} & {w w}_{22}^{11} & ... ... & {w w}_{c c}^{11} \\ {w w}_{11}^{22} & {w w}_{22}^{22} & ... ... & {w w}_{c c}^{22} \\ ... ... & ... ... & ... ... & ... ... \\ {w w}_{11}^{d d} & {w w}_{22}^{d d} & ... ... & {w w}_{c c}^{d d} \end{matrix})$

可见，在l₁范数的约束下，各个任务之间是相互独立的，为了对全局特征进行特征选择，做如下改变后，得到l₂范数约束下的目标函数表达形式：It can be seen that under the constraints of the l₁ norm, each task is independent of each other. In order to perform feature selection on the global features, after the following changes are made, the expression form of the objective function under the l₂ norm constraints is obtained:

$\underset{w w}{min min} {Σ Σ}_{i i = = 11}^{c c} {J J}_{i i} (({w w}_{i i},, {X x}_{i i},, {y the y}_{i i})) + + λ λ {Σ Σ}_{k k = = 11}^{d d} | | | | {w w}^{k k} | | {| |}_{22}$

直观可见l₂范数下共享了各个任务之间的特征。从整个W来看，先计算w^k的2范数，再求和，称为l₁/l₂范数。Intuitively, it can be seen that the characteristics between each task are shared under the l₂ norm. From the perspective of the whole W, First calculate the 2-norm of w^k , and then sum, which is called the l₁ /l₂ norm.

通过迭代求得最优的c个任务的稀疏系数W，W的每一列对应单个任务的稀疏系数，稀疏值绝对值越大，表明该维特征对所在任务识别的贡献率越大，按贡献值排序后即可实现对每种任务的特征进行选择，最终得到最优模型。The optimal sparsity coefficient W of c tasks is obtained through iteration. Each column of W corresponds to the sparse coefficient of a single task. The larger the absolute value of the sparsity value, the greater the contribution rate of the feature of this dimension to the recognition of the task. According to the contribution value After sorting, the features of each task can be selected, and finally the optimal model can be obtained.

S4模型测试S4 model test

(1)从测试样本库中选择N幅人脸图片作为测试样本集，并进行预处理。(1) Select N face pictures from the test sample library as the test sample set, and perform preprocessing.

(2)将测试图片依次输入到训练好的模型中，同时对两个任务进行求解。(2) Input the test pictures into the trained model in turn, and solve the two tasks at the same time.

(3)对求解的类别得分进行排序，取最大值作为最终的预测类别。(3) Sort the solved category scores and take the maximum value as the final predicted category.

(4)得出人脸图片的种族和性别的类别之后，可根据索引从对应的类标记文件中读取标记文本进行输出说明。(4) After obtaining the categories of race and gender of the face picture, the label text can be read from the corresponding class label file according to the index for output description.

当下流行的机器学习算法理论都是在统一的模型下一次只学习一个任务，将复杂问题先分解成理论上独立的子问题，在每个子问题中，训练集中的样本只反映单个任务的信息。人脸图像蕴含着种族、性别、身份等信息，对应不同信息的识别任务间存在相关性，在学习过程中各个任务之间共享一定的相关信息。将多任务学习方法引入人脸图像种族和性别识别中，以不同语义作为不同任务，提出基于语义的多任务特征选择，应用于种族和性别识别，能显著提高学习系统的泛化能力和识别效果。The current popular machine learning algorithm theory is to learn only one task at a time under a unified model, and decompose complex problems into theoretically independent sub-problems. In each sub-problem, the samples in the training set only reflect the information of a single task. Face images contain information such as race, gender, identity, etc. There is a correlation between recognition tasks corresponding to different information, and certain relevant information is shared between each task in the learning process. Introduce the multi-task learning method into face image race and gender recognition, take different semantics as different tasks, propose a semantic-based multi-task feature selection, and apply it to race and gender recognition, which can significantly improve the generalization ability and recognition effect of the learning system .

附图说明Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分，示出了符合本发明的实施例，并与说明书一起用于解释本发明的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description serve to explain the principles of the invention.

图1是关于该发明方法的模型训练流程示意图；Fig. 1 is a schematic diagram of the model training process of the inventive method;

图2是关于该发明方法的模型测试流程示意图；Fig. 2 is a schematic diagram of the model testing process about the inventive method;

图3是根据一示例性实施例示出的一组原始的训练样本图片集；Fig. 3 is a group of original training sample picture sets shown according to an exemplary embodiment;

图4是根据一示例性实施例示出的一组经过预处理之后的样本图片集；Fig. 4 is a set of sample picture sets after preprocessing according to an exemplary embodiment;

图5是根据一示例性实施例示出的一组测试结果示意图；Fig. 5 is a schematic diagram showing a group of test results according to an exemplary embodiment;

具体实施方式detailed description

这里将详细地对示例性实施例进行说明，其示例表示在附图中。Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings.

下面的描述涉及附图时，除非另有表示，不同附图中的相同数字表示相同或相似的要素。When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated.

以下示例性实施例中所描述的实施方式并不代表与本发明相一致的所有实施方式。相反，它们仅是与如所附权利要求书中所详述的、本发明的一些方面相一致的方法的例子。The implementations described in the following exemplary examples do not represent all implementations consistent with the present invention. Rather, they are merely examples of approaches consistent with aspects of the invention as recited in the appended claims.

在本发明中使用的术语是仅仅出于描述特定实施例的目的，而非旨在限制本发明。The terminology used in the present invention is for the purpose of describing particular embodiments only, and is not intended to limit the present invention.

在本发明和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式，除非上下文清楚地表示其他含义。As used herein and in the appended claims, the singular forms "a", "the", and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise.

在本发明中使用的术语“和”、“或”是指包含一个或多个相关联的列出项目的任何或所有可能的组合。The term "and", "or" used in the present invention is meant to include any or all possible combinations of one or more of the associated listed items.

在本发明可能采用术语“第一”、“第二”、“第三”等来描述各种信息，但这些信息不仅限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如，在不脱离本发明范围的情况下，第一信息也可以被称为第二信息，类似地，第二信息也可以被称为第一信息。取决于语境，如在此所使用的词语“如果”可以被解释成为“在......时”或“当......时”或“响应于......”。In the present invention, the terms "first", "second", "third", etc. may be used to describe various information, but the information is not limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of the present invention, first information may also be called second information, and similarly, second information may also be called first information. Depending on the context, the word "if" as used herein may be construed as "at" or "when" or "in response to ".

以下结合附图，对本发明的具体实施例作进一步详细说明：Below in conjunction with accompanying drawing, specific embodiment of the present invention is described in further detail:

S1图片预处理S1 image preprocessing

a)统计原图像的直方图；a) Statistical histogram of the original image;

S_k＝T(r_k)＝∑P_r(r)S_k ＝T(r_k )＝∑P_r (r)

S2多任务标记S2 multitasking flag

(2)对于给定的c个相关学习任务，训练集标记为：{(X₁,y₁),…,(X_c,y_c)}，(2) For a given c related learning tasks, the training set is marked as: {(X₁ ,y₁ ),…,(X_c ,y_c )},

其中：in:

表示第i个任务的类标； Indicates the class label of the i-th task;

S3训练模型S3 training model

$\underset{{w w}_{i i}}{min min} {J J}_{i i} (({w w}_{i i},, {X x}_{i i},, {y the y}_{i i})) + + λ λ | | | | {w w}_{i i} | | {| |}_{11}$

$\underset{w w}{min min} {Σ Σ}_{i i = = 11}^{c c} {J J}_{i i} (({w w}_{i i},, {X x}_{i i},, {y the y}_{i i})) + + λ λ {Σ Σ}_{i i = = 11}^{c c} | | | | {w w}_{i i} | | {| |}_{11}$

通过迭代求得最优的c个任务的稀疏系数W，W的每一列对应单个任务的稀疏系数，稀疏值绝对值越大，表明该维特征对所在任务识别的贡献率越大，按贡献值排序后即可实现对每种任务的特征进行选择，最终得到最优模型。The optimal sparsity coefficient W of c tasks is obtained through iteration. Each column of W corresponds to the sparsity coefficient of a single task. The larger the absolute value of the sparsity value, the greater the contribution rate of the feature of this dimension to the recognition of the task. According to the contribution value After sorting, the features of each task can be selected, and finally the optimal model can be obtained.

S4模型测试S4 model test