CN111274882A

Movatterモバイル変換

Info

Publication number: CN111274882A
Application number: CN202010028680.3A
Authority: CN
Inventors: 胡玉帅; 沈修平
Original assignee: SHANGHAI ULUCU ELECTRONIC TECHNOLOGY CO LTD
Current assignee: SHANGHAI ULUCU ELECTRONIC TECHNOLOGY CO LTD
Priority date: 2020-01-11
Filing date: 2020-01-11
Publication date: 2020-06-12

Abstract

Translated fromChinese

本发明涉及一种基于弱监督的人脸年龄自动估计方法，包含以下步骤：预处理、根据图片本身的均值和方差对图片做归一化；卷积神经网络；提取特征向量和类别；计算每个类别的特征向量的中心点；计算每个图片与每个类别中心点的距离；根据图片与每个类别中心点的距离估算年龄。本发明依靠年龄段弱标签可以自动训练强分类具体年龄模型，该模型对一张未标注的图片自动估计具体年龄，减少了对人工精确标记的依赖。并且该算法从弱分类年龄段到强分类具体年龄的计算非常简单。

The invention relates to an automatic face age estimation method based on weak supervision, comprising the following steps: preprocessing, normalizing the picture according to the mean and variance of the picture itself; convolutional neural network; extracting feature vectors and categories; The center point of the feature vector of each category; calculate the distance between each picture and the center point of each category; estimate the age according to the distance between the picture and the center point of each category. The invention can automatically train a strong classification specific age model by relying on weak age labels, and the model can automatically estimate the specific age for an unlabeled picture, thereby reducing the dependence on manual accurate labels. And the calculation of the algorithm from the weakly classified age group to the strongly classified specific age is very simple.

Description

Translated fromChinese

基于弱监督的人脸年龄自动估计方法Automatic face age estimation method based on weak supervision

技术领域technical field

本发明涉及人脸年龄自动估计方法，尤其是基于弱监督的人脸年龄自动估计方法。The invention relates to an automatic face age estimation method, in particular to an automatic face age estimation method based on weak supervision.

背景技术Background technique

人脸年龄自动估计指提供一张图片，自动识别出图中人物的年龄。这项技术有很多应用，如视频监控、产品推荐、人机交互、市场分析、用户画像、年龄变化预测等。近些年来随着深度学习特别是卷积神经网络领域的发展，人脸年龄估计算法的性能也得到了快速的提高。但是使用算法进行年龄估计仍是一个比较困难的任务，人脸年龄与头骨形状、五官位置、皱纹等都有一定关系，而且会受到光照、姿态、表情等的影响,即使是人类对年龄的观察估计也会有很大偏差，因此人工标记年龄也不精确。当前的大部分基于机器学习的年龄估计方法需要依赖人工标记精确年龄，然后让算法去自动学习图片和标记的年龄之间的对应关系。因为标记不精确，学习到的模型也有误差。并且人工标记是一个很费时的工作，很多时候获取不到具有精确标记的人脸年龄数据。Automatic face age estimation refers to providing a picture and automatically identifying the age of the person in the picture. This technology has many applications, such as video surveillance, product recommendation, human-computer interaction, market analysis, user profiling, age change prediction, etc. In recent years, with the development of deep learning, especially the field of convolutional neural networks, the performance of face age estimation algorithms has also been rapidly improved. However, it is still a difficult task to use algorithms to estimate age. The age of the face is related to the shape of the skull, the position of the facial features, wrinkles, etc., and will be affected by lighting, posture, expressions, etc., even if humans observe age. Estimates can also be highly biased, so manually labeling ages is inaccurate. Most of the current age estimation methods based on machine learning need to rely on manual labeling of accurate ages, and then let the algorithm automatically learn the correspondence between pictures and labeled ages. Because the labels are not precise, the learned model also has errors. And manual labeling is a very time-consuming task, and in many cases, accurate facial age data cannot be obtained.

发明内容SUMMARY OF THE INVENTION

本发明目的在于提供一种基于弱监督的人脸年龄估计方法，此处弱监督(或称弱标签)是指人工对一张图片进行年龄标记的时候，不需要标记精确的年龄，只标记儿童，少年，青年，壮年，老年等几个年龄阶段。这样一是可以减少人工估计年龄的时间，二是降低算法对人工标注的精确性的依赖。The purpose of the present invention is to provide a face age estimation method based on weak supervision, where weak supervision (or weak label) refers to when manually marking a picture for age, it is not necessary to mark the exact age, and only children are marked. , teenagers, youth, adults, old age and other age stages. One is to reduce the time for manual age estimation, and the other is to reduce the algorithm's dependence on the accuracy of manual annotation.

本发明的具体技术方案是：一种基于弱监督的人脸年龄自动估计方法，包含以下步骤：The specific technical scheme of the present invention is: an automatic face age estimation method based on weak supervision, comprising the following steps:

步骤一、预处理：对人脸图像，统一缩放和裁剪为设定尺寸,根据图片本身的均值和方差对图片做归一化；Step 1. Preprocessing: uniformly scale and crop the face image to the set size, and normalize the image according to the mean and variance of the image itself;

步骤二、卷积神经网络：卷积神经网络使用任一种卷积神经网络模型，以经典的残差网络为例，它包含若干卷积层和最后的全连接分类层，各层之间使用直接映射来连接；Step 2. Convolutional Neural Network: Convolutional neural network uses any kind of convolutional neural network model, taking the classic residual network as an example, it contains several convolutional layers and the final fully connected classification layer, which is used between each layer. direct mapping to connect;

步骤三、提取特征向量和类别:神经网络的最终输出为图片所对应的类别得分，最后的全连接分类层的输入特征为图片对应的特征向量；Step 3, extract feature vector and category: the final output of the neural network is the category score corresponding to the picture, and the input feature of the final fully connected classification layer is the feature vector corresponding to the picture;

步骤四、计算每个类别的特征向量的中心点：取训练集中的所有图片,按类别分组，计算每组对应特征向量集合的中心点；Step 4. Calculate the center point of the feature vector of each category: take all the pictures in the training set, group them by category, and calculate the center point of each group of corresponding feature vector sets;

步骤五、计算每个图片与每个类别中心点的距离：对一个待测试的图片，提取特征向量后，计算该特征向量与每个类别中心点的距离，计算公式如下：Step 5. Calculate the distance between each picture and the center point of each category: For a picture to be tested, after extracting the feature vector, calculate the distance between the feature vector and the center point of each category. The calculation formula is as follows:

d_{j_i}＝(f_j-C_i)×(f_j-C_i)d_{j_i} =(f_j -C_i )×(f_j -C_i )

其中，f_j表示第j个测试的图片，C_i表示第i个类别的中心点；d_{j_i}表示待测试的图片与C_i的距离；Among them, f_j represents the image of the jth test, C_i represents the center point of the ith category; d_{j_i} represents the distance between the image to be tested and C_i ;

步骤六、根据图片与每个类别中心点的距离估算年龄：对一张待测试图片p，首先估算每个类中心点之间的距离，计算公式如下：Step 6. Estimate the age according to the distance between the picture and the center point of each category: For a picture p to be tested, first estimate the distance between the center points of each category. The calculation formula is as follows:

C_ij＝(C_i-C_j)×(C_i-C_j)C_ij =(C_i -C_j )×(C_i -C_j )

其中，C_i与C_j表示第i与第j个类别的中心点，C_ij表示两个类别中心点之间的距离；Among them, C_i and C_j represent the center points of the i-th and j-th categories, and C_ij represents the distance between the center points of the two categories;

年龄估算公式如下：The age estimation formula is as follows:

y_p＝Y_{c_i}+d_{p_i}×(C_{i_i+1}>d_{p_i})×(G_i×0.5)y_p =Y_{c_i} +d_{p_i} ×(C_{i_i+1} >d_{p_i} )×(G_i ×0.5)

其中，G_i表示第i类别年龄段的跨度，Y_{c_i}表示第i类年龄段的中心年龄，所述中心年龄是预设跨度的平均值；例如儿童年龄段预设跨度为0-12岁，中心年龄为6岁；d_{p_i}表示预测图片p与对应预测的类别i之间距离；C_{i_i+1}表示类别i与邻近类别之间的距离；y_p是最终得出的年龄。Among them, G_i represents the span of the i-th age group, Y_{c_i} represents the center age of the i-th age group, and the center age is the average value of the preset span; for example, the preset span of children's age group is 0-12 years old, The central age is 6 years old; d_{p_i} represents the distance between the predicted picture p and the corresponding predicted category i; C_{i_i+1} represents the distance between the category i and the adjacent category; y_p is the final age.

进一步的，上述步骤一中统一缩放和裁剪为设定尺寸是缩放到124x124,然后随机裁剪为112x112。Further, in the above step 1, the uniform scaling and cropping to set size is scaling to 124x124, and then randomly cropping to 112x112.

进一步的，上述卷积神经网络模型为残差网络，它包含若干卷积层和最后的全连接分类层，各层之间使用直接映射来连接。Further, the above-mentioned convolutional neural network model is a residual network, which includes several convolutional layers and a final fully-connected classification layer, and direct mapping is used to connect each layer.

目前常用的年龄估计算法普遍存在两个问题：一是非常依赖人工精确标注，而这非常花费时间和精力；二是模型计算度复杂，难以满足实际需求。本发明所提出的基于非精确标记的年龄估计算法，即基于弱监督的人脸年龄自动估计方法，依靠弱标签(年龄段)可以自动训练强分类(具体年龄)模型，该模型对一张未标注的图片自动估计具体年龄，减少了对人工精确标记的依赖。并且该算法从弱分类(年龄段)到强分类(具体年龄)的计算非常简单，相对于神经网络，额外增加的计算量可以忽略不计。There are generally two problems in the commonly used age estimation algorithms: one is that it relies heavily on manual accurate annotation, which takes a lot of time and effort; the other is that the model is computationally complex and difficult to meet actual needs. The imprecise label-based age estimation algorithm proposed by the present invention, that is, the weakly-supervised face age automatic estimation method, can automatically train a strong classification (specific age) model by relying on weak labels (age segments). Annotated images automatically estimate specific age, reducing the reliance on manual precision labeling. And the calculation of the algorithm from weak classification (age group) to strong classification (specific age) is very simple, and the additional calculation amount is negligible compared with the neural network.

在以平均绝对误差MAE为标准的评估中，在同样的网络比如res18，以IMDB-WIKI数据集为例，误差为3.45。该结果与需要精确标记的方法相比是可比较的。以无需精确标记能达到同样的准确度说明了方法的有效性。In the evaluation based on the mean absolute error MAE, in the same network such as res18, taking the IMDB-WIKI dataset as an example, the error is 3.45. The results are comparable to methods requiring precise labeling. The effectiveness of the method is demonstrated by achieving the same accuracy without precise labeling.

附图说明Description of drawings

图1是本发明一种基于弱监督的人脸年龄估计方法的流程图。FIG. 1 is a flow chart of a method for estimating face age based on weak supervision of the present invention.

图2是步骤三提取特征向量和类别的示意图。FIG. 2 is a schematic diagram ofstep 3 extracting feature vectors and categories.

具体实施方式Detailed ways

下面结合附图和具体实施方式对本发明作进一步详细的说明。The present invention will be described in further detail below with reference to the accompanying drawings and specific embodiments.

一种基于弱监督的人脸年龄估计方法，如图1所示，其步骤如下：A face age estimation method based on weak supervision, as shown in Figure 1, the steps are as follows:

步骤一、数据预处理：利用MTCNN或任一种现有的人脸检测和对齐方法，裁剪出人脸框，进行人脸关键点对齐。然后在训练模型前对图像数据预处理，首先大小统一调整为144x144；然后随机裁剪为122x122,随机旋转(比如旋转角度±20°)，随机像素亮度调整；最后对图片做归一化，计算图片本身的均值和方差，然后对图片所有像素，均减去均值，除以方差。Step 1. Data preprocessing: Use MTCNN or any existing face detection and alignment method to crop out the face frame and align the key points of the face. Then preprocess the image data before training the model. First, the size is uniformly adjusted to 144x144; then it is randomly cropped to 122x122, randomly rotated (such as rotation angle ±20°), and the random pixel brightness is adjusted; finally, the image is normalized to calculate the image The mean and variance of itself, and then for all pixels in the picture, subtract the mean and divide by the variance.

步骤二、卷积神经网络：卷积神经网络不是固定的某种结构才能实现所提算法，为了更详细的描述实施方式，以残差网络resnet18为例，它的网络结构如图2所示。它包含17个卷积层，和一个全连接层。除了第一层，所有卷积层卷积核大小都为3x3。为了使梯度正常传播，多个卷积层之间以直接映射加间接映射的方式连接。最后的全连接层输出类别得分。Step 2. Convolutional Neural Network: Convolutional neural network is not a fixed structure to realize the proposed algorithm. In order to describe the implementation in more detail, take the residual network resnet18 as an example, its network structure is shown in Figure 2. It contains 17 convolutional layers, and one fully connected layer. Except for the first layer, all convolutional layers have a kernel size of 3x3. In order to make the gradient propagate normally, multiple convolutional layers are connected by direct mapping and indirect mapping. The final fully connected layer outputs the class score.

步骤三、提取特征向量和类别：如图2所示，取全连接层fc1000之前的特征作为特征向量，预测类别的最大值作为对应类别。在本实施例中，特征向量的维度为256。Step 3: Extract feature vector and category: As shown in Figure 2, the feature before the fully connected layer fc1000 is taken as the feature vector, and the maximum value of the predicted category is taken as the corresponding category. In this embodiment, the dimension of the feature vector is 256.

步骤四、计算每个类别的特征向量的中心点：取所有训练集图片，按类别分组，对每个类别，计算该类别下所有图片提取到的特征向量的均值，作为该类的中心点。Step 4: Calculate the center point of the feature vector of each category: take all the training set pictures, group them by category, and calculate the mean value of the feature vectors extracted from all the pictures under the category as the center point of the category for each category.

步骤五、计算要预测图片与类中心点距离：对于待预测图片，预处理后输入网络，提取特征向量和对应预测类别。计算特征向量与每个类中心点的距离，计算公式如下：Step 5: Calculate the distance between the image to be predicted and the class center point: For the image to be predicted, input it into the network after preprocessing, and extract the feature vector and the corresponding predicted class. Calculate the distance between the feature vector and the center point of each class, the calculation formula is as follows:

d_{j_i}＝(f_j-C_i)×(f_j-C_i)d_{j_i} =(f_j -C_i )×(f_j -C_i )

其中，f_j表示第j个测试的图片，C_i表示第i个类别的中心点；d_{j_i}表示待测试的图片与C_i的距离。Among them, f_j represents the image of the jth test, C_i represents the center point of the ith category; d_{j_i} represents the distance between the image to be tested and C_i .

步骤六、根据图片与每个类别中心点的距离估算年龄：对于待预测图片，提取出特征向量和对应预测类别后，以对应类别的年龄段的中心点Y_{c_i}为基础，计算特征向量与对应类别中心点的距离d_{p_i}，d_{p_i}越大，说明年龄偏移越大；d_{p_i}越小，说明年龄偏移越小。偏移的方向由d_{p_i}和预测对应类别与临近类别的中心距离C_{i_i+1}决定。如果d_{p_i}大于C_{i_i+1}，说明方向应为负，否则方向应为正。Step 6. Estimate the age according to the distance between the picture and the center point of each category: for the picture to be predicted, after extracting the feature vector and the corresponding predicted category, based on the center point Y_{c_i} of the age range of the corresponding category, calculate the feature vector and the corresponding predicted category. The distance d_{p_i} from the center point of the category, the larger the d_{p_i} , the larger the age offset; the smaller the d_{p_i} , the smaller the age offset. The direction of the offset is determined by d_{p_i} and the center distance C_{i_i+1} between the predicted corresponding category and the adjacent category. If d_{p_i} is greater than C_{i_i+1} , the direction should be negative, otherwise the direction should be positive.

对一张待测试图片p，首先估算每个类中心点之间的距离，计算公式如下：For a picture p to be tested, first estimate the distance between the center points of each class, the calculation formula is as follows:

C_ij＝(C_i-C_j)×(C_i-C_j)C_ij =(C_i -C_j )×(C_i -C_j )

最终的年龄估算公式如下：The final age estimation formula is as follows:

在训练和预估之前的数据打标阶段，数据标注方法如下：In the data labeling stage before training and estimation, the data labeling method is as follows:

以标记为儿童，少年，青年，壮年，老年等5个年龄阶段为例，分别对应年龄范围大致为0-12岁，13-19岁，20-30岁，31-49岁，50-100岁。这5个年龄段即为5个类别，则5个类别的中心点分别为6,16,25,25,75。在上面步骤六的实施过程中，以这5个类别对应的中心点和年龄范围进行计算。Take the five age stages marked as children, teenagers, youth, prime-aged and elderly as an example, the corresponding age ranges are roughly 0-12 years old, 13-19 years old, 20-30 years old, 31-49 years old, 50-100 years old . These 5 age groups are 5 categories, and the center points of the 5 categories are 6, 16, 25, 25, and 75 respectively. In the implementation process of step 6 above, the calculation is performed based on the center point and age range corresponding to these 5 categories.

Claims

Translated fromChinese

1.基于弱监督的人脸年龄自动估计方法，包含以下步骤：1. An automatic face age estimation method based on weak supervision, including the following steps:

d_{j_i}＝(f_j-C_i)×(f_j-C_i)d_{j_i} =(f_j -C_i )×(f_j -C_i )

C_ij＝(C_i-C_j)×(C_i-C_j)C_ij =(C_i -C_j )×(C_i -C_j )

年龄估算公式如下：The age estimation formula is as follows:

其中，G_i表示第i类别年龄段的跨度，Y_{c_i}表示第i类年龄段的中心年龄，所述中心年龄是预设跨度的平均值；d_{p_i}表示预测图片p与对应预测的类别i之间距离；C_{i_i+1}表示类别i与邻近类别之间的距离；y_p是最终得出的年龄。Among them, G_i represents the span of the i-th age group, Y_{c_i} represents the center age of the i-th age group, and the center age is the average value of the preset span; d_{p_i} represents the difference between the predicted picture p and the corresponding predicted category i distance; C_{i_i+1} represents the distance between category i and neighboring categories; y_p is the final age.

2.根据权利要求1所述的基于弱监督的人脸年龄自动估计方法，其特征在于，所述步骤一中统一缩放和裁剪为设定尺寸是缩放到124x124,然后随机裁剪为112x112。2. The automatic face age estimation method based on weak supervision according to claim 1, is characterized in that, in the described step 1, uniform scaling and cropping are to be scaled to 124x124 as the set size, and then randomly cropped to 112x112.

3.根据权利要求1所述的基于弱监督的人脸年龄自动估计方法，其特征在于，所述卷积神经网络模型为残差网络，它包含若干卷积层和最后的全连接分类层，各层之间使用直接映射来连接。3. the face age automatic estimation method based on weak supervision according to claim 1, is characterized in that, described convolutional neural network model is residual network, and it comprises some convolution layers and last fully connected classification layer, The layers are connected using direct mapping.