CN108647625A

Movatterモバイル変換

Info

Publication number: CN108647625A
Application number: CN201810420178.XA
Authority: CN
Inventors: 黄海; 祝轶哲; 田耒; 景晓军
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2018-05-04
Filing date: 2018-05-04
Publication date: 2018-10-12

Abstract

Translated fromChinese

本发明实施例提供了一种表情识别方法，所述方法包括：获取动态图像，检测出所述动态图像中的人脸区域；截取所述人脸区域作为第一图像组；对所述第一图像组进行光流处理，得到所述人脸区域的光流图像组，作为第二图像组；将所述第一图像组和第二图像组输入至预先训练得到的双流卷积神经网络中进行处理，得到表情识别结果。利用本方案进行表情识别，能够考虑到动态图像中的表情变化情况，从而提高了表情识别的准确性。

An embodiment of the present invention provides an expression recognition method, the method comprising: acquiring a dynamic image, detecting a human face area in the dynamic image; intercepting the human face area as a first image group; The image group is subjected to optical flow processing to obtain the optical flow image group of the face area as the second image group; the first image group and the second image group are input into the pre-trained two-stream convolutional neural network to perform Processing to get the facial expression recognition result. Utilizing the scheme for facial expression recognition can take into account the expression changes in dynamic images, thereby improving the accuracy of facial expression recognition.

Description

Translated fromChinese

一种表情识别方法及装置A facial expression recognition method and device

技术领域technical field

本发明涉及图像处理技术领域，特别是涉及一种表情识别方法及装置。The invention relates to the technical field of image processing, in particular to an expression recognition method and device.

背景技术Background technique

随着人工智能的发展，在人机交互、实时监控、自动驾驶以及社交网络等领域中，计算机越来越需要理解人类的内心情感。而表情是人类表达内心情感的最主要的方式之一，因此，计算机对表情的识别技术对于计算机理解人类的内心情感显得尤为重要。With the development of artificial intelligence, in the fields of human-computer interaction, real-time monitoring, autonomous driving, and social networking, computers increasingly need to understand human inner emotions. Expressions are one of the most important ways for humans to express their inner emotions. Therefore, computer recognition technology for expressions is particularly important for computers to understand human inner emotions.

现有的表情识别方案通常包括：获取大量样本图像，利用样本图像对预设的神经网络进行迭代训练，利用训练完成的神经网络识别待识别图像中的表情。Existing expression recognition solutions usually include: obtaining a large number of sample images, using the sample images to iteratively train a preset neural network, and using the trained neural network to recognize the expressions in the image to be recognized.

但是，在上述识别方案中，样本图像为单张静态图像，训练完成的神经网络也主要是针对静态图像进行表情识别。而在对连续的动态图像进行分析，比如对一段视频图像进行分析，或者对一张动图进行分析时，利用上述神经网络进行表情识别，并不能考虑到动态图像中的表情变化情况，进而造成表情识别不准确。However, in the above recognition scheme, the sample image is a single static image, and the trained neural network is mainly used for expression recognition of static images. However, when analyzing continuous dynamic images, such as analyzing a section of video images, or analyzing a moving picture, using the above-mentioned neural network for expression recognition cannot take into account the expression changes in the dynamic image, thereby causing Expression recognition is not accurate.

发明内容Contents of the invention

本发明实施例的目的在于提供一种表情识别方法及装置，以提高表情识别的准确性。The purpose of the embodiments of the present invention is to provide an expression recognition method and device to improve the accuracy of expression recognition.

具体技术方案如下：The specific technical scheme is as follows:

本发明实施例提供了一种表情识别方法，所述方法包括：An embodiment of the present invention provides an expression recognition method, the method comprising:

获取动态图像，检测出所述动态图像中的人脸区域；Acquiring a dynamic image, and detecting a human face area in the dynamic image;

截取所述人脸区域作为第一图像组；intercepting the face area as the first image group;

对所述第一图像组进行光流处理，得到所述人脸区域的光流图像组，作为第二图像组；performing optical flow processing on the first image group to obtain an optical flow image group of the face area as a second image group;

将所述第一图像组和第二图像组输入至预先训练得到的双流卷积神经网络中进行处理，得到表情识别结果。The first image group and the second image group are input into the pre-trained two-stream convolutional neural network for processing to obtain expression recognition results.

可选的，所述截取所述人脸区域作为第一图像组，包括：Optionally, the intercepting the face area as the first image group includes:

从所述动态图像中，截取出所述人脸区域；From the dynamic image, intercepting the human face area;

对所截取出的人脸区域进行归一化处理；Perform normalization processing on the intercepted face area;

将归一化处理后的人脸区域转化为灰度图像，得到第一图像组。The normalized face area is converted into a grayscale image to obtain a first image group.

可选的，所述将所述第一图像组和第二图像组输入至预先训练得到的双流卷积神经网络中进行处理，得到表情识别结果，包括：Optionally, the input of the first image group and the second image group into the pre-trained two-stream convolutional neural network for processing to obtain expression recognition results includes:

将所述第一图像组输入至预先训练得到的双流卷积神经网络中的空间域卷积网络中，提取所述第一图像组的特征值；Inputting the first image group into the spatial domain convolutional network in the pre-trained two-stream convolutional neural network, extracting the feature value of the first image group;

将所述第二图像组输入至预先训练得到的双流卷积神经网络中的时间域卷积网络中，提取所述第二图像组的特征值；Inputting the second image group into the time-domain convolutional network in the pre-trained two-stream convolutional neural network, extracting the feature value of the second image group;

将所述第一图像组的特征值和所述第二图像组的特征值进行加权融合，得到融合结果，对所述融合结果进行分类，得到表情识别结果。performing weighted fusion of the eigenvalues of the first image group and the eigenvalues of the second image group to obtain a fusion result, and classifying the fusion results to obtain an expression recognition result.

可选的，所述将所述第一图像组输入至预先训练得到的双流卷积神经网络中的空间域卷积网络中，提取所述第一图像组的特征值，包括：Optionally, the inputting the first image group into the spatial domain convolutional network in the pre-trained two-stream convolutional neural network, and extracting the feature values of the first image group includes:

将所述第一图像组输入至预先训练得到的双流卷积神经网络中的空间域卷积网络中，利用预设大小的卷积核进行卷积处理，对卷积处理的结果进行池化处理，得到所述第一图像组的特征值；The first image group is input into the spatial domain convolutional network in the pre-trained two-stream convolutional neural network, convolution processing is performed using a convolution kernel of a preset size, and pooling processing is performed on the result of the convolution processing , to obtain the feature value of the first image group;

所述将所述第二图像组输入至预先训练得到的双流卷积神经网络中的时间域卷积网络中，提取所述第二图像组的特征值，包括：Said inputting said second image group into the time-domain convolutional network in the two-stream convolutional neural network obtained in advance training, extracting the feature value of said second image group includes:

将所述第二图像组输入至预先训练得到的双流卷积神经网络中的时间域卷积网络中，利用预设大小的卷积核进行卷积处理，对卷积处理的结果进行最大池化处理，得到所述第二图像组的特征值。The second image group is input into the time-domain convolutional network in the pre-trained two-stream convolutional neural network, convolution processing is performed using a convolution kernel of a preset size, and maximum pooling is performed on the result of the convolution processing processing to obtain the feature value of the second image group.

可选的，所述获取动态图像，包括：Optionally, the acquisition of dynamic images includes:

获取待处理视频；Get the video to be processed;

按照预设的间隔，从所述待处理视频抽取视频帧，作为动态图像。According to a preset interval, video frames are extracted from the video to be processed as a dynamic image.

可选的，所述表情识别结果为：所述待处理图像中的人脸区域对应不同表情的识别概率；在所述将所述第一图像组和第二图像组输入至预先训练得到的双流卷积神经网络中进行处理，得到表情识别结果之后，所述方法还包括：Optionally, the facial expression recognition result is: the recognition probabilities of different facial expressions corresponding to the face regions in the image to be processed; Processing in the convolutional neural network, after obtaining the expression recognition result, the method also includes:

根据所述表情识别结果中包括的所述识别概率，确定符合预设概率条件的表情的标识；在所述动态图像中标注所确定的表情的标识；According to the recognition probability included in the facial expression recognition result, determine the logo of the facial expression that meets the preset probability condition; mark the logo of the determined facial expression in the dynamic image;

或者，在所述动态图像中标注所述人脸区域对应不同表情的识别概率。Alternatively, mark the recognition probabilities of the human face regions corresponding to different expressions in the dynamic image.

本发明实施例还提供了一种表情识别装置，所述装置包括：The embodiment of the present invention also provides an expression recognition device, which includes:

人脸检测模块，用于获取动态图像，检测出所述动态图像中的人脸区域；A face detection module, configured to acquire a dynamic image, and detect a human face area in the dynamic image;

图像截取模块，用于截取所述人脸区域作为第一图像组；An image intercepting module, configured to intercept the human face area as the first image group;

光流处理模块，用于对所述第一图像组进行光流处理，得到所述人脸区域的光流图像组，作为第二图像组；An optical flow processing module, configured to perform optical flow processing on the first image group to obtain an optical flow image group of the face area as a second image group;

表情识别模块，用于将所述第一图像组和第二图像组输入至预先训练得到的双流卷积神经网络中进行处理，得到表情识别结果。The expression recognition module is used to input the first image group and the second image group into the pre-trained two-stream convolutional neural network for processing to obtain an expression recognition result.

可选的，所述图像截取模块，具体用于从所述动态图像中，截取出所述人脸区域；对所截取出的人脸区域进行归一化处理；将归一化处理后的人脸区域转化为灰度图像，得到第一图像组。Optionally, the image intercepting module is specifically configured to extract the human face area from the dynamic image; perform normalization processing on the extracted human face area; The face region is converted to a grayscale image, resulting in a first set of images.

可选的，所述表情识别模块，具体用于将所述第一图像组输入至预先训练得到的双流卷积神经网络中的空间域卷积网络中，提取所述第一图像组的特征值；将所述第二图像组输入至预先训练得到的双流卷积神经网络中的时间域卷积网络中，提取所述第二图像组的特征值；将所述第一图像组的特征值和所述第二图像组的特征值进行加权融合，得到融合结果，对所述融合结果进行分类，得到表情识别结果。Optionally, the expression recognition module is specifically configured to input the first image group into the spatial domain convolutional network in the pre-trained two-stream convolutional neural network, and extract the feature value of the first image group ; The second image group is input to the time domain convolutional network in the pre-trained two-stream convolutional neural network, and the feature value of the second image group is extracted; the feature value of the first image group and The eigenvalues of the second image group are weighted and fused to obtain a fusion result, and the fusion result is classified to obtain an expression recognition result.

可选的，所述表情识别模块，具体用于将所述第一图像组输入至预先训练得到的双流卷积神经网络中的空间域卷积网络中，利用预设大小的卷积核进行卷积处理，对卷积处理的结果进行池化处理，得到所述第一图像组的特征值；将所述第二图像组输入至预先训练得到的双流卷积神经网络中的时间域卷积网络中，利用预设大小的卷积核进行卷积处理，对卷积处理的结果进行最大池化处理，得到所述第二图像组的特征值。Optionally, the expression recognition module is specifically configured to input the first image group into the spatial domain convolutional network in the pre-trained two-stream convolutional neural network, and perform convolution with a convolution kernel of a preset size. Product processing, the result of convolution processing is pooled to obtain the feature value of the first image group; the second image group is input to the time-domain convolutional network in the pre-trained two-stream convolutional neural network In the method, convolution processing is performed using a convolution kernel of a preset size, and maximum pooling processing is performed on a result of the convolution processing to obtain feature values of the second image group.

本发明实施例还提供了一种包含指令的计算机程序产品，当其在计算机上运行时，使得计算机执行上述任一所述的表情识别方法。An embodiment of the present invention also provides a computer program product containing instructions, which when run on a computer, causes the computer to execute any one of the expression recognition methods described above.

本发明实施例提供的表情识别方法及装置，通过检测待处理图像中的人脸区域，并对人脸区域进行光流处理，将得到的人脸区域及对应的人脸区域的光流图像输入至预先训练得到的双流卷积神经网络中进行处理，得到表情识别结果，由于双流卷积神经网络可以同时对人脸区域及对应的人脸区域的光流图像进行表情识别处理，而光流图像中可以携带动态图像中的表情变化信息，可见，利用本方案进行表情识别时，能够考虑到动态图像中的表情变化情况，提高了表情识别的准确性。当然，实施本发明的任一产品或方法并不一定需要同时达到以上所述的所有优点。The facial expression recognition method and device provided by the embodiments of the present invention detect the face area in the image to be processed, and perform optical flow processing on the face area, and input the obtained face area and the corresponding optical flow image of the face area It is processed in the pre-trained two-stream convolutional neural network to obtain the expression recognition result. Since the two-stream convolutional neural network can simultaneously perform expression recognition processing on the face area and the optical flow image of the corresponding face area, and the optical flow image It can carry the expression change information in the dynamic image. It can be seen that when using this scheme for expression recognition, the expression change in the dynamic image can be considered, and the accuracy of expression recognition is improved. Of course, implementing any product or method of the present invention does not necessarily need to achieve all the above-mentioned advantages at the same time.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. Those skilled in the art can also obtain other drawings based on these drawings without creative work.

图1为本发明实施例提供的一种表情识别方法的架构图；Fig. 1 is the architecture diagram of a kind of facial expression recognition method provided by the embodiment of the present invention;

图2为本发明实施例提供的一种表情识别方法的流程示意图；FIG. 2 is a schematic flow chart of an expression recognition method provided by an embodiment of the present invention;

图3为矩形特征级联分类器模型示意图；Fig. 3 is a schematic diagram of a rectangular feature cascade classifier model;

图4为本实施例提供的表情识别算法的一种网络模型构架图；Fig. 4 is a kind of network model frame diagram of the facial expression recognition algorithm that the present embodiment provides;

图5为本发明实施例提供的一种表情识别装置的结构示意图；FIG. 5 is a schematic structural diagram of an expression recognition device provided by an embodiment of the present invention;

图6为本发明实施例提供的一种电子设备的示意图。Fig. 6 is a schematic diagram of an electronic device provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

为了解决现有技术中，在对连续的动态图像进行分析，比如对一段视频图像进行分析，或者对一张动图进行分析时，利用由单张静态样本图像训练得到的神经网络进行表情识别，并不能考虑到动态图像中的表情变化情况，进而造成表情识别不准确的问题，本发明提出了一种表情识别方法及装置。In order to solve the problem in the prior art, when analyzing continuous dynamic images, such as analyzing a section of video image, or analyzing a moving image, the neural network trained by a single static sample image is used for facial expression recognition, The expression change in the dynamic image cannot be taken into account, which leads to the problem of inaccurate expression recognition. The present invention proposes an expression recognition method and device.

下面从总体上对本发明实施例提供的表情识别方法进行说明。The expression recognition method provided by the embodiment of the present invention will be generally described below.

本发明的一种实现方式中，上述表情识别方法包括：In an implementation manner of the present invention, the above-mentioned facial expression recognition method includes:

如图1所示，为本发明实施例提供的一种表情识别方法的架构图。首先，对所获取的动态图像进行人脸检测，识别出动态图像中的人脸区域，然后，截取识别出的人脸区域，对人脸区域进行表情识别，最终得到表情识别结果。As shown in FIG. 1 , it is a structure diagram of an expression recognition method provided by an embodiment of the present invention. Firstly, face detection is performed on the acquired dynamic image, and the face area in the dynamic image is recognized, and then the recognized face area is intercepted, and expression recognition is performed on the face area, and finally the expression recognition result is obtained.

由以上可见，本发明实施例提供的表情识别方法及装置，通过检测待处理图像中的人脸区域，并对人脸区域进行光流处理，将得到的人脸区域及对应的人脸区域的光流图像输入至预先训练得到的双流卷积神经网络中进行处理，得到表情识别结果，由于双流卷积神经网络可以同时对人脸区域及对应的人脸区域的光流图像进行表情识别处理，而光流图像中可以携带动态图像中的表情变化信息，可见，利用本方案进行表情识别时，能够考虑到动态图像中的表情变化情况，提高了表情识别的准确性。As can be seen from the above, the facial expression recognition method and device provided by the embodiments of the present invention detect the face area in the image to be processed, and perform optical flow processing on the face area, and convert the obtained face area and the corresponding face area The optical flow image is input to the pre-trained dual-stream convolutional neural network for processing, and the expression recognition result is obtained. Since the dual-stream convolutional neural network can perform expression recognition processing on the face area and the optical flow image of the corresponding face area at the same time, The optical flow image can carry the expression change information in the dynamic image. It can be seen that when using this scheme for expression recognition, the expression change in the dynamic image can be considered, and the accuracy of expression recognition is improved.

下面将通过具体的实施例，对本发明实施例提供的表情识别方法进行详细描述。The expression recognition method provided by the embodiment of the present invention will be described in detail below through specific embodiments.

如图2所示，为本发明实施例提供的一种表情识别方法的流程示意图，包括如下步骤：As shown in Figure 2, it is a schematic flow chart of an expression recognition method provided by an embodiment of the present invention, including the following steps:

步骤S201：获取动态图像，检测出动态图像中的人脸区域。Step S201: Acquire a dynamic image, and detect a human face area in the dynamic image.

举例来说，动态图像可以为一段视频，检测视频中的人脸区域；或者，动态图像张还可以为一组动图，如GIF(Graphics Interchange Format，图像互换格式)图像，检测动图中的人脸区域。For example, the dynamic image can be a section of video to detect the face area in the video; or, the dynamic image can also be a set of animations, such as GIF (Graphics Interchange Format, Image Interchange Format) images, to detect the face area in the animation. face area.

在一种实现方式中，S101可以包括：获取待处理视频；按照预设的间隔，从所述待处理视频抽取视频帧，作为动态图像。In an implementation manner, S101 may include: acquiring a video to be processed; and extracting a video frame from the video to be processed at a preset interval as a dynamic image.

例如，可以从待处理视频中抽取关键帧，这样，得到的动态图像可以更好地反映人脸运动的关键动作及表情变化；或者，还可以每隔预设数量帧，从待处理的视频中抽取一个视频帧；或者，也可以每隔预设时间段从待处理的视频中抽取一个视频帧，等等，将这些抽取的视频帧组合为连续的视频帧序列，即可作为动态图像。For example, key frames can be extracted from the video to be processed, so that the obtained dynamic image can better reflect the key movements and expression changes of human face movement; or, every preset number of frames can be extracted from the video to be processed Extract a video frame; or, extract a video frame from the video to be processed every preset time period, etc., and combine these extracted video frames into a continuous video frame sequence, which can be used as a dynamic image.

获取动态图像之后，首先可以检测动态图像中的人脸区域，该过程可以利用不同的算法。After the dynamic image is acquired, the human face area in the dynamic image can be detected first, and different algorithms can be used in this process.

举例而言，可以利用矩形特征级联分类器，对动态图像中的每一张图像进行人脸检测。如图3所示，为矩形特征级联分类器模型示意图。具体的，针对动态图像中的每张图像，利用预设的矩形特征模板在该图像中进行遍历，在矩形特征模板遍历到的每一个位置，都可以计算该位置对应的区域特征，从区域特征中提取出关键特征，最后由预设的级联强分类器对提取出的关键特征进行迭代训练，从而得到动态图像中的人脸区域。For example, a rectangular feature cascade classifier can be used to perform face detection on each image in the dynamic image. As shown in Figure 3, it is a schematic diagram of a rectangular feature cascade classifier model. Specifically, for each image in the dynamic image, the preset rectangular feature template is used to traverse the image, and at each position traversed by the rectangular feature template, the corresponding regional feature of the position can be calculated. From the regional feature The key features are extracted, and finally the preset cascaded strong classifier is used to iteratively train the extracted key features, so as to obtain the face area in the dynamic image.

或者，还可以利用基于直方图粗分割和奇异值特征的人脸检测算法检测动态图像中的人脸区域。具体的，首先根据动态图像中的每一张图像的结构分布特征绘制直方图，然后可以用高斯函数对直方图进行平滑处理，根据平滑的直方图对图像进行粗分割，再根据一定的灰度空间对人眼进行定位，从而基于奇异值特征确定出人脸区域。Alternatively, a face detection algorithm based on histogram rough segmentation and singular value features can also be used to detect the face area in the dynamic image. Specifically, first draw a histogram according to the structural distribution characteristics of each image in the dynamic image, and then use a Gaussian function to smooth the histogram, roughly segment the image according to the smoothed histogram, and then use a certain grayscale The human eye is positioned in space, and the face area is determined based on the singular value feature.

又或者，还可以利用特征脸方法检测动态图像中的人脸区域。具体的，可以将人脸中眼睛、面颊、下颌等部位的样本集协方差矩阵的特征向量称为特征眼、特征颌和特征唇，统称特征子脸，特征子脸在相应的图像空间中生成子空间，称为子脸空间，将动态图像中的每一张图像划分为多个测试图像窗口，分别计算出每个测试图像窗口在子脸空间的投影距离，若投影距离满足阈值条件，则判断对应的测试图像窗口为人脸区域。Alternatively, the eigenface method can also be used to detect the human face area in the dynamic image. Specifically, the eigenvectors of the covariance matrix of the sample set of the eyes, cheeks, and jaws in the human face can be called characteristic eyes, characteristic jaws, and characteristic lips, collectively referred to as characteristic sub-faces, and the characteristic sub-faces are generated in the corresponding image space The subspace, called the subface space, divides each image in the dynamic image into multiple test image windows, and calculates the projection distance of each test image window in the subface space. If the projection distance satisfies the threshold condition, then It is judged that the corresponding test image window is a face area.

或者，还可以利用基于二进小波变换的人脸检测算法、基于弹性模型的方法、神经网络方法、人脸等密度线分析匹配方法等方法检测动态图像中的人脸区域，本发明实施例对此不做限定。Alternatively, methods such as a face detection algorithm based on binary wavelet transform, an elastic model-based method, a neural network method, and a face isodensity line analysis and matching method can also be used to detect the face area in a dynamic image. This is not limited.

步骤S202：截取人脸区域作为第一图像组。Step S202: Capture the face area as the first image group.

在检测出动态图像中的人脸区域之后，就可以将人脸区域从动态图像中截取出来，得到第一图像组，第一图像组可以输入至预先训练得到的双流卷积神经网络中，进行表情识别。After the face area in the dynamic image is detected, the face area can be intercepted from the dynamic image to obtain the first image group, which can be input into the pre-trained two-stream convolutional neural network to perform Expression recognition.

具体的，将人脸区域从动态图像中截取出来之后，得到的人脸区域的图像根据动态图像中的图像内容的不同，通常是不同尺寸的。Specifically, after the human face area is cut out from the dynamic image, the obtained images of the human face area usually have different sizes according to the image content in the dynamic image.

在本步骤中，可以进一步对截取出的人脸区域的图像进行归一化处理，将截取出的不同尺寸的人脸区域的图像归一化为适用于预先训练得到的双流卷积神经网络的尺寸，也就是说，将截取出的不同尺寸的人脸区域的图像处理为与用于训练双流卷积神经网络的图片集相同的尺寸。In this step, the images of the intercepted face regions can be further normalized, and the images of the intercepted face regions of different sizes can be normalized to be suitable for the pre-trained two-stream convolutional neural network. Size, that is, the images of different sizes of the intercepted face regions are processed into the same size as the image set used to train the two-stream convolutional neural network.

例如，如果用于训练双流卷积神经网络的图片集的大小为48*48像素，则将截取出的不同尺寸的人脸区域的图像归一化处理为48*48像素。For example, if the size of the picture set used for training the two-stream convolutional neural network is 48*48 pixels, the images of the intercepted face regions of different sizes are normalized to 48*48 pixels.

另外，通常动态图像为RGB(RED-GREEN-BLUE，真彩色)图像，那么，在对截取出的人脸区域的图像进行归一化处理之后，可以将归一化处理后的人脸区域转化为灰度图像，将得到的灰度图像作为第一图像组。这样，可以减少第一图像组在预先训练得到的双流卷积神经网络中的计算量，从而提高表情识别的效率。In addition, usually the dynamic image is an RGB (RED-GREEN-BLUE, true color) image, so, after normalizing the image of the intercepted human face area, the normalized human face area can be transformed into is a grayscale image, and the obtained grayscale image is used as the first image group. In this way, the calculation amount of the first image group in the pre-trained two-stream convolutional neural network can be reduced, thereby improving the efficiency of expression recognition.

其中，可以利用如下公式，将归一化处理结果转化为灰度图像：Among them, the following formula can be used to convert the normalized processing result into a grayscale image:

Gray＝R*0.299+G*0.587+B*0.114Gray＝R*0.299+G*0.587+B*0.114

RGB图像是通过对红色(RED)、绿色(GREEN)、蓝色(BLUE)三个颜色通道的变化以及它们相互之间的叠加来得到各式各样的颜色，在上述公式中，R表示红色通道的颜色值，G表示绿色通道的颜色值，B表示蓝色通道的颜色值，Gray表示第一图像组中的灰度图像的灰度值。The RGB image obtains a variety of colors by changing the three color channels of red (RED), green (GREEN), and blue (BLUE) and superimposing them with each other. In the above formula, R means red The color value of the channel, G represents the color value of the green channel, B represents the color value of the blue channel, and Gray represents the gray value of the grayscale image in the first image group.

检测出的人脸区域经过归一化处理、转化为灰度图像后，得到的第一图像组更符合预先训练得到的双流卷积神经网络的输入图像的标准，便于进一步进行表情识别。After the detected face areas are normalized and converted into grayscale images, the obtained first image group is more in line with the input image standard of the pre-trained two-stream convolutional neural network, which is convenient for further expression recognition.

步骤S203：对第一图像组进行光流处理，得到人脸区域的光流图像组，作为第二图像组。Step S203: Perform optical flow processing on the first image group to obtain an optical flow image group of the face area as a second image group.

光流是一种简单实用的图像运动的表达方式，是利用视频中的像素灰度值的时域变化和相关性来确定各自像素位置的“运动”，即利用图像的灰度值在时间上的变化，表现图像中物体的动态信息。光流图像组实际代表了第一图像组中人脸动态变化的轨迹信息。Optical flow is a simple and practical way to express image motion. It uses the time-domain change and correlation of pixel gray values in the video to determine the "movement" of the respective pixel positions, that is, using the gray value of the image in time The change of , showing the dynamic information of the object in the image. The optical flow image group actually represents the track information of the dynamic change of the face in the first image group.

对第一图像组进行光流处理的方法有多种，例如，可以基于特征计算光流，不断地对人脸的主要特征进行定位和跟踪；或者，基于区域计算光流，先对第一图像组中类似的区域进行定位，然后通过相似区域的位移计算光流；或者，基于频域计算光流，利用速度可调的滤波组输出频率或相位信息；或者，基于梯度计算光流，利用图像序列亮度的时空微分来计算图像上每一像素点的光流；等等。There are many ways to process the optical flow of the first image group. For example, the optical flow can be calculated based on features, and the main features of the face can be continuously located and tracked; or, the optical flow can be calculated based on the area, and the first image The similar regions in the group are located, and then the optical flow is calculated by the displacement of the similar region; or, the optical flow is calculated based on the frequency domain, and the frequency or phase information is output by using a filter group with adjustable speed; or, the optical flow is calculated based on the gradient, and the image The spatiotemporal differentiation of sequential brightness to calculate the optical flow of each pixel on the image; and so on.

在本步骤中，对第一图像组进行光流处理，可以得到人脸区域的光流图像组，也就是第二图像组，第二图像组和第一图像组具有相同的尺寸。In this step, optical flow processing is performed on the first image group to obtain an optical flow image group of the face area, that is, a second image group, and the second image group has the same size as the first image group.

步骤S204：将第一图像组和第二图像组输入至预先训练得到的双流卷积神经网络中进行处理，得到表情识别结果。Step S204: Input the first image group and the second image group into the pre-trained two-stream convolutional neural network for processing, and obtain an expression recognition result.

其中，预先训练得到的双流卷积神经网络包括空间域卷积网络和时间域卷积网络两部分，可以将第一图像组输入至预先训练得到的双流卷积神经网络中的空间域卷积网络中，将第二图像组输入至预先训练得到的双流卷积神经网络中的时间域卷积网络中，分别提取第一图像组和第二图像组的特征值。Among them, the pre-trained two-stream convolutional neural network includes two parts: a spatial domain convolutional network and a time-domain convolutional network, and the first image group can be input to the spatial domain convolutional network in the pre-trained two-stream convolutional neural network In , the second image group is input into the time-domain convolutional network in the pre-trained two-stream convolutional neural network, and the feature values of the first image group and the second image group are extracted respectively.

具体的，将第一图像组输入空间域卷积网络后，首先利用预设大小的卷积核对第一图像组进行卷积处理，得到卷积层，例如，利用3*3大小的卷积核对48*48像素大小的第一图像组进行卷积处理，得到46*46个特征值大小的卷积层。Specifically, after the first image group is input into the spatial domain convolutional network, the first image group is firstly convolved with a convolution kernel of a preset size to obtain a convolutional layer, for example, a convolution kernel of 3*3 size is used to The first image group with a size of 48*48 pixels is subjected to convolution processing to obtain a convolution layer with a size of 46*46 feature values.

然后，可以对卷积层进行池化处理，从而得到第一图像组的特征值。其中，池化处理的过程可以为最大池化处理，也可以为平均值池化处理，例如，如果为3*3最大池化处理，则将卷积处理的结果划分为多个3*3大小的窗口，在每个窗口中，选择最大的特征值作为该窗口的特征值，最终由各个窗口的特征值构成的矩阵即为第一图像组的特征值。Then, pooling processing can be performed on the convolutional layer, so as to obtain the feature values of the first image group. Among them, the process of pooling processing can be maximum pooling processing or average value pooling processing. For example, if it is 3*3 maximum pooling processing, the result of convolution processing is divided into multiple 3*3 sizes In each window, the largest eigenvalue is selected as the eigenvalue of the window, and finally the matrix formed by the eigenvalues of each window is the eigenvalue of the first image group.

类似的，将第二图像组输入时间域卷积网络后，也可以经过上述处理过程，首先，利用预设大小的卷积核进行卷积处理，然后，对卷积处理的结果进行池化处理，得到第二图像组的特征值。Similarly, after the second image group is input into the time-domain convolutional network, the above-mentioned processing process can also be performed. First, convolution processing is performed using a convolution kernel of a preset size, and then the result of the convolution processing is pooled. , to get the eigenvalues of the second image group.

提取出第一图像组和第二图像组的特征值之后，可以进一步将第一图像组的特征值和第二图像组的特征值进行加权融合，得到融合结果。其中，可以针对不同的动态图像确定第一图像组和第二图像组的特征值的融合权重，例如，可以预设两者权重相同，那么，则将第一图像组和第二图像组的特征值简单相加，或者，如果动态图像的图像动态变化情况较为突出，则为第二图像组的特征值预设较大的权重，等等，本发明实施例对此不做限定。After the eigenvalues of the first image group and the second image group are extracted, the eigenvalues of the first image group and the eigenvalues of the second image group can be further weighted and fused to obtain a fusion result. Wherein, the fusion weights of the feature values of the first image group and the second image group can be determined for different dynamic images. For example, the weights of the two can be preset to be the same. Values are simply added together, or, if the image dynamic change of the dynamic image is more prominent, a larger weight is preset for the feature value of the second image group, etc., which is not limited in this embodiment of the present invention.

得到第一图像组和第二图像组的特征值的加权融合结果之后，可以进一步对融合结果进行分类，从而得到表情识别结果。例如，可以利用支持向量机算法、神经网络算法或者softmax算法等不同的方式，对融合结果进行分类。After obtaining the weighted fusion result of the eigenvalues of the first image group and the second image group, the fusion result can be further classified to obtain an expression recognition result. For example, different methods such as support vector machine algorithm, neural network algorithm, or softmax algorithm can be used to classify the fusion result.

具体的，在预先训练得到的双流卷积神经网络中，包含不同的多种表情分类结果，例如：愤怒、厌恶、恐惧、开心、难过、惊讶和中性等，每个表情都有其对应的特征，可以通过计算，得到融合结果对应于不同表情的识别概率，从而对融合结果进行分类，得到表情识别结果。Specifically, in the pre-trained two-stream convolutional neural network, it contains different expression classification results, such as: anger, disgust, fear, happiness, sadness, surprise and neutral, etc., each expression has its corresponding The features can be calculated to obtain the recognition probabilities of the fusion results corresponding to different expressions, so as to classify the fusion results and obtain the expression recognition results.

一种实现方式中，在得到表情识别结果之后，可以在原始的动态图像中对表情识别结果进行标注和展示。例如，可以先根据表情识别结果中的识别概率，确定符合预设概率条件的表情对应的标识，然后，在动态图像中标注所确定的表情的标识；或者，还可以直接在原始的动态图像中标注人脸区域对应不同表情的识别概率；本发明实施例对此不做限定。In an implementation manner, after the expression recognition result is obtained, the expression recognition result may be marked and displayed in the original dynamic image. For example, according to the recognition probability in the facial expression recognition result, the identification corresponding to the facial expression that meets the preset probability condition can be determined first, and then the identification of the determined facial expression can be marked in the dynamic image; or, it can also be directly in the original dynamic image Mark the recognition probabilities corresponding to different facial expressions; this is not limited in this embodiment of the present invention.

另外，在原始的动态图像中展示表情识别结果时，还可以同时对检测出的人脸区域进行标注，以供用户参考。In addition, when displaying the facial expression recognition results in the original dynamic image, the detected face area can also be marked at the same time for the user's reference.

在本实施例中，预先训练得到的双流卷积神经网络中的空间域卷积网络由RGB图片集训练得到，时间域卷积网络由光流图片集训练得到，其中，上述光流图片集可以由RGB图片集进行光流处理后得到，两者具有相同的尺寸。而RGB图片集可以来自于预设的表情数据库中的数据集，例如，CK+数据集(The Extended Cohn-Kanade Dataset，扩展的科恩-卡纳德数据集)、JAFFE数据集(The Japanese Female FacialExpression Database，日本女性面部表情数据集)等等。In this embodiment, the spatial domain convolutional network in the pre-trained two-stream convolutional neural network is obtained by training the RGB image set, and the temporal domain convolutional network is obtained by training the optical flow image set, wherein the above optical flow image set can be Obtained after optical flow processing from the RGB image set, both have the same size. The RGB picture set can come from a dataset in a preset expression database, for example, CK+ dataset (The Extended Cohn-Kanade Dataset, extended Cohn-Kanade dataset), JAFFE dataset (The Japanese Female FacialExpression Database , Japanese female facial expression dataset) and so on.

举例而言，在训练过程中，将CK+数据集中的RGB图像数据转化为光流图像，然后分别将RGB图像和光流图像输入至双流卷积神经网络中，训练得到空间域卷积网络和时间域卷积网络。For example, in the training process, the RGB image data in the CK+ data set is converted into an optical flow image, and then the RGB image and the optical flow image are respectively input into the two-stream convolutional neural network, and the spatial domain convolutional network and the temporal domain convolutional network are trained. convolutional network.

如图4所示，为本实施例提供的表情识别算法的一种网络模型构架图。图中，首先获取待处理视频，然后从待处理视频中抽取关键帧，作为动态图像，检测出动态图像中的人脸区域，对人脸区域进行截取，得到第一图像组，对第一图像组进行光流处理，得到第二图像组。然后，将第一图像组输入至预先训练得到的双流卷积神经网络中的空间域卷积网络中，将第二图像组输入至预先训练得到的双流卷积神经网络中的时间域卷积网络中，分别经过两次3*3*16的卷积处理和3*3的最大池化处理，分别得到第一图像组和第二图像组的128个特征值，将第一图像组和第二图像组的特征值进行加权融合，对融合结果利用softmax算法进行分类，最终得到表情识别结果。As shown in FIG. 4 , it is a framework diagram of a network model of the facial expression recognition algorithm provided in this embodiment. In the figure, the video to be processed is first obtained, and then the key frame is extracted from the video to be processed as a dynamic image, the face area in the dynamic image is detected, and the face area is intercepted to obtain the first image group, and the first image The group is subjected to optical flow processing to obtain the second image group. Then, input the first image group into the spatial domain convolutional network in the pre-trained two-stream convolutional neural network, and input the second image group into the temporal domain convolutional network in the pre-trained two-stream convolutional neural network Among them, after two 3*3*16 convolution processing and 3*3 maximum pooling processing respectively, 128 eigenvalues of the first image group and the second image group are respectively obtained, and the first image group and the second image group The eigenvalues of the image group are weighted and fused, and the fusion result is classified by the softmax algorithm, and finally the expression recognition result is obtained.

通过上述方法进行表情识别时，能够考虑到动态图像中的表情变化情况，可以提高表情识别的准确性。When facial expression recognition is performed by the above method, the facial expression changes in the dynamic image can be considered, and the accuracy of facial expression recognition can be improved.

举例来说，假设动态图像中人员的表情为惊喜，或者说由惊讶变成开心，但是将动态图像拆分为单张静态图像进行表情识别时，可能将惊讶的表情识别为恐惧或其他表情，从而导致表情识别不准确。For example, suppose the expression of the person in the dynamic image is surprise, or changes from surprise to happiness, but when the dynamic image is split into a single static image for expression recognition, the surprised expression may be recognized as fear or other expressions, This leads to inaccurate facial expression recognition.

而通过上述方法进行表情识别时，在预先训练双流卷积神经网络时，就同时利用RGB图片及对应的光流图片分别训练出空间域卷积网络和时间域卷积网络，因此将第一图像组和第二图像组输入双流卷积神经网络进行表情识别时，能够考虑到动态图像中的表情变化情况，可以识别出动态图像中的表情是由惊讶变成开心，而不会将某一张图像识别为恐惧或其他表情，减少了表情识别不准确的情况。When using the above method for expression recognition, when pre-training the dual-stream convolutional neural network, the RGB image and the corresponding optical flow image are used to train the spatial domain convolutional network and the temporal domain convolutional network respectively. Therefore, the first image When the group and the second image group are input into the dual-stream convolutional neural network for expression recognition, it can take into account the expression changes in the dynamic image, and can recognize the expression in the dynamic image from surprise to happiness, without changing a certain image Image recognition as fear or other expressions reduces instances of inaccurate expression recognition.

如图5所示，本发明实施例还提供了一种表情识别装置的结构示意图，所述装置包括：As shown in FIG. 5, the embodiment of the present invention also provides a schematic structural diagram of an expression recognition device, which includes:

人脸检测模块510，用于获取动态图像，检测出所述动态图像中的人脸区域；A face detection module 510, configured to acquire a dynamic image, and detect a human face area in the dynamic image;

图像截取模块520，用于截取所述人脸区域作为第一图像组；An image interception module 520, configured to intercept the human face area as the first image group;

光流处理模块530，用于对所述第一图像组进行光流处理，得到所述人脸区域的光流图像组，作为第二图像组；An optical flow processing module 530, configured to perform optical flow processing on the first image group to obtain an optical flow image group of the face area as a second image group;

表情识别模块540，用于将所述第一图像组和第二图像组输入至预先训练得到的双流卷积神经网络中进行处理，得到表情识别结果。The expression recognition module 540 is configured to input the first image group and the second image group into the pre-trained two-stream convolutional neural network for processing to obtain an expression recognition result.

在一种实现方式中，图像截取模块520，具体用于从所述动态图像中，截取出所述人脸区域；对所截取出的人脸区域进行归一化处理；将归一化处理后的人脸区域转化为灰度图像，得到第一图像组。In one implementation, the image interception module 520 is specifically configured to extract the human face area from the dynamic image; perform normalization processing on the extracted human face area; The face area of the face is converted into a grayscale image to obtain the first image group.

在一种实现方式中，表情识别模块540，具体用于将所述第一图像组输入至预先训练得到的双流卷积神经网络中的空间域卷积网络中，提取所述第一图像组的特征值；将所述第二图像组输入至预先训练得到的双流卷积神经网络中的时间域卷积网络中，提取所述第二图像组的特征值；将所述第一图像组的特征值和所述第二图像组的特征值进行加权融合，得到融合结果，对所述融合结果进行分类，得到表情识别结果。In one implementation, the facial expression recognition module 540 is specifically configured to input the first image group into the spatial domain convolutional network in the pre-trained two-stream convolutional neural network, and extract the first image group. feature value; the second image group is input into the time domain convolution network in the two-stream convolutional neural network obtained in advance training, and the feature value of the second image group is extracted; the feature of the first image group is value and the feature value of the second image group are weighted and fused to obtain a fusion result, and the fusion result is classified to obtain an expression recognition result.

在一种实现方式中，表情识别模块540，具体用于将所述第一图像组输入至预先训练得到的双流卷积神经网络中的空间域卷积网络中，利用预设大小的卷积核进行卷积处理，对卷积处理的结果进行池化处理，得到所述第一图像组的特征值；将所述第二图像组输入至预先训练得到的双流卷积神经网络中的时间域卷积网络中，利用预设大小的卷积核进行卷积处理，对卷积处理的结果进行最大池化处理，得到所述第二图像组的特征值。In one implementation, the facial expression recognition module 540 is specifically configured to input the first image group into the spatial domain convolutional network in the pre-trained two-stream convolutional neural network, using a convolution kernel of a preset size Carry out convolution processing, and perform pooling processing on the result of convolution processing to obtain the feature value of the first image group; input the second image group to the time domain convolution in the pre-trained two-stream convolutional neural network In the product network, a convolution kernel with a preset size is used for convolution processing, and a maximum pooling processing is performed on the result of the convolution processing to obtain the feature value of the second image group.

本发明实施例还提供了一种电子设备，如图6所示，包括处理器601、通信接口602、存储器603和通信总线604，其中，处理器601，通信接口602，存储器603通过通信总线604完成相互间的通信，The embodiment of the present invention also provides an electronic device, as shown in FIG. complete the mutual communication,

存储器603，用于存放计算机程序；Memory 603, used to store computer programs;

处理器601，用于执行存储器603上所存放的程序时，实现如下步骤：When the processor 601 is used to execute the program stored on the memory 603, the following steps are implemented:

上述电子设备提到的通信总线可以是外设部件互连标准(Peripheral ComponentInterconnect，PCI)总线或扩展工业标准结构(Extended Industry StandardArchitecture，EISA)总线等。该通信总线可以分为地址总线、数据总线、控制总线等。为便于表示，图中仅用一条粗线表示，但并不表示仅有一根总线或一种类型的总线。The communication bus mentioned in the above electronic device may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus or the like. The communication bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.

通信接口用于上述电子设备与其他设备之间的通信。The communication interface is used for communication between the electronic device and other devices.

存储器可以包括随机存取存储器(Random Access Memory，RAM)，也可以包括非易失性存储器(Non-Volatile Memory，NVM)，例如至少一个磁盘存储器。可选的，存储器还可以是至少一个位于远离前述处理器的存储装置。The memory may include a random access memory (Random Access Memory, RAM), and may also include a non-volatile memory (Non-Volatile Memory, NVM), such as at least one disk memory. Optionally, the memory may also be at least one storage device located far away from the aforementioned processor.

上述的处理器可以是通用处理器，包括中央处理器(Central Processing Unit，CPU)、网络处理器(Network Processor，NP)等；还可以是数字信号处理器(Digital SignalProcessing，DSP)、专用集成电路(Application Specific Integrated Circuit，ASIC)、现场可编程门阵列(Field-Programmable Gate Array，FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。Above-mentioned processor can be general-purpose processor, comprises central processing unit (Central Processing Unit, CPU), network processor (Network Processor, NP) etc.; Can also be Digital Signal Processor (Digital Signal Processing, DSP), ASIC (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

需要说明的是，在本文中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that there is a relationship between these entities or operations. any such actual relationship or order exists between them. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements of or also include elements inherent in such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element.

本说明书中的各个实施例均采用相关的方式描述，各个实施例之间相同相似的部分互相参见即可，每个实施例重点说明的都是与其他实施例的不同之处。尤其，对于系统实施例而言，由于其基本相似于方法实施例，所以描述的比较简单，相关之处参见方法实施例的部分说明即可。Each embodiment in this specification is described in a related manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for relevant parts, refer to part of the description of the method embodiment.

以上所述仅为本发明的较佳实施例而已，并非用于限定本发明的保护范围。凡在本发明的精神和原则之内所作的任何修改、等同替换、改进等，均包含在本发明的保护范围内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the protection scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present invention are included in the protection scope of the present invention.