CN113901875A

Movatterモバイル変換

Info

Publication number: CN113901875A
Application number: CN202111059813.4A
Authority: CN
Inventors: 刘琛; 耿艳磊; 李晗; 安晓博
Original assignee: Inspur Cloud Information Technology Co Ltd
Current assignee: Inspur Cloud Information Technology Co Ltd
Priority date: 2021-09-10
Filing date: 2021-09-10
Publication date: 2022-01-07

Abstract

The invention discloses a face snapshot method, a system and a storage medium, belonging to the technical field of computer vision, and aiming at solving the technical problem of how to snapshot a face with the best quality and the most positive angle in the face detection and tracking process, the adopted technical scheme is as follows: the method comprises the following specific steps: acquiring image data of a current frame of a video; detecting a face in a video through a face detection-face key point detection model to obtain a face image, and detecting the face key point of the detected face image; matching continuous inter-frame face coordinates through a tracking matching algorithm to obtain a plurality of target sequences; calculating the quality score of the face image according to the face coordinates and the key point information; and screening out the optimal face image of which each target sequence meets the requirement according to the face image quality score, and outputting related information corresponding to the optimal face image.

Description

Face snapshot method, system and storage medium

Technical Field

The invention relates to the technical field of computer vision, in particular to a face snapshot method, a face snapshot system and a storage medium.

Background

The face snapshot is considered to be a biological feature recognition technology which can be widely used due to the advantages of non-contact, operation concealment, no need of characteristic matching and the like. The face snapshot technology is widely applied to many fields, such as public security monitoring, traffic access monitoring, railway station face-train ticket identification and the like, but the face quality is low in the face snapshot process and is influenced by various factors, such as face posture, expression, blur, brightness, shielding and the like. Therefore, how to snap a face with the best quality and the most correct angle in the face detection and tracking process is a technical problem to be solved urgently at present.

Patent document No. CN106446851A discloses a method and system for optimizing human face based on visible light, which combines evaluation indexes of human face three-dimensional rotation angle, human face sunlight exposure area, human face definition and human face shielding area to adapt to quality evaluation of human face in different application scenes, and firstly directly eliminates images with too low quality through preprocessing to improve the calculation efficiency in the scoring process. However, although the technical scheme combines a plurality of evaluation indexes to perform weighted summation to improve the defect of a single factor to a certain extent, the scheme has a large number of over-parameters and has great limitation.

Disclosure of Invention

The technical task of the invention is to provide a face snapshot method, a face snapshot system and a storage medium, so as to solve the problem of how to snapshot the face with the best quality and the most correct angle in the face detection and tracking process.

The technical task of the invention is realized in the following way, and the face snapshot method comprises the following steps:

acquiring image data of a current frame of a video;

detecting a face in a video through a face detection-face key point detection model to obtain a face image, and detecting the face key point of the detected face image;

matching continuous inter-frame face coordinates through a tracking matching algorithm to obtain a plurality of target sequences;

calculating the quality score of the face image according to the face coordinates and the key point information;

and screening out the optimal face image of which each target sequence meets the requirement according to the face image quality score, and outputting related information corresponding to the optimal face image.

Preferably, the face detection-face key point detection model adopts MTCNN, Retinaface or Openface;

the tracking matching algorithm adopts sort, deepsort, KCF or JDE.

Preferably, the face image quality score calculation formula is as follows:

the face image quality score is the face image definition value, the face key point distance and the face rotation process value;

performing DCT cosine transformation on the face image according to the face image definition value, and then performing normalization processing;

the key point distance is the shortest distance from the key point of the nose tip to a quadrangle formed by pupils of the two eyes and the two mouth corners;

the human face rotation degree value is the absolute value of the cosine value of the included angle between the mean value of the direction vectors of the two eye pupil connecting lines and the direction vectors of the two mouth angle connecting lines and the horizontal axis.

Preferably, the related information corresponding to the face image includes duration information, belonging video frame information, coordinate information and key point coordinate information.

Preferably, the optimal face image which meets the requirement of each target sequence is screened out according to the face image quality score as follows:

judging whether the existence time of the target face exceeds a threshold value according to the face image quality score:

if yes, judging whether the size of the face area image is larger than a set threshold value:

if yes, outputting the face region image;

if not, no output is obtained.

A face snapshot system, the system comprising,

the face detection-key point detection module is used for detecting the face and key points by using the face detection-key point detection network model to acquire the coordinate information of the face and the key points of the face;

the tracking matching module is used for correlating and matching the face sequences of the unified targets by utilizing a tracking matching algorithm so as to obtain related information of each target in the tracking process;

and the face quality evaluation module is used for calculating face image quality scores according to the face coordinates and the key point information, screening out the optimal face image with each target sequence meeting the requirements according to the face image quality scores, and outputting the related information corresponding to the optimal face image.

Preferably, the working process of the face detection-key point detection module is as follows:

(1) initializing relevant parameters;

(2) acquiring a video current frame image, inputting a pre-trained face detection-key point detection model, and outputting the face image of the video current frame and information of key point coordinates thereof through network forward reasoning;

the working process of the tracking matching module is as follows:

(1) inputting a face region image and coordinate information;

(2) matching the faces of the targets in the previous frame and the next frame by using a tracking matching algorithm, and associating the faces belonging to the same target;

(3) and outputting the matched face information.

Preferably, the working process of the face quality evaluation module is as follows:

(1) inputting face coordinates and face information of key point coordinates of the face coordinates;

(2) calculating the score of the face quality, wherein the formula is as follows:

the human face rotation degree value is the absolute value of the cosine value of the included angle between the mean value of the direction vectors of the two eye pupil connecting lines and the direction vectors of the two mouth angle connecting lines and the horizontal axis;

(3) updating the optimal score of the face and the related information corresponding to the face image; the relevant information corresponding to the face image comprises duration information, belonging video frame information, coordinate information and key point coordinate information;

(4) inputting the quality score of the face image, and judging whether the existence time of the target face exceeds a threshold value:

if yes, executing the step (5);

(5) judging whether the size of the face region image is larger than a set threshold value:

if the image is larger than the preset image, outputting the face region image;

and ② if not, no output is obtained.

the tracking matching algorithm adopts sort, deepsort, KCF or JDE.

A computer-readable storage medium having stored thereon computer-executable instructions, which, when executed by a processor, implement a face capture method as described above.

The face snapshot method, the face snapshot system and the storage medium have the following advantages:

the invention aims to reduce the excessive resource consumption caused by the AI model to the face quality score and the interpretability of face snapshot, so that the resource consumption is less, the system efficiency is higher, and the interpretability of the face quality evaluation is stronger;

the invention reduces the excessive number of the hyperparameters set when the traditional image processing method evaluates the face quality, reduces the use of priori knowledge, reduces the resource consumption on the premise of ensuring the face snapshot precision in a word, and has stronger applicability and generalization;

the invention adopts the mode of face definition value, face key point distance and face rotation degree value to evaluate the face quality, is compatible with factors such as image definition, face size, face posture and the like, not only effectively reduces resource consumption, but also has less over-parameter quantity, more excellent face quality score and higher face snapshot efficiency.

Drawings

The invention is further described below with reference to the accompanying drawings.

FIG. 1 is a flow chart diagram of a face capture method;

FIG. 2 is a block diagram of a face capture system;

FIG. 3 is a diagram showing the results of face detection-key point detection;

FIG. 4 is a diagram showing the degree of face rotation;

FIG. 5 is a diagram showing the face quality evaluation score results;

where dist represents the face quality score.

Detailed Description

The face capture method, system, and storage medium of the present invention are described in detail below with reference to the drawings and the detailed description of the invention.

Example 1:

as shown in fig. 1, the face snapshot method of the present invention specifically includes:

s1, acquiring the image data of the current frame of the video;

s2, detecting the face in the video through a face detection-face key point detection model to obtain a face image, and detecting the face key point of the detected face image;

s3, matching the face coordinates between continuous frames through a tracking matching algorithm to obtain a plurality of target sequences;

s4, calculating the quality score of the face image according to the face coordinates and the key point information;

and S5, screening out the optimal face image with each target sequence meeting the requirement according to the face image quality score, and outputting the relevant information corresponding to the optimal face image.

In this embodiment, the face detection-face key point detection model in step S2 adopts MTCNN, Retinaface, or OpenFace;

the tracking matching algorithm of step S3 in this embodiment uses sort, depsort, KCF, or JDE.

In this embodiment, the formula for calculating the face image quality score in step S4 is:

In this embodiment, the step S5 of screening out the optimal face image meeting the requirement of each target sequence according to the face image quality score specifically includes the following steps:

if yes, outputting the face region image;

if not, no output is obtained.

Example 2:

as shown in fig. 2, the face snapshot system of the present invention comprises,

The working process of the face detection-key point detection module in the embodiment is specifically as follows:

(1) initializing relevant parameters;

the working process of the tracking matching module in this embodiment is specifically as follows:

(1) inputting a face region image and coordinate information;

(3) and outputting the matched face information.

The working process of the face quality evaluation module in the embodiment is as follows:

if yes, executing the step (5);

if the image is larger than the preset image, outputting the face region image;

and ② if not, no output is obtained.

Example 3:

the face snapshot method comprises the following specific processes:

acquiring image data of a current frame of a video;

selecting a pre-trained face detection-face key point detection model (such as MTCNN, Retinaface, Openface and the like), inputting the result of the step (I) into the face detection-face key point detection model, performing network forward reasoning, and outputting the face image of the current video frame and the information of key point coordinates thereof, as shown in the attached figure 3;

selecting a tracking matching algorithm (such as sort, depsort, KCF, JDE and the like), inputting the result of the step (two), performing correlation matching with the results of the front and rear frames of the video, and outputting information such as the face of the matched target;

cutting the face image matched in the image in the step (I) according to the result (face coordinate information and the like) in the step (III), and selecting an image definition calculation formula (such as DCT (discrete cosine transform) and the like) to calculate the face image definition, wherein the value range is [0,1 ];

fifthly, calculating the distance of key points of the human face according to the result of the step (three); here, 5 points (left eye, right eye, nose tip, left mouth corner, right mouth corner) of the human face are taken as an example, but not limited thereto. Calculating the shortest distance from a key point of a nose tip to a quadrangle formed by eyes and a mouth corner, wherein the shortest distance is used as the distance of the key point, the distance of the point in the quadrangle is positive, and the distance of the point in the quadrangle is negative;

calculating the rotation degree of the face according to the result of the step (III), as shown in fig. 4, wherein 5 points of the face are taken as an example, but not limited to the example; calculating the absolute value of the cosine value of the average value of the eye direction vector and the mouth angle direction vector;

(VII) according to the results of the step (IV), the step (V) and the step (VI), calculating a face quality score according to a formula of face image definition, key point distance and face rotation degree, and updating the face optimal score and corresponding related information (coordinates, duration, belonging video frames and the like) thereof, as shown in figure 5;

taking the output of the step (seven) as input, judging whether the existence time of the target face exceeds a threshold value, if so, judging whether the size of the face area image is larger than a set threshold value; if yes, outputting; otherwise, the information is not output (the output mode is not limited to this, and only the best 1 piece of face information can be output at the end of the tracking process, or the best face information at each time end can be output according to the time period, etc.).

Example 4:

the embodiment of the invention also provides a computer-readable storage medium, wherein a plurality of instructions are stored, and the instructions are loaded by the processor, so that the processor executes the face snapshot method, the face snapshot system and the storage medium in any embodiment of the invention. Specifically, a system or an apparatus equipped with a storage medium on which software program codes that realize the functions of any of the above-described embodiments are stored may be provided, and a computer (or a CPU or MPU) of the system or the apparatus is caused to read out and execute the program codes stored in the storage medium.

In this case, the program code itself read from the storage medium can realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code constitute a part of the present invention.

Examples of the storage medium for supplying the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD + RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer via a communications network.

Further, it should be clear that the functions of any one of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform a part or all of the actual operations based on instructions of the program code.

Further, it is to be understood that the program code read out from the storage medium is written to a memory provided in an expansion board inserted into the computer or to a memory provided in an expansion unit connected to the computer, and then causes a CPU or the like mounted on the expansion board or the expansion unit to perform part or all of the actual operations based on instructions of the program code, thereby realizing the functions of any of the above-described embodiments.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

Translated fromChinese

1.一种人脸抓拍方法，其特征在于，该方法具体如下：1. a face capture method, is characterized in that, the method is specifically as follows:

获取视频当前帧图像数据；Get the image data of the current frame of the video;

通过人脸检测-人脸关键点检测模型对视频中人脸检测获取人脸图像，并对检测到的人脸图像进行人脸关键点检测；Through the face detection-face key point detection model, the face image in the video is detected, and the face key point detection is performed on the detected face image;

通过跟踪匹配算法对连续帧间人脸坐标匹配获得数个目标序列；Several target sequences are obtained by matching face coordinates between consecutive frames through the tracking matching algorithm;

根据人脸坐标和关键点信息计算人脸图像质量得分；Calculate the face image quality score according to the face coordinates and key point information;

根据人脸图像质量得分筛选出每个目标序列符合要求的最佳人脸图像，并输出最佳人脸图像对应的相关信息。According to the face image quality score, the best face image that meets the requirements of each target sequence is screened, and the relevant information corresponding to the best face image is output.

2.根据权利要求1所述的人脸抓拍方法，其特征在于，所述人脸检测-人脸关键点检测模型采用MTCNN、Retinaface或OpenFace；2. human face capture method according to claim 1, is characterized in that, described human face detection-human face key point detection model adopts MTCNN, Retinaface or OpenFace;

所述跟踪匹配算法采用sort、deepsort、KCF或JDE。The tracking matching algorithm adopts sort, deepsort, KCF or JDE.

3.根据权利要求1所述的人脸抓拍方法，其特征在于，人脸图像质量得分计算公式为：3. human face capture method according to claim 1, is characterized in that, human face image quality score calculation formula is:

人脸图像质量得分＝人脸图像清晰度值*人脸关键点距离*人脸旋转程度值；Face image quality score = face image clarity value * face key point distance * face rotation degree value;

其中，人脸图像清晰度值对人脸图像进行DCT余弦变换后再进行归一化处理；Among them, the face image sharpness value is subjected to DCT cosine transformation on the face image and then normalized;

关键点距离是指鼻尖关键点到由两个眼睛的瞳孔与两个嘴角组成的四边形的最短距离；The key point distance refers to the shortest distance from the key point of the nose tip to the quadrilateral formed by the pupils of the two eyes and the two corners of the mouth;

人脸旋转程度值是指两个眼睛瞳孔连线的方向向量和两个嘴角连线的方向向量的均值与水平轴夹角的余弦值的绝对值。The face rotation degree value refers to the absolute value of the cosine value of the mean value of the direction vector connecting the two eye pupils and the direction vector connecting the two mouth corners and the angle between the horizontal axis.

4.根据权利要求1-3中任一所述的人脸抓拍方法，其特征在于，人脸图像对应的相关信息包括持续时间信息、所属视频帧信息、坐标信息和关键点坐标信息。4 . The face capture method according to claim 1 , wherein the relevant information corresponding to the face image includes duration information, video frame information, coordinate information and key point coordinate information. 5 .

5.根据权利要求4中所述的人脸抓拍方法，其特征在于，根据人脸图像质量得分筛选出每个目标序列符合要求的最佳人脸图像具体如下：5. according to the human face capture method described in claim 4, it is characterized in that, according to the human face image quality score, filter out the best human face image that each target sequence meets the requirements and is specifically as follows:

根据人脸图像质量得分判断目标人脸存在时间是否超出阈值：Determine whether the target face existence time exceeds the threshold according to the face image quality score:

若是，则判断人脸区域图像尺寸是否大于设定阈值：If so, judge whether the image size of the face area is larger than the set threshold:

若是，则输出该人脸区域图像；If so, output the image of the face area;

若否，则无输出。If not, there is no output.

6.一种人脸抓拍系统，其特征在于，该系统包括，6. a face capture system, is characterized in that, this system comprises,

人脸检测-关键点检测模块，用于利用人脸检测-关键点检测网络模型对人脸及关键点检测，获取人脸及人脸关键点坐标信息；The face detection-key point detection module is used to detect the face and key points by using the face detection-key point detection network model, and obtain the coordinate information of the face and the key points of the face;

跟踪匹配模块，用于利用跟踪匹配算法将统一目标的人脸序列关联匹配起来，进而获得每个目标在跟踪过程中的相关信息；The tracking matching module is used to correlate and match the face sequences of the unified target by using the tracking matching algorithm, and then obtain the relevant information of each target in the tracking process;

人脸质量评价模块，用于根据人脸坐标和关键点信息计算人脸图像质量得分，再根据人脸图像质量得分筛选出每个目标序列符合要求的最佳人脸图像，并输出最佳人脸图像对应的相关信息。The face quality evaluation module is used to calculate the face image quality score according to the face coordinates and key point information, and then filter out the best face image that meets the requirements for each target sequence according to the face image quality score, and output the best face image. The relevant information corresponding to the face image.

7.根据权利要求6所述的人脸抓拍系统，其特征在于，所述人脸检测-关键点检测模块的工作过程具体如下：7. human face capture system according to claim 6, is characterized in that, the working process of described human face detection-key point detection module is specifically as follows:

(2)、获取视频当前帧图像输入预训练好的人脸检测-关键点检测模型，经过网络前向推理，输出视频当前帧的人脸图像及其关键点坐标的信息；(2), obtain the current frame image of the video and input the pre-trained face detection-keypoint detection model, and through the network forward inference, output the face image of the current frame of the video and the information of the key point coordinates;

跟踪匹配模块的工作过程具体如下：The working process of the tracking matching module is as follows:

(1)、输入人脸区域图像和坐标信息；(1), input face area image and coordinate information;

(2)、使用跟踪匹配算法匹配前后帧中目标的人脸，将属于同一目标的人脸关联；(2), use the tracking matching algorithm to match the faces of the targets in the frames before and after, and associate the faces belonging to the same target;

(3)、输出匹配后的人脸信息。(3), output the matched face information.

8.根据权利要求6所述的人脸抓拍系统，其特征在于，所述人脸质量评价模块工作过程具体如下：8. human face capture system according to claim 6, is characterized in that, described human face quality evaluation module working process is specifically as follows:

(1)输入人脸坐标及其关键点坐标的人脸信息；(1) Input face information of face coordinates and key point coordinates;

(2)、计算人脸质量的得分，公式如下：(2), calculate the score of face quality, the formula is as follows:

人脸旋转程度值是指两个眼睛瞳孔连线的方向向量和两个嘴角连线的方向向量的均值与水平轴夹角的余弦值的绝对值；The face rotation degree value refers to the absolute value of the mean value of the direction vector connecting the two eye pupils and the direction vector connecting the two mouth corners and the cosine value of the angle between the horizontal axis;

(3)、更新人脸最优得分及人脸图像对应的相关信息；其中，人脸图像对应的相关信息包括持续时间信息、所属视频帧信息、坐标信息和关键点坐标信息；(3), update the relevant information corresponding to the optimal score of the face and the face image; wherein, the relevant information corresponding to the face image includes duration information, video frame information, coordinate information and key point coordinate information;

(4)、输入人脸图像质量得分，判断目标人脸存在时间是否超出阈值：(4), enter the face image quality score, and judge whether the target face existence time exceeds the threshold:

若是，则执行步骤(5)；If so, execute step (5);

(5)、判断人脸区域图像尺寸是否大于设定阈值：(5), determine whether the image size of the face area is larger than the set threshold:

①、若大于，则输出该人脸区域图像；①, if it is greater than, output the image of the face area;

②、若否，则无输出。②, if not, there is no output.

9.根据权利要求6-8中任一所述的人脸抓拍系统，其特征在于，所述人脸检测-人脸关键点检测模型采用MTCNN、Retinaface或OpenFace；9. The human face capture system according to any one of claims 6-8, wherein the human face detection-human face key point detection model adopts MTCNN, Retinaface or OpenFace;

10.一种计算机可读存储介质，其特征在于，所述计算机可读存储介质中存储有计算机执行指令，当处理器执行所述计算机执行时，实现如权利要求1至5中任一所述的人脸抓拍方法。10. A computer-readable storage medium, wherein computer-executable instructions are stored in the computer-readable storage medium, and when a processor executes the computer execution, any one of claims 1 to 5 is implemented. face capture method.