if the similarity is lower than a threshold, judging that the person evidence is not uniform; and if the similarity is higher than a threshold value, judging that the person and the evidence are uniform.

Preferably, the quality attributes used for face quality assessment include: the quality attributes used for face quality assessment comprise face gestures, eye states, mouth states, makeup states, overall brightness, left and right face brightness differences, ambiguity and shielding;

the face pose, the eye state, the mouth state, the dressing state, the ambiguity and the occlusion all adopt a MobileFaceNet structure as a main body to construct a multi-task convolutional neural network, and a plurality of tasks output corresponding to the quality attributes of the face respectively.

Eye state, mouth state, makeup state and face shielding are used as classification tasks, and a softmax loss function is used as an objective function;

the face gesture, the illuminance of the image and the image ambiguity are used as regression tasks, and a Euclidean loss function is used as an objective function;

the total objective function of the network training comprises a combination of a plurality of Softmax loss functions and a Euclidean loss function, and when a plurality of tasks are learned together, the total objective function is a linear combination of a plurality of loss functions.

Preferably, the in vivo detection process is:

acquiring a depth image, and carrying out normalization processing on a face region in the image to obtain a processed face depth image;

inputting RGB face images with a face ID of a preset frame number and the face depth map into a deep learning network for detection, and obtaining a living body judgment result of each frame of picture;

voting all living body judgment results of the face ID, when the number of frames of the living body is judged to be large, determining the object as a living body, and when the number of frames of the attack is judged to be large, determining the object as a non-living body.

Preferably, the in vivo detection process is:

the face is intercepted from the original image, RGB channels are converted into HSV and YCbCr spaces, and the converted HSV and YCbCr images are overlapped to obtain an overlapped image; extracting Sobel features from the face region through a Sobel operator, and obtaining a Sobel feature map;

respectively inputting the Sobel characteristic diagram and the superposition diagram of a preset frame number of a face ID from two input channels of a double-flow neural network to obtain a living body judgment result of each frame of picture;

voting all living body judgment results of the face ID, when the number of frames of the living body is judged to be large, the object is judged to be living body, and when the number of frames of the attack is judged to be large, the object is judged to be non-living body.

In a second aspect, the present invention further provides an automatic face auditing system, including:

the human face detection module is used for detecting a human face rectangular frame and human face key points through a cascade neural network algorithm;

the face quality evaluation module is used for carrying out face quality evaluation through quality attributes of a plurality of face pictures and screening high-quality pictures;

the living body detection module is used for detecting whether the pictures are true persons or not by using a double-flow convolutional neural network and filtering pictures without the true persons;

and a face comparison module for extracting face feature vectors of the pictures, comparing the similarity between the face feature vectors of the pictures and the face feature vectors of the pre-stored credentials, and filtering the pictures with low similarity.

In a third aspect, the present invention further provides an automatic face auditing apparatus, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the steps of the automatic face auditing method described above when executing the program.

In a fourth aspect, the present invention further proposes a readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the above-described automatic face auditing method.

The technical scheme of the invention is an automatic face auditing method combining face detection, face quality analysis, face living body detection and face recognition technologies. The method examines whether the quality of the personal photo uploaded by the user is compliant, whether the personal photo is a true person or not, and whether the personal photo is authenticated. In the technical scheme, whether the quality of the personal photo is in compliance mainly considers whether the face in the picture is high in quality and is convenient to identify; the living body detection mainly considers whether the photo is a flip or fake; whether the personal certificate is unified mainly considers whether the certificate photograph and the personal photo are the same person. The invention realizes full automation of picture auditing, does not need manual operation, reduces labor cost, has stable algorithm and reduces manual error. In addition, each invention uses a cascading picture filtering mode, so that the speed is higher.

Drawings

Fig. 1 is a flowchart illustrating steps of an embodiment of an automatic face auditing method according to the present invention.

Detailed Description

The following describes the embodiments of the present invention further with reference to the drawings. The description of these embodiments is provided to assist understanding of the present invention, but is not intended to limit the present invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

Referring to fig. 1, the invention provides an automatic face auditing method, which comprises the following steps:

filtering pictures with resolution outside the threshold range, and taking pictures with resolution lower than the threshold in the vertical direction or lower than the threshold in the horizontal direction as pictures with resolution not meeting the requirements. In the embodiment of the invention, the photos with the resolution lower than 640 in the vertical direction or the resolution lower than 480 in the horizontal direction are filtered and deleted.

S10: and detecting the coordinates of the rectangular frame of the human face and the coordinates of key points of the human face through a cascade neural network algorithm.

And predicting the coordinates of the face frame and the coordinates of the key points of the face in the image by using a cascade neural network algorithm. The face frame coordinates refer to a rectangular face frame containing a face area; the face key point coordinates refer to positions of 106 key points of the face area, and cover eyebrows, glasses, nose, mouth and face contours of the face area.

And calculating the size of the face frame according to the face frame coordinates. When the face duty ratio is greater than the threshold, the face duty ratio does not meet the requirement. In this embodiment, when the face ratio is greater than 0.4, the proportion of the face to the whole image is considered to be too large.

Face duty ratio = face frame size/image size

And calculating the distance between pupils of two eyes according to the key points of the human face, namely the number of pixels between the centers of the two eyes. When the interocular distance is smaller than the threshold value, the interocular distance does not satisfy the requirement. For example, when the distance between the left eye and the right eye is less than 40, the inter-eye distance is too small.

Face alignment: and aiming at each face, calculating a transformation matrix between the extracted face key point coordinates and the standard face key point coordinates, and applying the transformation matrix to the initial face image to obtain an aligned face image. The distribution of the aligned face key point coordinates is more consistent, and the face is corrected.

S20: face quality evaluation and screening, namely performing face quality evaluation on quality attributes of a plurality of face pictures, and screening high-quality pictures.

The face quality evaluation algorithm adopts a mode of combining deep learning and a traditional image analysis algorithm, and achieves the quality attributes of face brightness, left-right face brightness difference, face angle (yaw) around the y-axis direction, face angle (pitch) around the x-axis direction, face angle (roll) around the z-axis direction, expression classification, glasses classification, mask classification, eye state classification, mouth state classification, makeup state classification, face reality (the distinguishing of the items is stone statue image, CG face and real face), face ambiguity, face shielding degree and the like according to the face characteristics of the face image.

The face brightness and the left and right face brightness difference adopt a traditional algorithm, specifically, three RGB channels of a face image are converted into gray images according to a certain proportion, each region of the face is divided according to 106 key points of the face, the face brightness is calculated according to the gray average value of the face region, and the left and right face brightness difference is calculated according to the gray average value of the left and right faces.

The other attributes are realized by adopting a deep learning method, a lightweight MobileFaceNet structure is adopted as a main body to construct a multi-task convolutional neural network, and a plurality of tasks output each quality attribute corresponding to a face. The quality judgment of eye state, mouth state, dressing state, face shielding, mask classification and the like belongs to classification tasks, and a softmax loss function is adopted as an objective function; face pose angles, image ambiguities and the like belong to regression tasks, and a Euclidean loss function is adopted as an objective function. The total objective function of the network training is a combination of a plurality of Softmax loss functions and a Euclidean loss function, and when a plurality of tasks are learned together, the total objective function is a linear combination of a plurality of loss functions.

Calculate Softmax loss L: l= -log (p_i ),

Wherein p is_i Normalized probabilities calculated for each attribute class, i.e

x_i Representing the ith neuron output, N representing the total number of categories;

calculate the Euclidean loss L:

wherein y is_i As a true tag value of the tag,

is the predicted value of the regressor.

After face quality evaluation, face quality screening is also required, and reference factors for screening include the following:

face ratio: and calculating the size of the face frame according to the face frame coordinates, wherein when the face ratio is larger than a threshold value, the face ratio does not meet the requirement. For example: when the face ratio is greater than 0.4, the proportion of the face to the whole image is too large.

Face brightness: the brightness of the human face should be within a reasonable range. For example: the human face brightness value is between 0 and 1, and the reasonable human face brightness is more than 0.3 and less than 0.8.

Left-right face brightness difference: the left-right face luminance difference should be less than the threshold. For example: when the difference of the left and right face brightness is between 0 and 1, the reasonable difference of the left and right face brightness is less than 0.4.

Face pose: the face angle (yaw) around the y-axis, the face angle (pitch) around the x-axis, and the face angle (roll) around the z-axis should be within a reasonable range. For example, within + -10 deg..

Ambiguity: the ambiguity should be less than a threshold. For example: when the ambiguity is between 0 and 1, the face ambiguity should be less than 0.6.

Shielding: if the face map is judged to have shielding of the five sense organs and the outlines, including a sunglasses or a mask, filtering is carried out.

Expression: if the face image is judged to be an exaggerated expression, the eyes are closed, and the mouth is enlarged, filtering is performed.

Degree of realism: the degree of realism should be greater than a threshold, if the degree of realism is small, this may indicate that the face is a statue face/cartoon face, etc. For example: when the value of the reality is between 0 and 1, the reality of the human face is larger than 0.6.

And filtering out pictures which do not meet the quality requirements according to the requirements.

S30: and detecting living bodies, namely detecting whether the pictures are true persons or not by using a double-flow convolutional neural network, and filtering and judging the pictures to be non-true persons.

The following two methods can be used in performing the biopsy:

the first living body detection process is as follows:

and inputting the RGB face image of the picture and the face depth image into a deep learning network for detection, and obtaining a living body judgment result of the picture.

Specifically, a deep learning network for performing living body judgment on a picture uses Resnet as a basic network, the deep learning network adopts a double-input channel of a face image and a face depth image, after feature extraction is performed on two input branches respectively, the features extracted by the two branches are selectively excited and fused through a se-module, and then feature extraction is performed on the fused features through a plurality of layers of convolution, so that a living body judgment result is obtained.

Specifically, the objective function of the deep learning network is a focal loss function.

Specifically, the actual depths of all the eyes and mouth angles in the key points of the human face are calculated, the average value of the actual depths of the points is calculated, the upper limit of normalization is taken as the average value plus a fixed value, the lower limit is taken as the average value minus the fixed value, and the depth of the human face area is normalized into a gray scale map with the pixel value in the interval of 0-255.

The gray value for the actual depth greater than the upper limit and less than the lower limit is set to 0.

Wherein, the normalization formula is:

wherein V is a gray value after depth normalization, the range is 0-255, dreal is the actual depth of the face region, dmax is the upper limit of the actual depth of the face, and Dmin is the lower limit of the actual depth of the face.

The second living body detection process is as follows:

the face is intercepted from the original image, RGB channels are converted into HSV and YCbCr spaces, and the converted HSV and YCbCr images are overlapped to obtain an overlapped image; extracting Sobel features from the face region through a Sobel operator to obtain a Sobel feature map;

and respectively inputting the Sobel characteristic diagram and the superposition diagram of the picture from two input channels of the double-flow neural network to obtain a living body judgment result of the picture.

Specifically, for each input image a, gx and Gy are convolved with image a to obtain AGx, AGy, respectively, and then image AG is output, the value of each pixel is:

where Gx represents the convolution kernel in the x-direction and Gy represents the convolution kernel in the y-direction.

S40: and (3) face comparison authentication, namely extracting face feature vectors of the pictures, comparing the similarity degree between the face feature vectors of the pictures and the face feature vectors of the standard pictures, and outputting a comparison result.

Extracting a high-quality picture, outputting a floating point vector with 512 dimensions by using a 50-layer ResNet neural network, and recording the floating point vector as a face feature vector;

if the similarity is lower than the threshold, judging that the credentials are not uniform; if the similarity is higher than the threshold, judging that the people are unified

The technical scheme of the invention is a face auditing method combining face detection, face quality analysis, face living body detection and face recognition technology, and is used for checking whether the quality of the personal photo uploaded by the user is compliant, whether the personal photo is a true person or not and whether the personal photo is authenticated. Whether the quality of the personal photo is compliant mainly considers whether the face in the picture is high in quality and is convenient to identify; the living body detection mainly considers whether the photo is a flip or fake; whether the personal certificate is unified mainly considers whether the certificate photograph and the personal photo are the same person. The invention realizes full automation of picture auditing, does not need manual operation, reduces labor cost, has stable algorithm and reduces manual error. In addition, each invention uses a cascading picture filtering mode, so that the speed is higher.

On the other hand, the invention also provides an automatic face auditing system, which comprises the following steps:

In still another aspect, the present invention further provides an automatic face auditing apparatus, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the steps of the automatic face auditing method described above when executing the program.

In yet another aspect, the present invention further provides a readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above-described automatic face auditing method.

Aiming at websites and applications requiring uploading images to meet certain standard requirements, the invention provides an automatic face auditing method and system based on a deep neural network. The method can be effectively used in the information verification category, and realizes rapid face filtering and person evidence comparison.

The embodiments of the present invention have been described in detail above with reference to the accompanying drawings, but the present invention is not limited to the described embodiments. It will be apparent to those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, and yet fall within the scope of the invention.

Claims

1. An automatic face auditing method is characterized by comprising the following steps:

face comparison and authentication, namely extracting a face feature vector of a picture, comparing the similarity degree between the face feature vector of the picture and the face feature vector of a standard picture, and filtering a picture with low similarity degree; before face detection, filtering pictures with resolution outside a threshold range;

the quality attributes used for face quality assessment include: the quality attributes used for face quality assessment comprise face gestures, eye states, mouth states, makeup states, overall brightness, left and right face brightness differences, ambiguity and shielding;

the face pose, eye state, mouth state, dressing state, ambiguity and occlusion all adopt a MobileFaceNet structure as a main body to construct a multi-task convolutional neural network, and a plurality of task outputs respectively correspond to each quality attribute of the face;

2. The automatic face auditing method of claim 1, characterized by: after face detection, further comprising:

3. The automatic face auditing method according to claim 1, further comprising, after face detection:

and aligning the human face, calculating a transformation matrix between the coordinates of the key points of the human face of a picture and the coordinates of the key points of the pre-stored standard human face, and acting the transformation matrix on the picture to obtain an aligned human face image.

4. A method of automatic face auditing according to any of claims 1 to 3, in which the process of face comparison authentication is:

if the similarity is lower than a threshold, judging that the person evidence is not uniform; and if the similarity is higher than a threshold value, judging that the person and the evidence are unified.

5. A method of automatic face auditing according to any of claims 1 to 3, in which the process of in vivo detection is:

6. A method of automatic face auditing according to any of claims 1 to 3, in which the process of in vivo detection is:

7. An automated face auditing system, comprising:

the face comparison module is used for extracting face feature vectors of the pictures, comparing the similarity degree between the face feature vectors of the pictures and the face feature vectors of the pre-stored credentials, and filtering the pictures with low similarity degree;

8. An automated facial auditing apparatus comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, characterized by: the processor, when executing the program, implements the steps of the automatic face auditing method of any of claims 1-6.

9. A readable storage medium having stored thereon a computer program for automated face auditing, characterized by: the computer program when executed by a processor implements the steps of the automatic face auditing method of any of claims 1-6.