Disclosure of Invention
The invention aims to provide a quick and efficient automatic face auditing method, system and device and a readable storage medium.
In order to solve the technical problems, the technical scheme of the invention is as follows:
in a first aspect, the present invention provides an automatic face auditing method, including the steps of:
face detection, namely detecting coordinates of a rectangular frame of the face and coordinates of key points of the face through a cascade neural network algorithm;
face quality evaluation and screening, namely performing face quality evaluation on quality attributes of a plurality of face pictures, and screening high-quality pictures;
detecting living bodies, namely detecting whether the pictures are true persons or not by using a double-flow convolutional neural network, and filtering and judging the pictures to be non-true persons;
and (3) face comparison authentication, namely extracting face feature vectors of the pictures, comparing the similarity degree between the face feature vectors of the pictures and the face feature vectors of the standard pictures, and filtering the pictures with low similarity degree.
Preferably, before face detection, a picture with a filtering resolution outside the threshold range is further included.
Preferably, after the face detection, the method further comprises:
filtering out the pictures with the size of the face rectangular frame accounting for the size of the picture and lower than a threshold value;
and filtering out pictures with the distance between the two eyes of the human face lower than a threshold value.
Preferably, after the face detection, the method further comprises:
and aligning the human face, calculating a transformation matrix between the key point coordinates of the human face of a picture and the key point coordinates of a pre-stored standard human face, and acting the transformation matrix on the picture to obtain an aligned human face image.
Preferably, the face comparison authentication process is as follows:
extracting the high-quality picture, outputting a floating point vector with 512 dimensions by using a 50-layer ResNet neural network, and recording the floating point vector as a face feature vector;
by comparing the similarity degree between the face feature vector of the current picture and the face feature vector of the standard picture, the formula is as follows:
wherein S isi Is the face feature vector of the current picture, Sj The face feature vector is the face feature vector of the standard picture;
if the similarity is lower than a threshold, judging that the person evidence is not uniform; and if the similarity is higher than a threshold value, judging that the person and the evidence are uniform.
Preferably, the quality attributes used for face quality assessment include: the quality attributes used for face quality assessment comprise face gestures, eye states, mouth states, makeup states, overall brightness, left and right face brightness differences, ambiguity and shielding;
the face pose, the eye state, the mouth state, the dressing state, the ambiguity and the occlusion all adopt a MobileFaceNet structure as a main body to construct a multi-task convolutional neural network, and a plurality of tasks output corresponding to the quality attributes of the face respectively.
Eye state, mouth state, makeup state and face shielding are used as classification tasks, and a softmax loss function is used as an objective function;
the face gesture, the illuminance of the image and the image ambiguity are used as regression tasks, and a Euclidean loss function is used as an objective function;
the total objective function of the network training comprises a combination of a plurality of Softmax loss functions and a Euclidean loss function, and when a plurality of tasks are learned together, the total objective function is a linear combination of a plurality of loss functions.
Preferably, the in vivo detection process is:
acquiring a depth image, and carrying out normalization processing on a face region in the image to obtain a processed face depth image;
inputting RGB face images with a face ID of a preset frame number and the face depth map into a deep learning network for detection, and obtaining a living body judgment result of each frame of picture;
voting all living body judgment results of the face ID, when the number of frames of the living body is judged to be large, determining the object as a living body, and when the number of frames of the attack is judged to be large, determining the object as a non-living body.
Preferably, the in vivo detection process is:
the face is intercepted from the original image, RGB channels are converted into HSV and YCbCr spaces, and the converted HSV and YCbCr images are overlapped to obtain an overlapped image; extracting Sobel features from the face region through a Sobel operator, and obtaining a Sobel feature map;
respectively inputting the Sobel characteristic diagram and the superposition diagram of a preset frame number of a face ID from two input channels of a double-flow neural network to obtain a living body judgment result of each frame of picture;
voting all living body judgment results of the face ID, when the number of frames of the living body is judged to be large, the object is judged to be living body, and when the number of frames of the attack is judged to be large, the object is judged to be non-living body.
In a second aspect, the present invention further provides an automatic face auditing system, including:
the human face detection module is used for detecting a human face rectangular frame and human face key points through a cascade neural network algorithm;
the face quality evaluation module is used for carrying out face quality evaluation through quality attributes of a plurality of face pictures and screening high-quality pictures;
the living body detection module is used for detecting whether the pictures are true persons or not by using a double-flow convolutional neural network and filtering pictures without the true persons;
and a face comparison module for extracting face feature vectors of the pictures, comparing the similarity between the face feature vectors of the pictures and the face feature vectors of the pre-stored credentials, and filtering the pictures with low similarity.
In a third aspect, the present invention further provides an automatic face auditing apparatus, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the steps of the automatic face auditing method described above when executing the program.
In a fourth aspect, the present invention further proposes a readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the above-described automatic face auditing method.
The technical scheme of the invention is an automatic face auditing method combining face detection, face quality analysis, face living body detection and face recognition technologies. The method examines whether the quality of the personal photo uploaded by the user is compliant, whether the personal photo is a true person or not, and whether the personal photo is authenticated. In the technical scheme, whether the quality of the personal photo is in compliance mainly considers whether the face in the picture is high in quality and is convenient to identify; the living body detection mainly considers whether the photo is a flip or fake; whether the personal certificate is unified mainly considers whether the certificate photograph and the personal photo are the same person. The invention realizes full automation of picture auditing, does not need manual operation, reduces labor cost, has stable algorithm and reduces manual error. In addition, each invention uses a cascading picture filtering mode, so that the speed is higher.
Detailed Description
The following describes the embodiments of the present invention further with reference to the drawings. The description of these embodiments is provided to assist understanding of the present invention, but is not intended to limit the present invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
Referring to fig. 1, the invention provides an automatic face auditing method, which comprises the following steps:
filtering pictures with resolution outside the threshold range, and taking pictures with resolution lower than the threshold in the vertical direction or lower than the threshold in the horizontal direction as pictures with resolution not meeting the requirements. In the embodiment of the invention, the photos with the resolution lower than 640 in the vertical direction or the resolution lower than 480 in the horizontal direction are filtered and deleted.
S10: and detecting the coordinates of the rectangular frame of the human face and the coordinates of key points of the human face through a cascade neural network algorithm.
And predicting the coordinates of the face frame and the coordinates of the key points of the face in the image by using a cascade neural network algorithm. The face frame coordinates refer to a rectangular face frame containing a face area; the face key point coordinates refer to positions of 106 key points of the face area, and cover eyebrows, glasses, nose, mouth and face contours of the face area.
And calculating the size of the face frame according to the face frame coordinates. When the face duty ratio is greater than the threshold, the face duty ratio does not meet the requirement. In this embodiment, when the face ratio is greater than 0.4, the proportion of the face to the whole image is considered to be too large.
Face duty ratio = face frame size/image size
And calculating the distance between pupils of two eyes according to the key points of the human face, namely the number of pixels between the centers of the two eyes. When the interocular distance is smaller than the threshold value, the interocular distance does not satisfy the requirement. For example, when the distance between the left eye and the right eye is less than 40, the inter-eye distance is too small.
Face alignment: and aiming at each face, calculating a transformation matrix between the extracted face key point coordinates and the standard face key point coordinates, and applying the transformation matrix to the initial face image to obtain an aligned face image. The distribution of the aligned face key point coordinates is more consistent, and the face is corrected.
S20: face quality evaluation and screening, namely performing face quality evaluation on quality attributes of a plurality of face pictures, and screening high-quality pictures.
The face quality evaluation algorithm adopts a mode of combining deep learning and a traditional image analysis algorithm, and achieves the quality attributes of face brightness, left-right face brightness difference, face angle (yaw) around the y-axis direction, face angle (pitch) around the x-axis direction, face angle (roll) around the z-axis direction, expression classification, glasses classification, mask classification, eye state classification, mouth state classification, makeup state classification, face reality (the distinguishing of the items is stone statue image, CG face and real face), face ambiguity, face shielding degree and the like according to the face characteristics of the face image.
The face brightness and the left and right face brightness difference adopt a traditional algorithm, specifically, three RGB channels of a face image are converted into gray images according to a certain proportion, each region of the face is divided according to 106 key points of the face, the face brightness is calculated according to the gray average value of the face region, and the left and right face brightness difference is calculated according to the gray average value of the left and right faces.
The other attributes are realized by adopting a deep learning method, a lightweight MobileFaceNet structure is adopted as a main body to construct a multi-task convolutional neural network, and a plurality of tasks output each quality attribute corresponding to a face. The quality judgment of eye state, mouth state, dressing state, face shielding, mask classification and the like belongs to classification tasks, and a softmax loss function is adopted as an objective function; face pose angles, image ambiguities and the like belong to regression tasks, and a Euclidean loss function is adopted as an objective function. The total objective function of the network training is a combination of a plurality of Softmax loss functions and a Euclidean loss function, and when a plurality of tasks are learned together, the total objective function is a linear combination of a plurality of loss functions.
Calculate Softmax loss L: l= -log (pi ),
Wherein p is
i Normalized probabilities calculated for each attribute class, i.e
x
i Representing the ith neuron output, N representing the total number of categories;
calculate the Euclidean loss L:
wherein y is
i As a true tag value of the tag,
is the predicted value of the regressor.
After face quality evaluation, face quality screening is also required, and reference factors for screening include the following:
face ratio: and calculating the size of the face frame according to the face frame coordinates, wherein when the face ratio is larger than a threshold value, the face ratio does not meet the requirement. For example: when the face ratio is greater than 0.4, the proportion of the face to the whole image is too large.
Face brightness: the brightness of the human face should be within a reasonable range. For example: the human face brightness value is between 0 and 1, and the reasonable human face brightness is more than 0.3 and less than 0.8.
Left-right face brightness difference: the left-right face luminance difference should be less than the threshold. For example: when the difference of the left and right face brightness is between 0 and 1, the reasonable difference of the left and right face brightness is less than 0.4.
Face pose: the face angle (yaw) around the y-axis, the face angle (pitch) around the x-axis, and the face angle (roll) around the z-axis should be within a reasonable range. For example, within + -10 deg..
Ambiguity: the ambiguity should be less than a threshold. For example: when the ambiguity is between 0 and 1, the face ambiguity should be less than 0.6.
Shielding: if the face map is judged to have shielding of the five sense organs and the outlines, including a sunglasses or a mask, filtering is carried out.
Expression: if the face image is judged to be an exaggerated expression, the eyes are closed, and the mouth is enlarged, filtering is performed.
Degree of realism: the degree of realism should be greater than a threshold, if the degree of realism is small, this may indicate that the face is a statue face/cartoon face, etc. For example: when the value of the reality is between 0 and 1, the reality of the human face is larger than 0.6.
And filtering out pictures which do not meet the quality requirements according to the requirements.
S30: and detecting living bodies, namely detecting whether the pictures are true persons or not by using a double-flow convolutional neural network, and filtering and judging the pictures to be non-true persons.
The following two methods can be used in performing the biopsy:
the first living body detection process is as follows:
acquiring a depth image, and carrying out normalization processing on a face region in the image to obtain a processed face depth image;
and inputting the RGB face image of the picture and the face depth image into a deep learning network for detection, and obtaining a living body judgment result of the picture.
Specifically, a deep learning network for performing living body judgment on a picture uses Resnet as a basic network, the deep learning network adopts a double-input channel of a face image and a face depth image, after feature extraction is performed on two input branches respectively, the features extracted by the two branches are selectively excited and fused through a se-module, and then feature extraction is performed on the fused features through a plurality of layers of convolution, so that a living body judgment result is obtained.
Specifically, the objective function of the deep learning network is a focal loss function.
Specifically, the actual depths of all the eyes and mouth angles in the key points of the human face are calculated, the average value of the actual depths of the points is calculated, the upper limit of normalization is taken as the average value plus a fixed value, the lower limit is taken as the average value minus the fixed value, and the depth of the human face area is normalized into a gray scale map with the pixel value in the interval of 0-255.
The gray value for the actual depth greater than the upper limit and less than the lower limit is set to 0.
Wherein, the normalization formula is:
wherein V is a gray value after depth normalization, the range is 0-255, dreal is the actual depth of the face region, dmax is the upper limit of the actual depth of the face, and Dmin is the lower limit of the actual depth of the face.
The second living body detection process is as follows:
the face is intercepted from the original image, RGB channels are converted into HSV and YCbCr spaces, and the converted HSV and YCbCr images are overlapped to obtain an overlapped image; extracting Sobel features from the face region through a Sobel operator to obtain a Sobel feature map;
and respectively inputting the Sobel characteristic diagram and the superposition diagram of the picture from two input channels of the double-flow neural network to obtain a living body judgment result of the picture.
Specifically, for each input image a, gx and Gy are convolved with image a to obtain AGx, AGy, respectively, and then image AG is output, the value of each pixel is:
where Gx represents the convolution kernel in the x-direction and Gy represents the convolution kernel in the y-direction.
S40: and (3) face comparison authentication, namely extracting face feature vectors of the pictures, comparing the similarity degree between the face feature vectors of the pictures and the face feature vectors of the standard pictures, and outputting a comparison result.
Extracting a high-quality picture, outputting a floating point vector with 512 dimensions by using a 50-layer ResNet neural network, and recording the floating point vector as a face feature vector;
by comparing the similarity degree between the face feature vector of the current picture and the face feature vector of the standard picture, the formula is as follows:
wherein S isi Is the face feature vector of the current picture, Sj The face feature vector is the face feature vector of the standard picture;
if the similarity is lower than the threshold, judging that the credentials are not uniform; if the similarity is higher than the threshold, judging that the people are unified
The technical scheme of the invention is a face auditing method combining face detection, face quality analysis, face living body detection and face recognition technology, and is used for checking whether the quality of the personal photo uploaded by the user is compliant, whether the personal photo is a true person or not and whether the personal photo is authenticated. Whether the quality of the personal photo is compliant mainly considers whether the face in the picture is high in quality and is convenient to identify; the living body detection mainly considers whether the photo is a flip or fake; whether the personal certificate is unified mainly considers whether the certificate photograph and the personal photo are the same person. The invention realizes full automation of picture auditing, does not need manual operation, reduces labor cost, has stable algorithm and reduces manual error. In addition, each invention uses a cascading picture filtering mode, so that the speed is higher.
On the other hand, the invention also provides an automatic face auditing system, which comprises the following steps:
the human face detection module is used for detecting a human face rectangular frame and human face key points through a cascade neural network algorithm;
the face quality evaluation module is used for carrying out face quality evaluation through quality attributes of a plurality of face pictures and screening high-quality pictures;
the living body detection module is used for detecting whether the pictures are true persons or not by using a double-flow convolutional neural network and filtering pictures without the true persons;
and a face comparison module for extracting face feature vectors of the pictures, comparing the similarity between the face feature vectors of the pictures and the face feature vectors of the pre-stored credentials, and filtering the pictures with low similarity.
In still another aspect, the present invention further provides an automatic face auditing apparatus, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the steps of the automatic face auditing method described above when executing the program.
In yet another aspect, the present invention further provides a readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above-described automatic face auditing method.
Aiming at websites and applications requiring uploading images to meet certain standard requirements, the invention provides an automatic face auditing method and system based on a deep neural network. The method can be effectively used in the information verification category, and realizes rapid face filtering and person evidence comparison.
The embodiments of the present invention have been described in detail above with reference to the accompanying drawings, but the present invention is not limited to the described embodiments. It will be apparent to those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, and yet fall within the scope of the invention.