Disclosure of Invention
The invention aims to provide a quick and efficient automatic face auditing method, a system, equipment and a readable storage medium.
In order to solve the technical problems, the technical scheme of the invention is as follows:
in a first aspect, the present invention provides an automatic face audit method, which includes the steps of:
detecting the face, namely detecting the coordinates of a face rectangular frame and the coordinates of face key points through a cascading neural network algorithm;
evaluating and screening the quality of the human face, namely evaluating the quality of the human face of the quality attributes of a plurality of human face pictures and screening high-quality pictures;
detecting a living body, namely detecting whether the picture is a real person or not by using a double-flow convolutional neural network, and filtering the picture which is judged to be a non-real person;
and (3) face comparison authentication, namely extracting a face characteristic vector of the picture, comparing the similarity between the face characteristic vector of the picture and the face characteristic vector of the standard picture, and filtering the picture with low similarity.
Preferably, before the face detection, the method further comprises filtering the pictures with the resolution outside the threshold range.
Preferably, after the face detection, the method further comprises:
filtering out pictures with the proportion of the size of the face rectangular frame to the size of the pictures lower than a threshold value;
and filtering out pictures with the distance between two eyes of the human face lower than a threshold value.
Preferably, after the face detection, the method further comprises:
and aligning the face, calculating a transformation matrix between the key point coordinates of the face of a picture and the key point coordinates of a pre-stored standard face, and acting the transformation matrix on the picture to obtain an aligned face image.
Preferably, the process of face comparison authentication is as follows:
extracting the high-quality picture, outputting a 512-dimensional floating point vector by using a 50-layer ResNet neural network, and recording the 512-dimensional floating point vector as a face feature vector;
by comparing the similarity between the face feature vector of the current picture and the face feature vector of the standard picture, the formula is as follows:
wherein S isiFor the face feature vector of the current picture, SjThe face feature vector is a standard picture;
if the similarity degree is lower than a threshold value, judging that the testimonies are not uniform; and if the similarity degree is higher than the threshold value, judging that the testimony is uniform.
Preferably, the quality attributes used for face quality assessment include: the quality attributes used for the human face quality evaluation comprise human face posture, eye state, mouth state, makeup state, overall brightness, left and right face brightness difference, ambiguity and occlusion;
the face posture, the eye state, the mouth state, the makeup state, the ambiguity and the shielding all adopt a MobileFaceNet structure as a main body to construct a multitask convolution neural network, and a plurality of task outputs respectively correspond to all quality attributes of the face.
Eye state, mouth state, makeup state and face shielding are classified tasks, and a softmax loss function is adopted as a target function;
the human face posture, the image illuminance and the image fuzziness are regression tasks, and an Euclidean loss function is adopted as a target function;
the total objective function of the network training comprises a combination of a plurality of Softmax loss functions and Euclidean loss functions, and when a plurality of tasks are jointly learned, the total objective function is a linear combination of the plurality of loss functions.
Preferably, the process of in vivo detection is:
acquiring a depth image, and carrying out normalization processing on a face region in a picture to obtain a processed face depth image;
inputting an RGB face image with a face ID preset frame number and the face depth image into a deep learning network for detection to obtain a living body judgment result of each frame of image;
voting is performed on all living body judgment results of the face ID, and when the number of the living body judgment results is judged to be large, the object is determined to be a living body, and when the number of the attacking frames is judged to be large, the object is determined to be a non-living body.
Preferably, the process of in vivo detection is:
intercepting a human face from an original image, converting an RGB channel into HSV and YCbCr spaces, and overlapping the converted HSV and YCbCr images to obtain an overlay image; extracting Sobel features from the face region through a Sobel operator, and obtaining an obtained Sobel feature map;
inputting the Sobel feature map and the superimposed map of a preset frame number of face ID from two input channels of the double-flow neural network respectively to obtain a living body judgment result of each frame of image;
voting is performed on all living body judgment results of the face ID, and when the number of the living body judgment results is judged to be large, the object is determined to be a living body, and when the number of the attacking frames is judged to be large, the object is determined to be a non-living body.
In a second aspect, the present invention further provides an automatic face audit system, including:
the face detection module detects a face rectangular frame and face key points through a cascading neural network algorithm;
the human face quality evaluation module is used for carrying out human face quality evaluation according to the quality attributes of the human face pictures and screening high-quality pictures;
the living body detection module detects whether the picture is a real person or not by using a double-flow convolutional neural network, and filters the picture without the real person;
and the face comparison module is used for extracting the face characteristic vector of the picture, comparing the similarity between the face characteristic vector of the picture and the face characteristic vector of the pre-stored certificate photo, and filtering the picture with low similarity.
In a third aspect, the present invention further provides an automatic face audit device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the steps of the automatic face audit method when executing the program.
In a fourth aspect, the present invention further provides a readable storage medium for automatic face auditing, wherein the readable storage medium stores a computer program, and the computer program, when executed by a processor, implements the steps of the automatic face auditing method.
The technical scheme of the invention is an automatic face auditing method combining face detection, face quality analysis, face living body detection and face recognition technologies. By the method, whether the quality of the personal photos uploaded by the user is in compliance, whether the photos are real or not and whether the photos are integrated with one another or not are checked. In the technical scheme, whether the quality of the personal photo conforms or not mainly considers whether the face in the picture is high in quality or not and is convenient to identify; the living body detection mainly considers whether the photo is a copy or a forgery; whether the certificate photo and the personal photo are the same person or not is mainly considered. The invention realizes the full automation of the picture audit, does not need manual operation, reduces the labor cost, has stable algorithm and reduces the manual error. In addition, each invention uses a cascading picture filtering mode, and the speed is high.
Detailed Description
The following further describes embodiments of the present invention with reference to the drawings. It should be noted that the description of the embodiments is provided to help understanding of the present invention, but the present invention is not limited thereto. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Referring to fig. 1, the invention provides an automatic face auditing method, comprising the following steps:
and filtering pictures with resolution outside the threshold range, wherein the pictures with the vertical resolution lower than the threshold or the horizontal resolution lower than the threshold are not satisfied with the image resolution. In the embodiment of the invention, the photos with the vertical direction resolution lower than 640 or the horizontal direction resolution lower than 480 are filtered and deleted.
S10: and detecting the face, namely detecting the coordinates of a face rectangular frame and the coordinates of face key points through a cascading neural network algorithm.
And predicting the coordinates of the face frame and the coordinates of the face key points in the image by using a cascading neural network algorithm. The face frame coordinates refer to a rectangular face frame containing a face region; the coordinates of the key points of the human face refer to the positions of 106 key points of the face region of the human face, and eyebrows, glasses, a nose, a mouth and a facial contour of the face region of the human face are covered.
And calculating the size of the face frame according to the coordinates of the face frame. And when the face proportion is larger than the threshold value, the face proportion does not meet the requirement. In this embodiment, when the face proportion is greater than 0.4, it is determined that the proportion of the face in the entire image is too large.
Face ratio-face frame size/image size
And calculating the distance between the pupils of the two eyes according to the key points of the human face, namely the number of pixels between the centers of the two eyes. And when the interocular distance is smaller than the threshold value, the interocular distance does not meet the requirement. For example, when the distance between the left and right eyes is less than 40, the distance between the eyes is too small.
Aligning the human face: and aiming at each face, calculating a transformation matrix between the extracted face key point coordinates and the standard face key point coordinates, and applying the transformation matrix to the initial face image to obtain an aligned face image. The distribution of the aligned human face key point coordinates tends to be more consistent, and the human face is corrected.
S20: and (4) evaluating and screening the face quality, namely evaluating the face quality of the quality attributes of the plurality of face pictures and screening high-quality pictures.
The human face quality evaluation algorithm adopts a mode of combining deep learning with a traditional image analysis algorithm, and realizes the quality attributes of human face brightness, left and right face brightness difference, a human face angle (yaw) around the y-axis direction, a human face angle (pitch) around the x-axis direction, a human face angle (roll) around the z-axis direction, expression classification, glasses classification, mask classification, eye state classification, mouth state classification, makeup state classification, human face truth (the classification is stone statue, CG human face and real human face), human face ambiguity, human face shielding degree and the like according to the facial features of a human face image.
The traditional algorithm is adopted for the difference between the human face brightness and the left and right face brightness, specifically, RGB three channels of a human face image are converted into gray level images according to a certain proportion, each human face area is divided according to 106 key points of the human face, the human face brightness is calculated according to the gray level average value of the human face area, and the left and right face brightness difference is calculated according to the gray level average value of the left and right faces.
The other attributes are realized by adopting a deep learning method, a light-weight MobileFaceNet structure is adopted as a main body to construct a multi-task convolutional neural network, and a plurality of task outputs respectively correspond to each quality attribute of the human face. Wherein, the quality judgment of eye state, mouth state, makeup state, face shielding, mask classification and the like belongs to a classification task, and a softmax loss function is adopted as a target function; the human face posture angle, the image ambiguity and the like belong to a regression task, and an Euclidean loss function is adopted as a target function. The total objective function of the network training is the combination of a plurality of Softmax loss functions and Euclidean loss functions, and when a plurality of tasks are jointly learned, the total objective function is the linear combination of the plurality of loss functions.
Calculate Softmax loss L: l ═ log (p)i),
Wherein p is
iNormalized probability calculated for each attribute class, i.e.
x
iRepresenting the ith neuron output, and N representing the total number of categories;
calculate the Euclidean loss L:
wherein y is
iIn order to be the true tag value,
is the predicted value of the regressor.
After the face quality evaluation, face quality screening is also needed, and the reference factors for screening include the following:
face ratio: and calculating the size of the face frame according to the face frame coordinates, wherein when the face ratio is greater than a threshold value, the face ratio does not meet the requirement. For example: when the face ratio is larger than 0.4, the proportion of the face in the whole image is too large.
Face brightness: the face brightness should be within a reasonable range. For example: the face brightness value is between 0 and 1, and the reasonable face brightness is more than 0.3 and less than 0.8.
Difference in left and right face brightness: the left and right face brightness difference should be less than a threshold. For example: when the left and right face brightness difference is between 0 and 1, the reasonable left and right face brightness difference should be less than 0.4.
Face pose: the face angle (yaw) around the y-axis, the face angle (pitch) around the x-axis, and the face angle (roll) around the z-axis should be within reasonable ranges. For example, within ± 10 °.
Ambiguity: the ambiguity should be less than a threshold. For example: when the ambiguity value is between 0 and 1, the face ambiguity should be less than 0.6.
Shielding: and if the face image is judged to have occlusion of five sense organs and outlines, including wearing sunglasses or a mask, filtering.
Expression: if the face image is judged to be an exaggerated expression, closed eyes and large mouth, filtering is carried out.
The degree of truth: the degree of reality should be greater than the threshold, if the degree of reality is slightly less, it indicates that the face may be a statue face/cartoon face, etc. For example: when the truth value is between 0 and 1, the human face truth value is more than 0.6.
And filtering out pictures which do not meet the quality requirement according to the requirement.
S30: and in the living body detection, a double-flow convolutional neural network is utilized to detect whether the picture is a real person or not, and the picture which is judged to be a non-real person is filtered.
The following two methods can be used in performing the in vivo detection:
the first in vivo detection process is as follows:
acquiring a depth image, and carrying out normalization processing on a face region in a picture to obtain a processed face depth image;
and inputting the RGB face image of the picture and the face depth image into a deep learning network for detection to obtain a living body judgment result of the picture.
Specifically, Resnet is used as a basic network for a deep learning network for living body judgment of pictures, the deep learning network adopts a double-input channel of a face image and a face depth image, after feature extraction is respectively carried out on two input branches, feature extraction fusion is carried out on the features extracted from the two branches through se-module, and feature extraction is carried out on the fused features through several layers of convolution to obtain a living body judgment result.
Specifically, the objective function of the deep learning network is the focal loss function.
Specifically, the actual depths of all points of the eyes and the mouth corners in the key points of the human face are calculated, the mean value of the actual depths of the points is calculated, the normalization upper limit is taken as the mean value plus a fixed value, the lower limit is the mean value minus the fixed value, and the depth of the human face area is normalized into a gray level image with the pixel value in the range of 0-255.
The gray value for the position where the actual depth is greater than the upper limit and less than the lower limit is set to 0.
Wherein, the normalization formula is:
v is a gray value after depth normalization, the range is 0-255, Dreal is the actual depth of the face area, Dmax is the upper limit of the actual depth of the face, and Dmin is the lower limit of the actual depth of the face.
The second in vivo detection procedure is:
intercepting a human face from an original image, converting an RGB channel into HSV and YCbCr spaces, and overlapping the converted HSV and YCbCr images to obtain an overlay image; extracting Sobel features from the face region through a Sobel operator to obtain a Sobel feature map;
and respectively inputting the Sobel characteristic diagram and the superposition diagram of the image from two input channels of the double-flow neural network to obtain a living body judgment result of the image.
Specifically, for each input image a, Gx, and Gy, respectively convolved with the image a to obtain AGx and AGy, and then an image AG is output, where the value of each pixel is:
where Gx denotes the convolution kernel in the x-direction and Gy denotes the convolution kernel in the y-direction.
S40: and (3) face comparison authentication, namely extracting a face characteristic vector of the picture, comparing the similarity between the face characteristic vector of the picture and the face characteristic vector of the standard picture, and outputting a comparison result.
Extracting a high-quality picture, outputting a 512-dimensional floating point vector by using a 50-layer ResNet neural network, and recording the 512-dimensional floating point vector as a face feature vector;
by comparing the similarity between the face feature vector of the current picture and the face feature vector of the standard picture, the formula is as follows:
wherein S isiFor the face feature vector of the current picture, SjThe face feature vector is a standard picture;
if the similarity degree is lower than the threshold value, the testimony of the people is judged to be not uniform; and if the similarity degree is higher than the threshold value, judging that the testimony is uniform.
The technical scheme of the invention is a face auditing method combining face detection, face quality analysis, face living body detection and face recognition technologies, and is used for checking whether the quality of personal photos uploaded by a user is in compliance, whether the photos are real or not and whether the photos are testified into one. Whether the quality of the personal photo conforms or not mainly considers whether the face in the picture is high in quality or not and is convenient to identify; the living body detection mainly considers whether the photo is a copy or a forgery; whether the certificate photo and the personal photo are the same person or not is mainly considered. The invention realizes the full automation of the picture audit, does not need manual operation, reduces the labor cost, has stable algorithm and reduces the manual error. In addition, each invention uses a cascading picture filtering mode, and the speed is high.
On the other hand, the invention also provides an automatic face auditing system, which comprises:
the face detection module detects a face rectangular frame and face key points through a cascading neural network algorithm;
the human face quality evaluation module is used for carrying out human face quality evaluation according to the quality attributes of the human face pictures and screening high-quality pictures;
the living body detection module detects whether the picture is a real person or not by using a double-flow convolutional neural network, and filters the picture without the real person;
and the face comparison module is used for extracting the face characteristic vector of the picture, comparing the similarity between the face characteristic vector of the picture and the face characteristic vector of the pre-stored certificate photo, and filtering the picture with low similarity.
In another aspect, the present invention further provides an automatic face audit device, which includes a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor implements the steps of the automatic face audit method when executing the computer program.
In another aspect, the present invention further provides a readable storage medium for automatic face auditing, wherein the readable storage medium stores a computer program, and the computer program, when executed by a processor, implements the steps of the automatic face auditing method.
The invention provides an automatic face auditing method and system based on a deep neural network aiming at websites and applications requiring uploaded images to meet certain standard requirements. The method can be effectively used in the information verification category, and realizes rapid face filtering and testimony comparison.
The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the described embodiments. It will be apparent to those skilled in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, and the scope of protection is still within the scope of the invention.