Disclosure of Invention
The technical task of the invention is to provide a face snapshot method, a face snapshot system and a storage medium, so as to solve the problem of how to snapshot the face with the best quality and the most correct angle in the face detection and tracking process.
The technical task of the invention is realized in the following way, and the face snapshot method comprises the following steps:
acquiring image data of a current frame of a video;
detecting a face in a video through a face detection-face key point detection model to obtain a face image, and detecting the face key point of the detected face image;
matching continuous inter-frame face coordinates through a tracking matching algorithm to obtain a plurality of target sequences;
calculating the quality score of the face image according to the face coordinates and the key point information;
and screening out the optimal face image of which each target sequence meets the requirement according to the face image quality score, and outputting related information corresponding to the optimal face image.
Preferably, the face detection-face key point detection model adopts MTCNN, Retinaface or Openface;
the tracking matching algorithm adopts sort, deepsort, KCF or JDE.
Preferably, the face image quality score calculation formula is as follows:
the face image quality score is the face image definition value, the face key point distance and the face rotation process value;
performing DCT cosine transformation on the face image according to the face image definition value, and then performing normalization processing;
the key point distance is the shortest distance from the key point of the nose tip to a quadrangle formed by pupils of the two eyes and the two mouth corners;
the human face rotation degree value is the absolute value of the cosine value of the included angle between the mean value of the direction vectors of the two eye pupil connecting lines and the direction vectors of the two mouth angle connecting lines and the horizontal axis.
Preferably, the related information corresponding to the face image includes duration information, belonging video frame information, coordinate information and key point coordinate information.
Preferably, the optimal face image which meets the requirement of each target sequence is screened out according to the face image quality score as follows:
judging whether the existence time of the target face exceeds a threshold value according to the face image quality score:
if yes, judging whether the size of the face area image is larger than a set threshold value:
if yes, outputting the face region image;
if not, no output is obtained.
A face snapshot system, the system comprising,
the face detection-key point detection module is used for detecting the face and key points by using the face detection-key point detection network model to acquire the coordinate information of the face and the key points of the face;
the tracking matching module is used for correlating and matching the face sequences of the unified targets by utilizing a tracking matching algorithm so as to obtain related information of each target in the tracking process;
and the face quality evaluation module is used for calculating face image quality scores according to the face coordinates and the key point information, screening out the optimal face image with each target sequence meeting the requirements according to the face image quality scores, and outputting the related information corresponding to the optimal face image.
Preferably, the working process of the face detection-key point detection module is as follows:
(1) initializing relevant parameters;
(2) acquiring a video current frame image, inputting a pre-trained face detection-key point detection model, and outputting the face image of the video current frame and information of key point coordinates thereof through network forward reasoning;
the working process of the tracking matching module is as follows:
(1) inputting a face region image and coordinate information;
(2) matching the faces of the targets in the previous frame and the next frame by using a tracking matching algorithm, and associating the faces belonging to the same target;
(3) and outputting the matched face information.
Preferably, the working process of the face quality evaluation module is as follows:
(1) inputting face coordinates and face information of key point coordinates of the face coordinates;
(2) calculating the score of the face quality, wherein the formula is as follows:
the face image quality score is the face image definition value, the face key point distance and the face rotation process value;
performing DCT cosine transformation on the face image according to the face image definition value, and then performing normalization processing;
the key point distance is the shortest distance from the key point of the nose tip to a quadrangle formed by pupils of the two eyes and the two mouth corners;
the human face rotation degree value is the absolute value of the cosine value of the included angle between the mean value of the direction vectors of the two eye pupil connecting lines and the direction vectors of the two mouth angle connecting lines and the horizontal axis;
(3) updating the optimal score of the face and the related information corresponding to the face image; the relevant information corresponding to the face image comprises duration information, belonging video frame information, coordinate information and key point coordinate information;
(4) inputting the quality score of the face image, and judging whether the existence time of the target face exceeds a threshold value:
if yes, executing the step (5);
(5) judging whether the size of the face region image is larger than a set threshold value:
if the image is larger than the preset image, outputting the face region image;
and ② if not, no output is obtained.
Preferably, the face detection-face key point detection model adopts MTCNN, Retinaface or Openface;
the tracking matching algorithm adopts sort, deepsort, KCF or JDE.
A computer-readable storage medium having stored thereon computer-executable instructions, which, when executed by a processor, implement a face capture method as described above.
The face snapshot method, the face snapshot system and the storage medium have the following advantages:
the invention aims to reduce the excessive resource consumption caused by the AI model to the face quality score and the interpretability of face snapshot, so that the resource consumption is less, the system efficiency is higher, and the interpretability of the face quality evaluation is stronger;
the invention reduces the excessive number of the hyperparameters set when the traditional image processing method evaluates the face quality, reduces the use of priori knowledge, reduces the resource consumption on the premise of ensuring the face snapshot precision in a word, and has stronger applicability and generalization;
the invention adopts the mode of face definition value, face key point distance and face rotation degree value to evaluate the face quality, is compatible with factors such as image definition, face size, face posture and the like, not only effectively reduces resource consumption, but also has less over-parameter quantity, more excellent face quality score and higher face snapshot efficiency.
Detailed Description
The face capture method, system, and storage medium of the present invention are described in detail below with reference to the drawings and the detailed description of the invention.
Example 1:
as shown in fig. 1, the face snapshot method of the present invention specifically includes:
s1, acquiring the image data of the current frame of the video;
s2, detecting the face in the video through a face detection-face key point detection model to obtain a face image, and detecting the face key point of the detected face image;
s3, matching the face coordinates between continuous frames through a tracking matching algorithm to obtain a plurality of target sequences;
s4, calculating the quality score of the face image according to the face coordinates and the key point information;
and S5, screening out the optimal face image with each target sequence meeting the requirement according to the face image quality score, and outputting the relevant information corresponding to the optimal face image.
In this embodiment, the face detection-face key point detection model in step S2 adopts MTCNN, Retinaface, or OpenFace;
the tracking matching algorithm of step S3 in this embodiment uses sort, depsort, KCF, or JDE.
In this embodiment, the formula for calculating the face image quality score in step S4 is:
the face image quality score is the face image definition value, the face key point distance and the face rotation process value;
performing DCT cosine transformation on the face image according to the face image definition value, and then performing normalization processing;
the key point distance is the shortest distance from the key point of the nose tip to a quadrangle formed by pupils of the two eyes and the two mouth corners;
the human face rotation degree value is the absolute value of the cosine value of the included angle between the mean value of the direction vectors of the two eye pupil connecting lines and the direction vectors of the two mouth angle connecting lines and the horizontal axis.
The related information corresponding to the face image in step S5 in this embodiment includes duration information, belonging video frame information, coordinate information, and key point coordinate information.
In this embodiment, the step S5 of screening out the optimal face image meeting the requirement of each target sequence according to the face image quality score specifically includes the following steps:
judging whether the existence time of the target face exceeds a threshold value according to the face image quality score:
if yes, judging whether the size of the face area image is larger than a set threshold value:
if yes, outputting the face region image;
if not, no output is obtained.
Example 2:
as shown in fig. 2, the face snapshot system of the present invention comprises,
the face detection-key point detection module is used for detecting the face and key points by using the face detection-key point detection network model to acquire the coordinate information of the face and the key points of the face;
the tracking matching module is used for correlating and matching the face sequences of the unified targets by utilizing a tracking matching algorithm so as to obtain related information of each target in the tracking process;
and the face quality evaluation module is used for calculating face image quality scores according to the face coordinates and the key point information, screening out the optimal face image with each target sequence meeting the requirements according to the face image quality scores, and outputting the related information corresponding to the optimal face image.
The working process of the face detection-key point detection module in the embodiment is specifically as follows:
(1) initializing relevant parameters;
(2) acquiring a video current frame image, inputting a pre-trained face detection-key point detection model, and outputting the face image of the video current frame and information of key point coordinates thereof through network forward reasoning;
the working process of the tracking matching module in this embodiment is specifically as follows:
(1) inputting a face region image and coordinate information;
(2) matching the faces of the targets in the previous frame and the next frame by using a tracking matching algorithm, and associating the faces belonging to the same target;
(3) and outputting the matched face information.
The working process of the face quality evaluation module in the embodiment is as follows:
(1) inputting face coordinates and face information of key point coordinates of the face coordinates;
(2) calculating the score of the face quality, wherein the formula is as follows:
the face image quality score is the face image definition value, the face key point distance and the face rotation process value;
performing DCT cosine transformation on the face image according to the face image definition value, and then performing normalization processing;
the key point distance is the shortest distance from the key point of the nose tip to a quadrangle formed by pupils of the two eyes and the two mouth corners;
the human face rotation degree value is the absolute value of the cosine value of the included angle between the mean value of the direction vectors of the two eye pupil connecting lines and the direction vectors of the two mouth angle connecting lines and the horizontal axis;
(3) updating the optimal score of the face and the related information corresponding to the face image; the relevant information corresponding to the face image comprises duration information, belonging video frame information, coordinate information and key point coordinate information;
(4) inputting the quality score of the face image, and judging whether the existence time of the target face exceeds a threshold value:
if yes, executing the step (5);
(5) judging whether the size of the face region image is larger than a set threshold value:
if the image is larger than the preset image, outputting the face region image;
and ② if not, no output is obtained.
Example 3:
the face snapshot method comprises the following specific processes:
acquiring image data of a current frame of a video;
selecting a pre-trained face detection-face key point detection model (such as MTCNN, Retinaface, Openface and the like), inputting the result of the step (I) into the face detection-face key point detection model, performing network forward reasoning, and outputting the face image of the current video frame and the information of key point coordinates thereof, as shown in the attached figure 3;
selecting a tracking matching algorithm (such as sort, depsort, KCF, JDE and the like), inputting the result of the step (two), performing correlation matching with the results of the front and rear frames of the video, and outputting information such as the face of the matched target;
cutting the face image matched in the image in the step (I) according to the result (face coordinate information and the like) in the step (III), and selecting an image definition calculation formula (such as DCT (discrete cosine transform) and the like) to calculate the face image definition, wherein the value range is [0,1 ];
fifthly, calculating the distance of key points of the human face according to the result of the step (three); here, 5 points (left eye, right eye, nose tip, left mouth corner, right mouth corner) of the human face are taken as an example, but not limited thereto. Calculating the shortest distance from a key point of a nose tip to a quadrangle formed by eyes and a mouth corner, wherein the shortest distance is used as the distance of the key point, the distance of the point in the quadrangle is positive, and the distance of the point in the quadrangle is negative;
calculating the rotation degree of the face according to the result of the step (III), as shown in fig. 4, wherein 5 points of the face are taken as an example, but not limited to the example; calculating the absolute value of the cosine value of the average value of the eye direction vector and the mouth angle direction vector;
(VII) according to the results of the step (IV), the step (V) and the step (VI), calculating a face quality score according to a formula of face image definition, key point distance and face rotation degree, and updating the face optimal score and corresponding related information (coordinates, duration, belonging video frames and the like) thereof, as shown in figure 5;
taking the output of the step (seven) as input, judging whether the existence time of the target face exceeds a threshold value, if so, judging whether the size of the face area image is larger than a set threshold value; if yes, outputting; otherwise, the information is not output (the output mode is not limited to this, and only the best 1 piece of face information can be output at the end of the tracking process, or the best face information at each time end can be output according to the time period, etc.).
Example 4:
the embodiment of the invention also provides a computer-readable storage medium, wherein a plurality of instructions are stored, and the instructions are loaded by the processor, so that the processor executes the face snapshot method, the face snapshot system and the storage medium in any embodiment of the invention. Specifically, a system or an apparatus equipped with a storage medium on which software program codes that realize the functions of any of the above-described embodiments are stored may be provided, and a computer (or a CPU or MPU) of the system or the apparatus is caused to read out and execute the program codes stored in the storage medium.
In this case, the program code itself read from the storage medium can realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code constitute a part of the present invention.
Examples of the storage medium for supplying the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD + RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer via a communications network.
Further, it should be clear that the functions of any one of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform a part or all of the actual operations based on instructions of the program code.
Further, it is to be understood that the program code read out from the storage medium is written to a memory provided in an expansion board inserted into the computer or to a memory provided in an expansion unit connected to the computer, and then causes a CPU or the like mounted on the expansion board or the expansion unit to perform part or all of the actual operations based on instructions of the program code, thereby realizing the functions of any of the above-described embodiments.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.