Movatterモバイル変換


[0]ホーム

URL:


CN111079686A - Single-stage face detection and key point positioning method and system - Google Patents

Single-stage face detection and key point positioning method and system
Download PDF

Info

Publication number
CN111079686A
CN111079686ACN201911358998.1ACN201911358998ACN111079686ACN 111079686 ACN111079686 ACN 111079686ACN 201911358998 ACN201911358998 ACN 201911358998ACN 111079686 ACN111079686 ACN 111079686A
Authority
CN
China
Prior art keywords
face
key point
frame
face detection
loss function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911358998.1A
Other languages
Chinese (zh)
Other versions
CN111079686B (en
Inventor
黄明飞
姚宏贵
王普
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Open Intelligent Machine Shanghai Co ltd
Original Assignee
Open Intelligent Machine Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Open Intelligent Machine Shanghai Co ltdfiledCriticalOpen Intelligent Machine Shanghai Co ltd
Priority to CN201911358998.1ApriorityCriticalpatent/CN111079686B/en
Publication of CN111079686ApublicationCriticalpatent/CN111079686A/en
Application grantedgrantedCritical
Publication of CN111079686BpublicationCriticalpatent/CN111079686B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The invention provides a method and a system for single-stage face detection and key point positioning, which relate to the technical field of face detection and key point positioning and comprise the steps of labeling a face image to obtain a labeled image; training according to the labeling image to obtain a face detection and key point positioning fusion model; inputting a current frame to-be-detected face picture into a face detection and key point positioning fusion model to obtain a current frame face detection frame and current frame face key point positions; performing key point anti-shaking processing on the next frame of face picture to be detected according to the key point position of the face of the current frame to obtain a next frame of face detection frame and the key point position of the face of the next frame; and when the total times of carrying out key point anti-shake processing is not more than the time threshold value, carrying out face detection and key point positioning by adopting a key point anti-shake and otherwise adopting a face detection and key point positioning fusion model. The accuracy of face detection and key point positioning is effectively improved; improving the jitter of the key points; the edge computing device is suitable for single model deployment.

Description

Single-stage face detection and key point positioning method and system
Technical Field
The invention relates to the technical field of face detection and key point positioning, in particular to a single-stage face detection and key point positioning method and system.
Background
Face detection is a technique for automatically searching the position and size of a face in any input image, and key point positioning is a process for correctly positioning the position of a key point in a given face frame. In the face-related field, face detection and key point positioning are pre-algorithms of many algorithms, such as face recognition, face beautification, face changing, and the like, so that face detection and key point positioning are important in the face field.
At present, most face and key point detection methods are implemented step by step, namely, the face detection is firstly carried out, then the key point detection is carried out, a face detection algorithm is only responsible for the face detection, a face key point algorithm is only responsible for key point positioning, the two algorithms are independent and unrelated, the method ignores the internal relation between the two tasks, and the whole detection efficiency is not high. In the prior art, the following problems exist in face detection and key point positioning: the first is the precision problem, and because the two algorithms are trained independently, the mutual complementation promotion effect is avoided, and the accuracy is general. Secondly, the problem of key point jitter exists, and the current key point positioning algorithm has point jitter. Third, the deployment problem, some current edge computing devices only support single model reasoning, such as: some Haisi model development boards, if load two or more than two models, the inference speed can descend by a wide margin, can not satisfy the speed requirement, also will fall to the ground and become very difficult.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a single-stage face detection and key point positioning method, which specifically comprises the following steps:
step S1, acquiring a plurality of face images, and labeling each face image to obtain a labeled image with a real face frame and a real key point position;
step S2, training according to the annotation image to obtain a face detection and key point positioning fusion model;
step S3, inputting the current frame of the human face picture to be detected in the video image into the human face detection and key point positioning fusion model, and obtaining and outputting the current frame of human face detection frame and the current frame of human face key point position corresponding to the current frame of the human face picture to be detected;
step S4, performing key point anti-shaking processing on the next frame of the face picture to be detected according to the key point position of the current frame of the face, and recording the total times of performing the key point anti-shaking processing;
step S5, comparing the recorded total times with a preset time threshold:
if the total number of times is not greater than the number threshold, then go to step S6;
if the total number of times is greater than the number threshold, clearing the total number of times, and then returning to the step S3;
step S6, directly obtaining the next frame face detection frame and the next frame face key point position corresponding to the next frame of the face picture to be detected according to the key point anti-shake processing result, outputting the next frame face detection frame and the next frame face key point position as the current frame face detection frame and the current frame face key point position, and then returning to the step S4;
the above process is continuously executed until all the frames of the video images are processed.
Preferably, the face detection and key point positioning fusion model adopts a retinet network structure, and a feature graph output by the last three layers of convolution layers in the retinet network structure adopts a feature pyramid network structure.
Preferably, in the training process of the face detection and key point positioning fusion model, an anchor point frame with a preset proportion is used for performing regression prediction of the face detection frame and prediction of the position of the key point of the face.
Preferably, in the training process of the face detection and key point positioning fusion model, in the feature map generated by convolution operation, the size of the receptive field of each pixel point in the corresponding face image is twice as large as the size of the anchor point frame.
Preferably, the preset ratio is 1: 1.
Preferably, the step S2 specifically includes:
step S21, inputting the annotation image into a pre-generated initial fusion model to obtain a corresponding face detection prediction result and a key point prediction result;
the face detection prediction result comprises a face classification prediction result, a face frame regression prediction result and a face frame proportion prediction result;
step S22, respectively calculating a first loss function between the face classification prediction result and a real face classification result contained in the real face frame, a second loss function between the face frame regression prediction result and a real face region contained in the real face frame, a third loss function between the face frame proportion prediction result and the preset proportion, and a fourth loss function between the key point prediction result and the real key point position;
step S23, performing weighted summation on the first loss function, the second loss function, the third loss function, and the fourth loss function to obtain a total loss function, and comparing the total loss function with a preset loss function threshold:
if the total loss function is not less than the loss function threshold, then go to step S24;
if the total loss function is less than the loss function threshold, then go to step S25;
step S24, adjusting the training parameters in the initial fusion model according to a preset learning rate, and then returning to the step S21 to continue a new training process;
and step S25, taking the initial fusion model as a face detection and key point positioning fusion model and outputting the model.
Preferably, in step S4, the key point anti-shake processing specifically includes:
step A1, according to the positions of the face key points, expanding the positions of the corresponding face key points in the next frame of face picture to be detected by preset times to obtain a face area picture;
step A2, the face region picture is verified according to a pre-generated face verification model, and whether the face region picture is a face is judged according to a verification result:
if yes, go to step A3;
if not, exiting;
and step A3, tracking the face by adopting a tracking algorithm to obtain the face detection frame and the face key point position corresponding to the next frame of the face picture to be detected.
A single-stage face detection and key point positioning system is applied to any one of the single-stage face detection and key point positioning methods, and specifically comprises:
the data annotation module is used for acquiring a plurality of face images and annotating each face image to obtain an annotated image with a real face frame and a real key point position;
the data training module is connected with the data annotation module and used for obtaining a face detection and key point positioning fusion model according to the annotation image training;
the model prediction module is connected with the data training module and used for inputting the current frame of the human face picture to be detected into the human face detection and key point positioning fusion model to obtain and output a human face detection frame and a human face key point position corresponding to the current frame of the human face picture to be detected;
the anti-shake processing module is connected with the model prediction module and is used for carrying out key point anti-shake processing on the next frame of face picture to be detected according to the face detection frame and the position of the key point of the face and recording the total times of carrying out the key point anti-shake processing;
the data comparison module is connected with the anti-shake processing module and used for comparing the total times obtained by recording with a preset time threshold, generating a first comparison result when the total times is not more than the time threshold, and generating a second comparison result when the total times is more than the time threshold;
the first processing module is connected with the data comparison module, and is used for directly obtaining a next frame face detection frame and a next frame face key point position corresponding to a next frame of face picture to be detected according to the first comparison result and the key point anti-shake processing result, and outputting the next frame face detection frame and the next frame face key point position as the current frame face detection frame and the current frame face key point position;
and the second processing module is connected with the data comparison module and used for clearing the total times according to the second comparison result.
Preferably, the data training module specifically includes:
the data prediction unit is used for inputting the marked image into a pre-generated initial fusion model to obtain a corresponding human face detection prediction result and a key point prediction result;
the face detection prediction result comprises a face classification prediction result, a face frame regression prediction result and a face frame proportion prediction result;
the first processing unit is connected with the data prediction unit and is used for respectively calculating a first loss function between the face classification prediction result and a real face classification result contained in the real face frame, a second loss function between the face frame regression prediction result and a real face area contained in the real face frame, a third loss function between the face frame proportion prediction result and the preset proportion, and a fourth loss function between the key point prediction result and the real key point position;
the second processing unit is connected with the first processing unit and used for carrying out weighted summation on the first loss function, the second loss function, the third loss function and the fourth loss function to obtain a total loss function;
the data comparison unit is connected with the second processing unit and used for comparing the total loss function with a preset loss function threshold value, generating a first comparison result when the total loss function is not smaller than the loss function threshold value, and generating a second comparison result when the total loss function is smaller than the loss function threshold value;
the third processing unit is connected with the data comparison unit and used for adjusting the training parameters in the initial fusion model according to the first comparison result and a preset learning rate so as to continue a new training process;
and the fourth processing unit is connected with the data comparison unit and used for taking the initial fusion model as a face detection and key point positioning fusion model according to the second comparison result and outputting the face detection and key point positioning fusion model.
Preferably, the anti-shake processing module specifically includes:
the image processing unit is used for expanding the corresponding position of the face key point in the next frame of face picture to be detected by preset times according to the position of the face key point to obtain a face area picture;
the face checking unit is connected with the image processing unit and used for checking the face region picture according to a pre-generated face checking model and outputting a corresponding face checking result when the checking result shows that the face region picture is a face;
and the face tracking unit is connected with the face checking unit and used for tracking the face by adopting a tracking algorithm to obtain the positions of the face detection frame and the face key point corresponding to the next frame of the face picture to be detected.
The technical scheme has the following advantages or beneficial effects:
1) the face detection and the key point positioning are fused, an end-to-end training mode is adopted, the face detection and the key point positioning are mutually promoted, and the accuracy rate of the face detection and the key point positioning is effectively improved;
2) the problem of key point jitter is effectively improved by combining the face verification model and the tracking model;
3) the face detection and the key point positioning are fused into a model, so that the inference speed is effectively improved, and the method is suitable for edge computing equipment only supporting single model deployment.
Drawings
FIG. 1 is a schematic flow chart of a single-stage face detection and keypoint location method according to a preferred embodiment of the present invention;
FIG. 2 is a schematic flow chart illustrating a training process of a face detection and keypoint localization fusion model according to a preferred embodiment of the present invention;
FIG. 3 is a flow chart illustrating a key point anti-shaking process according to a preferred embodiment of the present invention;
fig. 4 is a schematic structural diagram of a single-stage face detection and keypoint localization system according to a preferred embodiment of the present invention.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. The present invention is not limited to the embodiment, and other embodiments may be included in the scope of the present invention as long as the gist of the present invention is satisfied.
In a preferred embodiment of the present invention, based on the above problems in the prior art, a single-stage face detection and key point location method is provided, as shown in fig. 1, and specifically includes:
step S1, acquiring a plurality of face images, and labeling each face image to obtain a labeled image with a real face frame and a real key point position;
step S2, training according to the annotation image to obtain a face detection and key point positioning fusion model;
step S3, inputting the current frame of the human face picture to be detected in the video image into the human face detection and key point positioning fusion model, and obtaining and outputting the current frame of human face detection frame and the current frame of human face key point position corresponding to the current frame of the human face picture to be detected;
step S4, performing key point anti-shake processing on the next frame of face picture to be detected according to the key point position of the current frame of face, and recording the total times of performing key point anti-shake processing;
step S5, comparing the recorded total times with a preset time threshold:
if the total number of times is not greater than the number threshold, go to step S6;
if the total number of times is larger than the number threshold, clearing the total number of times, and then returning to the step S3;
step S6, directly obtaining the next frame face detection frame and the next frame face key point position corresponding to the next frame face picture to be detected according to the key point anti-shake processing result, outputting the next frame face detection frame and the next frame face key point position as the current frame face detection frame and the current frame face key point position, and then returning to the step S4;
the above process is continuously executed until all the frame video images are processed.
The key point positioning method fuses face detection and key point positioning, adopts an end-to-end training mode, mutually promotes the face detection and the key point positioning, and effectively improves the accuracy of the face detection and the key point positioning; meanwhile, the problem of key point jitter is effectively improved by combining the face verification model and the tracking model; furthermore, the face detection and the key point positioning are fused into a model, the inference speed is effectively improved, the method is suitable for the edge computing equipment only supporting single-model deployment, and the problem that the whole inference speed becomes very slow when the edge computing equipment loads more than one model is solved.
Further specifically, the technical scheme of the invention comprises a training process of a face detection and key point positioning fusion model:
firstly, training data are prepared, namely, a plurality of acquired face images are labeled to obtain labeled images with real face frames and real key point positions. In this embodiment, it is preferable that the annotation image is stored in a format of a text document. Specifically, a text document named "in.txt" is newly created as a training set, where each line in the "in.txt" represents a piece of annotation image data. Each piece of marked image data preferably comprises 6 points and a picture path of a face frame (box) and a face key point (landmark), and the specific storage format is as follows:
Path/xxx.jpg x1,y1,x2,y2,ptx1,pty1,ptx2,pty2……ptx6,pty6
where Path represents a storage Path, xxx.jpg represents the name of an annotation image, x1, y1, x2, and y2 represent face frame data, and ptx1, pty1, ptx2, pty2 … … ptx6, and pty6 represent 6 face key point data. x1, y1 to ptx6 and pty6 represent data of one face in the annotation image, and if a picture contains a plurality of faces, the process is repeated later. The training data of face detection and key point positioning are fused together to form a labeling file, so that data processing and reading are facilitated.
Secondly, presetting a network framework of a face detection and key point positioning fusion model, and preferably adopting a retinet network structure based on a single-stage network framework, wherein the retinet network structure is a single-stage detection network structure with a characteristic pyramid (fpn) structure. Due to the particularity of the human face, the proportion of the anchor point frame (anchor) in the training process is preferably set to be 1:1, so that the phenomena that the human face frame detected by the model is long, wide and not in accordance with the proportion of the human face are effectively avoided. In order to further improve recall rate and precision rate of face detection and key point positioning, in the process of training the face detection and key point positioning fusion model, in the feature map generated by convolution operation, the size of the receptive field of each pixel point in the feature map in the corresponding face image is preferably set to be twice of the size of the anchor point frame, so that the problem of poor detection precision caused by the adoption of a default value of the feature map is avoided. In order to enable the human face detection and key point positioning fusion model to have a good small face detection effect, the method preferably selects the feature maps of the three layers behind the backbone network (backbone) to perform upsampling by adopting the feature pyramid (fpn), so that feature fusion is realized, and the recall rate of the small face is effectively improved.
More preferably, in the training process of the face detection and key point positioning fusion model, after each training is finished, the method further comprises the steps of balancing positive and negative samples according to the prediction result of the current training, and sending the obtained positive sample and negative sample to the next training. In this embodiment, for positive and negative samples of face detection, it is preferable to use the positive sample when the intersection ratio (iou) of the anchor point frame (anchor) and the real face frame (gt) obtained by the calculation of the current training prediction is greater than 0.5, and use the negative sample when the intersection ratio (iou) of the anchor point frame (anchor) and the real face frame (gt) is less than 0.3. For positive and negative samples of the key point positioning, preferably, when the intersection-parallel ratio (iou) of the anchor point frame (anchor) and the real face frame (gt) obtained by the training and prediction is greater than 0.7, a loss function between the predicted key point positioning and the real key point position is calculated, and the problems that the network is difficult to converge and the key point positioning is inaccurate due to the fact that the intersection-parallel ratio (iou) of the anchor point frame (anchor) and the real face frame (gt) is too small are solved.
Further specifically, in the training process of the face detection and key point positioning fusion model, four loss functions are preferably set as return values in the training process, so that the network effectiveness is ensured. The four loss functions preferably include a first loss function between the face classification prediction result and the real face classification result contained in the real face frame, a second loss function between the face frame regression prediction result and the real face region contained in the real face frame, a third loss function between the face frame proportion prediction result and the preset proportion, and a fourth loss function between the key point prediction result and the real key point position. The first loss function preferably adopts a softmax function, the second loss function preferably adopts a smooth function, the third loss function preferably adopts a MSE function, and the fourth loss function is set to ensure that the proportion of the face frames is 1: 1.
Preferably, the first loss function, the second loss function, the third loss function and the fourth loss function are weighted and summed to obtain a total loss function. The weight of the first loss function is preferably 1, the weight of the second loss function is preferably 1, the weight of the third loss function is preferably 0.5, and the weight of the fourth loss function is preferably 0.1. The weight setting of the fourth loss function is small, so that the integral effect of the network on face detection and key point positioning is not influenced while the face frame proportion is ensured. In the training process, the preset learning rate is preferably 0.0001, and after the training is finished, the fusion model of face detection and key point positioning is waited.
Furthermore, the human face detection and key point positioning fusion model is adopted to carry out human face detection and key point positioning on the picture to be detected of the video image, and in order to eliminate the problem of jitter of key point positioning caused by data labeling subjectivity and human face detection frame jitter, the position of the human face detection frame is fixed from the angle of human face tracking, so that the human face detection frame is not moved when the human face is not moved, the input frame is not moved when the key point is positioned, and the jitter amplitude of the key point is reduced.
Specifically, when performing face detection and key point positioning on a continuous multi-frame picture to be detected of a video image, preferably, a current frame picture to be detected is used as a detection starting node, the position of a current frame face detection frame and a current frame key point corresponding to the current frame picture to be detected is obtained by adopting the face detection and key point positioning fusion model for predicting the current frame picture to be detected, and the position of a corresponding face detection frame and key point is obtained by adopting key point anti-shake processing on a subsequent continuous multi-frame picture to be detected. Preferably, one-time face detection and key point positioning fusion model prediction is adopted, key point anti-shake processing is adopted for the subsequent ten frames of pictures to be detected, then face detection and key point positioning fusion model prediction is adopted, and the like.
In a preferred embodiment of the present invention, the face detection and key point positioning fusion model adopts a retinet network structure, and the feature graph output by the last three layers of convolution layers in the retinet network structure adopts a feature pyramid network structure.
In the preferred embodiment of the present invention, in the training process of the face detection and key point positioning fusion model, the anchor point frame with the preset proportion is used for performing regression prediction of the face detection frame and prediction of the position of the key point of the face.
In a preferred embodiment of the present invention, in the training process of the face detection and key point location fusion model, in the feature map generated by convolution operation, the size of the receptive field of each pixel point in the corresponding face image is twice as large as the size of the anchor point frame.
In a preferred embodiment of the present invention, the predetermined ratio is 1: 1.
In a preferred embodiment of the present invention, as shown in fig. 2, step S2 specifically includes:
step S21, inputting the annotation image into a pre-generated initial fusion model to obtain a corresponding face detection prediction result and a key point prediction result;
the face detection prediction result comprises a face classification prediction result, a face frame regression prediction result and a face frame proportion prediction result;
step S22, respectively calculating a first loss function between the face classification prediction result and the real face classification result contained in the real face frame, a second loss function between the face frame regression prediction result and the real face region contained in the real face frame, a third loss function between the face frame proportion prediction result and the preset proportion, and a fourth loss function between the key point prediction result and the real key point position;
step S23, performing weighted summation on the first loss function, the second loss function, the third loss function, and the fourth loss function to obtain a total loss function, and comparing the total loss function with a preset loss function threshold:
if the total loss function is not less than the loss function threshold, go to step S24;
if the total loss function is less than the loss function threshold, go to step S25;
step S24, adjusting the training parameters in the initial fusion model according to the preset learning rate, and then returning to step S21 to continue a new training process;
and step S25, taking the initial fusion model as a face detection and key point positioning fusion model and outputting the model.
In a preferred embodiment of the present invention, in step S4, as shown in fig. 3, the key point anti-shake processing specifically includes:
step A1, according to the position of the key point of the face, the position of the corresponding key point of the face in the next frame of picture to be detected is enlarged by a preset multiple to obtain a picture of the face area;
step A2, checking the face region picture according to a pre-generated face checking model, and judging whether the face region picture is a face according to a checking result:
if yes, go to step A3;
if not, exiting;
and step A3, tracking the face by adopting a tracking algorithm to obtain a face detection frame and face key point positions corresponding to the next frame of face picture to be detected.
Specifically, in this embodiment, the jitter amplitude of the key point can be greatly reduced through the key point anti-jitter processing. The key point anti-shake processing mainly comprises two parts, namely a face checking module for judging whether the face is the face, wherein the face checking module preferably adopts a simple two-classification network and can be realized by only a plurality of convolution layers; and the second is a tracking algorithm, the KCF algorithm is preferably adopted in the tracking algorithm, and the tracking algorithm is used for tracking the face to obtain a face detection frame when the face verification module judges that the face is the face. The human face area is enlarged by 1.5 times and sent into a detection algorithm, so that the output picture is ensured not to have large change, and the accuracy of the regression of the key points is also ensured.
A single-stage face detection and key point localization system, which applies any one of the above single-stage face detection and key point localization methods, as shown in fig. 4, specifically includes:
thedata annotation module 1 is used for acquiring a plurality of face images and annotating each face image to obtain an annotated image with a real face frame and a real key point position;
thedata training module 2 is connected with thedata annotation module 1 and used for obtaining a face detection and key point positioning fusion model according to the annotation image training;
themodel prediction module 3 is connected with thedata training module 2 and is used for inputting the current frame of the human face picture to be detected into the human face detection and key point positioning fusion model to obtain and output a human face detection frame and a human face key point position corresponding to the current frame of the human face picture to be detected;
theanti-shake processing module 4 is connected with themodel prediction module 3 and is used for carrying out key point anti-shake processing on the next frame of face picture to be detected according to the face detection frame and the position of the key point of the face and recording the total times of carrying out the key point anti-shake processing;
thedata comparison module 5 is connected with theanti-shake processing module 4 and is used for comparing the recorded total times with a preset time threshold, generating a first comparison result when the total times is not more than the time threshold, and generating a second comparison result when the total times is more than the time threshold;
thefirst processing module 6 is connected with thedata comparison module 5, and is used for directly obtaining the next frame face detection frame and the next frame face key point position corresponding to the next frame face picture to be detected according to the first comparison result and the key point anti-shake processing result, and outputting the next frame face detection frame and the next frame face key point position as the current frame face detection frame and the current frame face key point position;
and thesecond processing module 7 is connected with thedata comparison module 5 and used for clearing the total times according to a second comparison result.
In a preferred embodiment of the present invention, thedata training module 2 specifically includes:
thedata prediction unit 21 is configured to input the annotation image into a pre-generated initial fusion model to obtain a corresponding face detection prediction result and a corresponding key point prediction result;
the face detection prediction result comprises a face classification prediction result, a face frame regression prediction result and a face frame proportion prediction result;
thefirst processing unit 22 is connected to thedata prediction unit 21 and configured to calculate a first loss function between the face classification prediction result and a real face classification result included in a real face frame, a second loss function between the face frame regression prediction result and a real face region included in the real face frame, a third loss function between the face frame proportion prediction result and a preset proportion, and a fourth loss function between the key point prediction result and a real key point position, respectively;
thesecond processing unit 23 is connected to thefirst processing unit 22, and configured to perform weighted summation on the first loss function, the second loss function, the third loss function, and the fourth loss function to obtain a total loss function;
thedata comparison unit 24 is connected to thesecond processing unit 23, and configured to compare the total loss function with a preset loss function threshold, and generate a first comparison result when the total loss function is not less than the loss function threshold, and generate a second comparison result when the total loss function is less than the loss function threshold;
thethird processing unit 25 is connected to thedata comparing unit 24, and is configured to adjust the training parameters in the initial fusion model according to the first comparison result and the preset learning rate to continue a new training process;
and thefourth processing unit 26 is connected to thedata comparing unit 24, and is configured to take the initial fusion model as a face detection and key point positioning fusion model according to the second comparison result and output the face detection and key point positioning fusion model.
In a preferred embodiment of the present invention, theanti-shake processing module 4 specifically includes:
the image processing unit 41 is configured to enlarge, by a preset multiple, a position of a corresponding face key point in a next frame of face picture to be detected according to the position of the face key point, so as to obtain a face region picture;
theface checking unit 42 is connected to the image processing unit 41, and is configured to check the face region picture according to a pre-generated face checking model, and output a corresponding face checking result when the checking result indicates that the face region picture is a face;
and theface tracking unit 43 is connected to theface verification unit 42, and is configured to track the face by using a tracking algorithm, so as to obtain a processing result of the anti-shake processing for the key points, where the key points include a face detection frame corresponding to the next frame of face picture to be detected and the key point positions of the face.
And thedata recording unit 44 is connected with theface tracking unit 43 and is used for recording the total times of key point anti-shake processing according to the processing result.
While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention.

Claims (10)

CN201911358998.1A2019-12-252019-12-25Single-stage face detection and key point positioning method and systemActiveCN111079686B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201911358998.1ACN111079686B (en)2019-12-252019-12-25Single-stage face detection and key point positioning method and system

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201911358998.1ACN111079686B (en)2019-12-252019-12-25Single-stage face detection and key point positioning method and system

Publications (2)

Publication NumberPublication Date
CN111079686Atrue CN111079686A (en)2020-04-28
CN111079686B CN111079686B (en)2023-05-23

Family

ID=70317792

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201911358998.1AActiveCN111079686B (en)2019-12-252019-12-25Single-stage face detection and key point positioning method and system

Country Status (1)

CountryLink
CN (1)CN111079686B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN111783749A (en)*2020-08-122020-10-16成都佳华物链云科技有限公司Face detection method and device, electronic equipment and storage medium
CN111783608A (en)*2020-06-242020-10-16南京烽火星空通信发展有限公司Face changing video detection method
CN111881876A (en)*2020-08-062020-11-03桂林电子科技大学 An Attendance Method Based on Single-Order Anchor-Free Detection Network
CN112241700A (en)*2020-10-152021-01-19希望银蕨智能科技有限公司Multi-target forehead temperature measurement method for forehead accurate positioning
CN112949492A (en)*2021-03-032021-06-11南京视察者智能科技有限公司Model series training method and device for face detection and key point detection and terminal equipment
CN113011356A (en)*2021-03-262021-06-22杭州朗和科技有限公司Face feature detection method, device, medium and electronic equipment
CN113449657A (en)*2021-07-052021-09-28中山大学Method, system and medium for detecting depth-forged face video based on face key points
CN114257748A (en)*2022-01-262022-03-29Oppo广东移动通信有限公司Video anti-shake method and device, computer readable medium and electronic device
CN114418901A (en)*2022-03-302022-04-29江西中业智能科技有限公司Image beautifying processing method, system, storage medium and equipment based on Retinaface algorithm
CN114627519A (en)*2020-12-142022-06-14阿里巴巴集团控股有限公司 Data processing method, apparatus, electronic device and storage medium
CN114924645A (en)*2022-05-182022-08-19上海庄生晓梦信息科技有限公司Interaction method and system based on gesture recognition
CN114926876A (en)*2022-04-262022-08-19黑芝麻智能科技有限公司Image key point detection method and device, computer equipment and storage medium
CN115050129A (en)*2022-06-272022-09-13北京睿家科技有限公司Data processing method and system for intelligent access control
CN117079337A (en)*2023-10-172023-11-17成都信息工程大学High-precision face attribute feature recognition device and method

Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2018170864A1 (en)*2017-03-202018-09-27成都通甲优博科技有限责任公司Face recognition and tracking method
CN109919097A (en)*2019-03-082019-06-21中国科学院自动化研究所 Joint detection system and method of face and key points based on multi-task learning
CN109977775A (en)*2019-02-252019-07-05腾讯科技(深圳)有限公司Critical point detection method, apparatus, equipment and readable storage medium storing program for executing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2018170864A1 (en)*2017-03-202018-09-27成都通甲优博科技有限责任公司Face recognition and tracking method
CN109977775A (en)*2019-02-252019-07-05腾讯科技(深圳)有限公司Critical point detection method, apparatus, equipment and readable storage medium storing program for executing
CN109919097A (en)*2019-03-082019-06-21中国科学院自动化研究所 Joint detection system and method of face and key points based on multi-task learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
徐威威;李俊;: "一种鲁棒的人脸关键点实时跟踪方法"*

Cited By (18)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN111783608A (en)*2020-06-242020-10-16南京烽火星空通信发展有限公司Face changing video detection method
CN111783608B (en)*2020-06-242024-03-19南京烽火星空通信发展有限公司Face-changing video detection method
CN111881876B (en)*2020-08-062022-04-08桂林电子科技大学Attendance checking method based on single-order anchor-free detection network
CN111881876A (en)*2020-08-062020-11-03桂林电子科技大学 An Attendance Method Based on Single-Order Anchor-Free Detection Network
CN111783749A (en)*2020-08-122020-10-16成都佳华物链云科技有限公司Face detection method and device, electronic equipment and storage medium
CN112241700A (en)*2020-10-152021-01-19希望银蕨智能科技有限公司Multi-target forehead temperature measurement method for forehead accurate positioning
CN114627519A (en)*2020-12-142022-06-14阿里巴巴集团控股有限公司 Data processing method, apparatus, electronic device and storage medium
CN112949492A (en)*2021-03-032021-06-11南京视察者智能科技有限公司Model series training method and device for face detection and key point detection and terminal equipment
CN113011356A (en)*2021-03-262021-06-22杭州朗和科技有限公司Face feature detection method, device, medium and electronic equipment
CN113449657A (en)*2021-07-052021-09-28中山大学Method, system and medium for detecting depth-forged face video based on face key points
CN114257748A (en)*2022-01-262022-03-29Oppo广东移动通信有限公司Video anti-shake method and device, computer readable medium and electronic device
CN114418901A (en)*2022-03-302022-04-29江西中业智能科技有限公司Image beautifying processing method, system, storage medium and equipment based on Retinaface algorithm
CN114926876A (en)*2022-04-262022-08-19黑芝麻智能科技有限公司Image key point detection method and device, computer equipment and storage medium
CN114926876B (en)*2022-04-262025-06-06黑芝麻智能科技有限公司 Image key point detection method, device, computer equipment and storage medium
CN114924645A (en)*2022-05-182022-08-19上海庄生晓梦信息科技有限公司Interaction method and system based on gesture recognition
CN115050129A (en)*2022-06-272022-09-13北京睿家科技有限公司Data processing method and system for intelligent access control
CN117079337A (en)*2023-10-172023-11-17成都信息工程大学High-precision face attribute feature recognition device and method
CN117079337B (en)*2023-10-172024-02-06成都信息工程大学 A high-precision facial attribute feature recognition device and method

Also Published As

Publication numberPublication date
CN111079686B (en)2023-05-23

Similar Documents

PublicationPublication DateTitle
CN111079686A (en)Single-stage face detection and key point positioning method and system
CN111444878B (en) A video classification method, device and computer-readable storage medium
CN112183166B (en)Method and device for determining training samples and electronic equipment
CN110633610B (en) A method of student status detection based on YOLO
CN113689440B (en) Video processing method, device, computer equipment and storage medium
US20240362746A1 (en)Modifying sensor data using generative adversarial models
CN114416260B (en)Image processing method, device, electronic equipment and storage medium
CN111414842B (en)Video comparison method and device, computer equipment and storage medium
CN111476060B (en) Face clarity analysis method, device, computer equipment and storage medium
CN111310746B (en)Text line detection method, model training method, device, server and medium
US20220309623A1 (en)Method and apparatus for processing video
CN112085768B (en)Optical flow information prediction method, optical flow information prediction device, electronic equipment and storage medium
CN110096617B (en)Video classification method and device, electronic equipment and computer-readable storage medium
KR102546631B1 (en)Apparatus for video data argumentation and method for the same
KR20210088656A (en) Methods, devices, devices and media for image generation and neural network training
CN114359938A (en) Form identification method and device
CN111726592B (en)Method and apparatus for obtaining architecture of image signal processor
WO2021235247A1 (en)Training device, generation method, inference device, inference method, and program
US12387291B2 (en)Video super-resolution using deep neural networks
US10929655B2 (en)Portrait image evaluation based on aesthetics
CN113779366B (en)Automatic optimization deployment method and device for neural network architecture for automatic driving
CN118608990A (en) A foreign body identification method, device and equipment for transmission line
CN118279348A (en)Method and related equipment for detecting and tracking moving target based on video image
US10728444B2 (en)Automated image capture system with expert guidance
CN114333065A (en) Behavior recognition method, system and related device applied to surveillance video

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp