Movatterモバイル変換


[0]ホーム

URL:


CN112668385A - Method for marking human face - Google Patents

Method for marking human face
Download PDF

Info

Publication number
CN112668385A
CN112668385ACN202010825775.8ACN202010825775ACN112668385ACN 112668385 ACN112668385 ACN 112668385ACN 202010825775 ACN202010825775 ACN 202010825775ACN 112668385 ACN112668385 ACN 112668385A
Authority
CN
China
Prior art keywords
training
face
shape
node
regression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010825775.8A
Other languages
Chinese (zh)
Inventor
张旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unicloud Technology Co Ltd
Original Assignee
Unicloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unicloud Technology Co LtdfiledCriticalUnicloud Technology Co Ltd
Priority to CN202010825775.8ApriorityCriticalpatent/CN112668385A/en
Publication of CN112668385ApublicationCriticalpatent/CN112668385A/en
Pendinglegal-statusCriticalCurrent

Links

Landscapes

Abstract

The invention provides a method for marking human faces, which comprises a training process and a prediction process, wherein the training process comprises the following steps: a1, inputting a training sample, and preprocessing the training sample; a2, performing cascade regression training; a3, generating a model file after training and storing; the prediction process comprises the steps of: b1, denoising and filtering the input human face image; b2, detecting the face image; b2, performing prediction calculation on the processed face image model; and B3, analyzing and calculating the positions of the feature points through the trained model file to obtain the shape of the human face. The method for marking the human face effectively solves the problem of jitter of the marking points existing when the existing human face marking algorithm is used in the video sequence, enables the key points of the human face in the video sequence to be marked more stably, provides a stable effect for the subsequent analysis and processing links based on the marking points, and particularly provides application with high requirements on the position stability and accuracy of the marking points.

Description

Method for marking human face
Technical Field
The invention belongs to the field of face marking, and particularly relates to a face marking method.
Background
With the rise of machine vision research, applications based on technologies such as face detection, labeling and analysis become current hot spots, including face identification, face special effects, expression analysis and recognition, and the like. All of these applications rely on face detection and labeling techniques. The face mark is a given face image, the positions of the edge contours of the face and the five sense organs are found in the face area of the face image, and the shape of the face contour and the positions and shapes of the five sense organs can be obtained through the found positions, so that feature analysis, attribute analysis and the like can be further performed.
The traditional face marking algorithm is based on a single frame image, and for a video image sequence, due to the influence of a camera and an external illumination environment, the images of the front frame and the rear frame of a video have obvious difference even if no motion exists, so that the shaking and position instability of characteristic points can be caused, and the accuracy of face detection marking on the video image sequence by utilizing the traditional face marking algorithm is lower.
Disclosure of Invention
In view of this, the present invention aims to provide a method for marking a face, which effectively solves the problem of jitter of mark points existing when the current face marking algorithm is used in a video sequence, so that key points of the face in the video sequence are marked more stably, and a stable effect is provided for subsequent analysis and processing links based on the mark points, especially for applications with high requirements on the position stability and accuracy of the mark points, such as expression analysis and recognition.
In order to achieve the purpose, the technical scheme of the invention is realized as follows:
a method of face labeling comprising a training process and a prediction process, the training process comprising the steps of: a1, inputting a training sample, and preprocessing the training sample; a2, performing cascade regression training; a3, generating a model file after training and storing;
the prediction process comprises the steps of: b1, denoising and filtering the input human face image; b2, detecting the face image; b2, performing prediction calculation on the processed face image model; and B3, analyzing and calculating the positions of the feature points through the trained model file to obtain the shape of the human face.
Further, the training samples input in step a1 are images obtained by labeling face regions and key points, and are denoted as Li (i ═ N), N represents the total number of training sets, and the feature point shape is an overall contour formed by the positions of all the feature points;
the real shape of the feature point is denoted as Si, and represents coordinate information of all the feature points.
Further, the preprocessing of the training samples in step a1 includes normalization, and the normalization method includes: the face shape of each sample image is mapped into a unified standard matrix by carrying out isomorphic scale scaling and translation operations on the feature points of each sample image, and the corresponding shape information of each sample image in the standard matrix, including the coordinate information corresponding to each feature point, is obtained.
Furthermore, normalization processing is carried out on each sample image in the sample image set, so that the matrixes of the feature points on each sample image are consistent, training is facilitated, and a cascade regression learning model is constructed.
Further, in step a2, cascade regression training is performed through random forests.
Further, the model file generated in step a3 includes the number of levels of the random forest, the number of regression trees in each level of forest, the depth of each regression tree, and node information of each node of the regression tree.
Further, the node information of each node of the regression tree includes a pixel pair position and a threshold of each node, probability information of left and right branches, and an error estimation value of a leaf node.
Further, in step B1, the input face image is denoised by nonlinear median filtering, and the median is statistically calculated by N × N (where N is an odd number) windows centered on each pixel in the image, and the pixel value at the position is replaced by the median.
Further, in step B2, the method for calculating the prediction of the processed face image model includes: analyzing a model file generated by training, reconstructing a random forest model generated in the training process, and carrying out iterative computation on each regression tree under each level of random forest in each level of random forest regression model to finally obtain the detected face shape.
Further, the method for calculating the prediction of the processed face image model comprises the following steps:
c1, analyzing the model file to obtain the average shape S, the pixel pair position and threshold (u, v, th) of the node, and the error estimation value of each leaf node;
c2, entering a first regression tree of the first-level forest, starting from the root node, judging the pixel intensity difference value of the image at the position (u, v) according to the (u, v, th) of the root node, and calculating the left branch probability and the right branch probability;
c3, processing the next level depth node of the tree, and respectively calculating the probability of the left branch and the right branch;
c4, until the probability of all leaf node branches is calculated, finally calculating the product of the path branch probabilities of all leaf nodes to obtain the probability of the leaf node, and updating the shape estimation for the estimated value of the shape error of the tree, wherein the sum of the error estimation of all leaf nodes and the product of the probabilities is the accumulated sum of the shape error estimation of the tree;
repeating the steps C2-C4 on other trees to obtain the shape estimation of the primary forest;
and performing iterative calculation by taking the obtained shape estimation as the initial shape of the next adjacent regression tree, and repeating the steps C2-C4 for each stage of regression until the last regression tree in the random forest model obtains the estimated shape of the last regression tree as the detected face shape.
Compared with the prior art, the method for marking the human face has the following advantages:
the method for marking the human face effectively solves the problem of jitter of the marking points existing when the existing human face marking algorithm is used in a video sequence, enables the key points of the human face in the video sequence to be marked more stably, provides a stable effect for subsequent analysis and processing links based on the marking points, and particularly provides applications with high requirements on the position stability and accuracy of the marking points, such as expression analysis and recognition.
Detailed Description
It should be noted that the embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art through specific situations.
The present invention will be described in detail with reference to examples.
A method of face labeling comprising a training process and a prediction process, the training process comprising the steps of: a1, inputting a training sample, and preprocessing the training sample; a2, performing cascade regression training; a3, generating a model file after training and storing;
the prediction process comprises the steps of: b1, denoising and filtering the input human face image; b2, detecting the face image; b2, performing prediction calculation on the processed face image model; and B3, analyzing and calculating the positions of the feature points through the trained model file to obtain the shape of the human face.
The training samples input in step a1 are images obtained by labeling face regions and key points, and are denoted as Li (i ═ N), N represents the total number of training sets, and the feature point shape is an overall contour formed by the positions of all the feature points;
the real shape of the feature point is denoted as Si, and represents coordinate information of all the feature points.
The preprocessing of the training samples in the step a1 includes normalization, and the normalization method includes: scaling and translating the feature points of each sample image in isomorphic scale, mapping the face shape of each sample image into a unified standard matrix, and obtaining the corresponding shape information of each sample image in the standard matrix, including the coordinate information corresponding to each feature point.
And carrying out normalization processing on each sample image in the sample image set, wherein the normalization processing is used for ensuring that matrixes where feature points on each sample image are located are consistent, training is convenient, and a cascade regression learning model is constructed.
In step a2, cascade regression training is performed through random forests.
The model file generated in step a3 includes the number of levels of the random forest, the number of regression trees in each level of forest, the depth of each regression tree, and node information for each node of the regression tree.
The node information of each node of the regression tree comprises the pixel pair position and threshold value of each node, probability information of left and right branches and error estimation values of leaf nodes.
In step B1, the input face image is denoised by nonlinear median filtering, and a median is statistically calculated by N × N (where N is an odd number) windows centered on each pixel in the image, and the pixel value at the position is replaced by the median.
In step B2, the method for calculating the prediction of the processed face image model includes: analyzing a model file generated by training, reconstructing a random forest model generated in the training process, and carrying out iterative computation on each regression tree under each level of random forest in the random forest regression model to finally obtain the detected face shape.
The method for predicting and calculating the processed face image model comprises the following steps:
c1, analyzing the model file to obtain the average shape S, the pixel pair position and threshold (u, v, th) of the node, and the error estimation value of each leaf node;
c2, entering a first regression tree of the first-level forest, starting from the root node, judging the pixel intensity difference value of the image at the position (u, v) according to the (u, v, th) of the root node, and calculating the left branch probability and the right branch probability;
c3, processing the next level depth node of the tree, and respectively calculating the probability of the left branch and the right branch;
c4, until the probability of all leaf node branches is calculated, finally calculating the product of the path branch probabilities of all leaf nodes to obtain the probability of the leaf node, and updating the shape estimation for the estimated value of the shape error of the tree, wherein the sum of the error estimation of all leaf nodes and the product of the probabilities is the accumulated sum of the shape error estimation of the tree;
repeating the steps C2-C4 on other trees to obtain the shape estimation of the primary forest;
and performing iterative calculation by taking the obtained shape estimation as the initial shape of the next adjacent regression tree, and repeating the steps C2-C4 for each stage of regression until the last regression tree in the random forest model obtains the estimated shape of the last regression tree as the detected face shape.
In the specific implementation, the following examples are proposed:
1.1 image preprocessing:
the training samples are all images marked on the face region and the key points, and are marked as Li (i < ═ N), N represents the total number of the training sets, and the feature point shape is an integral contour formed by the positions of all the feature points.
The true shape of the feature point is denoted as Si (coordinate information of all feature points).
Because the resolution, the posture and the like of different sample images are different, normalization processing needs to be carried out on each sample image in a sample image set, the face shape of each sample image is mapped into a unified standard matrix by carrying out operations such as scaling, translation and the like on feature points of each sample image in isomorphic scale, and shape information corresponding to each sample image in the standard matrix, including coordinate information corresponding to each feature point and the like, is obtained.
And carrying out normalization processing on each sample image in the sample image set, so that the matrixes of the characteristic points on each sample image are consistent, the training is convenient, and a cascade regression learning model is constructed.
1.2 probability cascade regression learning:
the training is the process of regression learning and is realized through random forests, the training process is the process of generating a cascade of random forests, and the random forests realize the mapping between the image characteristics and the shapes. The shape prediction error is gradually approximated through multistage random forests, and the method is a process for gradually approximating the prediction error from coarse to fine. The number of the random forests is T, and the number of the regression trees in each random forest is M. The image features adopt intensity difference information (gray difference) of pixel pairs, and the features have the characteristics of low calculation complexity and good robustness to posture, illumination change and the like.
The initial prediction of the shape of each sample feature point is the calculated average shape S of the preprocessing process, and the prediction error before the first-level iteration R ^0 is as follows: Δ S — S, where S (I _ I) is the characteristic point true shape of the sample I _ I. Each stage of stochastic Sensors are further approximated to the error after the previous stage of regression, I is image characteristic information, S ^ (T-1) is the shape estimation after the previous stage of regression, and after T stage of regression, the accumulated error estimation of all regressors is approximated to delta S, so that the aim of approximating the real shape is fulfilled.
S ^ t ^ R ^ t (L, S ^ (t-1)) formula (1)
Each level of the regressor R ^ t is a random forest, K regression trees { R ^1, R ^2, …, R ^ K } are provided, the depth of the trees is 1, and each layer is provided with 2^ n (n is more than or equal to 0 and less than or equal to l) nodes. When entering each level of forest, N pairs of pixel points are randomly generated in the face area, the pixel points satisfy the following formula (2), and the P value is smaller than the threshold value T _ P.
P ═ e- λ | u-v | formula (2)
In the formula (2), the essence is to constrain the randomly selected pixels, and a pair of randomly selected pixels is based on the above common
The value calculated by the formula is filtered, and the requirement that P is less than Tp. is met because through experiments, random points in the form of exponential functions are in the face area
The domains have better distribution characteristics, are coefficients in a general exponential function form, and have no specific physical significance.
The meaning of this formula
It is to make these randomly selected points as close to the facial five-sense organs edge and outline as possible, due to the proximity of the edge
The pixel difference has a large gradient (difference value), so the gradient needs to be larger than a certain range, so the P value needs to be smaller than a certain range.
To solve the jitter problem, the added probability model is: when each node is divided into a left subtree and a right subtree, the left subtree and the right subtree respectively correspond to a probability, the probability of the right subtree is p, the probability of the left subtree is 1-p, wherein alpha is a constant, g is an actual gray difference value of a sample at a (u, v) position, and [ th ] _ m is a threshold value of the division at the node. It is well proven mathematically that the probability of the branch of the subtree where the sample is located is greater than that of the other subtree.
p is 1/(1+ e α (g-th _ m)) formula (3)
After the tree is generated, the probability of each leaf node is the product of the probabilities of all the branches that are traversed from the root node. The reason why the probability model is added to resist jitter is obvious because each regression tree is actually a partition (or classification) of the training samples, and the linear combination of these partitions is used to estimate and approximate the shape error. Without the probabilistic model, the shape error estimate for each sample is determined only by the sample in the leaf node where it falls, and after adding the probabilistic model, the weighted sum of the samples in all leaf nodes is determined, but the weights of the other leaf nodes are lower. Obviously, the result is more stable, and meanwhile, under the condition of the global most condition, the process of gradually approaching the real error is realized. The predicted value of the shape error for each sample falling into the left and right subtrees is replaced by the mean of the errors of the true shapes of all samples in that subtree and the shape estimated at the previous stage. Therefore, the selection of the pixel pair division of each node is a process for solving the minimum overall prediction error once.
1.3 generating a model file:
necessary information of training end in the process is stored to generate a model file. The information contained in the model file mainly comprises the series number of random forests, the number of regression trees in each forest, the depth of each regression tree, and the node of each node of the regression tree
Information such as pixel pair location and threshold for each node, probability information for left and right branches, error estimates for leaf nodes.
2.1 denoising and filtering the image to be detected
Due to the change of hardware and environmental illumination, the difference between the front frame and the back frame of the image to be detected acquired by the camera is large even if the human face is still, and the positions of the characteristic points are unstable due to the difference. To reduce the effect of such noise, the image to be detected may be subjected to median filtering before prediction is performed. The median is calculated statistically for a window of N × N (where N is an odd number) centered at each pixel in the image by means of nonlinear median filtering, and the pixel value at that position is replaced by the median.
2.2 model prediction computation Using model files
Analyzing the model file generated by training, reconstructing a random forest model generated in the training process, carrying out iterative computation on each regression tree under each level of random forest in the random forest regression model, and finally obtaining the detection result
The human face shape of (1). Specifically, the method comprises the following steps: predicting the feature point true shape of the image according to the following steps:
1) analyzing the model file to obtain an average shape S, pixel pair positions and threshold values (u, v, th) of the nodes, and an error estimation value of each leaf node;
2) entering a first regression tree of a first-level forest, starting from a root node, judging a pixel intensity difference value of an image at a position (u, v) according to (u, v, th) of the root node, and respectively calculating a left branch probability and a right branch probability according to the difference value and an equation (3);
3) processing the next level depth node of the tree, wherein the steps are similar to 2), and calculating the probability of the left branch and the probability of the right branch respectively;
4) calculating the probability of all leaf node branches until the probability of all leaf node branches is calculated, and finally calculating the product of the path branch probabilities of all leaf nodes to obtain the probability of the leaf nodes, wherein the sum of the error estimation of all leaf nodes and the product of the probabilities is the estimation value of the shape error of the tree, and the shape estimation is updated; repeating the steps 2) to 4) on other trees to obtain the shape estimation of the first-level forest; and performing iterative calculation by taking the obtained shape estimation as the initial shape of the next adjacent regression tree, and repeating the steps 2) -4) for each stage of regression until the last regression tree in the random forest model obtains the estimated shape of the last regression tree as the detected face shape.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

CN202010825775.8A2020-08-172020-08-17Method for marking human facePendingCN112668385A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202010825775.8ACN112668385A (en)2020-08-172020-08-17Method for marking human face

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202010825775.8ACN112668385A (en)2020-08-172020-08-17Method for marking human face

Publications (1)

Publication NumberPublication Date
CN112668385Atrue CN112668385A (en)2021-04-16

Family

ID=75403983

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202010825775.8APendingCN112668385A (en)2020-08-172020-08-17Method for marking human face

Country Status (1)

CountryLink
CN (1)CN112668385A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN101763636A (en)*2009-09-232010-06-30中国科学院自动化研究所Method for tracing position and pose of 3D human face in video sequence
CN106909888A (en)*2017-01-222017-06-30南京开为网络科技有限公司It is applied to the face key point tracking system and method for mobile device end
CN107169463A (en)*2017-05-222017-09-15腾讯科技(深圳)有限公司Method for detecting human face, device, computer equipment and storage medium
CN107909034A (en)*2017-11-152018-04-13清华大学深圳研究生院A kind of method for detecting human face, device and computer-readable recording medium
CN111368683A (en)*2020-02-272020-07-03南京邮电大学Face image feature extraction method and face recognition method based on modular constraint CentreFace

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN101763636A (en)*2009-09-232010-06-30中国科学院自动化研究所Method for tracing position and pose of 3D human face in video sequence
CN106909888A (en)*2017-01-222017-06-30南京开为网络科技有限公司It is applied to the face key point tracking system and method for mobile device end
CN107169463A (en)*2017-05-222017-09-15腾讯科技(深圳)有限公司Method for detecting human face, device, computer equipment and storage medium
CN107909034A (en)*2017-11-152018-04-13清华大学深圳研究生院A kind of method for detecting human face, device and computer-readable recording medium
CN111368683A (en)*2020-02-272020-07-03南京邮电大学Face image feature extraction method and face recognition method based on modular constraint CentreFace

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
L. CONG ET AL.: "Improved explicit shape regression face alignment algorithm", 《2018 CHINESE CONTROL AND DECISION CONFERENCE (CCDC)》*
王丽婷 等: "基于随机森林的人脸关键点精确定位方法", 《清华大学学报(自然科学版)》*

Similar Documents

PublicationPublication DateTitle
CN107784663B (en) Correlation filter tracking method and device based on depth information
CN108198207A (en)Multiple mobile object tracking based on improved Vibe models and BP neural network
CN110349112B (en)Two-stage image denoising method based on self-adaptive singular value threshold
CN109215025B (en)Infrared weak and small target detection method based on non-convex rank approach minimization
Sigari et al.Real-time background modeling/subtraction using two-layer codebook model
CN107633226A (en)A kind of human action Tracking Recognition method and system
CN109685045A (en)A kind of Moving Targets Based on Video Streams tracking and system
CN107169962A (en)The gray level image fast partition method of Kernel fuzzy clustering is constrained based on space density
CN111027610B (en)Image feature fusion method, apparatus, and medium
CN107368802B (en) A moving target tracking method based on KCF and human brain memory mechanism
CN109948776A (en) An LBP-based Adversarial Network Model Image Label Generation Method
CN108681689A (en)Based on the frame per second enhancing gait recognition method and device for generating confrontation network
CN110309834A (en)Improvement ORB-SLAM algorithm for outdoor offline navigation system
CN120014525B (en)Adaptive scene analysis and target generation method and system based on deep learning
CN108846845B (en)SAR image segmentation method based on thumbnail and hierarchical fuzzy clustering
CN105654518B (en)A kind of trace template adaptive approach
CN118334512A (en)SAR image target recognition method and system based on SSIM and cascade deep neural network
CN111931722A (en)Correlated filtering tracking method combining color ratio characteristics
CN115114851B (en) Score card modeling method and device based on five-fold cross validation
CN111291785B (en) Target detection method, device, equipment and storage medium
CN109308709B (en) Vibe Moving Object Detection Algorithm Based on Image Segmentation
CN114495220B (en)Target identity recognition method, device and storage medium
Omer et al.Image specific feature similarities
CN115249227A (en) SAR image change detection method based on fusion difference map and morphological reconstruction
CN111223126B (en)Cross-view-angle trajectory model construction method based on transfer learning

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
RJ01Rejection of invention patent application after publication

Application publication date:20210416

RJ01Rejection of invention patent application after publication

[8]ページ先頭

©2009-2025 Movatter.jp