Detailed description of the preferred embodiments
The invention provides a high-speed face tracking method and a high-speed face tracking system based on a deep cascade neural network.
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings.
Example one
As shown in fig. 1, a high-speed face tracking method based on a deep cascade neural network provided in an embodiment of the present invention includes the following steps:
step S1: establishing a face tracking model comprising a multilayer cascade neural network;
step S2: training a face tracking model to obtain a trained face tracking model;
step S3: inputting a face video frame to a trained face tracking model, and if the face video frame is a first frame or a calibration frame, detecting through a complete multilayer cascade neural network to obtain a face frame position; and if the face video frame is a subsequent frame, taking the face candidate frame output by the previous frame of the frame as input, and inputting the face candidate frame into the last layer of the multilayer cascade neural network for detection to obtain the position of the face frame.
In one embodiment, the step S1: establishing a face tracking model comprising a multilayer cascade neural network, which specifically comprises the following steps:
establishing a multilayer cascade neural network, wherein different levels of the multilayer cascade neural network respectively receive corresponding images with different resolution scales and output gradually accurate positions of the face candidate frames; and each layer of neural network adjusts the resolution of the face candidate frame output by the previous layer of neural network and then takes the face candidate frame as the input of the layer.
The face tracking model constructed by the invention comprises a cascade neural network of a plurality of layers of independent networks. The networks are mutually independent, and are more and more distant from the input image and closer to the output result from a lower layer to a higher layer network; the lower layer network receives the low-resolution input image and outputs a tracking result with lower precision, and the higher layer network receives the high-resolution input network and outputs a high-precision tracking result.
In one embodiment, the step S2: training a face tracking model to obtain the trained face tracking model, and specifically comprising:
and performing end-to-end training by using training data, wherein the training data is screened according to the score of the face candidate box after each iteration process is finished, and meanwhile, the proportion of positive and negative samples is adjusted.
The invention adopts a training mode of a face detection network to train the face tracking model. End-to-end training is performed by using a large amount of training data generated offline. After each iteration process, the training data is screened according to the score of the face candidate box, meanwhile, the proportion of positive and negative samples is properly adjusted according to the actual situation, particularly the proportion of the difficult samples is increased, and the positive samples with lower scores and the negative samples with higher scores are trained.
As shown in fig. 2, in one embodiment, the step S3: inputting a face video frame to the trained face tracking model, and if the face video frame is a first frame or a calibration frame, detecting through the complete multilayer cascade neural network to obtain a face frame position; if the face video frame is a subsequent frame, taking a face candidate frame output from the previous frame of the frame as an input, and inputting the face candidate frame into the last layer of the multilayer cascade neural network for detection to obtain the position of the face frame, specifically comprising the following steps:
step S31: judging the input human face video frame, and if the frame is the first frame or the calibration frame, jumping to the step S32; if not, jumping to step S34;
when the face tracking is actually carried out, a face video frame is input to the trained face tracking model. Firstly, judging an input human face video frame, and if the frame is a first frame or a calibration frame, jumping to the step S32; if not, go to step S34.
Step S32: the frame is detected through a first layer of a multilayer cascade neural network, and is screened according to a non-maximum suppression algorithm to obtain a face candidate frame;
when the input frame is the first frame or the calibration frame, the frame is detected by the complete multilayer cascade neural network. Firstly, detecting through a first layer of a multilayer cascade neural network, screening according to a non-maximum value inhibition algorithm, screening face frames with more coincidence and lower scores, namely selecting the rest face frames with the coincidence proportion lower than a threshold value as an output result to obtain a face candidate frame, and cutting a face image with a corresponding resolution scale according to the result to be used as the input of a subsequent high-resolution cascade neural network; if the frame is not the first frame or the calibration frame, then the process directly jumps to step S34.
Step S33: inputting the face candidate frame into a next layer of cascade neural network, and screening according to a non-maximum suppression algorithm to obtain the face candidate frame;
and (8) inputting the screened face candidate frame in the step (S32) into the next layer of cascade neural network for face judgment, further screening according to a non-maximum suppression algorithm to obtain a face candidate frame, and cutting a face image with a corresponding resolution scale according to the result to be used as the input of the subsequent high-resolution cascade neural network.
Step S34: repeating the step S33 until the last layer of the cascade neural network outputs the position of the face frame;
and step S33 is repeated until the last layer of the cascade neural network is reached, similarly, screening is carried out according to a non-maximum value inhibition algorithm, and finally, the face candidate frame with the coincidence proportion lower than the threshold value is selected as a result, and the face frame position is output.
Step S35: and inputting the output obtained by the last frame of the subsequent frame in the last layer of the cascade neural network into the last layer of the cascade neural network to obtain a face candidate frame, screening according to the non-maximum suppression algorithm, and outputting the position of the face frame.
And for the subsequent frames which are not the first frame or the calibration frame, taking the output of the last frame in the last layer of the cascade neural network as the input, inputting the output into the last layer of the cascade neural network to obtain a face candidate frame, screening according to the non-maximum inhibition algorithm, and outputting the position of the face frame.
The high-speed face tracking method based on the deep cascade neural network provided by the invention fully utilizes the fact that the image characteristics and semantics of frames in a face tracking task are greatly related, so that a high-precision face frame output by a high-layer high-resolution cascade neural network in the previous frame is used as prior information and is used as the basis of a tracking flow of the next frame, and the face tracking process is greatly accelerated. The method provided by the invention can achieve the tracking speed of vision in real time (more than 24 frames per second), thereby realizing the face tracking task on a mobile terminal platform with limited calculation capacity.
Example two
As shown in fig. 3, an embodiment of the present invention provides a high-speed face tracking system based on a deep cascade neural network, including the following modules:
the model construction module is used for establishing a face tracking model comprising a multilayer cascade neural network;
the model training module is used for training the face tracking model to obtain a trained face tracking model;
the face tracking module is used for inputting a face video frame to the trained face tracking model, and if the face video frame is a first frame or a calibration frame, the face video frame is detected through a complete multilayer cascade neural network to obtain the position of a face frame; and if the face video frame is a subsequent frame, taking the face candidate frame output by the previous frame of the frame as input, and inputting the face candidate frame into the last layer of the multilayer cascade neural network for detection to obtain the position of the face frame.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above examples are provided only for the purpose of describing the present invention, and are not intended to limit the scope of the present invention. The scope of the invention is defined by the appended claims. Various equivalent substitutions and modifications can be made without departing from the spirit and principles of the invention, and are intended to be within the scope of the invention.