CN113284166A

Movatterモバイル変換

Info

Publication number: CN113284166A
Application number: CN202110403624.8A
Authority: CN
Inventors: 不公告发明人
Original assignee: Warmnut Beijing Technology Development Co ltd
Current assignee: Warmnut Beijing Technology Development Co ltd
Priority date: 2021-04-15
Filing date: 2021-04-15
Publication date: 2021-08-20

Abstract

Translated fromChinese

本发明涉及一种基于深度级联神经网络的高速人脸跟踪方法及系统，其方法包括：S1：建立多层级联神经网络的人脸跟踪模型；S2：对人脸跟踪模型进行训练，得到训练好的人脸跟踪模型；S3：输入人脸视频帧到训练好的人脸跟踪模型，若该帧为第一帧或校准帧，通过完整的多层级联神经网络进行检测，得到人脸框位置；若该帧是后续帧，则将该帧的上一帧的输出的人脸候选框，输入多层级联神经网络的最后一层进行检测，得到人脸框位置。本发明提供的方法，充分利用人脸跟踪任务中帧与帧之间在图像特征与语义上有较大关联，因此利用上一帧中高层高分辨率级联神经网络输出的高精度人脸框作为先验信息，作为对下一帧的跟踪流程的基础，进而大大加速人脸跟踪过程。

The invention relates to a high-speed face tracking method and system based on a deep cascaded neural network. The method includes: S1: establishing a face tracking model of a multi-level cascaded neural network; S2: training the face tracking model to obtain the trained A good face tracking model; S3: Input the face video frame to the trained face tracking model. If the frame is the first frame or calibration frame, it is detected through a complete multi-level cascaded neural network to obtain the position of the face frame. ; If the frame is a subsequent frame, input the output face candidate frame of the previous frame of the frame into the last layer of the multi-layer cascaded neural network for detection to obtain the face frame position. The method provided by the present invention makes full use of the large correlation between image features and semantics between frames in the face tracking task, so the high-precision face frame output by the high-level high-resolution cascaded neural network in the previous frame is used. As a priori information, it serves as the basis for the tracking process of the next frame, thereby greatly accelerating the face tracking process.

Description

High-speed face tracking method and system based on deep cascade neural network

Technical Field

The invention relates to the field of image enhancement and computer vision, in particular to a high-speed face tracking method and system based on a deep cascade neural network.

Background

In the use scene of face tracking, the inferred speed of face tracking is directly reflected as the frame number of the processed and output video stream, thereby directly influencing the user experience. The traditional face tracking algorithm carries out matching of manual feature characterization through a traditional machine learning algorithm to determine the corresponding relation between an image area and a face image to be detected, but the method cannot fully utilize deep information of the face image, so that the tracking effect is poor; moreover, the depth tracking algorithm suffers from the complexity of calculation, and cannot achieve the high-speed real-time tracking effect in a production environment with limited calculation capacity.

Disclosure of Invention

In order to solve the technical problem, the invention provides a high-speed face tracking method and system based on a deep cascade neural network.

The technical solution of the invention is as follows: a high-speed face tracking method based on a deep cascade neural network comprises the following steps:

step S1: establishing a face tracking model comprising a multilayer cascade neural network;

step S2: training the face tracking model to obtain a trained face tracking model;

step S3: inputting a face video frame to the trained face tracking model, and if the face video frame is a first frame or a calibration frame, detecting through the complete multilayer cascade neural network to obtain a face frame position; and if the face video frame is a subsequent frame, inputting a face candidate frame output by the previous frame of the frame as input into the last layer of the multilayer cascade neural network for detection to obtain the position of the face frame. Compared with the prior art, the invention has the following advantages:

the high-speed face tracking method based on the deep cascade neural network provided by the invention fully utilizes the fact that the image characteristics and semantics of frames in a face tracking task are greatly related, so that a high-precision face frame output by a high-layer high-resolution cascade neural network in the previous frame is used as prior information and is used as the basis of a tracking flow of the next frame, and the face tracking process is greatly accelerated. The method provided by the invention can achieve the tracking speed of vision in real time (more than 24 frames per second), thereby realizing the face tracking task on a mobile terminal platform with limited calculation capacity.

Drawings

FIG. 1 is a flowchart of a high-speed face tracking method based on a deep cascade neural network in an embodiment of the present invention;

fig. 2 is a step S3 in the high-speed face tracking method based on the deep cascade neural network in the embodiment of the present invention: inputting a face video frame to the trained face tracking model, and if the face video frame is a first frame or a calibration frame, detecting through the complete multilayer cascade neural network to obtain a face frame position; if the face video frame is a subsequent frame, taking a face candidate frame output from the previous frame of the frame as input, and inputting the face candidate frame into the last layer of the multilayer cascade neural network for detection to obtain a flow chart of the position of the face frame;

fig. 3 is a block diagram of a high-speed face tracking system based on a deep cascade neural network in the embodiment of the present invention.

Detailed description of the preferred embodiments

The invention provides a high-speed face tracking method and a high-speed face tracking system based on a deep cascade neural network.

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings.

Example one

As shown in fig. 1, a high-speed face tracking method based on a deep cascade neural network provided in an embodiment of the present invention includes the following steps:

step S2: training a face tracking model to obtain a trained face tracking model;

step S3: inputting a face video frame to a trained face tracking model, and if the face video frame is a first frame or a calibration frame, detecting through a complete multilayer cascade neural network to obtain a face frame position; and if the face video frame is a subsequent frame, taking the face candidate frame output by the previous frame of the frame as input, and inputting the face candidate frame into the last layer of the multilayer cascade neural network for detection to obtain the position of the face frame.

In one embodiment, the step S1: establishing a face tracking model comprising a multilayer cascade neural network, which specifically comprises the following steps:

establishing a multilayer cascade neural network, wherein different levels of the multilayer cascade neural network respectively receive corresponding images with different resolution scales and output gradually accurate positions of the face candidate frames; and each layer of neural network adjusts the resolution of the face candidate frame output by the previous layer of neural network and then takes the face candidate frame as the input of the layer.

The face tracking model constructed by the invention comprises a cascade neural network of a plurality of layers of independent networks. The networks are mutually independent, and are more and more distant from the input image and closer to the output result from a lower layer to a higher layer network; the lower layer network receives the low-resolution input image and outputs a tracking result with lower precision, and the higher layer network receives the high-resolution input network and outputs a high-precision tracking result.

In one embodiment, the step S2: training a face tracking model to obtain the trained face tracking model, and specifically comprising:

and performing end-to-end training by using training data, wherein the training data is screened according to the score of the face candidate box after each iteration process is finished, and meanwhile, the proportion of positive and negative samples is adjusted.

The invention adopts a training mode of a face detection network to train the face tracking model. End-to-end training is performed by using a large amount of training data generated offline. After each iteration process, the training data is screened according to the score of the face candidate box, meanwhile, the proportion of positive and negative samples is properly adjusted according to the actual situation, particularly the proportion of the difficult samples is increased, and the positive samples with lower scores and the negative samples with higher scores are trained.

As shown in fig. 2, in one embodiment, the step S3: inputting a face video frame to the trained face tracking model, and if the face video frame is a first frame or a calibration frame, detecting through the complete multilayer cascade neural network to obtain a face frame position; if the face video frame is a subsequent frame, taking a face candidate frame output from the previous frame of the frame as an input, and inputting the face candidate frame into the last layer of the multilayer cascade neural network for detection to obtain the position of the face frame, specifically comprising the following steps:

step S31: judging the input human face video frame, and if the frame is the first frame or the calibration frame, jumping to the step S32; if not, jumping to step S34;

when the face tracking is actually carried out, a face video frame is input to the trained face tracking model. Firstly, judging an input human face video frame, and if the frame is a first frame or a calibration frame, jumping to the step S32; if not, go to step S34.

Step S32: the frame is detected through a first layer of a multilayer cascade neural network, and is screened according to a non-maximum suppression algorithm to obtain a face candidate frame;

when the input frame is the first frame or the calibration frame, the frame is detected by the complete multilayer cascade neural network. Firstly, detecting through a first layer of a multilayer cascade neural network, screening according to a non-maximum value inhibition algorithm, screening face frames with more coincidence and lower scores, namely selecting the rest face frames with the coincidence proportion lower than a threshold value as an output result to obtain a face candidate frame, and cutting a face image with a corresponding resolution scale according to the result to be used as the input of a subsequent high-resolution cascade neural network; if the frame is not the first frame or the calibration frame, then the process directly jumps to step S34.

Step S33: inputting the face candidate frame into a next layer of cascade neural network, and screening according to a non-maximum suppression algorithm to obtain the face candidate frame;

and (8) inputting the screened face candidate frame in the step (S32) into the next layer of cascade neural network for face judgment, further screening according to a non-maximum suppression algorithm to obtain a face candidate frame, and cutting a face image with a corresponding resolution scale according to the result to be used as the input of the subsequent high-resolution cascade neural network.

Step S34: repeating the step S33 until the last layer of the cascade neural network outputs the position of the face frame;

and step S33 is repeated until the last layer of the cascade neural network is reached, similarly, screening is carried out according to a non-maximum value inhibition algorithm, and finally, the face candidate frame with the coincidence proportion lower than the threshold value is selected as a result, and the face frame position is output.

Step S35: and inputting the output obtained by the last frame of the subsequent frame in the last layer of the cascade neural network into the last layer of the cascade neural network to obtain a face candidate frame, screening according to the non-maximum suppression algorithm, and outputting the position of the face frame.

And for the subsequent frames which are not the first frame or the calibration frame, taking the output of the last frame in the last layer of the cascade neural network as the input, inputting the output into the last layer of the cascade neural network to obtain a face candidate frame, screening according to the non-maximum inhibition algorithm, and outputting the position of the face frame.

Example two

As shown in fig. 3, an embodiment of the present invention provides a high-speed face tracking system based on a deep cascade neural network, including the following modules:

the model construction module is used for establishing a face tracking model comprising a multilayer cascade neural network;

the model training module is used for training the face tracking model to obtain a trained face tracking model;

the face tracking module is used for inputting a face video frame to the trained face tracking model, and if the face video frame is a first frame or a calibration frame, the face video frame is detected through a complete multilayer cascade neural network to obtain the position of a face frame; and if the face video frame is a subsequent frame, taking the face candidate frame output by the previous frame of the frame as input, and inputting the face candidate frame into the last layer of the multilayer cascade neural network for detection to obtain the position of the face frame.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above examples are provided only for the purpose of describing the present invention, and are not intended to limit the scope of the present invention. The scope of the invention is defined by the appended claims. Various equivalent substitutions and modifications can be made without departing from the spirit and principles of the invention, and are intended to be within the scope of the invention.