CN112733667A

Movatterモバイル変換

Info

Publication number: CN112733667A
Application number: CN202011626746.5A
Authority: CN
Inventors: 魏舒; 刘玉宇
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2021-04-30
Anticipated expiration: 2040-12-30
Also published as: CN112733667B

Abstract

The application discloses a face alignment method and device based on face recognition, relates to the technical field of artificial intelligence, and can improve face alignment accuracy. The method comprises the following steps: determining an initial transformation array corresponding to a first face ROI (region of interest) region and used for realizing face key point alignment according to the first face ROI region in an acquired video frame and preset face standard key points; carrying out debounce and smoothing processing on the initial transformation array to obtain a target transformation array; and carrying out face key point alignment processing on the second face ROI area in the newly extracted video frame by using the target transformation array to obtain a target face ROI area with aligned face key points. The method and the device are suitable for various application systems based on face alignment. In addition, the application also relates to a block chain technology, and the video frame and the ROI area of the target face can be stored in the block chain so as to ensure the data privacy and safety.

Description

Face alignment method and device based on face recognition

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a face alignment method and apparatus based on face recognition.

Background

The face alignment is a hot research problem of computer vision, and the accuracy of extracting and identifying face features is directly influenced by the face alignment effect. For video, face alignment is a key factor affecting the stability of video stream.

In the existing solution for face alignment in a video scene, a neural network method is used for key point detection and alignment (such as MTCNN), a large amount of labeling cost is required, hardware requirements such as a GPU are required, and training cost is high. In addition, some tracking methods are also commonly used, for example, on the basis of key point detection, the detection stability between consecutive video frames in a video stream is improved, so as to ensure the accuracy of face alignment. Therefore, the existing solution has the defects that the method for detecting the MTCNN based on the face key points of the neural network model has higher hardware requirements such as labeling cost, GPU and the like, and higher training cost; the method for realizing the alignment of the key points of the human face by tracking and detecting the stability between the continuous video frames has lower processing speed and influences the real-time property of the system.

Disclosure of Invention

In view of the above, the present application provides a face alignment method and apparatus based on face recognition, and mainly aims to solve the technical problems that the existing method for detecting MTCNN based on face key points of a neural network model has high labeling cost and hardware requirements such as GPU, and training cost, and the method for realizing face key point alignment by tracking and detecting stability between continuous video frames has low processing speed and poor real-time processing effect of the system.

According to one aspect of the present application, a face alignment method based on face recognition is provided, the method including:

determining an initial transformation array corresponding to a first face ROI (region of interest) region and used for realizing face key point alignment according to the first face ROI region in an acquired video frame and preset face standard key points, wherein the initial transformation array comprises initial scale transformation data and initial translation transformation data;

carrying out debounce and smoothing processing on the initial transformation array to obtain a target transformation array;

carrying out face key point alignment processing on the extracted second face ROI area in the video frame by using the target transformation array to obtain a target face ROI area after face key points are aligned;

the second face ROI region is different from the first face ROI region, and further includes a background region compared to the first face ROI region.

According to another aspect of the present application, there is provided a face alignment apparatus, the apparatus comprising:

the initial transformation array module is used for determining an initial transformation array which is corresponding to a first face ROI area and is used for realizing face key point alignment according to the first face ROI area in the acquired video frame and preset face standard key points, wherein the initial transformation array comprises initial scale transformation data and initial translation transformation data;

the target transformation array module is used for carrying out debouncing and smoothing processing on the initial transformation array to obtain a target transformation array;

the alignment module is used for carrying out face key point alignment processing on the extracted second face ROI area in the video frame by using the target transformation array to obtain a target face ROI area after face key points are aligned;

According to yet another aspect of the present application, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described face alignment method based on face recognition.

According to still another aspect of the present application, there is provided a computer device, including a storage medium, a processor, and a computer program stored on the storage medium and executable on the processor, where the processor executes the computer program to implement the above face alignment method based on face recognition.

By means of the technical scheme, compared with the existing MTCNN (mean-fit network) detection method of the face key points based on the neural network model and the method for realizing the face key point alignment by tracking and detecting the stability between continuous video frames, the face alignment method and the face alignment device based on the face recognition determine an initial transformation array corresponding to a first face ROI (region of interest) region and used for realizing the face key point alignment according to the first face ROI region in the acquired video frames and preset face standard key points, wherein the initial transformation array comprises initial scale transformation data and initial translation transformation data; carrying out debounce and smoothing processing on the initial transformation array to obtain a target transformation array; carrying out face key point alignment processing on the extracted second face ROI area in the video frame by using the target transformation array to obtain a target face ROI area after face key points are aligned; the second face ROI region is different from the first face ROI region, and further includes a background region compared to the first face ROI region. Therefore, the face alignment of the face ROI is realized after self-adaptive shaking removal and smoothing processing according to the initial transformation array determined by the first face ROI in the video frame, the technical problem of low accuracy of face alignment under the condition that shaking or jumping exists between continuous frames of the face can be solved, and meanwhile, the realization mode of the face alignment is simplified on the basis of ensuring the face alignment of the video frame. The method has the advantages that a large amount of time-consuming preliminary preparation work such as existing manual labeling and model training is avoided, preliminary cost investment is high, meanwhile, shaking removal and smoothing processing are achieved in a self-adaptive mode, the generalization capability of face alignment operation is greatly improved, compared with an existing face key point alignment method for detecting stability between continuous video frames through tracking, the processing speed of video data can be greatly improved, time consumption is effectively reduced, and the like, so that the real-time requirements of various application systems based on face alignment are met.

The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a schematic flowchart illustrating a face alignment method based on face recognition according to an embodiment of the present application;

fig. 2 is a schematic flow chart illustrating another face alignment method based on face recognition according to an embodiment of the present application;

fig. 3 shows a schematic structural diagram of a face alignment apparatus according to an embodiment of the present application.

Detailed Description

The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

The method aims at solving the technical problems that the existing method for detecting the MTCNN based on the face key points of the neural network model has higher hardware requirements such as annotation cost and GPU and higher training cost, and the method for realizing the alignment of the face key points by tracking and detecting the stability between continuous video frames has lower processing speed and poorer real-time processing effect of the system. The embodiment provides a face alignment method based on face recognition, which can effectively improve the accuracy of face alignment and simplify the implementation method of face alignment under the condition that jitter or jump exists between consecutive frames of a face, as shown in fig. 1, the method includes:

101. according to a first face ROI area in an acquired video frame and preset face standard key points, determining an initial transformation array corresponding to the first face ROI area and used for achieving face key point alignment, wherein the initial transformation array comprises initial scale transformation data and initial translation transformation data.

In this embodiment, a remote recording controller is used to obtain a video, frame-dividing processing is performed on the obtained video to obtain continuous video frames, preset face standard key points are used to perform face key point alignment preprocessing on face key points in a first face ROI area in the continuous video frames, so as to obtain an initial transform array for achieving face key point alignment, where the face key point alignment preprocessing refers to comparing the face key points in the first face ROI area with the preset face standard key points, and calculating an initial transform array for achieving face key point alignment, but only used for determining the initial transform array, and not performing face key point alignment operation, so that the initial transform array is subjected to debounce and smoothing processing subsequently according to a video frame sequence, so as to improve accuracy of face alignment.

According to the demand of practical application scene, can gather the video through camera equipment to guarantee video acquisition's stability through the tripod, and promote the regional discernment of people's face ROI through the mode of light filling of light ring, input the video stream that camera equipment gathered and record the controller long-rangely, when making long-rangely record the controller and carry out the shake according to the video stream that acquires and handle, promote the accuracy that the people's face aligns, do not specifically prescribe to video acquisition here.

102. And carrying out debounce and smoothing processing on the initial transformation array to obtain a target transformation array.

In this embodiment, the initial transformation array is subjected to debounce processing, specifically, a first debounce array is obtained by performing first dithering processing on the initial transformation array, where the first dithering processing is used to eliminate abnormal points in a face key point detection result in a face ROI region, a second debounce array is obtained by performing second dithering processing on the first debounce array, and the second dithering processing is used to process large amplitude dithering in the face ROI region into small amplitude dithering; and smoothing the initial transformation array after the de-jittering processing, specifically, smoothing small-amplitude jitter existing in the second de-jittering array so that the obtained target transformation array can be closer to real transformation, and the purpose of anti-jittering processing in the face alignment process is achieved.

103. Carrying out face key point alignment processing on the extracted second face ROI area in the video frame by using the target transformation array to obtain a target face ROI area after face key points are aligned; the second face ROI region is different from the first face ROI region, and further includes a background region compared to the first face ROI region.

In this embodiment, face ROI region extraction is performed on the acquired video frame again, so that the target face ROI region is obtained by performing alignment processing on the face key points on the second face ROI region extracted again by using the target transform number group. Different from the first face ROI detected instep 101, the second face ROI re-extracted includes a background region other than the face, and may further include hair, hair accessories, a hat, and the like, and the face keypoint alignment processing is performed based on the second face ROI re-extracted, which can effectively improve the accuracy of the face keypoint alignment compared to the face keypoint alignment processing performed on the first face ROI only including the face.

According to the requirements of an actual application scene, a target human face ROI (region of interest) region or a video frame after human face key point alignment processing can be processed and applied, for example, the target human face ROI region can be input into a human face recognition network model obtained through training, a human face recognition result in the video frame is obtained through feature extraction in the target human face ROI region, generation of a virtual anchor can be achieved through feature extraction in the target human face ROI region, and the application of the target human face ROI region after human face alignment is not specifically limited.

According to the scheme, an initial transformation array corresponding to a first face ROI (region of interest) region and used for achieving face key point alignment is determined according to the first face ROI region in the acquired video frame and preset face standard key points, wherein the initial transformation array comprises initial scale transformation data and initial translation transformation data; carrying out debounce and smoothing processing on the initial transformation array to obtain a target transformation array; the extracted second face ROI area in the video frame is aligned by the aid of the target transformation array to obtain a target face ROI area after face key points are aligned, and compared with an existing method for detecting MTCNN (multiple-transmission neural network) by face key points based on a neural network model and a method for achieving face key point alignment by tracking and detecting stability between continuous video frames, the method can achieve face alignment of the face ROI area after self-adaptive de-jittering and smoothing processing according to an initial transformation array determined by the first face ROI area in the video frame, so that the technical problem that accuracy of face alignment is low under the condition that jitter or jump exists between continuous frames of faces is solved, and meanwhile, on the basis of guaranteeing face alignment of the video frame, a face alignment implementation mode is simplified.

Further, as a refinement and an extension of the specific implementation of the above embodiment, in order to fully describe the specific implementation process of the embodiment, another face alignment method based on face recognition is provided, as shown in fig. 2, the method includes:

201. and performing frame division processing on the acquired video to obtain continuous video frames.

202. Each video frame is processed as follows: performing face detection on a current image corresponding to the video frame by using a first preset depth model to obtain a first face ROI (region of interest); carrying out key point detection on the first face ROI by utilizing a second preset depth model to obtain target key points; and determining initial scale transformation data and initial translation transformation data for realizing the alignment of the key points of the human face by using the target key points and preset standard key points of the human face.

In this embodiment, a remote recording controller is used to acquire a video, frame division processing is performed on the acquired video to obtain consecutive video frames, and the following processing is performed for each of the consecutive video frames: specifically, a first preset depth model is used for detecting a face region in a video frame to obtain a first processing result, namely a first face ROI (region of interest) to be subjected to face key point alignment; detecting key points of the human face in the ROI area in the first processing result by using a second preset depth model to obtain coordinates of 68 key points; and obtaining scale transformation data S and translation transformation data T used for aligning the key points of the face from the key points to the standard key points in the 3D standard face template for the coordinates of the 68 key points in the second processing result and the 3D standard face template, and storing the scale transformation data S and the translation transformation data T of a plurality of continuous video frames into an initial transformation array set _ S and set _ T.

According to the requirements of an actual application scene, a first depth residual error network ResNet in a computer vision library Dlib can be respectively utilized to detect face regions, and one or more detected face regions are recorded to obtain a first face ROI region; detecting key points of the human face in a ROI (region of interest) by using a second depth residual error network ResNet in a computer vision library Dlib, and recording coordinates (x, y) of 68 key points; according to the coordinates (x, y) of the 68 key points and the 3D standard face template, 1 piece of scale transformation data S and 2 pieces of translation transformation data T used for aligning the key points of the face from the key points to the standard key points in the 3D standard face template are calculated and stored in the transformation arrays set _ S and set _ T.

The applied first depth residual error network ResNet and the second depth residual error network ResNet are different, namely the first depth residual error network ResNet is obtained by training a training sample based on a human face ROI area, and the second depth residual error network ResNet is obtained by training a training sample based on a human face key point; the method for acquiring the 3D standard face template specifically includes that according to a plurality of 3D faces, an average value of all dimensional features in the plurality of 3D faces is calculated to obtain a 3D standard face template for face key point comparison, and according to requirements of practical application scenes, dimensional features of the 3D faces are not specifically limited.

To illustrate a specific implementation manner ofstep 202, as a preferred embodiment, the determining, by using the target key point and a preset face standard key point, initial scale transformation data and initial translation transformation data for implementing face key point alignment specifically includes:

step 2021, calculating a corresponding relationship between the target key point and the preset face standard key point by using a least square method.

Step 2022, obtaining initial scale transformation data and initial translation transformation data from the target key point to the preset face standard key point according to the corresponding relation.

In this embodiment, the initial transformation array is a two-dimensional sequence, one dimension is used for representing the number of video frames, and the two dimensions are used for representing the scale transformation data and the translation transformation data of each video frame for aligning the key points of the human face, so that according to the sequence of the video frames, the obtained initial scale transformation data and the obtained initial translation transformation data from the target key points to the preset standard key points of the human face are subjected to debounce and smoothing respectively, and the target transformation array which makes the scale transformation data tend to be stable is obtained.

203. And carrying out first-time jitter removal processing on the initial transformation array to obtain a first jitter removal array.

204. And carrying out secondary jitter removal treatment on the first jitter removal array to obtain a second jitter removal array.

205. And carrying out smooth filtering processing on the second de-jittering array to obtain a target transformation array.

To illustrate the specific implementation of

steps

203 and 204, as a preferred embodiment, the first de-dithering process is a clipping filtering process, and the second de-dithering process is a recursive average filtering process.

In the actual application scenario, the prior art focuses more on the way of how to improve the accuracy of face key point detection to solve the problem of shaking or jumping between consecutive frames of a face, and this embodiment achieves the anti-shaking of face alignment in a video by performing the de-shaking and smooth correction operations on the initial transform array for face alignment, and improves the accuracy of face key point detection. On the other hand, although the method for detecting the key points of the face in the prior art is accurate and has small errors, the finally obtained face alignment result still has large errors after the plurality of key points of the face in the face alignment processing and the transformation data processing of the plurality of video frames, that is, the superposition operation of a plurality of small error results. Therefore, on the basis that the face key point detection is accurate, the method and the device perform de-jittering and smoothing processing on the initial transformation array needing face alignment, thereby greatly reducing the jittering property of the face alignment in the video and enabling the effect after the face alignment to reach a more ideal state.

Because the face itself in the video is continuous action, theoretically, the face key point detection and transformation data should also be continuously and smoothly distributed, but the distribution is easily uneven due to the interference of various factors such as face background, illumination and the like in the actual application scene. Therefore, based on this, in this embodiment, clipping filtering is performed on the initial transform arrays set _ S and set _ T, and recursive average filtering is performed on the transform arrays set _ S and set _ T after the clipping filtering processing, so as to obtain the debounce processing results of the initial transform arrays set _ S and set _ T. The method comprises the steps of carrying out filtering processing on an initial transformation array twice continuously to achieve debounce operation of the initial transformation array, specifically, based on amplitude limiting filtering operation, eliminating abnormal points in a key point detection result, based on recursive average filtering operation, processing large-amplitude jitter in continuous video frames (human face ROI areas) into small-amplitude jitter, so that initial transformation data processed into the small-amplitude jitter can be subjected to smoothing processing in the following process, real-approaching transformation processing is achieved, and the purpose of anti-jitter processing in the human face alignment process is achieved.

The method comprises the steps that an initial transformation array set _ S and set _ T are subjected to amplitude limiting filtering treatment respectively, specifically, the maximum deviation value (set as A) allowed by two times of sampling is preset, a detected current sampling value is judged, and if the difference value between a current value and a previous value is smaller than or equal to the maximum deviation value A, the current value is determined to be valid; if the difference value between the current value and the previous value is larger than A, the current value is determined to be invalid, the current value is abandoned, and the previous value replaces the current value, so that pulse interference caused by accidental factors is overcome, and the effect of eliminating abnormal points in the key point detection result is achieved.

Further, performing recursive average filtering on the initial transformation arrays set _ S and set _ T after the amplitude limiting filtering processing, specifically, taking continuously obtained N sampling values as a queue, fixing the length of the queue to N, putting a group of transformation data (scale transformation data and translation transformation data of the next video frame) obtained each time into the tail of the queue based on a first-in first-out principle, and performing arithmetic average operation on the N sampling values in the queue to obtain a new filtering result. Therefore, the recursive average filtering processing is performed on the initial transformation arrays set _ S and set _ T after the amplitude limiting filtering processing, so that the periodic interference can be well inhibited, the smoothness is high, the defect that the processing of the periodic interference in the amplitude limiting filtering processing is poor is overcome, and the two are complementary to each other, so that the optimal effect of removing the jitter of the video frame is achieved.

To illustrate the specific implementation ofstep 205, as a preferred embodiment, step 205 may specifically include: and performing smooth filtering processing on the second de-jittering array by using the self-adaptively set window size to obtain a target transformation array, wherein the self-adaptively set window size is determined according to the length of the video.

In this embodiment, the smooth filtering process is performed on the debounce operation result to obtain the smoothed transform arrays set _ S and set _ T, specifically, the debounce operation result (the second debounce array) is filled to obtain a one-dimensional array, and after the pixel value of the window center position is updated in the window with the adaptive size, the space area of the filling process is removed to obtain the target transform array. The size of the window is determined according to the acquired related information such as the video length and the like, so that the purpose of self-adaption of the size of the window is achieved.

Setting an adaptive data value according to the requirements of an actual application scene, calculating the current window size according to the acquired video length, for example, Win _ size ═ len (input video)/3 (adaptive data value), and performing padding processing on the initial transformation array set _ S and set _ T after the debounce processing according to the adaptively set window size to obtain a one-dimensional sequence set _ S _ padding and set _ T _ padding; respectively moving the set _ S _ padding and the set _ T _ padding of the one-dimensional sequence according to the size of the window from the beginning, and calculating the average value of all pixel values in each window so as to update the pixel value of the window center position in the one-dimensional sequence; and clearing the padding part to obtain the de-jittered and smoothed target transformation arrays set _ S and set _ T, namely the target transformation array of which the scale transformation data S tends to be stable.

206. And carrying out face key point scaling processing on the extracted second face ROI area in the video frame by using the target scale transformation data in the target transformation array to obtain a scaled third face ROI area.

207. And performing face key point translation processing on the third face ROI by using the target translation transformation data in the target transformation array to obtain a translated target face ROI.

In this embodiment, the second face ROI area in the video frame is re-extracted, and the target transform arrays set _ S and set _ T after the de-jittering and smoothing processes are used to sequentially perform scale transform and translation transform on the second face ROI area, so as to obtain the target face ROI area after face alignment. According to the requirement of an actual application scene, the scale transformation can also comprise rotation processing, and the scale transformation is not specifically limited; the specific method for re-extracting the second face ROI in the video frame includes extracting the second face ROI centered on the center point by obtaining the center point of the face key point, or performing certain region extension on the first face ROI detected instep 202 to obtain the second face ROI used for face alignment, where the extraction method of the second face ROI is not specifically limited.

It should be noted that the video frame and the ROI region of the target face in this embodiment may be stored in a blockchain, where the blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

It can be seen that, in the face alignment method based on face recognition in the artificial intelligence-based video scene in this embodiment, initial transformation data from a face key point to a standard face key point is calculated according to a detection result of the face key point in each frame of image, the face alignment operation on each frame of image is realized by performing debounce and smoothing processing on the initial transformation data, and on the basis of the face alignment of each frame of image, continuity between video frames is ensured, and a situation of face shake or jump in consecutive video frames is avoided.

By applying the technical scheme of the embodiment, according to a first face ROI (region of interest) in an acquired video frame and preset face standard key points, an initial transformation array corresponding to the first face ROI and used for realizing face key point alignment is determined, wherein the initial transformation array comprises initial scale transformation data and initial translation transformation data; carrying out debounce and smoothing processing on the initial transformation array to obtain a target transformation array; compared with the existing method for detecting MTCNN (multiple-input multi-output network) by using the face key points of the extracted second face ROI (region of interest) in the video frame by using the target transformation array to align the face key points, and the method for realizing the face key point alignment by tracking and detecting the stability between continuous video frames, the method for realizing the face key point alignment in the extracted second face ROI in the video frame has the advantages that the face alignment of the face ROI is realized by using the initial transformation array determined by using the first face ROI in the video frame after self-adaptive de-shaking and smoothing processing, the realization mode of the face alignment can be simplified on the basis of ensuring the accuracy of the face alignment in the video, so that the consumption of hardware resources such as memory, display memory and the like is reduced, namely, the large amount of time-consuming early-stage preparation work such as the existing manual labeling, model training and the like is avoided, and the early-stage cost investment is high, and while hardware requirements such as GPU are higher, the adaptive realization is shaken off, smooth processing, has greatly improved the generalization ability of the alignment operation of the human face, and compare with the human face key point alignment method of stability through tracking detection between the consecutive video frames now, can promote the processing speed of the video data greatly, reduce time consumption effectively, etc., in order to meet the real-time requirement of various application systems based on human face alignment.

Further, as a specific implementation of the method in fig. 1, an embodiment of the present application provides a face alignment apparatus, as shown in fig. 3, the apparatus includes: an initialtransform array module 31, a targettransform array module 32, and analignment module 33.

The initialtransformation array module 31 may be configured to determine, according to a first face ROI region in the acquired video frame and preset face standard key points, an initial transformation array corresponding to the first face ROI region and used for achieving face key point alignment, where the initial transformation array includes initial scale transformation data and initial translation transformation data.

The targettransform array module 32 may be configured to perform de-jittering and smoothing processing on the initial transform array to obtain a target transform array.

Thealignment module 33 may be configured to perform face keypoint alignment processing on the extracted second face ROI area in the video frame by using the target transform array, so as to obtain a target face ROI area after face keypoint alignment; the second face ROI region is different from the first face ROI region, and further includes a background region compared to the first face ROI region.

In a specific application scenario, the initialtransformation array module 31 includes: a framing unit 311 and a transformation unit 312.

The framing unit 311 may be configured to perform framing processing on the acquired video to obtain continuous video frames.

A transform unit 312, which may be configured to perform the following processing for each video frame: performing face detection on a current image corresponding to the video frame by using a first preset depth model to obtain a first face ROI (region of interest); carrying out key point detection on the first face ROI by utilizing a second preset depth model to obtain target key points; and determining initial scale transformation data and initial translation transformation data for realizing the alignment of the key points of the human face by using the target key points and preset standard key points of the human face.

In a specific application scenario, the transformation unit 312 includes: calculating the corresponding relation between the target key point and the preset human face standard key point by using a least square method; and obtaining initial scale transformation data and initial translation transformation data from the target key point to the preset human face standard key point according to the corresponding relation.

In a specific application scenario, the targettransformation array module 32 includes: a first debounce unit 321, a second debounce unit 322, and a smooth filter unit 323.

The first debounce unit 321 may be configured to perform a first debounce process on the initial transform array to obtain a first debounce array.

The second debounce unit 322 may be configured to perform a second debounce process on the first debounce array to obtain a second debounce array.

The smoothing filter unit 323 may be configured to perform smoothing filtering on the second debounce array to obtain a target transform array.

In a specific application scenario, the first dithering removal processing is amplitude limiting filtering processing, and the second dithering removal processing is recursive average filtering processing.

In a specific application scenario, the smoothing filtering unit 323 includes: and performing smooth filtering processing on the second de-jittering array by using the self-adaptively set window size to obtain a target transformation array, wherein the self-adaptively set window size is determined according to the length of the video.

In a specific application scenario, thealignment module 33 includes: a scale transformation unit 331 and a translation transformation unit 332.

The scale conversion unit 331 may be configured to perform face keypoint scaling on the extracted second face ROI in the video frame by using the target scale conversion data in the target conversion array, so as to obtain a scaled third face ROI.

The translation transformation unit 332 may be configured to perform face keypoint translation processing on the third face ROI area by using the target translation transformation data in the target transformation array, so as to obtain a target face ROI area after the translation processing.

It should be noted that other corresponding descriptions of the functional units related to the face alignment apparatus provided in the embodiment of the present application may refer to the corresponding descriptions in fig. 1 and fig. 2, and are not described herein again.

Based on the methods shown in fig. 1 and fig. 2, correspondingly, an embodiment of the present application further provides a storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the face alignment method based on face recognition shown in fig. 1 and fig. 2.

Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the implementation scenarios of the present application.

Based on the foregoing methods shown in fig. 1 and fig. 2 and the virtual device embodiment shown in fig. 3, to achieve the foregoing object, an embodiment of the present application further provides a computer device, which may specifically be a personal computer, a server, a network device, and the like, where the entity device includes a storage medium and a processor; a storage medium for storing a computer program; a processor for executing a computer program to implement the above-mentioned face alignment method based on face recognition as shown in fig. 1 and 2.

Optionally, the computer device may further include a user interface, a network interface, a camera, Radio Frequency (RF) circuitry, a sensor, audio circuitry, a WI-FI module, and so forth. The user interface may include a Display screen (Display), an input unit such as a keypad (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, etc. The network interface may optionally include a standard wired interface, a wireless interface (e.g., a bluetooth interface, WI-FI interface), etc.

It will be understood by those skilled in the art that the present embodiment provides a computer device structure that is not limited to the physical device, and may include more or less components, or some components in combination, or a different arrangement of components.

The storage medium may further include an operating system and a network communication module. An operating system is a program that manages the hardware and software resources of a computer device, supporting the operation of information handling programs, as well as other software and/or programs. The network communication module is used for realizing communication among components in the storage medium and other hardware and software in the entity device.

Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus a necessary general hardware platform, and can also be implemented by hardware. Compared with the existing MTCNN (human face key point detection) method based on a neural network model and the existing method for realizing human face key point alignment by tracking and detecting the stability between continuous video frames, the embodiment realizes the human face alignment of the human face ROI region according to the initial transformation array determined by the first human face ROI region in the video frame after self-adaptive debounce and smooth processing, can simplify the realization mode of the human face alignment on the basis of ensuring the accuracy of the human face alignment in the video, thereby reducing the consumption of hardware resources such as memory, video memory and the like, namely, when avoiding the large amount of time-consuming preliminary preparation work such as the existing manual labeling and model training and the like, leading to higher preliminary cost investment, realizing the debounce and smooth processing in a self-adaptive manner, greatly improving the generalization capability of human face alignment operation, and compared with the existing human face key point alignment method for detecting the stability between continuous video frames by tracking and detecting, the processing speed of the video data can be greatly improved, the time consumption is effectively reduced, and the like, so that the real-time requirements of various application systems based on face alignment are met.

Those skilled in the art will appreciate that the figures are merely schematic representations of one preferred implementation scenario and that the blocks or flow diagrams in the figures are not necessarily required to practice the present application. Those skilled in the art will appreciate that the modules in the devices in the implementation scenario may be distributed in the devices in the implementation scenario according to the description of the implementation scenario, or may be located in one or more devices different from the present implementation scenario with corresponding changes. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.

The above application serial numbers are for description purposes only and do not represent the superiority or inferiority of the implementation scenarios. The above disclosure is only a few specific implementation scenarios of the present application, but the present application is not limited thereto, and any variations that can be made by those skilled in the art are intended to fall within the scope of the present application.

Claims

1. A face alignment method based on face recognition is characterized by comprising the following steps:

the second face ROI region is different from the first face ROI region, and the second face ROI region further includes at least a background region compared to the first face ROI region.

2. The method according to claim 1, wherein the determining, according to a first face ROI region in the acquired video frame and preset face standard key points, an initial transformation array corresponding to the first face ROI region for achieving face key point alignment, where the initial transformation array includes initial scale transformation data and initial translation transformation data, includes:

performing framing processing on the acquired video to obtain continuous video frames;

each video frame is processed as follows:

performing face detection on a current image corresponding to the video frame by using a first preset depth model to obtain a first face ROI (region of interest);

carrying out key point detection on the first face ROI by utilizing a second preset depth model to obtain target key points;

and determining initial scale transformation data and initial translation transformation data for realizing the alignment of the key points of the human face by using the target key points and preset standard key points of the human face.

3. The method according to claim 2, wherein the determining initial scale transformation data and initial translation transformation data for achieving face key point alignment by using the target key point and preset face standard key points comprises:

calculating the corresponding relation between the target key point and the preset human face standard key point by using a least square method;

and obtaining initial scale transformation data and initial translation transformation data from the target key point to the preset human face standard key point according to the corresponding relation.

4. The method of claim 1, wherein the de-jittering and smoothing the initial transform array to obtain a target transform array comprises:

carrying out first-time jitter removal processing on the initial transformation array to obtain a first jitter removal array;

carrying out secondary jitter removal processing on the first jitter removal array to obtain a second jitter removal array;

and carrying out smooth filtering processing on the second de-jittering array to obtain a target transformation array.

5. The method according to claim 4, wherein the first de-jittering process is a clipping filtering process, and the second de-jittering process is a recursive average filtering process.

6. The method according to claim 4, wherein the performing a smoothing filtering process on the second de-jittering array to obtain a target transformation array comprises:

and performing smooth filtering processing on the second de-jittering array by using the self-adaptively set window size to obtain a target transformation array, wherein the self-adaptively set window size is determined according to the length of the video.

7. The method according to claim 1, wherein the performing, by using the target transform array, a face key point alignment process on the extracted face ROI in the video frame to obtain a target face ROI with aligned face key points comprises:

carrying out face key point scaling processing on a second face ROI (region of interest) in the extracted video frame by using target scale conversion data in the target conversion array to obtain a scaled third face ROI;

and performing face key point translation processing on the third face ROI by using the target translation transformation data in the target transformation array to obtain a translated target face ROI.

8. A face alignment device, comprising:

9. A storage medium having stored thereon a computer program, wherein the program, when executed by a processor, implements the face alignment method based on face recognition according to any one of claims 1 to 7.

10. A computer device comprising a storage medium, a processor and a computer program stored on the storage medium and executable on the processor, wherein the processor implements the face alignment method based on face recognition according to any one of claims 1 to 7 when executing the program.