Background
The facial actions are the main way for human to transmit information in daily life and take the important responsibility of emotion transmission, so that the recognition, analysis and synthesis of the facial actions are indispensable key technologies for realizing human-computer interaction. With the continuous development of deep learning technology, simulation to generate real expression motion is one of important research directions in the computer field, and facial expression migration is a popular technology. Facial expression migration has been widely used in privacy security protection, bionic agent, special effect synthesis, data expansion, and the like. However, the existing expression migration method has an unsatisfactory migration effect, and cannot achieve a satisfactory facial style migration effect on some fine features such as facial texture features.
Disclosure of Invention
The invention aims to provide a method and a system for transferring facial style based on image to image, which aim to solve the technical problem.
In order to achieve the purpose, the invention adopts the following technical scheme:
a face style transfer method based on image to image is provided, which comprises the following steps:
1) establishing a mapping relation between expression key points on a source face and a target face;
2) tracking and detecting expression key points of the source face in real time;
3) generating expression key points of the target face by using a generative confrontation network GAN according to the detected expression key points on the source face and based on the mapping relation of the expression key points between the source face and the target face;
4) fitting a three-dimensional face model of the target face based on the generated expression key points of the target face and according to the texture features of the target face;
5) and performing face fusion on the source face and the three-dimensional face model to finally generate the target face with the migrated expression.
As a preferred aspect of the present invention, expression key points of a face are identified and detected based on HOG features of the face.
As a preferable scheme of the invention, the real-time tracking detection of the expression key points of the source face is realized through camera shooting tracking equipment.
As a preferred aspect of the present invention, the generated expression key points on the target face include 68 key points, where 17 key points are used to characterize the edge of the face, 5 key points are used to characterize the left eyebrow, 5 key points are used to characterize the right eyebrow, 9 key points are used to characterize the nose, 6 key points are used to characterize the left eye, 6 key points are used to characterize the right eye, and 20 key points are used to characterize the mouth.
As a preferred aspect of the present invention, a method for training a facial style transfer model by a generative confrontation network GAN comprises:
manually calibrating expression key points on the source face and the target face to obtain at least 100 expression key point data sets in pairs;
uniformly transforming the image sizes of the paired expression key point images into 512 x 256 sample images;
taking 75% of the sample images as a training set of the face style transfer model, and taking the rest 25% as a testing set of the face style transfer model;
setting model training parameters of the generative confrontation network GAN;
inputting the sample images in the training set into the generative confrontation network GAN, and training to form a facial style transfer initial model;
taking the sample image in the test set as a test sample, carrying out model performance test on the face style transfer initial model, and adjusting model training parameters according to a model performance test result;
and taking the sample images in the training set as training samples, and carrying out updating training on the face style transfer initial model until the face style transfer model with the performance meeting the requirements is formed through training.
As a preferred embodiment of the present invention, a method for performing face fusion on the source face and the three-dimensional face model includes:
identifying and positioning the positions of eyes and mouth of the three-dimensional face model;
respectively creating black masks in the positioned eye and mouth regions to identify the region position needing face fusion;
recognizing an eye region and a mouth region corresponding to the black mask region on the three-dimensional face model on the source face, and performing image cutting on the recognized eye region and mouth region to obtain an eye region image and a mouth region image;
and fusing the cut region image into the corresponding black mask region on the three-dimensional face model by a Poisson texture fusion method, and finally forming the target face with the transferred expression.
The invention also provides a face style transfer system based on image to image, comprising:
the expression key point mapping relation establishing module is used for establishing a mapping relation between expression key points on a source face and a target face;
the expression key point tracking detection module is used for tracking and detecting the expression key points on the source face in real time;
the target face expression key point generation module is connected with the expression key point tracking detection module and used for generating expression key points of the target face by using a generative confrontation network GAN based on the detected expression key points on the source face;
the three-dimensional face model building module is connected with the target face expression key point generating module and used for fitting a three-dimensional face model of the target face based on the generated expression key points of the target face and according to the texture characteristics of the target face;
and the face fusion module is connected with the three-dimensional face model construction module and is used for carrying out face fusion on the source face and the three-dimensional face model and finally generating the target face with the transferred expression.
As a preferable aspect of the present invention, the face style transfer system further includes:
the model training module is connected with the three-dimensional face model building module and used for training and forming a face style transfer model through a generative confrontation network GAN, and the model training module specifically comprises:
the expression key point calibration unit is used for providing technicians with artificial calibration on expression key points on the source face and the target face to obtain an expression key point data set with at least 100 pairs;
the expression key point image size conversion unit is connected with the expression key point calibration unit and is used for uniformly converting the image sizes of the paired expression key point images into 512 multiplied by 256 sample images;
the sample set forming unit is connected with the expression key point image size transformation unit and is used for automatically classifying 75% of the sample images into a training set for model training and classifying the rest 25% of the sample images into a test set for testing the performance of the model;
the model training parameter setting unit is used for providing a technician with model training parameters for setting the generative confrontation network GAN;
the model training unit is respectively connected with the sample set forming unit and the model training parameter setting unit and is used for inputting the sample images in the training set into the generative confrontation network GAN and training according to the set model training parameters to form a facial style transfer initial model;
the model performance testing unit is respectively connected with the sample set forming unit and the model training unit and is used for inputting the sample images in the testing set to the surface part style transfer initial model so as to carry out model performance testing and adjusting model training parameters according to a model performance testing result;
and the model training unit is used for carrying out model updating training on the face style transfer initial model by taking the sample image as a training sample according to the adjusted model training parameters until the training forms the face style transfer model with expected performance.
The method comprises the steps of firstly generating expression key points of a target face through a generation type confrontation network GAN based on expression key points on a source face, then constructing a three-dimensional face model of the target face according to the expression key points of the target face and facial texture characteristics of the target face, and finally fusing facial style region images on the source face into the three-dimensional face model to form the target face after style migration, so that the fidelity of the target face subjected to the face style migration is greatly improved.
Detailed Description
The technical scheme of the invention is further explained by the specific implementation mode in combination with the attached drawings.
Wherein the showings are for the purpose of illustration only and are shown by way of illustration only and not in actual form, and are not to be construed as limiting the present patent; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if the terms "upper", "lower", "left", "right", "inner", "outer", etc. are used for indicating the orientation or positional relationship based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not indicated or implied that the referred device or element must have a specific orientation, be constructed in a specific orientation and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes and are not to be construed as limitations of the present patent, and the specific meanings of the terms may be understood by those skilled in the art according to specific situations.
In the description of the present invention, unless otherwise explicitly specified or limited, the term "connected" or the like, if appearing to indicate a connection relationship between the components, is to be understood broadly, for example, as being fixed or detachable or integral; can be mechanically or electrically connected; they may be directly connected or indirectly connected through intervening media, or may be connected through one or more other components or may be in an interactive relationship with one another. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
An embodiment of the present invention provides a method for transferring facial style from an image to an image, as shown in fig. 1, including the following steps:
1) establishing a mapping relation between a source face and expression key points on a target face; in this embodiment, the number of expression key points on the face is 68, where 17 key points are used to characterize the edge of the face, 5 key points are used to characterize the left eyebrow, 5 key points are used to characterize the right eyebrow, 9 key points are used to characterize the nose, 6 key points are used to characterize the left eye, 6 key points are used to characterize the right eye, and 20 key points are used to characterize the mouth.
2) Tracking and detecting expression key points on the source face in real time;
3) generating expression key points of the target face by using a generative confrontation network GAN according to the detected expression key points on the source face and based on the mapping relation of the expression key points between the source face and the target face;
4) fitting a three-dimensional face model of the target face based on the generated expression key points of the target face and according to the texture features of the target face;
5) and performing face fusion on the source face and the three-dimensional face model to finally generate a target face with the migrated expression.
The invention preferably performs recognition detection on expression key points of the face based on the HOG features of the face. The specific method process of identifying facial expression key points based on facial HOG features is not the scope of the claimed invention, so the specific identification detection process of expression key points is not set forth herein.
The invention preferably realizes real-time tracking detection of the source face expression key points through two-dimensional RGB camera shooting tracking equipment.
The generative confrontation network GAN is substantially a facial style transfer model, and the method for training the facial style transfer model of the present invention is shown in fig. 2, and comprises the following steps:
and step L1, manually calibrating the expression key points on the source face and the target face to obtain at least 100 paired expression key point data sets. As used herein, "pair" refers to an image combination of an expressive keypoint image targeted on the source face and an expressive keypoint image targeted on the target face.
Step L2, converting the image sizes of the paired expression key point images into a text image of 512 × 256 pixels;
step L3, taking 75% of the sample images as a training set of the face style transfer model, and taking the rest 25% as a test set of the face style transfer model;
step L4, setting model training parameters of the generative confrontation network GAN;
step L5, inputting the sample images in the training set into a generative confrontation network GAN, training and forming a facial style transfer initial model;
l6, taking the sample image in the test set as a test sample, carrying out model performance test on the facial style transfer initial model, and adjusting model training parameters according to the model performance test result;
and step L7, updating and training the face style transfer initial model by taking the sample images in the training set as training samples until the training forms a face style transfer model with the performance meeting the requirements.
There are many existing methods for constructing a three-dimensional face model of a target face based on the generated expression key points of the target face and the facial texture features of the target face, so the construction process of the three-dimensional face model is not described here.
Experiments show that the postures of eyes and mouth on the target face estimated according to the expression key points on the generated target face cannot accurately represent the expression posture of the source face, so that the three-dimensional face model formed by fitting based on the expression key points of the target face does not comprise eyes and mouth parts. In order to ensure the effect of the facial style transfer, the invention forms the final target face by the fusion of a face fusion method. Specifically, as shown in fig. 3, the method for face fusion of a source face and a three-dimensional face model of the present invention includes:
step A1, identifying and positioning the positions of eyes and mouth of the three-dimensional face model;
step A2, respectively creating black masks in the positioned eye and mouth regions to identify the region position needing face fusion;
step A3, recognizing eye regions and mouth regions corresponding to black mask regions on a three-dimensional face model on a source face, and performing image cutting on the recognized eye regions and mouth regions to obtain eye region images and mouth region images;
and step A4, fusing the cut region image into a corresponding black mask region on the three-dimensional face model by a Poisson texture fusion method, and finally forming the target face with the transferred expression.
The present invention also provides a system for transferring facial style from image to image, as shown in fig. 4, the system comprising:
the expression key point mapping relation establishing module is used for establishing a mapping relation between expression key points on a source face and a target face;
the expression key point tracking detection module is used for tracking and detecting expression key points on the source face in real time;
the target face expression key point generation module is connected with the expression key point tracking detection module and used for generating expression key points of a target face by using the generative confrontation network GAN based on the detected expression key points on the source face;
the three-dimensional face model building module is connected with the target face expression key point generating module and used for fitting a three-dimensional face model of the target face based on the generated expression key points of the target face and according to the texture characteristics of the target face;
and the face fusion module is connected with the three-dimensional face model construction module and is used for carrying out face fusion on the source face and the three-dimensional face model to finally form a target face after the expression migration.
In the present invention, the generated confrontation network GAN is substantially a facial style transfer model capable of implementing facial style transfer, and the facial style transfer model is formed for training, as shown in fig. 4, the facial style transfer system further includes:
a model training module, configured to form a facial style transfer model through generative confrontation network GAN training, specifically, as shown in fig. 5, the model training module specifically includes:
the expression key point calibration unit is used for providing technicians with artificial calibration on expression key points on a source face and a target face to obtain an expression key point data set with at least 100 pairs;
the expression key point image size conversion unit is connected with the expression key point calibration unit and is used for uniformly converting the image sizes of the paired expression key point images into sample images with 512 multiplied by 256 pixels;
the sample set forming unit is connected with the expression key point image size transformation unit and is used for automatically classifying 75% of sample images into a training set for model training and automatically classifying the rest 25% of sample images into a test set for testing the performance of the model;
the model training parameter setting unit is used for providing a technician with model training parameters of the generative confrontation network GAN;
the model training unit is respectively connected with the sample set forming unit and the model training parameter setting unit and is used for inputting sample images in a training set into the generative confrontation network GAN and training according to the set model training parameters to form a facial style transfer initial model;
the model performance testing unit is respectively connected with the sample set forming unit and the model training unit and is used for inputting the sample images in the test set to the face style transfer initial model so as to carry out model performance testing and adjusting model training parameters according to the model performance testing result;
and the model training unit is used for carrying out model updating training on the face style transfer initial model by taking the sample image as a training sample according to the adjusted model training parameters until a face style transfer model with expected performance is formed by training.
It should be understood that the above-described embodiments are merely preferred embodiments of the invention and the technical principles applied thereto. It will be understood by those skilled in the art that various modifications, equivalents, changes, and the like can be made to the present invention. However, such variations are within the scope of the invention as long as they do not depart from the spirit of the invention. In addition, certain terms used in the specification and claims of the present application are not limiting, but are used merely for convenience of description.