Disclosure of Invention
The embodiment of the invention provides a facial feature extraction method and device based on artificial intelligence, electronic equipment and a computer-readable storage medium, which can automatically generate a large number of facial image samples with labels by remolding a standard facial image and improve the efficiency of model training.
The technical scheme of the embodiment of the invention is realized as follows:
the embodiment of the invention provides a face feature extraction method based on artificial intelligence, which comprises the following steps:
according to registration point labeling of a standard face image, subdividing the standard face image to obtain a paneling result of the standard face image;
according to the facetted result of the standard facial image, performing remodeling processing on the spatial distribution of the registration point labels of the standard facial image to obtain a facial image sample, and
synchronizing registration point labels of the standard facial image according to the remodeling treatment to obtain registration point labels corresponding to the facial image samples;
training an image registration model based on the face image sample and the corresponding registration point label;
and performing feature extraction processing on the target face image based on the trained image registration model, and taking the extracted registration points as the face features of the target face image.
The embodiment of the invention provides a face feature extraction device based on artificial intelligence, which comprises:
the subdivision module is used for subdividing the standard face image according to registration point marks of the standard face image to obtain a faceted result of the standard face image;
a remodeling module for remodeling the spatial distribution of the registration point labels of the standard facial image according to the facecoverage result of the standard facial image to obtain a facial image sample, and
synchronizing registration point labels of the standard face image according to the remodeling treatment to obtain registration point labels corresponding to the face image samples;
the training module is used for training an image registration model based on the face image sample and the corresponding registration point label;
and the extraction module is used for carrying out feature extraction processing on the target face image based on the trained image registration model and taking the extracted registration points as the face features of the target face image.
In the above technical solution, the subdivision module is further configured to connect any three registration points in the registration point labels of the standard face image to obtain a triangular patch corresponding to the any three registration points in the standard face image;
combining the triangular patches to obtain a patch result of the standard face image;
wherein any two of the triangular patches do not intersect or intersect at a common edge.
In the above technical solution, the reshaping module is further configured to adjust the size and the position of a plurality of triangular patches in a patch result of the standard face image; each triangular patch corresponds to the spatial distribution of three registration point labels in the standard face image;
carrying out mapping processing on the adjusted triangular patches based on the textures of the triangular patches to obtain new triangular patches which correspond to the triangular patches and contain the textures;
and combining the unadjusted triangular patch in the standard face image with the new triangular patch to obtain a face image sample.
In the above technical solution, the reshaping module is further configured to perform, for any triangular patch of the plurality of triangular patches, the following processing:
determining coordinates of three registration points in the triangular patch;
and transforming the coordinates of at least one registration point in the three registration points so as to correspondingly transform the size and the position of the triangular patch.
In the above technical solution, the remodeling module is further configured to perform remodeling processing on the standard face image according to a facetted result of the standard face image to obtain a remodeled standard face image;
and integrally adjusting the reshaped standard facial image to obtain a facial image sample.
In the above technical solution, the remodeling module is further configured to, with an imaging region of the reshaped standard facial image as a reference region, perform at least one of the following adjustment processes on the reshaped standard facial image:
reducing or enlarging the remolded standard face image in equal proportion, and taking the part in the reference area after reduction or enlargement as a face image sample;
rotating the reshaped standard face image clockwise or counterclockwise by taking any position in the reshaped standard face image as an axis, and taking a part in the reference area after rotation as a face image sample;
performing translation in at least one direction on the remolded standard face image, and taking a part in the reference region after translation as a face image sample;
and taking at least part of the remolded standard face image as a noise adding area, adding color values of pixel points in the noise adding area and color values of noise to be added, and taking the standard face image obtained by adding the color values as a face image sample.
In the above technical solution, the training module is further configured to perform registration processing on the face image sample through the image registration model to obtain a predicted registration point of the face image sample;
constructing a first loss function of the image registration model according to the predicted registration point of the face image sample and the registration point mark of the face image sample;
constructing a second loss function of the image registration model according to the predicted registration point of the facial image sample, the type of the five sense organs to which the predicted registration point belongs and the registration point mark of the facial image sample;
constructing a third loss function of the image registration model according to the predicted registration point of the face image sample and the predicted registration point of the mirror image sample corresponding to the face image sample;
performing weighted summation on the first loss function, the second loss function and the third loss function to obtain an overall loss function of the image registration model;
and updating the parameters of the image registration model until the overall loss function is converged, and taking the updated parameters of the image registration model when the overall loss function is converged as the parameters of the trained image registration model.
In the above technical solution, the training module is further configured to determine, in the registration point labeling, a labeling coordinate of a registration point corresponding to the predicted registration point of the face image sample;
and determining the absolute value of the difference value of the coordinates of the predicted registration point of the face image sample and the annotated coordinates as a first loss function of the image registration model.
In the above technical solution, the training module is further configured to determine, in the registration point labeling, a labeling coordinate of a registration point corresponding to the predicted registration point of the face image sample, and determine the labeling coordinate of the registration point corresponding to the predicted registration point of the face image sample
Determining the absolute value of the difference value of the coordinates of the prediction registration points of the face image samples and the labeling coordinates;
and determining the ratio of the absolute value to the number of registration points contained in the type of the five sense organs to which the predicted registration points belong as a second loss function of the image registration model.
In the foregoing technical solution, the training module is further configured to weight a ratio of the absolute value to the number of registration points included in the type of the five sense organs to which the predicted registration point belongs based on a weight of the type of the five sense organs to which the predicted registration point belongs, and determine a weighting result as a second loss function of the image registration model.
In the above technical solution, the apparatus further includes:
the registration module is used for carrying out registration processing on a mirror image sample of the face image sample through the image registration model to obtain a prediction registration point of the mirror image sample;
the training module is further used for determining mirror image coordinates of a predicted registration point corresponding to the predicted registration point of the face image sample in the mirror image sample;
determining an absolute value of a difference of coordinates of a predicted registration point of the facial image sample and the mirror image coordinates as a third loss function of the image registration model.
In the above technical solution, the apparatus further includes:
a processing module for performing the following iterative processes on the face image samples:
adjusting the size and position of a plurality of triangular patches in the face image sample; each triangular patch corresponds to the spatial distribution of three registration point labels in the face image sample;
based on the textures of a plurality of triangular patches of the face image sample, carrying out mapping processing on the adjusted triangular patches to obtain new triangular patches which correspond to the triangular patches and contain the textures;
and combining the unadjusted triangular patch in the face image sample with the new triangular patch, and determining a combination result as a new face image sample.
An embodiment of the present invention provides an electronic device for facial feature extraction, where the electronic device includes:
a memory for storing executable instructions;
and the processor is used for realizing the artificial intelligence-based facial feature extraction method provided by the embodiment of the invention when the executable instructions stored in the memory are executed.
The embodiment of the invention provides a computer-readable storage medium, which stores executable instructions and is used for causing a processor to execute the executable instructions so as to realize the artificial intelligence-based facial feature extraction method provided by the embodiment of the invention.
The embodiment of the invention has the following beneficial effects:
the standard face image is reshaped through a facepiecing result of the standard face image to change the spatial distribution of registration point labels of the standard face image to obtain a face image sample, so that a large number of face image samples with labels can be generated automatically based on a small number of standard face images according to the standard face image, and the efficiency of model training is improved; the image registration model is trained through the accurate face image sample with the label, so that the trained image registration model can accurately extract the face features of the target face image, and the accuracy of extracting the face features is improved.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings, the described embodiments should not be construed as limiting the present invention, and all other embodiments that can be obtained by a person skilled in the art without making creative efforts fall within the protection scope of the present invention.
In the description that follows, references to the terms "first", "second", and the like, are intended only to distinguish similar objects and not to indicate a particular ordering for the objects, it being understood that "first", "second", and the like may be interchanged under certain circumstances or sequences of events to enable embodiments of the invention described herein to be practiced in other than the order illustrated or described herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.
Before further detailed description of the embodiments of the present invention, terms and expressions mentioned in the embodiments of the present invention are explained, and the terms and expressions mentioned in the embodiments of the present invention are applied to the following explanations.
1) Unsupervised learning: various problems in pattern recognition are solved from training samples whose classes are unknown (not labeled). The unsupervised learning algorithm mainly comprises a principal component analysis method, an equidistant mapping method, a local linear embedding method, a Laplace feature mapping method, a black plug local linear embedding method, a local tangent space arrangement method and the like.
2) Deep learning network: a new area in machine learning research has motivated the establishment and simulation of neural networks for analytical learning of the human brain, mimicking the mechanism of the human brain to interpret data such as images, sounds and text. Deep learning is one of unsupervised learning and comprises a multi-layer sensor with multiple hidden layers. Deep learning forms a more abstract class or feature of high-level representation properties by combining low-level features to discover a distributed feature representation of the data.
3) Face registration: the image registration model locates the feature points of the facial features in an image containing a human face. For example, a registration point of the mouth, which is a characteristic point of the mouth, is located in a face image, and the position of the mouth can be accurately located according to the registration point of the mouth.
4) Standard face image: the face-containing image with the registration points labeled for subsequent generation of a new face image sample from the standard face image. For example, the registration point of a certain face image is manually labeled, after the labeling is completed, the face image is used as a standard face image, and then a new face image sample is generated according to the registration point labeling of the standard face image.
The embodiment of the invention provides a facial feature extraction method and device based on artificial intelligence, electronic equipment and a computer-readable storage medium, which can automatically generate a large number of facial image samples with labels by remolding a standard facial image, improve the training efficiency of an image registration model and accurately extract facial features through the trained image registration model. The electronic device for facial feature extraction provided by the embodiment of the present invention may be a server, for example, a server deployed in the cloud, subdivides and reshapes a standard facial image according to registration point labels of the standard facial image to obtain a facial image sample, trains an image registration model based on the facial image sample and corresponding registration point labels, extracts facial features of a target facial image based on the trained image registration model, and performs processing such as makeup and beauty based on the facial features of the target facial image; the method includes the steps that a standard facial image is split and reshaped to obtain a facial image sample according to registration point labels of the standard facial image input by a user, an image registration model is trained based on the facial image sample and the corresponding registration point labels, facial features of a target facial image are extracted based on the trained image registration model, and subsequent treatments such as makeup and beauty are performed based on the facial features of the target facial image.
Referring to fig. 1, fig. 1 is a schematic view of an application scenario of a facialfeature extraction system 10 according to an embodiment of the present invention, a terminal 200 is connected to a server 100 through a network 300, and the network 300 may be a wide area network or a local area network, or a combination of the two.
The terminal 200 may be used to obtain the registration point labels of the standard facial image and the target facial image, for example, the user manually labels the registration points of the standard facial image through the terminal, and after the labeling is completed, the terminal automatically obtains the registration point labels of the standard facial image.
In some embodiments, the method for extracting facial features based on artificial intelligence provided by the embodiments of the present invention is integrated into a Software Development Kit (SDK) and the SDK is integrated into the terminal 200, so that the terminal 200 automatically generates facial image samples and corresponding registration point labels according to registration point labels of a standard facial image, trains an image registration model, extracts facial features of a target facial image based on the trained image registration model, and performs subsequent cosmetic, beauty, recognition and other processing based on the facial features of the target facial image, for example, in an Application scenario of facial cosmetic, a cosmetic Application (APP) is installed on the terminal 200, after a user inputs a target facial image based on the cosmetic APP, the SDK (integrated with the method for extracting facial features based on artificial intelligence) of the cosmetic Application performs subdivision processing on the standard facial image according to the registration point labels of the standard facial image to obtain the facial image samples, and based on the facial image samples and the corresponding registration point labels, the training image based on the training image training model, displays the facial image on the training image, and displays the facial features of the facial image on the terminal 200, and displays the facial image of the facial image, so that the facial image can be displayed on the terminal 200, and the facial image display the facial features of the facial image after the cosmetic registration of the facial cosmetic Application scenario is performed, the facial image, the facial features of the facial image, the facial image can be displayed, and the facial image can be displayed on the terminal, and the facial image display of the facial image can be displayed accurately displayed on the terminal, and the facial image display the facial image, the facial image display method of the facial image, the facial features of the facial image, the facial image can be displayed, and the facial image display method of the facial image, the facial image can be displayed on the terminal.
In some embodiments, the terminal 200 may also call an Application Programming Interface (API) through the network 300 to send a request to the server 100, where the request includes a target facial image input by the user on the terminal 200, the server 100 automatically generates a facial image sample and a corresponding registration point label according to a registration point label of a standard facial image by using the artificial intelligence based facial feature extraction method provided by the embodiment of the present invention, trains an image registration model, extracts a facial feature of the target facial image based on the trained image registration model to perform subsequent cosmetic, beauty, identification and other processes, for example, in an Application scenario of facial cosmetic, a cosmetic Application is installed on the terminal 200, and after the user inputs the target facial image in the cosmetic Application, the terminal 200 calls the API to send the request including the target facial image to the server 100, after receiving the target face image, the server 100 extracts the facial features of the target face image based on the trained image registration model, and returns the facial features of the target face image to the makeup Application, the user can perform makeup changing operation according to the facial features of the target face image displayed on the makeup Application, for example, changing red lipstick on lips in the facial features, in response to the makeup changing operation of the user, the terminal 200 calls the API to send a makeup changing request to the server 100, the server 100 changes makeup for the target face image according to the makeup changing request to obtain a makeup changed target face image, and returns the makeup changed target face image to the makeup Application, and the makeup changed target face image is displayed on the display Interface 210 of the terminal 200 to realize accurate makeup changing of the facial features, so that the makeup is more natural; in an application scenario of the door access, when a certain user needs to open the door access, the terminal 200 calls the API to send an access request including a face image of the user to the server 100, after the server 100 receives the access request, the server 100 extracts facial features of the face image of the user based on the trained image registration model, and performs recognition and comparison on the facial features of the face image of the user and a face image (for example, a face that has been recorded in an access control system, such as a residential user or a company employee) in a database, where the face image of the user is matched with the face that has been registered with identity information, the door access is opened, the user is allowed to pass through, and thus, an unrelated person can be prevented from entering and exiting the door access at will.
The following describes a structure of an electronic device for facial feature extraction according to an embodiment of the present invention, where the electronic device for facial feature extraction may be various terminals, such as a mobile phone, a computer, and the like, and may also be a server 100 as shown in fig. 1.
Referring to fig. 2, fig. 2 is a schematic structural diagram of an electronic device 500 for facial feature extraction according to an embodiment of the present invention, and the electronic device 500 for facial feature extraction shown in fig. 2 includes: at least one processor 510, memory 550, at least one network interface 520, and a user interface 530. The various components in the electronic device 500 for facial feature extraction are coupled together by abus system 540. It is understood that thebus system 540 is used to enable communications among the components of the connection. Thebus system 540 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled asbus system 540 in fig. 2.
The Processor 510 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.
The user interface 530 includes one or more output devices 531 enabling presentation of media content, including one or more speakers and/or one or more visual display screens. The user interface 530 also includes one or more input devices 532 including user interface components to facilitate user input, such as a keyboard, mouse, microphone, touch screen display screen, camera, other input buttons and controls.
The memory 550 may comprise volatile memory or nonvolatile memory, and may also comprise both volatile and nonvolatile memory. The non-volatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 550 described in connection with embodiments of the invention is intended to comprise any suitable type of memory. Memory 550 optionally includes one or more storage devices physically located remote from processor 510.
In some embodiments, memory 550 can store data to support various operations, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below.
An operating system 551 including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;
a network communication module 552 for communicating to other computing devices via one or more (wired or wireless) network interfaces 520, exemplary network interfaces 520 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;
a display module 553 for enabling presentation of information (e.g., a user interface for operating peripherals and displaying content and information) via one or more output devices 531 (e.g., a display screen, speakers, etc.) associated with the user interface 530;
an input processing module 554 to detect one or more user inputs or interactions from one of the one or more input devices 532 and to translate the detected inputs or interactions.
In some embodiments, the artificial intelligence based facial feature extraction apparatus provided by the embodiments of the present invention may be implemented in software, and fig. 2 illustrates an artificial intelligence based facial feature extraction apparatus 555 stored in a memory 550, which may be software in the form of programs and plug-ins, etc., and includes a series of modules including a dissection module 5551, a remodeling module 5552, a training module 5553, an extraction module 5554, a registration module 5555, and a processing module 5556; the subdivision module 5551, the remodeling module 5552, the training module 5553, the extraction module 5554, the registration module 5555, and the processing module 5556 are used for implementing the facial feature extraction function provided by the embodiment of the invention.
As can be understood from the foregoing, the artificial intelligence-based facial feature extraction method provided by the embodiment of the present invention may be implemented by various types of electronic devices for facial feature extraction, such as an intelligent terminal and a server.
The following describes the artificial intelligence-based facial feature extraction method provided by the embodiment of the present invention in conjunction with an exemplary application and implementation of the server provided by the embodiment of the present invention. Referring to fig. 3A, fig. 3A is a schematic flowchart of a method for extracting facial features based on artificial intelligence according to an embodiment of the present invention, which is described with reference to the steps shown in fig. 3A.
Instep 101, the standard face image is subdivided according to the registration point labels of the standard face image, and a clustering result of the standard face image is obtained.
For example, a user can input a standard facial image on an input interface of a terminal, manually label registration points of the standard facial image, after the labeling is completed, the terminal automatically acquires registration point labels of the standard facial image, the terminal can forward the registration point labels of the standard facial image to a server, and the server triangulates the standard facial image according to the registration point labels of the standard facial image to obtain a tiling result of the standard facial image.
In some embodiments, the subdividing the standard face image according to the registration point labels of the standard face image to obtain a tiling result of the standard face image includes: connecting any three registration points in the registration point labels of the standard face image to obtain a triangular patch corresponding to any three registration points in the standard face image; combining the triangular patches to obtain a patch result of the standard face image; wherein any two triangular patches do not intersect or intersect at a common edge.
Illustratively, the standard facial image is split into triangular patches (triangular patches) according to their registration point labels, where each triangular patch is a curved-edge triangle, and any two triangular patches do not intersect or intersect on a common edge (two or more edges cannot intersect at the same time).
For example, a standard facial image is subdivided according to Delaunay triangulation method to obtain a tiling result of the standard facial image, wherein any triangular patch in the tiling result only contains a Delaunay edge, and the inside of the circumscribed circle of any triangular patch does not contain any registration point in the registration point labels. When any edge e (the two end points of the edge e are the registration points a and b) does not contain any other registration point in the registration point labels (the registration point in the registration point labels except the registration points a and b) in the circle circumscribing the circle, the edge e is the Delaunay edge.
Instep 102, according to the tiling result of the standard facial image, the spatial distribution of the registration point labels of the standard facial image is reshaped to obtain a facial image sample.
After the server obtains a tiling result of the standard facial image, five sense organs of the standard facial image are reshaped to change the spatial distribution of registration point labels in the standard facial image so as to form a facial image sample different from the standard facial image, so that an image registration model is efficiently trained through the automatically generated facial image sample.
In some embodiments, the reshaping the spatial distribution of the registration point labels of the standard facial image according to the tiling result of the standard facial image to obtain the facial image sample includes: adjusting the size and the position of a plurality of triangular patches in a patch result of the standard face image; each triangular patch corresponds to the spatial distribution of three registration point labels in the standard face image; mapping the adjusted triangular patches based on the textures of the triangular patches to obtain new triangular patches which correspond to the triangular patches and contain the textures; and combining the unadjusted triangular patch in the standard face image with the new triangular patch to obtain a face image sample.
For example, the standard face image has N triangular patches in the patch result, the size and position of K triangular patches are adjusted to obtain K adjusted triangular patches, the K adjusted triangular patches are mapped based on the texture of the K triangular patches to obtain new triangular patches including the texture corresponding to the K triangular patches, and N-K unadjusted triangular patches are combined with the K new triangular patches to obtain a face image sample, where K is a natural number less than or equal to N. And aiming at any one triangular patch j in the K triangular patches, adjusting the size and the position of the triangular patch j to obtain an adjusted triangular patch, and mapping the adjusted triangular patch based on the texture of the triangular patch j to obtain a new triangular patch which corresponds to the triangular patch j and contains the texture.
In some embodiments, the adjusting of the size and the position for a plurality of triangular patches in the patch result of the standard face image includes: performing the following for any of a plurality of triangular patches: determining the coordinates of three registration points in the triangular patch; and transforming the coordinates of at least one registration point in the three registration points so as to correspondingly transform the size and the position of the triangular patch.
The size and the position of the triangular patch can be adjusted by changing the coordinates of the registration points in the triangular patch. For any triangular patch needing to change the size and the position, three registration points (end points) in the triangular patch need to be determined, and the size and the position of the triangular patch can be transformed by changing the coordinates of at least one registration point in the three registration points.
In some embodiments, performing a reshaping process on the standard facial image to obtain a facial image sample according to a tiling result of the standard facial image includes: according to the facepiecing result of the standard facial image, carrying out remodeling treatment on the standard facial image to obtain a remodeled standard facial image; and integrally adjusting the reshaped standard facial image to obtain a facial image sample.
Illustratively, after the standard facial image is reshaped, the spatial distribution of the registration points in the standard facial image is changed to obtain the reshaped standard facial image. In order to improve the sample of the amplified facial image, the sample of the facial image can be obtained by integrally adjusting the reshaped standard facial image without changing the spatial distribution of the registration points of the reshaped standard facial image.
In some embodiments, the integrally adjusting the reshaped standard facial image to obtain the facial image sample comprises: taking the imaging area of the reshaped standard facial image as a reference area, and executing at least one of the following adjustment processes on the reshaped standard facial image: reducing or enlarging the remolded standard face image in equal proportion, and taking the part in the reference area after reduction or enlargement as a face image sample; rotating the reshaped standard face image clockwise or counterclockwise by taking any position in the reshaped standard face image as an axis, and taking a part in the reference area after rotation as a face image sample; performing translation in at least one direction on the remolded standard facial image, and taking a part in the reference region after the translation as a facial image sample; and taking at least part of the remolded standard face image as a noise adding area, adding color values of pixel points in the noise adding area and color values of noise to be added, and taking the standard face image obtained after adding the color values as a face image sample.
The overall adjustment may include scaling, rotating, translating, and adding noise, and the overall adjustment does not change the spatial distribution of the registration points in the image, and the reshaped standard facial image is subjected to the overall adjustment (at least one of scaling, rotating, translating, and adding noise) to obtain the facial image sample. Setting a reference area, such as a 100 × 100 box, for the imaging area of the standard face image after the standard face image is reshaped, wherein the zooming refers to enlarging or reducing the reshaped standard face image according to a set proportion, and using a part still in the 100 × 100 box after the enlarging or reducing as a face image sample; the rotation is to rotate the standard face image after reshaping clockwise or counterclockwise by taking the central point of the standard face image after reshaping as an axis (or any point in the standard face image after reshaping), and take the part still in the 100 × 100 box after rotation as a face image sample; parallel refers to any direction of translation (e.g., left, right, up, down, etc.) of the reshaped standard facial image, and the portion still within the 100 x 100 box after translation is used as the facial image sample. The method comprises the steps of taking at least a partial area of a reshaped standard facial image as a noise adding area (or the whole reshaped standard facial image), generating a random number (color value of noise to be added) according to a specified noise type (such as Gaussian noise, salt and pepper noise and the like), adding the random number (noise) and a source pixel value (color value of a pixel point in the noise adding area) of the reshaped standard facial image, and adding the color value to obtain the reshaped standard facial image as a facial image sample.
Instep 103, the registration point labels of the standard face image are synchronized according to the remodeling process to obtain the registration point labels corresponding to the face image sample.
After the server obtains the face image sample, the registration point markers of the standard face image are synchronized according to remodeling processing to obtain the registration point markers corresponding to the face image sample, so that the registration points of the face image sample do not need to be labeled manually, the cost of sample labeling is greatly reduced, and the generation efficiency of sample labeling is improved.
Referring to fig. 3B, fig. 3B is a schematic flow chart of an alternative artificial intelligence-based facial feature extraction method according to an embodiment of the present invention, and fig. 3B shows that fig. 3A further includessteps 106 to 108. The following iterative process is performed on the face image sample: instep 106, adjusting the size and position of a plurality of triangular patches in the face image sample; each triangular patch corresponds to the spatial distribution of three registration point labels in the face image sample; instep 107, based on the textures of the triangular patches of the face image sample, performing mapping processing on the adjusted triangular patches to obtain new triangular patches containing textures corresponding to the triangular patches; instep 108, the unadjusted triangular patch in the face image sample is combined with the new triangular patch, and the combined result is determined as a new face image sample.
Illustratively, after the server subdivides and reshapes the standard facial image to obtain a facial image sample, the server may further perform reshaping iteration on the facial image sample to generate more new facial image samples, that is, adjust the sizes and positions of a plurality of triangular patches in the facial image sample, map the adjusted triangular patches based on the textures of the plurality of triangular patches of the facial image sample to obtain new triangular patches corresponding to the plurality of triangular patches and including textures, combine the unadjusted triangular patches and the new triangular patches in the facial image sample to obtain a new facial image sample, synchronize registration point labels of the facial image sample according to the reshaping iteration to obtain registration point labels corresponding to the new facial image sample. And sequentially generating new face image samples in an iterative mode to obtain layered face image samples, so that the image registration model is better trained through the layered face image samples, and the registration accuracy of the image registration model is improved.
Instep 104, the image registration model is trained based on the facial image samples and the corresponding registration point labels.
After the server automatically generates a large number of facial image samples with labels according to the standard facial image, the image registration model can be trained according to the facial image samples and the corresponding registration point labels to obtain a trained image registration model, so that the image registration model after subsequent training can register the target facial image to obtain the registration point of the target facial image. After the server generates the face image sample, determining a loss function value of the image registration model according to the face image sample and the corresponding registration point label, judging whether the loss function value exceeds a preset threshold value, determining an error signal of the image registration model based on the loss function when the loss function value exceeds the preset threshold value, reversely propagating error information in the image registration model, and updating model parameters of each layer in the propagation process.
Referring to fig. 3C, fig. 3C is a schematic flowchart of an alternative artificial intelligence-based facial feature extraction method according to an embodiment of the present invention, and fig. 3C shows thatstep 104 in fig. 3A can be implemented throughsteps 1041 to 1046 shown in fig. 3C. Instep 1041, performing registration processing on the face image sample through the image registration model to obtain a predicted registration point of the face image sample; instep 1042, a first loss function of the image registration model is constructed according to the predicted registration point of the face image sample and the registration point label of the face image sample; instep 1043, a second loss function of the image registration model is constructed according to the predicted registration point of the facial image sample, the type of the five sense organs to which the predicted registration point belongs, and the registration point label of the facial image sample; instep 1044, a third loss function of the image registration model is constructed according to the predicted registration point of the face image sample and the predicted registration point of the mirror image sample corresponding to the face image sample; instep 1045, performing weighted summation on the first loss function, the second loss function, and the third loss function to obtain an overall loss function of the image registration model; instep 1046, the parameters of the image registration model are updated until the overall loss function converges, and the updated parameters of the image registration model when the overall loss function converges are used as the parameters of the trained image registration model.
Illustratively, after a server obtains a face image sample, a registration process is performed on the face image sample through an image registration model to obtain a predicted registration point of the face image sample, a first loss function of the image registration model is constructed through the predicted registration point of the face image sample and a registration point label of the face image sample, a second loss function of the image registration model is constructed according to the predicted registration point of the face image sample, a type of a five sense organs to which the predicted registration point belongs and the registration point label of the face image sample, a third loss function of the image registration model is constructed according to the predicted registration point of the face image sample and the predicted registration point of a mirror image sample corresponding to the face image sample, the first loss function, the second loss function and the third loss function are subjected to weighted summation to obtain an overall loss function of the image registration model, finally, parameters of the image registration model are updated when the overall loss function is converged, and the updated parameters of the image registration model are used as parameters of the trained image registration model. The three loss functions are insensitive to abnormal values (predicted registration points of face image samples with large errors), so that the trained image registration model is more robust.
In some embodiments, constructing a first loss function of the image registration model from the predicted registration points of the facial image samples and the registration point annotations of the facial image samples comprises: in the registration point marking, marking coordinates of registration points corresponding to the predicted registration points of the face image sample are determined; and determining the absolute value of the difference value of the coordinates of the predicted registration points of the face image samples and the annotated coordinates as a first loss function of the image registration model.
Determining the absolute value of the difference value between the coordinates of the prediction registration point of the face image sample and the labeling coordinates as a first loss function of the image registration model, wherein the formula of the first loss function is
Wherein l
i Represents a predictive registration point, and>
representing true registration points, i.e. registration points in the face image sample, (x)
i ,y
i ) Is represented by
i Is greater than or equal to>
Represents->
K denotes the total number of registration points in the registration point labeling of the face image sample.
In some embodiments, constructing a second loss function of the image registration model based on the predicted registration points of the facial image sample, the type of the five sense organs to which the predicted registration points belong, and the registration point labels of the facial image sample comprises: in the registration point marking, determining the marking coordinate of a registration point corresponding to the predicted registration point of the face image sample, and determining the absolute value of the difference value between the coordinate of the predicted registration point of the face image sample and the marking coordinate; and determining the ratio of the absolute value to the number of registration points contained in the type of the five sense organs to which the predicted registration points belong as a second loss function of the image registration model.
After determining the absolute value of the difference value between the coordinate of the predicted registration point of the face image sample and the labeling coordinate, determining the ratio of the absolute value to the number of registration points contained in the type of the five sense organs to which the predicted registration point belongs as a second loss function, wherein the formula of the second loss function is
Five sense organs Ω = { left eyebrow, right eyebrow, left eye, right eye, nose, mouth, contour }, where l
j Represents a predictive registration point, and>
representing true registration points, i.e. registration points in facial image samples, K
i Indicating the number of registration points contained in the type of facial features to which the predicted registration points belong.
In some embodiments, determining a ratio of the absolute value to the number of registration points comprised by the type of facial organ to which the predicted registration point belongs as a second loss function of the image registration model comprises: and weighting the ratio of the absolute value to the number of the registration points contained in the type of the five sense organs to which the predicted registration points belong based on the weight of the type of the five sense organs to which the predicted registration points belong, and determining the weighting result as a second loss function of the image registration model.
Taking the above example as a reference, after determining the absolute value of the difference between the coordinates of the predicted registration point of the facial image sample and the annotated coordinates, weighting the ratio of the absolute value to the number of registration points included in the type of the five sense organs to which the predicted registration point belongs based on the weight of the type of the five sense organs to which the predicted registration point belongs, and determining the weighting result as a second loss function of the image registration model, where the formula of the second loss function is
Ω = { left eyebrow, right eyebrow, left eye, right eye, nose, mouth, contour }, where l
j Represents a predictive registration point, and>
representing true registration points, i.e. registration points in facial image samples, K
i Indicates the number of registration points, w, contained in the type of facial features to which the predicted registration point belongs
i Representing the weight of the type of the five sense organs to which the predicted registration point belongs. Different weights are assigned to different five sense organs, so that the trained image registration model has different attention to the five sense organs, and the image registration model can more accurately register the five sense organs with high weights.
In some embodiments, before constructing the third loss function of the image registration model, the method further comprises: carrying out registration processing on a mirror image sample of the face image sample through an image registration model to obtain a predicted registration point of the mirror image sample; constructing a third loss function of the image registration model according to the predicted registration point of the face image sample and the predicted registration point of the mirror image sample corresponding to the face image sample, wherein the third loss function comprises: determining mirror image coordinates of a predicted registration point corresponding to the predicted registration point of the face image sample in the mirror image sample; and determining the absolute value of the difference value of the coordinates of the predicted registration point of the face image sample and the mirror image coordinates as a third loss function of the image registration model.
Wherein after the predicted registration point of the mirror image sample is determined, the coordinates of the predicted registration point of the face image sample are compared with the mirrorDetermining the absolute value of the difference value of the image coordinates as a third loss function of the image registration model, wherein the formula of the third loss function is
Wherein (x)
i ,y
i ) Coordinates (x ') of predicted registration point representing face image sample'
mi ,y′
mi ) Is represented by (x)
i ,y
i ) Coordinates of the corresponding mirror image sample, ((W-x'
mi ),y′
mi ) Is represented by (x'
mi ,y′
mi ) W represents the width of the mirror image sample or face image sample, K represents the total number of registration points in the registration point labeling of the face image sample. And constructing a third loss function through the mirror image characteristics of the mirror image samples, so that the image registration model trained through the third loss function has more stable output.
Describing backward propagation, inputting training sample data into an input layer of a neural network model, passing through a hidden layer, finally reaching an output layer and outputting a result, which is a forward propagation process of the neural network model, wherein because the output result of the neural network model has an error with an actual result, an error between the output result and the actual value is calculated and is propagated backward from the output layer to the hidden layer until the error is propagated to the input layer, and in the process of backward propagation, the value of a model parameter is adjusted according to the error; and continuously iterating the process until convergence, wherein the image registration model belongs to the neural network model.
Instep 105, feature extraction processing is performed on the target face image based on the trained image registration model, and the extracted registration points are used as the face features of the target face image.
The method comprises the steps that a server automatically generates a large number of face image samples with labels, trains an image registration model to obtain a trained image registration model, then registers a target face image according to the trained image registration model to extract registration points of the target face image, and the extracted registration points are used as face features of the target face image. Wherein, the extracted registration points are the registration points of five sense organs, namely the key points included by the left eyebrow, the right eyebrow, the left eye, the right eye, the nose, the mouth and the contour.
Now, the artificial intelligence based facial feature extraction method provided by the embodiment of the present invention has been described with reference to the exemplary application and implementation of the server provided by the embodiment of the present invention, and the following continues to describe a scheme for implementing facial feature extraction by cooperation of the modules in the artificial intelligence based facial feature extraction apparatus 555 provided by the embodiment of the present invention.
The subdivision module 5551 is configured to perform subdivision processing on a standard face image according to registration point labels of the standard face image, so as to obtain a tiling result of the standard face image; a remodeling module 5552, configured to perform remodeling processing on spatial distribution of registration point labels of the standard facial image according to a tiling result of the standard facial image to obtain a facial image sample, and synchronize the registration point labels of the standard facial image according to the remodeling processing to obtain registration point labels corresponding to the facial image sample; a training module 5553, configured to train an image registration model based on the face image sample and the corresponding registration point label; an extracting module 5554, configured to perform feature extraction processing on the target face image based on the trained image registration model, and use the extracted registration point as the face feature of the target face image.
In some embodiments, the subdivision module 5551 is further configured to connect any three registration points in the registration point labels of the standard facial image to obtain a triangular patch corresponding to the any three registration points in the standard facial image; combining the triangular patches to obtain a patch result of the standard face image; wherein any two of the triangular patches do not intersect or intersect at a common edge.
In some embodiments, the reshaping module 5552 is further configured to perform size and position adjustment for a plurality of triangular patches in the tiling result of the standard facial image; carrying out mapping processing on the adjusted triangular patches based on the textures of the triangular patches to obtain new triangular patches which correspond to the triangular patches and contain the textures; each triangular patch corresponds to the spatial distribution of three registration point labels in the standard face image; and combining the unadjusted triangular patch in the standard face image with the new triangular patch to obtain a face image sample.
In some embodiments, the reshaping module 5552 is further for performing the following for any of the plurality of triangular patches: determining coordinates of three registration points in the triangular surface patch; and transforming the coordinates of at least one registration point in the three registration points so as to correspondingly transform the size and the position of the triangular patch.
In some embodiments, the reshaping module 5552 is further configured to perform reshaping processing on the standard facial image according to a facetiling result of the standard facial image, so as to obtain a reshaped standard facial image; and integrally adjusting the reshaped standard facial image to obtain a facial image sample.
In some embodiments, the reshaping module 5552 is further configured to perform at least one of the following for the reshaped standard facial image: carrying out scaling processing on the remolded standard face image; rotating the remolded standard face image; performing translation processing on the remolded standard face image; performing fusion processing on the remolded standard face image and noise; and taking the adjusted standard face image as a face image sample.
In some embodiments, the training module 5553 is further configured to perform a registration process on the facial image sample through the image registration model, so as to obtain a predicted registration point of the facial image sample; constructing a first loss function of the image registration model according to the predicted registration point of the face image sample and the registration point mark of the face image sample; constructing a second loss function of the image registration model according to the predicted registration point of the facial image sample, the type of the five sense organs to which the predicted registration point belongs and the registration point mark of the facial image sample; constructing a third loss function of the image registration model according to the predicted registration point of the face image sample and the predicted registration point of the mirror image sample corresponding to the face image sample; performing weighted summation on the first loss function, the second loss function and the third loss function to obtain an overall loss function of the image registration model; and updating the parameters of the image registration model until the overall loss function is converged, and taking the updated parameters of the image registration model when the overall loss function is converged as the parameters of the trained image registration model.
In some embodiments, the training module 5553 is further configured to determine, in the registration point labeling, labeling coordinates of a registration point corresponding to a predicted registration point of the facial image sample; and determining the absolute value of the difference value of the coordinates of the predicted registration point of the face image sample and the annotated coordinates as a first loss function of the image registration model.
In some embodiments, the training module 5553 is further configured to determine, in the registration point labeling, an labeled coordinate of a registration point corresponding to the predicted registration point of the facial image sample, and determine an absolute value of a difference between the coordinate of the predicted registration point of the facial image sample and the labeled coordinate; and determining the ratio of the absolute value to the number of registration points contained in the type of the five sense organs to which the predicted registration points belong as a second loss function of the image registration model.
In some embodiments, the training module 5553 is further configured to weight a ratio of the absolute value to the number of registration points included in the type of the facial organ to which the predicted registration point belongs based on a weight of the type of the facial organ to which the predicted registration point belongs, and determine a weighting result as a second loss function of the image registration model.
In some embodiments, the artificial intelligence based facial feature extraction device 555 further includes: a registration module 5555, configured to perform registration processing on a mirror image sample of the face image sample through the image registration model, so as to obtain a predicted registration point of the mirror image sample; the training module 5553 is further configured to determine mirror coordinates of predicted registration points corresponding to predicted registration points of the face image sample in the mirror image sample; determining an absolute value of a difference of coordinates of a predicted registration point of the facial image sample and the mirror image coordinates as a third loss function of the image registration model.
In some embodiments, the artificial intelligence based facial feature extraction device 555 further includes: a processing module 5556, configured to perform the following iterative processing on the face image sample: adjusting the size and position of a plurality of triangular patches in the face image sample; each triangular patch corresponds to the spatial distribution of three registration point labels in the face image sample; based on the textures of a plurality of triangular patches of the face image sample, mapping the adjusted triangular patches to obtain new triangular patches which correspond to the triangular patches and contain the textures; and combining the unadjusted triangular patch in the face image sample with the new triangular patch, and determining a combination result as a new face image sample.
Embodiments of the present invention also provide a computer-readable storage medium storing executable instructions, which when executed by a processor, will cause the processor to perform an artificial intelligence based facial feature extraction method provided by embodiments of the present invention, for example, the artificial intelligence based facial feature extraction method shown in fig. 3A-3C.
In some embodiments, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EP ROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.
In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).
By way of example, executable instructions may be deployed to be executed on one computing device (a device that includes a smart terminal and a server), or on multiple computing devices located at one site, or distributed across multiple sites and interconnected by a communication network.
In the following, an exemplary application of the embodiments of the present invention in a practical application scenario will be described.
The embodiment of the invention can be applied to an application scene of facial makeup, as shown in fig. 1, a terminal 200 is connected with a server 100 deployed at a cloud end through a network 300, the makeup application is installed on the terminal 200, the terminal 200 calls an API to send a request including a target face image to the server 100, the server 100 executes a facial feature extraction method (face registration technology) based on artificial intelligence provided by the embodiment of the invention, facial features of the target face image are extracted based on a trained image registration model, and the facial features of the target face image are fed back to the makeup application of the terminal 200, so that a user can perform makeup changing operation according to the displayed facial features of the target face image, for example, the mouth in the facial features is changed to red, the makeup application responds to makeup changing operation of the user, and the makeup-changed target face image is displayed on a display interface 210 of the terminal 200, so as to realize accurate makeup changing of the facial features, make the makeup more natural, and effectively improve user experience.
In the related art, feature points (registration points) of facial features are located by a face registration technique. The face registration method is based on deep learning, which relies on a large amount of training data. However, manually labeling feature points of a face is time-consuming and costly. Therefore, generating a large number of accurate and available face image samples is helpful to reduce cost and improve registration accuracy. Although the operations can construct different pairs of input and output face image samples for deep learning and rapidly generate a large number of samples, the coordinate distribution of points corresponding to the samples is the same (namely the relative positions of the same characteristic point A and the same characteristic point B in different samples are completely consistent), the spatial distribution of the output points is not changed, the diversity is lacking, and overfitting is easily caused. In the related technology, a loss function of L2-loss is adopted, namely the Euclidean distance between coordinates of a prediction point and a true value point (marked feature point), the distance between the prediction point and the true value point is minimized, so that the image registration model is gradually close to a regression target value, and the image registration model is more biased to learn the feature point with larger distance deviation only by adopting the L2-loss, namely if an abnormal value exists, the image registration model focuses more on the abnormal value, and ignores other parts.
As shown in fig. 4A, the face image is an original image (face image) without being manually labeled with feature points, as shown in fig. 4B, the face image is a face image with being manually labeled with feature points, wherein thefeature point 401 is an eye feature point manually labeled with. The data amplification method in the related art is to add noise to the labeled face image or to perform simple operations such as translation, rotation, scaling and the like. For example, the input of the deep convolutional neural network is 100 × 100 pixels, that is, a square region is taken from the original training image, and then the square region is scaled to 100 × 100 pixels, and meanwhile, the coordinates of the feature points are correspondingly scaled, so that the coordinates of the feature points are still at corresponding positions in the 100 × 100 region. The square image area and the coordinate values are training samples of a group of face images. Fig. 5A-5D illustrate a translational amplification method, that is, a square region is translated to generate a training sample, asquare region 501 in fig. 5A corresponds to a training sample of 100 × 100 pixels shown in fig. 5B, asquare region 502 in fig. 5A corresponds to a training sample of 100 × 100 pixels shown in fig. 5C, and asquare region 503 in fig. 5A corresponds to a training sample of 100 × 100 pixels shown in fig. 5D; fig. 6A-6D illustrate a spin amplification method, i.e., a square region is spun to generate a training sample, thesquare region 601 in fig. 6A corresponds to the training sample of 100 × 100 pixels shown in fig. 6B, thesquare region 602 in fig. 6A corresponds to the training sample of 100 × 100 pixels shown in fig. 6C, and thesquare region 603 in fig. 6A corresponds to the training sample of 100 × 100 pixels shown in fig. 6D; fig. 7A-7D illustrate a scaling amplification method, i.e., a square region is scaled to generate a training sample, thesquare region 701 in fig. 7A corresponds to the training sample of 100 × 100 pixels shown in fig. 7B, thesquare region 702 in fig. 7A corresponds to the training sample of 100 × 100 pixels shown in fig. 7C, and thesquare region 703 in fig. 7A corresponds to the training sample of 100 × 100 pixels shown in fig. 7D.
In order to solve the above problem, an embodiment of the present invention provides a method for extracting facial features based on artificial intelligence (a data amplification method for rapidly and efficiently changing data spatial distribution), in which, for a face image with feature point labeling already completed, after triangular tiling is performed on a face based on feature points, facial features are reshaped (for example, the distance between eyes and the thickness of lips are adjusted), so as to rapidly obtain a large number of new training samples. Meanwhile, 3 loss functions different from the L2-loss method are adopted, the same image registration model is used, and the image registration model trained on the training sample generated based on the embodiment of the invention is better improved in precision compared with the image registration model trained on the prior training sample.
As shown in fig. 8, in the embodiment of the present invention, registration points are manually marked on an image with a human face, triangulation is performed according to the registration points to obtain triangular patches, the size, the position, and the like of the triangular patches are adjusted to generate new human face registration points, an old triangular patch is mapped to a new triangular patch to obtain new training data (amplification data), iterative adjustment and mapping operations are sequentially performed to generate a large amount of new training data, and finally, an image registration model is trained using the amplification data. The specific implementation process comprises the following steps:
1) Average face triangle tiling
Everyone has a unique face in the world, but the structure of the five sense organs is fixed, the mouth is positioned below the nose, and the eyes are respectively arranged at the two sides of the nasal alar. Based on this fixed structure, an average feature point distribution map as shown in fig. 9 can be obtained, in which the feature points 901, 902, 903 are one feature point in the right eye. Delaunay triangulation is applied to the average feature point distribution diagram of fig. 9 to obtain a triangular patch result as shown in fig. 10, and for example, feature points 901, 902, and 903 in fig. 9 are divided to obtain atriangular patch 1001 in fig. 10.
2) Triangle tiling of specific faces
For the face image shown in fig. 4B, which has been manually labeled with feature points, the triangulation method shown in fig. 10 is applied to the face shown in fig. 4B, and a specific face triangle tiling result shown in fig. 11 is obtained, where atriangle patch 1101 is one triangle patch in the specific face triangle tiling result.
3) Amplification of five-sense remodeling data
Based on the triangular faceting result shown in fig. 11, the distribution of the five sense organs, i.e., the five sense organs can be changed by moving the feature points of the five sense organs (changing the size and the position of the triangular face patch), so as to obtain new feature point coordinates. Meanwhile, the texture of the corresponding triangular patch in fig. 11 is mapped to the corresponding triangular patch after transformation, so as to obtain a new image corresponding to the feature point. Wherein feature points of the eyes in fig. 11 are moved to obtain a new training sample with reduced eyes as shown in fig. 12A, wherein theright eye 1101 in fig. 12A is relatively small compared to the right eye in fig. 11, and a new training sample with enlarged eyes as shown in fig. 12B, wherein the right eye 1102 in fig. 12B is relatively large compared to the right eye in fig. 11; moving the feature points of the eyes in fig. 11 to obtain a new training sample for binocular zooming-in as shown in fig. 13A, wherein thedistance 1301 of the eyes in fig. 13A is smaller relative to the distance of the eyes in fig. 11, and a new training sample for binocular zooming-out as shown in fig. 13B, wherein thedistance 1302 of the eyes in fig. 13B is larger relative to the distance of the eyes in fig. 11; moving feature points of the eyebrow in fig. 11 to obtain a new training sample in which the eyebrow is tightened inward as shown in fig. 14A, in which theright eyebrow 1401 in fig. 14A is tightened inward with respect to the right eyebrow in fig. 11, and a new training sample in which the eyebrow is relaxed outward as shown in fig. 14B, in which theright eyebrow 1402 in fig. 14B is relaxed outward with respect to the right eyebrow in fig. 11; the characteristic points of the mouth in fig. 11 were moved to obtain a new training sample of the lateral stretching of the mouth as shown in fig. 15A, in which themouth 1501 in fig. 15A is longer in lateral distance with respect to the mouth in fig. 11, and a new training sample of the lateral contraction of the mouth as shown in fig. 15B, in which themouth 1502 in fig. 15B is shorter in lateral distance with respect to the mouth in fig. 11. Other operations such as moving the mouth position up and down, face contour becoming fat and thin, etc. may generate new training samples.
After generating the training sample, training an image registration model (deep neural network) by a loss function different from L2-loss, and after obtaining the positioning result of key points (registration points) of five sense organs by the image registration model, calculating L2-loss with the manually marked registration points, namely
Wherein K represents the number of key points, l
i Indicates a predicted value, is greater than or equal to>
The true value (manually marked registration point) is represented. Unlike L2-loss, the present invention employs the following 3 loss functions (three loss functions can be used in permutation and combination):
1. l1-loss, i.e.
Wherein (x)
i ,y
i ) Represents l
i Is greater than or equal to>
Represents->
The L1-loss is less sensitive to abnormal values and more robust.
2. Weighted pentad point balance loss, which can solve two problems: 1) The key points of different five sense organs are inconsistent; 2) The tolerance for the accuracy of each facial feature is different. As shown in fig. 9, one eye of the key points has 8 points, one mouth has 18 points, and the contour has 19 points, so that the L2-loss obviously focuses more on the part with more points. Therefore, it is necessary to equalize the loss of points in fig. 9 from the weight equalization of each point to the weight equalization of each five sense organs, i.e.
K
i And | represents the number of key points in the corresponding five sense organs. In the last step, five sense organs are balanced, but in practical use, tolerance of each part of error is different, for example, the mouth registration point is often used for making up lipstick, etc., deviation of the mouth registration point can cause a great discount on lipstick making effect or a strong sense of incongruity, while the contour point is often used for face thinning operation, and deviation to a certain extent does not affect the face thinning effect. Thus, different weights may be assigned to different five sense organs, i.e., the weighted five sense organ point balance loss is calculated as ≧ greater>
Ω = { left eyebrow, right eyebrow, left eye, right eye, nose, mouth, contour }, where, w
i Represent the weights of the different five sense organs.
3. Mirror image loss, the human face is a symmetrical structure, so the original image can be processed in a mirror image mode in the data amplification process, and training data can be doubled. Note that, for the original image with a label shown in fig. 16A and the mirror image with a label shown in fig. 16B, the points 1 (original image) and 4 (mirror image), the points 2 (original image) and 3 (mirror image), the points 3 (original image) and 2 (mirror image), and the points 4 (original image) and 1 (mirror image) are symmetrical points to each other, and the following relationships apply to such inputs: 1) Point 1 (original)Graph) x-coordinate = image width-x-coordinate of point 4 (mirror image); 2) The y-coordinate of point 1 (original image = the y-coordinate of point 4 (mirror image), and other symmetrical points have similar relationships. The formula for calculating the mirror loss is
Wherein, (x'
mi ,y′
mi ) Representing the sum of (x) in a mirror image
i ,y
i ) Coordinates of the corresponding keypoints. Through the mirror image characteristics of the mirror image samples, the image registration model trained through the mirror image loss has more stable output.
Therefore, the image registration network is trained by using different data amplification modes and a new loss function, and a Root Mean Square Error (RMSE) is obtained through testing, and the smaller the Error value is, the higher the accuracy is, as shown in table 1:
TABLE 1
As can be seen from table 1, the embodiment of the present invention significantly improves the registration accuracy with a small cost and a limited data amount.
In summary, the embodiment of the present invention is based on a small number of artificially labeled training samples, can generate a large number of training samples rapidly, and in combination with the three proposed loss functions, can significantly improve the accuracy of face registration and obtain a better application effect in practical applications, wherein the accuracy of a registration task is improved, and under the condition that the amount of the artificially labeled training samples is constant, the accuracy of a registration model can be significantly improved by using the amplification method of the embodiment of the present invention; the human cost is saved, only one feature point of a human face picture needs to be marked manually, dozens of and hundreds of different training samples can be automatically generated by the embodiment of the invention, and then a large number of training samples are constructed by combining the amplification mode in the related technology; the diversity of the training samples is increased, the amplification mode in the related technology cannot change the distribution of the points, the amplification method of the embodiment of the invention can change the distribution of the points, the diversity of the training samples is increased, and the method is greatly helpful for the accuracy of the training; the consistency of the training samples is enhanced, and for the same face image, different people are marked with errors, the amplification mode of the embodiment of the invention not only amplifies the data volume, but also ensures that the training samples with different distributions generated by the same face image still have strong consistency, and is greatly helpful for the stability of training; the visual angle of each five sense organs is expanded from the visual angle of a single point, so that the requirements of actual production and application (suitable for weighted five sense organs point balance Loss) are better met; the training sample enhancement is not only the increase of the number, but also the application of the mirror image loss simultaneously excavates the relationship between the original sample and the mirror image sample, and obtains the benefit of 1+1 >.
The above description is only an example of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present invention are included in the protection scope of the present invention.