Face feature extraction method and device and electronic equipmentTechnical Field
The present invention relates to image processing technologies, and in particular, to a method and an apparatus for extracting facial features based on artificial intelligence, an electronic device, and a computer-readable storage medium.
Background
Artificial Intelligence (AI) is a comprehensive technique in computer science, and by studying the design principles and implementation methods of various intelligent machines, the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology is a comprehensive subject and relates to a wide range of fields, for example, natural language processing technology and machine learning/deep learning, etc., and along with the development of the technology, the artificial intelligence technology can be applied in more fields and can play more and more important values.
In the image processing technology based on artificial intelligence, feature extraction is an important research direction, facial features in an image can be automatically extracted through an image registration model, and five sense organs are automatically and accurately positioned based on the extracted facial features so as to modify and beautify the specific position of the face; facial attribute information such as expressions and emotions can be acquired based on the extracted facial features, so that interactive entertainment functions such as special-effect cameras and dynamic stickers are achieved. However, in the related art, training of the image registration model depends on a large number of manually labeled face image samples, and the model training efficiency is low.
Disclosure of Invention
The embodiment of the invention provides a facial feature extraction method and device based on artificial intelligence, electronic equipment and a computer-readable storage medium, which can automatically generate a large number of facial image samples with labels by remolding a standard facial image and improve the efficiency of model training.
The technical scheme of the embodiment of the invention is realized as follows:
the embodiment of the invention provides a face feature extraction method based on artificial intelligence, which comprises the following steps:
according to registration point labeling of a standard face image, subdividing the standard face image to obtain a paneling result of the standard face image;
according to the facetted result of the standard facial image, performing remodeling processing on the spatial distribution of the registration point labels of the standard facial image to obtain a facial image sample, and
synchronizing registration point labels of the standard facial image according to the remodeling treatment to obtain registration point labels corresponding to the facial image samples;
training an image registration model based on the face image sample and the corresponding registration point label;
and performing feature extraction processing on the target face image based on the trained image registration model, and taking the extracted registration points as the face features of the target face image.
The embodiment of the invention provides a face feature extraction device based on artificial intelligence, which comprises:
the subdivision module is used for subdividing the standard face image according to registration point marks of the standard face image to obtain a faceted result of the standard face image;
a remodeling module for remodeling the spatial distribution of the registration point labels of the standard facial image according to the facecoverage result of the standard facial image to obtain a facial image sample, and
synchronizing registration point labels of the standard facial image according to the remodeling treatment to obtain registration point labels corresponding to the facial image samples;
the training module is used for training an image registration model based on the face image sample and the corresponding registration point label;
and the extraction module is used for carrying out feature extraction processing on the target face image based on the trained image registration model and taking the extracted registration points as the face features of the target face image.
In the above technical solution, the subdivision module is further configured to connect any three registration points in the registration point labels of the standard face image to obtain a triangular patch corresponding to the any three registration points in the standard face image;
combining the triangular patches to obtain a patch result of the standard face image;
wherein any two of the triangular patches do not intersect or intersect at a common edge.
In the above technical solution, the reshaping module is further configured to adjust the size and the position of a plurality of triangular patches in a patch result of the standard face image; each triangular patch corresponds to the spatial distribution of three registration point labels in the standard face image;
carrying out mapping processing on the adjusted triangular patches based on the textures of the triangular patches to obtain new triangular patches which correspond to the triangular patches and contain the textures;
and combining the unadjusted triangular patch in the standard face image with the new triangular patch to obtain a face image sample.
In the foregoing technical solution, the reshaping module is further configured to perform the following processing on any triangular patch of the plurality of triangular patches:
determining coordinates of three registration points in the triangular patch;
and transforming the coordinates of at least one registration point in the three registration points so as to correspondingly transform the size and the position of the triangular patch.
In the above technical solution, the remodeling module is further configured to perform remodeling processing on the standard face image according to a facetted result of the standard face image to obtain a remodeled standard face image;
and integrally adjusting the reshaped standard facial image to obtain a facial image sample.
In the above technical solution, the remodeling module is further configured to, with an imaging region of the reshaped standard facial image as a reference region, perform at least one of the following adjustment processes on the reshaped standard facial image:
reducing or enlarging the remolded standard face image in equal proportion, and taking the part in the reference area after reduction or enlargement as a face image sample;
rotating the reshaped standard face image clockwise or counterclockwise by taking any position in the reshaped standard face image as an axis, and taking a part in the reference area after rotation as a face image sample;
performing translation in at least one direction on the remolded standard facial image, and taking a part in the reference region after translation as a facial image sample;
and taking at least part of the remolded standard face image as a noise adding area, adding color values of pixel points in the noise adding area and color values of noise to be added, and taking the standard face image obtained after adding the color values as a face image sample.
In the above technical solution, the training module is further configured to perform registration processing on the face image sample through the image registration model to obtain a predicted registration point of the face image sample;
constructing a first loss function of the image registration model according to the predicted registration point of the face image sample and the registration point mark of the face image sample;
constructing a second loss function of the image registration model according to the predicted registration point of the facial image sample, the type of the five sense organs to which the predicted registration point belongs and the registration point mark of the facial image sample;
constructing a third loss function of the image registration model according to the predicted registration point of the face image sample and the predicted registration point of the mirror image sample corresponding to the face image sample;
performing weighted summation on the first loss function, the second loss function and the third loss function to obtain an overall loss function of the image registration model;
and updating the parameters of the image registration model until the overall loss function is converged, and taking the updated parameters of the image registration model when the overall loss function is converged as the parameters of the trained image registration model.
In the above technical solution, the training module is further configured to determine, in the registration point labeling, a labeling coordinate of a registration point corresponding to the predicted registration point of the face image sample;
and determining the absolute value of the difference value of the coordinates of the predicted registration point of the face image sample and the annotated coordinates as a first loss function of the image registration model.
In the above technical solution, the training module is further configured to determine, in the registration point labeling, a labeling coordinate of a registration point corresponding to the predicted registration point of the face image sample, and determine the labeling coordinate of the registration point corresponding to the predicted registration point of the face image sample
Determining the absolute value of the difference value of the coordinates of the prediction registration points of the face image samples and the labeling coordinates;
and determining the ratio of the absolute value to the number of registration points contained in the type of the five sense organs to which the predicted registration points belong as a second loss function of the image registration model.
In the foregoing technical solution, the training module is further configured to weight a ratio of the absolute value to the number of registration points included in the type of the five sense organs to which the predicted registration point belongs based on a weight of the type of the five sense organs to which the predicted registration point belongs, and determine a weighting result as a second loss function of the image registration model.
In the above technical solution, the apparatus further includes:
the registration module is used for carrying out registration processing on a mirror image sample of the face image sample through the image registration model to obtain a prediction registration point of the mirror image sample;
the training module is further used for determining mirror image coordinates of a predicted registration point corresponding to the predicted registration point of the face image sample in the mirror image sample;
determining an absolute value of a difference of coordinates of a predicted registration point of the facial image sample and the mirror image coordinates as a third loss function of the image registration model.
In the above technical solution, the apparatus further includes:
a processing module for performing the following iterative processing on the facial image sample:
adjusting the size and position of a plurality of triangular patches in the face image sample; each triangular patch corresponds to the spatial distribution of three registration point labels in the face image sample;
based on the textures of a plurality of triangular patches of the face image sample, carrying out mapping processing on the adjusted triangular patches to obtain new triangular patches which correspond to the triangular patches and contain the textures;
and combining the unadjusted triangular patch in the face image sample with the new triangular patch, and determining a combination result as a new face image sample.
An embodiment of the present invention provides an electronic device for facial feature extraction, where the electronic device includes:
a memory for storing executable instructions;
and the processor is used for realizing the artificial intelligence-based facial feature extraction method provided by the embodiment of the invention when the executable instructions stored in the memory are executed.
The embodiment of the invention provides a computer-readable storage medium, which stores executable instructions and is used for causing a processor to execute the executable instructions so as to realize the artificial intelligence-based facial feature extraction method provided by the embodiment of the invention.
The embodiment of the invention has the following beneficial effects:
the standard face image is reshaped through a facepiecing result of the standard face image to change the spatial distribution of registration point labels of the standard face image to obtain a face image sample, so that a large number of face image samples with labels can be generated automatically based on a small number of standard face images according to the standard face image, and the efficiency of model training is improved; the image registration model is trained through the accurate face image sample with the label, so that the trained image registration model can accurately extract the face features of the target face image, and the accuracy of extracting the face features is improved.
Drawings
Fig. 1 is a schematic view of an application scenario of a facialfeature extraction system 10 according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an electronic device 500 for facial feature extraction according to an embodiment of the present invention;
3A-3C are schematic flow charts of artificial intelligence-based facial feature extraction methods provided by embodiments of the present invention;
fig. 4A is an original image without manually marking feature points according to an embodiment of the present invention;
fig. 4B is a face image with artificially labeled feature points according to an embodiment of the present invention;
FIGS. 5A-5D are schematic diagrams of training samples generated by the translational amplification method provided in an embodiment of the invention;
FIGS. 6A-6D are schematic diagrams of training samples generated by the spin amplification method provided in an embodiment of the present invention;
FIGS. 7A-7D are schematic diagrams of training samples generated by the scaling amplification method according to an embodiment of the present invention;
FIG. 8 is a schematic flow chart of a method for extracting facial features based on artificial intelligence according to an embodiment of the present invention;
FIG. 9 is a graph of mean feature point distribution provided by an embodiment of the present invention;
FIG. 10 is a triangular patch result provided by an embodiment of the invention;
FIG. 11 illustrates the triangle tiling result of a specific face;
FIG. 12A is a schematic diagram of a new training sample with reduced eyes provided by an embodiment of the present invention;
FIG. 12B is a schematic diagram of a new training sample with eye magnification provided by an embodiment of the present invention;
FIG. 13A is a schematic diagram of a new training sample for binocular zooming in provided by an embodiment of the present invention;
FIG. 13B is a schematic diagram of a new training sample for binocular zooming provided by an embodiment of the present invention;
FIG. 14A is a schematic diagram of a new training sample with inward tightening of eyebrows according to an embodiment of the present invention;
FIG. 14B is a schematic diagram of a new training sample for eyebrow soothing outward according to an embodiment of the present invention;
FIG. 15A is a schematic view of a new training sample for lateral mouth stretching provided by an embodiment of the present invention;
FIG. 15B is a schematic view of a new training sample with a laterally contracted mouth provided by an embodiment of the present invention;
FIG. 16A is an illustration of artwork with annotations provided by an embodiment of the present invention;
fig. 16B is a mirror image with labels provided by an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail with reference to the accompanying drawings, the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
In the description that follows, references to the terms "first", "second", and the like, are intended only to distinguish similar objects and not to indicate a particular ordering for the objects, it being understood that "first", "second", and the like may be interchanged under certain circumstances or sequences of events to enable embodiments of the invention described herein to be practiced in other than the order illustrated or described herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.
Before further detailed description of the embodiments of the present invention, terms and expressions mentioned in the embodiments of the present invention are explained, and the terms and expressions mentioned in the embodiments of the present invention are applied to the following explanations.
1) Unsupervised learning: various problems in pattern recognition are solved from training samples whose classes are unknown (not labeled). The unsupervised learning algorithm mainly comprises a principal component analysis method, an isometric mapping method, a local linear embedding method, a Laplace feature mapping method, a blackout local linear embedding method, a local tangent space arrangement method and the like.
2) Deep learning network: a new area in machine learning research has motivated the establishment and simulation of neural networks for analytical learning of the human brain, mimicking the mechanism of the human brain to interpret data such as images, sounds and text. Deep learning is one of unsupervised learning and comprises a multi-layer sensor with multiple hidden layers. Deep learning forms a more abstract class or feature of high-level representation properties by combining low-level features to discover a distributed feature representation of the data.
3) Face registration: the image registration model locates the feature points of the facial features in an image containing a human face. For example, a registration point of the mouth, which is a characteristic point of the mouth, is located in a face image, and the position of the mouth can be accurately located according to the registration point of the mouth.
4) Standard face image: the face-containing image with the registration points labeled for subsequent generation of a new face image sample from the standard face image. For example, the registration point of a certain face image is manually labeled, after the labeling is completed, the face image is used as a standard face image, and then a new face image sample is generated according to the registration point labeling of the standard face image.
The embodiment of the invention provides a facial feature extraction method and device based on artificial intelligence, electronic equipment and a computer-readable storage medium, which can automatically generate a large number of facial image samples with labels by remolding a standard facial image, improve the training efficiency of an image registration model and accurately extract facial features through the trained image registration model. The electronic device for facial feature extraction provided by the embodiment of the present invention may be a server, for example, a server deployed in the cloud, subdivides and reshapes a standard facial image according to registration point labels of the standard facial image to obtain a facial image sample, trains an image registration model based on the facial image sample and corresponding registration point labels, extracts facial features of a target facial image based on the trained image registration model, and performs processing such as makeup and beauty based on the facial features of the target facial image; the method includes the steps that a standard facial image is split and reshaped to obtain a facial image sample according to registration point labels of the standard facial image input by a user, an image registration model is trained based on the facial image sample and the corresponding registration point labels, facial features of a target facial image are extracted based on the trained image registration model, and subsequent treatments such as makeup and beauty are performed based on the facial features of the target facial image.
Referring to fig. 1, fig. 1 is a schematic view of an application scenario of a facialfeature extraction system 10 according to an embodiment of the present invention, a terminal 200 is connected to a server 100 through a network 300, and the network 300 may be a wide area network or a local area network, or a combination of the two.
The terminal 200 may be used to obtain the registration point labels of the standard facial image and the target facial image, for example, the user manually labels the registration points of the standard facial image through the terminal, and after the labeling is completed, the terminal automatically obtains the registration point labels of the standard facial image.
In some embodiments, the method for extracting facial features based on artificial intelligence provided by the embodiments of the present invention is integrated into a Software Development Kit (SDK), and the SDK is integrated into the terminal 200, so that the terminal 200 automatically generates facial image samples and corresponding registration point labels according to registration point labels of standard facial images, trains an image registration model, extracts facial features of a target facial image based on the trained image registration model, and performs subsequent cosmetic, beauty, recognition and other processes based on the facial features of the target facial image, for example, in an Application scenario of facial cosmetic, a cosmetic Application (APP) is installed on the terminal 200, and after a user inputs the target facial image through the cosmetic Application APP, the SDK (integrated with the method for extracting facial features based on artificial intelligence) of the cosmetic Application is labeled according to the registration point labels of the standard facial image, the standard face image is split and remolded to obtain a face image sample, an image registration model is trained based on the face image sample and corresponding registration point labels, facial features of a target face image are extracted based on the trained image registration model, the facial features of the target face image are displayed on a display interface 210 of the terminal 200, a user can perform makeup changing operation according to the displayed facial features of the target face image, for example, lips in the facial features are changed to red and lipstick, and the target face image after makeup changing is displayed on the display interface 210 of the terminal 200 in response to makeup changing operation of the user, so that accurate makeup changing of the facial features is realized, and makeup can be more natural.
In some embodiments, the terminal 200 may also call an Application Programming Interface (API) through the network 300 to send a request to the server 100, where the request includes a target facial image input by the user on the terminal 200, the server 100 automatically generates a facial image sample and a corresponding registration point label according to a registration point label of a standard facial image by using the artificial intelligence based facial feature extraction method provided by the embodiment of the present invention, trains an image registration model, extracts a facial feature of the target facial image based on the trained image registration model to perform subsequent cosmetic, beauty, identification and other processes, for example, in an Application scenario of facial cosmetic, a cosmetic Application is installed on the terminal 200, and after the user inputs the target facial image in the cosmetic Application, the terminal 200 calls the API to send the request including the target facial image to the server 100, after receiving the target face image, the server 100 extracts the facial features of the target face image based on the trained image registration model, and returns the facial features of the target face image to the makeup application, the user can perform makeup changing operation according to the facial features of the target face image displayed on the makeup application, for example, changing red lipstick on lips in the facial features, in response to the makeup changing operation of the user, the terminal 200 calls the API to send a makeup changing request to the server 100, the server 100 changes makeup for the target face image according to the makeup changing request to obtain a makeup changed target face image, and returns the makeup changed target face image to the makeup application, and the makeup changed target face image is displayed on the display interface 210 of the terminal 200 to realize accurate makeup changing of the facial features, so that the makeup is more natural; in an application scenario of the door access, when a certain user needs to open the door access, the terminal 200 calls the API to send an access request including a face image of the user to the server 100, after the server 100 receives the access request, the server 100 extracts facial features of the face image of the user based on the trained image registration model, and performs recognition and comparison on the facial features of the face image of the user and a face image (for example, a face that has been recorded in an access control system, such as a residential user or a company employee) in a database, where the face image of the user is matched with the face that has been registered with identity information, the door access is opened, the user is allowed to pass through, and thus, an unrelated person can be prevented from entering and exiting the door access at will.
The following describes a structure of an electronic device for facial feature extraction according to an embodiment of the present invention, where the electronic device for facial feature extraction may be various terminals, such as a mobile phone, a computer, and the like, and may also be a server 100 as shown in fig. 1.
Referring to fig. 2, fig. 2 is a schematic structural diagram of an electronic device 500 for facial feature extraction according to an embodiment of the present invention, and the electronic device 500 for facial feature extraction shown in fig. 2 includes: at least one processor 510, memory 550, at least one network interface 520, and a user interface 530. The various components in the electronic device 500 for facial feature extraction are coupled together by abus system 540. It is understood that thebus system 540 is used to enable communications among the components. Thebus system 540 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled asbus system 540 in fig. 2.
The Processor 510 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.
The user interface 530 includes one or more output devices 531 enabling presentation of media content, including one or more speakers and/or one or more visual display screens. The user interface 530 also includes one or more input devices 532, including user interface components to facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.
The memory 550 may comprise volatile memory or nonvolatile memory, and may also comprise both volatile and nonvolatile memory. The non-volatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 550 described in connection with embodiments of the invention is intended to comprise any suitable type of memory. Memory 550 optionally includes one or more storage devices physically located remote from processor 510.
In some embodiments, memory 550 can store data to support various operations, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below.
An operating system 551 including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;
a network communication module 552 for communicating to other computing devices via one or more (wired or wireless) network interfaces 520, exemplary network interfaces 520 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;
a display module 553 for enabling presentation of information (e.g., a user interface for operating peripherals and displaying content and information) via one or more output devices 531 (e.g., a display screen, speakers, etc.) associated with the user interface 530;
an input processing module 554 to detect one or more user inputs or interactions from one of the one or more input devices 532 and to translate the detected inputs or interactions.
In some embodiments, the artificial intelligence based facial feature extraction apparatus provided by the embodiments of the present invention may be implemented in software, and fig. 2 illustrates an artificial intelligence based facial feature extraction apparatus 555 stored in a memory 550, which may be software in the form of programs and plug-ins, etc., and includes a series of modules including a dissection module 5551, a remodeling module 5552, a training module 5553, an extraction module 5554, a registration module 5555, and a processing module 5556; the subdivision module 5551, the remodeling module 5552, the training module 5553, the extraction module 5554, the registration module 5555, and the processing module 5556 are used for implementing the facial feature extraction function provided by the embodiment of the invention.
As can be understood from the foregoing, the artificial intelligence-based facial feature extraction method provided by the embodiment of the present invention may be implemented by various types of electronic devices for facial feature extraction, such as an intelligent terminal and a server.
The following describes the artificial intelligence-based facial feature extraction method provided by the embodiment of the present invention in conjunction with an exemplary application and implementation of the server provided by the embodiment of the present invention. Referring to fig. 3A, fig. 3A is a schematic flowchart of a method for extracting facial features based on artificial intelligence according to an embodiment of the present invention, which is described with reference to the steps shown in fig. 3A.
Instep 101, the standard face image is subdivided according to the registration point labels of the standard face image, and a clustering result of the standard face image is obtained.
For example, a user can input a standard facial image on an input interface of a terminal, manually label registration points of the standard facial image, after the labeling is completed, the terminal automatically acquires registration point labels of the standard facial image, the terminal can forward the registration point labels of the standard facial image to a server, and the server triangulates the standard facial image according to the registration point labels of the standard facial image to obtain a tiling result of the standard facial image.
In some embodiments, the subdividing the standard face image according to the registration point labels of the standard face image to obtain a tiling result of the standard face image includes: connecting any three registration points in the registration point labels of the standard face image to obtain a triangular patch corresponding to any three registration points in the standard face image; combining the triangular patches to obtain a patch result of the standard face image; wherein any two triangular patches do not intersect or intersect at a common edge.
Illustratively, the standard facial image is split into blocks of triangular patches (triangular patches) according to their registration point labels, where each triangular patch is a curved-edge triangle, and any two triangular patches do not intersect or intersect on a common edge (two or more edges cannot intersect at the same time).
For example, a standard facial image is subdivided according to Delaunay triangulation method to obtain a tiling result of the standard facial image, wherein any triangular patch in the tiling result only contains a Delaunay edge, and the inside of the circumscribed circle of any triangular patch does not contain any registration point in the registration point labels. When any edge e (two end points of the edge e are the registration points a and b) does not contain any other registration point in the registration point labels (the registration points except the registration points a and b in the registration point labels) in the circle of the circumscribed circle formed by the edge e, the edge e is the Delaunay edge.
Instep 102, according to the tiling result of the standard facial image, the spatial distribution of the registration point labels of the standard facial image is reshaped to obtain a facial image sample.
After the server obtains a tiling result of the standard facial image, five sense organs of the standard facial image are reshaped to change the spatial distribution of registration point labels in the standard facial image so as to form a facial image sample different from the standard facial image, so that an image registration model is efficiently trained through the automatically generated facial image sample.
In some embodiments, the reshaping the spatial distribution of the registration point labels of the standard facial image according to the tiling result of the standard facial image to obtain the facial image sample includes: adjusting the size and the position of a plurality of triangular patches in a patch result of the standard face image; each triangular patch corresponds to the spatial distribution of three registration point labels in the standard face image; mapping the adjusted triangular patches based on the textures of the triangular patches to obtain new triangular patches which correspond to the triangular patches and contain the textures; and combining the unadjusted triangular patch in the standard face image with the new triangular patch to obtain a face image sample.
For example, the standard face image has N triangular patches in the patch result, the size and position of K triangular patches are adjusted to obtain K adjusted triangular patches, the K adjusted triangular patches are mapped based on the texture of the K triangular patches to obtain new triangular patches including the texture corresponding to the K triangular patches, and N-K unadjusted triangular patches are combined with the K new triangular patches to obtain a face image sample, where K is a natural number less than or equal to N. And aiming at any one triangular patch j in the K triangular patches, adjusting the size and the position of the triangular patch j to obtain an adjusted triangular patch, and mapping the adjusted triangular patch based on the texture of the triangular patch j to obtain a new triangular patch corresponding to the triangular patch j and containing the texture.
In some embodiments, the adjusting of the size and the position for a plurality of triangular patches in the patch result of the standard face image includes: performing the following for any of a plurality of triangular patches: determining the coordinates of three registration points in the triangular patch; and transforming the coordinates of at least one registration point in the three registration points so as to correspondingly transform the size and the position of the triangular patch.
The size and the position of the triangular patch can be adjusted by changing the coordinates of the registration points in the triangular patch. For any triangular patch needing to change the size and the position, three registration points (end points) in the triangular patch need to be determined, and the size and the position of the triangular patch can be transformed by changing the coordinates of at least one registration point in the three registration points.
In some embodiments, performing a reshaping process on the standard facial image to obtain a facial image sample according to a tiling result of the standard facial image includes: according to the facepiecing result of the standard facial image, carrying out remodeling treatment on the standard facial image to obtain a remodeled standard facial image; and integrally adjusting the reshaped standard facial image to obtain a facial image sample.
Illustratively, after the standard facial image is reshaped, the spatial distribution of the registration points in the standard facial image is changed to obtain the reshaped standard facial image. In order to improve the sample of the amplified facial image, the sample of the facial image can be obtained by integrally adjusting the reshaped standard facial image without changing the spatial distribution of the registration points of the reshaped standard facial image.
In some embodiments, the integrally adjusting the reshaped standard facial image to obtain the facial image sample comprises: taking the imaging area of the reshaped standard facial image as a reference area, and executing at least one of the following adjustment processes on the reshaped standard facial image: reducing or enlarging the remolded standard face image in equal proportion, and taking the part in the reference area after reduction or enlargement as a face image sample; rotating the reshaped standard face image clockwise or counterclockwise by taking any position in the reshaped standard face image as an axis, and taking a part in the reference area after rotation as a face image sample; performing translation in at least one direction on the remolded standard facial image, and taking a part in the reference region after the translation as a facial image sample; and taking at least part of the remolded standard face image as a noise adding area, adding the color values of the pixel points in the noise adding area and the color values of the noise to be added, and taking the standard face image obtained after adding the color values as a face image sample.
The overall adjustment may include scaling, rotating, translating, and adding noise, and the overall adjustment does not change the spatial distribution of the registration points in the image, and the reshaped standard facial image is subjected to the overall adjustment (at least one of scaling, rotating, translating, and adding noise) to obtain the facial image sample. Setting a reference area, for example, a 100 × 100 box, for the imaging area of the standard face image after the standard face image is reshaped, wherein the scaling means that the standard face image after the reshaping is enlarged or reduced according to a set proportion, and using the part still in the 100 × 100 box after the enlargement or reduction as a face image sample; the rotation is to rotate the standard face image after reshaping clockwise or counterclockwise by taking the central point of the standard face image after reshaping as an axis (or any point in the standard face image after reshaping), and take the part still in the 100 × 100 box after rotation as a face image sample; parallel refers to any direction of translation (e.g., left, right, up, down, etc.) of the reshaped standard facial image, and the portion still within the 100 x 100 box after translation is used as the facial image sample. The method includes the steps of taking at least a partial area of a reshaped standard facial image as a noise adding area (or the whole reshaped standard facial image), generating a random number (a color value of a noise to be added) according to a specified noise type (such as Gaussian noise, salt and pepper noise and the like), adding the random number (noise) and a source pixel value (a color value of a pixel point in the noise adding area) of the reshaped standard facial image, and adding the color value to obtain the reshaped standard facial image as a facial image sample.
Instep 103, the registration point labels of the standard face image are synchronized according to the remodeling process to obtain the registration point labels corresponding to the face image sample.
After the server obtains the face image sample, the registration point markers of the standard face image are synchronized according to remodeling processing to obtain the registration point markers corresponding to the face image sample, so that the registration points of the face image sample do not need to be labeled manually, the cost of sample labeling is greatly reduced, and the generation efficiency of sample labeling is improved.
Referring to fig. 3B, fig. 3B is an optional flowchart of the artificial intelligence-based facial feature extraction method according to the embodiment of the present invention, and fig. 3B shows that fig. 3A further includessteps 106 and 108. The following iterative process is performed on the face image sample: instep 106, adjusting the size and position of a plurality of triangular patches in the face image sample; each triangular patch corresponds to the spatial distribution of three registration point labels in the face image sample; instep 107, based on the textures of the triangular patches of the face image sample, performing mapping processing on the adjusted triangular patches to obtain new triangular patches containing textures corresponding to the triangular patches; instep 108, the unadjusted triangular patch in the face image sample is combined with the new triangular patch, and the combined result is determined as a new face image sample.
Illustratively, after the server subdivides and reshapes the standard facial image to obtain a facial image sample, the server may further perform reshaping iteration on the facial image sample to generate more new facial image samples, that is, adjust the sizes and positions of a plurality of triangular patches in the facial image sample, map the adjusted triangular patches based on the textures of the plurality of triangular patches of the facial image sample to obtain new triangular patches corresponding to the plurality of triangular patches and including textures, combine the unadjusted triangular patches and the new triangular patches in the facial image sample to obtain a new facial image sample, synchronize registration point labels of the facial image sample according to the reshaping iteration to obtain registration point labels corresponding to the new facial image sample. And sequentially generating new face image samples in an iterative mode to obtain layered face image samples, so that the image registration model is better trained through the layered face image samples, and the registration accuracy of the image registration model is improved.
Instep 104, the image registration model is trained based on the facial image samples and the corresponding registration point labels.
After the server automatically generates a large number of facial image samples with labels according to the standard facial image, the image registration model can be trained according to the facial image samples and the corresponding registration point labels to obtain a trained image registration model, so that the image registration model after subsequent training can register the target facial image to obtain the registration point of the target facial image. After the server generates the face image sample, determining a loss function value of the image registration model according to the face image sample and the corresponding registration point label, judging whether the loss function value exceeds a preset threshold value, determining an error signal of the image registration model based on the loss function when the loss function value exceeds the preset threshold value, reversely propagating error information in the image registration model, and updating model parameters of each layer in the propagation process.
Referring to fig. 3C, fig. 3C is an optional flowchart of the artificial intelligence based facial feature extraction method according to the embodiment of the present invention, and fig. 3C illustrates thatstep 104 in fig. 3A can be implemented bysteps 1041 to 1046 illustrated in fig. 3C. Instep 1041, performing registration processing on the face image sample through the image registration model to obtain a predicted registration point of the face image sample; instep 1042, a first loss function of the image registration model is constructed according to the predicted registration point of the face image sample and the registration point label of the face image sample; instep 1043, a second loss function of the image registration model is constructed according to the predicted registration point of the facial image sample, the type of the five sense organs to which the predicted registration point belongs, and the registration point label of the facial image sample; instep 1044, a third loss function of the image registration model is constructed according to the predicted registration point of the face image sample and the predicted registration point of the mirror image sample corresponding to the face image sample; instep 1045, performing weighted summation on the first loss function, the second loss function, and the third loss function to obtain an overall loss function of the image registration model; instep 1046, the parameters of the image registration model are updated until the overall loss function converges, and the updated parameters of the image registration model when the overall loss function converges are used as the parameters of the trained image registration model.
Illustratively, after a server obtains a face image sample, a registration process is performed on the face image sample through an image registration model to obtain a predicted registration point of the face image sample, a first loss function of the image registration model is constructed through the predicted registration point of the face image sample and a registration point label of the face image sample, a second loss function of the image registration model is constructed according to the predicted registration point of the face image sample, a five-sense organ type to which the predicted registration point belongs and the registration point label of the face image sample, a third loss function of the image registration model is constructed according to the predicted registration point of the face image sample and a predicted registration point of a mirror image sample corresponding to the face image sample, and the first loss function, the second loss function and the third loss function are subjected to weighted summation to obtain an overall loss function of the image registration model, and finally, updating the parameters of the image registration model until the overall loss function is converged, and taking the updated parameters of the image registration model when the overall loss function is converged as the parameters of the trained image registration model. The three loss functions are insensitive to abnormal values (predicted registration points of face image samples with large errors), so that the trained image registration model is more robust.
In some embodiments, constructing a first loss function of the image registration model from the predicted registration points of the facial image samples and the registration point annotations of the facial image samples comprises: in the registration point marking, marking coordinates of registration points corresponding to the predicted registration points of the face image sample are determined; and determining the absolute value of the difference value of the coordinates of the predicted registration points of the face image samples and the annotated coordinates as a first loss function of the image registration model.
Determining the absolute value of the difference value between the coordinates of the prediction registration point of the face image sample and the labeling coordinates as a first loss function of the image registration model, wherein the formula of the first loss function is
Wherein l
iThe predicted registration points are represented as a function of,
representing true registration points, i.e. registration points in the face image sample, (x)
i,y
i) Is represented by
iIs determined by the coordinate of (a) in the space,
to represent
K denotes the total number of registration points in the registration point labeling of the face image sample.
In some embodiments, constructing a second loss function of the image registration model based on the predicted registration points of the facial image sample, the type of the five sense organs to which the predicted registration points belong, and the registration point labels of the facial image sample comprises: in the registration point marking, determining the marking coordinate of a registration point corresponding to the predicted registration point of the face image sample, and determining the absolute value of the difference value between the coordinate of the predicted registration point of the face image sample and the marking coordinate; and determining the ratio of the absolute value to the number of registration points contained in the type of the five sense organs to which the predicted registration points belong as a second loss function of the image registration model.
After the absolute value of the difference value between the coordinate of the predicted registration point of the face image sample and the labeled coordinate is determined, the ratio of the absolute value to the number of registration points contained in the type of the five sense organs to which the predicted registration point belongs is determined as a second loss function, and the formula of the second loss function is
Five sense organs Ω ═ left eyebrow, right eyebrow, left eye, right eye, nose, mouth, contour }, where l
jThe predicted registration points are represented as a function of,
representing true registration points, i.e. registration points in facial image samples, K
iIndicating the number of registration points contained in the type of facial features to which the predicted registration point belongs.
In some embodiments, determining a ratio of the absolute value to a number of registration points comprised by a type of five sense organs to which the predicted registration points belong as a second loss function of the image registration model comprises: and weighting the ratio of the absolute value to the number of the registration points contained in the type of the five sense organs to which the predicted registration points belong based on the weight of the type of the five sense organs to which the predicted registration points belong, and determining the weighting result as a second loss function of the image registration model.
Taking the above example as a reference, after determining the absolute value of the difference between the coordinates of the predicted registration point of the facial image sample and the annotated coordinates, weighting the ratio of the absolute value to the number of registration points included in the type of the five sense organs to which the predicted registration point belongs based on the weight of the type of the five sense organs to which the predicted registration point belongs, and determining the weighting result as a second loss function of the image registration model, where the formula of the second loss function is
Ω ═ left eyebrow, right eyebrow, left eye, right eye, nose, mouth, contour }, where l
jThe predicted registration points are represented as a function of,
representing true registration points, i.e. registration points in facial image samples, K
iIndicates the number of registration points, w, contained in the type of facial features to which the predicted registration point belongs
iRepresenting the weight of the type of the five sense organs to which the predicted registration point belongs. Different weights are assigned to different five sense organs, so that the trained image registration model has different attention to the five sense organs, and the image registration model is matched with the five sense organs with high weightsIt is accurate.
In some embodiments, before constructing the third loss function of the image registration model, the method further comprises: carrying out registration processing on a mirror image sample of the face image sample through an image registration model to obtain a predicted registration point of the mirror image sample; constructing a third loss function of the image registration model according to the predicted registration point of the face image sample and the predicted registration point of the mirror image sample corresponding to the face image sample, wherein the third loss function comprises: determining mirror image coordinates of a predicted registration point corresponding to the predicted registration point of the face image sample in the mirror image sample; and determining the absolute value of the difference value of the coordinates of the predicted registration point of the face image sample and the mirror image coordinates as a third loss function of the image registration model.
After the prediction registration point of the mirror image sample is determined, determining the absolute value of the difference value between the coordinate of the prediction registration point of the face image sample and the mirror image coordinate as a third loss function of the image registration model, wherein the formula of the third loss function is
Wherein (x)
i,y
i) Coordinates (x ') of predicted registration point representing face image sample'
mi,y′
mi) Is represented by (x)
i,y
i) Coordinates of the corresponding mirror image sample, ((W-x'
mi),y′
mi) Is represented by (x'
mi,y′
mi) W represents the width of the mirror image sample or the face image sample, and K represents the total number of registration points in the registration point labels of the face image sample. And constructing a third loss function through the mirror image characteristics of the mirror image samples, so that the image registration model trained through the third loss function has more stable output.
Describing backward propagation, inputting training sample data into an input layer of a neural network model, passing through a hidden layer, finally reaching an output layer and outputting a result, which is a forward propagation process of the neural network model, wherein because the output result of the neural network model has an error with an actual result, an error between the output result and the actual value is calculated and is propagated backward from the output layer to the hidden layer until the error is propagated to the input layer, and in the process of backward propagation, the value of a model parameter is adjusted according to the error; and continuously iterating the process until convergence, wherein the image registration model belongs to the neural network model.
Instep 105, feature extraction processing is performed on the target face image based on the trained image registration model, and the extracted registration points are used as the face features of the target face image.
The method comprises the steps that a server automatically generates a large number of face image samples with labels, trains an image registration model to obtain a trained image registration model, then registers a target face image according to the trained image registration model to extract registration points of the target face image, and the extracted registration points are used as face features of the target face image. Wherein, the extracted registration points are the registration points of five sense organs, namely the key points included by the left eyebrow, the right eyebrow, the left eye, the right eye, the nose, the mouth and the contour.
Now, the artificial intelligence based facial feature extraction method provided by the embodiment of the present invention has been described with reference to the exemplary application and implementation of the server provided by the embodiment of the present invention, and the following continues to describe a scheme for implementing facial feature extraction by cooperation of the modules in the artificial intelligence based facial feature extraction apparatus 555 provided by the embodiment of the present invention.
The subdivision module 5551 is configured to perform subdivision processing on a standard face image according to registration point labels of the standard face image, so as to obtain a tiling result of the standard face image; a remodeling module 5552, configured to perform remodeling processing on spatial distribution of registration point labels of the standard facial image according to a tiling result of the standard facial image to obtain a facial image sample, and synchronize the registration point labels of the standard facial image according to the remodeling processing to obtain registration point labels corresponding to the facial image sample; a training module 5553, configured to train an image registration model based on the facial image sample and the corresponding registration point label; an extracting module 5554, configured to perform feature extraction processing on the target face image based on the trained image registration model, and use the extracted registration point as the face feature of the target face image.
In some embodiments, the subdivision module 5551 is further configured to connect any three registration points in the registration point labels of the standard facial image to obtain a triangular patch corresponding to the any three registration points in the standard facial image; combining the triangular patches to obtain a patch result of the standard face image; wherein any two of the triangular patches do not intersect or intersect at a common edge.
In some embodiments, the reshaping module 5552 is further configured to perform size and position adjustment for a plurality of triangular patches in the tiling result of the standard facial image; carrying out mapping processing on the adjusted triangular patches based on the textures of the triangular patches to obtain new triangular patches which correspond to the triangular patches and contain the textures; each triangular patch corresponds to the spatial distribution of three registration point labels in the standard face image; and combining the unadjusted triangular patch in the standard face image with the new triangular patch to obtain a face image sample.
In some embodiments, the reshaping module 5552 is further for performing the following for any of the plurality of triangular patches: determining coordinates of three registration points in the triangular patch; and transforming the coordinates of at least one registration point in the three registration points so as to correspondingly transform the size and the position of the triangular patch.
In some embodiments, the reshaping module 5552 is further configured to perform reshaping processing on the standard facial image according to a facetiling result of the standard facial image, so as to obtain a reshaped standard facial image; and integrally adjusting the reshaped standard facial image to obtain a facial image sample.
In some embodiments, the reshaping module 5552 is further configured to perform at least one of the following for the reshaped standard facial image: carrying out scaling processing on the remolded standard face image; rotating the remolded standard face image; performing translation processing on the remolded standard face image; performing fusion processing on the remolded standard face image and noise; and taking the adjusted standard face image as a face image sample.
In some embodiments, the training module 5553 is further configured to perform a registration process on the facial image sample through the image registration model, so as to obtain a predicted registration point of the facial image sample; constructing a first loss function of the image registration model according to the predicted registration point of the face image sample and the registration point mark of the face image sample; constructing a second loss function of the image registration model according to the predicted registration point of the facial image sample, the type of the five sense organs to which the predicted registration point belongs and the registration point mark of the facial image sample; constructing a third loss function of the image registration model according to the predicted registration point of the face image sample and the predicted registration point of the mirror image sample corresponding to the face image sample; performing weighted summation on the first loss function, the second loss function and the third loss function to obtain an overall loss function of the image registration model; and updating the parameters of the image registration model until the overall loss function is converged, and taking the updated parameters of the image registration model when the overall loss function is converged as the parameters of the trained image registration model.
In some embodiments, the training module 5553 is further configured to determine, in the registration point labeling, labeling coordinates of a registration point corresponding to a predicted registration point of the facial image sample; and determining the absolute value of the difference value of the coordinates of the predicted registration point of the face image sample and the annotated coordinates as a first loss function of the image registration model.
In some embodiments, the training module 5553 is further configured to determine, in the registration point labeling, an labeled coordinate of a registration point corresponding to the predicted registration point of the facial image sample, and determine an absolute value of a difference between the coordinate of the predicted registration point of the facial image sample and the labeled coordinate; and determining the ratio of the absolute value to the number of registration points contained in the type of the five sense organs to which the predicted registration points belong as a second loss function of the image registration model.
In some embodiments, the training module 5553 is further configured to weight a ratio of the absolute value to the number of registration points included in the type of the facial organ to which the predicted registration point belongs based on a weight of the type of the facial organ to which the predicted registration point belongs, and determine a weighting result as a second loss function of the image registration model.
In some embodiments, the artificial intelligence based facial feature extraction device 555 further includes: a registration module 5555, configured to perform registration processing on a mirror image sample of the face image sample through the image registration model to obtain a predicted registration point of the mirror image sample; the training module 5553 is further configured to determine, in the mirror image sample, mirror coordinates of a predicted registration point corresponding to a predicted registration point of the face image sample; determining an absolute value of a difference of coordinates of a predicted registration point of the facial image sample and the mirror image coordinates as a third loss function of the image registration model.
In some embodiments, the artificial intelligence based facial feature extraction device 555 further includes: a processing module 5556, configured to perform the following iterative processing on the face image sample: adjusting the size and position of a plurality of triangular patches in the face image sample; each triangular patch corresponds to the spatial distribution of three registration point labels in the face image sample; based on the textures of a plurality of triangular patches of the face image sample, carrying out mapping processing on the adjusted triangular patches to obtain new triangular patches which correspond to the triangular patches and contain the textures; and combining the unadjusted triangular patch in the face image sample with the new triangular patch, and determining a combination result as a new face image sample.
Embodiments of the present invention also provide a computer-readable storage medium storing executable instructions, which when executed by a processor, will cause the processor to perform an artificial intelligence based facial feature extraction method provided by embodiments of the present invention, for example, the artificial intelligence based facial feature extraction method shown in fig. 3A-3C.
In some embodiments, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EP ROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.
In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
By way of example, executable instructions may correspond, but do not necessarily correspond, to files in a file system, may be stored in a portion of a file that holds other programs or data, such as in one or more scripts stored in a hypertext markup language (H TM L, HyperTextMarkup L engine) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).
By way of example, executable instructions may be deployed to be executed on one computing device (a device that includes a smart terminal and a server), or on multiple computing devices located at one site, or distributed across multiple sites and interconnected by a communication network.
In the following, an exemplary application of the embodiments of the present invention in a practical application scenario will be described.
The embodiment of the present invention may be applied to an application scenario of facial makeup, as shown in fig. 1, a terminal 200 is connected to a server 100 deployed in a cloud via a network 300, a makeup application is installed on the terminal 200, the terminal 200 invokes an API to send a request including a target face image to the server 100, the server 100 executes a facial feature extraction method (face registration technique) based on artificial intelligence provided in the embodiment of the present invention, extracts facial features of the target face image based on a trained image registration model, and feeds back the facial features of the target face image to the makeup application of the terminal 200, a user may perform a makeup change operation according to the facial features of the displayed target face image, for example, changing the mouth of the facial features to red and the makeup application responds to the makeup change operation of the user, and the makeup-changed target face image is displayed on a display interface 210 of the terminal 200, accurate makeup changing of facial features is achieved, so that makeup is more natural, and user experience is effectively improved.
The data amplification method (the method for generating the face image samples) is to add noise on the face image marked with the registration points or perform simple operations such as translation, rotation and scaling, which can help greatly reduce the cost and improve the registration accuracy, and quickly generate a large number of samples, but the coordinate distribution of the points corresponding to the samples is the same (namely, the relative positions of the same feature point A and the same feature point B in different samples are completely consistent), the spatial distribution of the output points is not changed, the diversity is lacked, the fitting is easily caused to be excessive, and the loss function of L2-loss is adopted in the related technology, namely, the coordinate distribution of the feature point and the true value point (the feature point of the feature point A and the feature point of the feature point B in different samples is completely consistent), the spatial distribution of the output points is not changed, the bias is easily caused, and the prediction distance of the registration point is gradually reduced by adopting an abnormal model, namely, the registration distance is more approximate to the target value prediction distance of the registered point, namely, and the prediction distance of the abnormal image is more similar to the prediction distance L2 if the abnormal registration point is adopted.
As shown in fig. 4A, the face image is an original image (face image) without being manually labeled with feature points, as shown in fig. 4B, the face image is a face image with being manually labeled with feature points, wherein thefeature point 401 is an eye feature point manually labeled with. The data amplification method in the related art is to add noise to the labeled face image or to perform simple operations such as translation, rotation, scaling and the like. For example, the input of the deep convolutional neural network is 100 × 100 pixels, that is, a square region is taken from the original training image, and then the square region is scaled to 100 × 100 pixels, and meanwhile, the coordinates of the feature points are also scaled correspondingly, so that the coordinates of the feature points are still at corresponding positions in the 100 × 100 region. The square image area and the coordinate values are training samples of a group of face images. Fig. 5A-5D illustrate a translational amplification method, that is, a square region is translated to generate a training sample, asquare region 501 in fig. 5A corresponds to a training sample of 100 × 100 pixels shown in fig. 5B, asquare region 502 in fig. 5A corresponds to a training sample of 100 × 100 pixels shown in fig. 5C, and asquare region 503 in fig. 5A corresponds to a training sample of 100 × 100 pixels shown in fig. 5D; fig. 6A-6D illustrate a spin amplification method, i.e., a square region is spun to generate training samples, thesquare region 601 in fig. 6A corresponds to the training sample of 100 × 100 pixels shown in fig. 6B, thesquare region 602 in fig. 6A corresponds to the training sample of 100 × 100 pixels shown in fig. 6C, and thesquare region 603 in fig. 6A corresponds to the training sample of 100 × 100 pixels shown in fig. 6D; fig. 7A-7D illustrate a scaling amplification method, i.e., a square region is scaled to generate a training sample, thesquare region 701 in fig. 7A corresponds to the training sample of 100 × 100 pixels shown in fig. 7B, thesquare region 702 in fig. 7A corresponds to the training sample of 100 × 100 pixels shown in fig. 7C, and thesquare region 703 in fig. 7A corresponds to the training sample of 100 × 100 pixels shown in fig. 7D.
In order to solve the above problems, an embodiment of the present invention provides a method for extracting facial features based on artificial intelligence (a data amplification method for rapidly and efficiently changing data spatial distribution), in which, for a face image with feature point labeling already completed, after triangular tiling is performed on the face based on feature points, five sense organs are reshaped (for example, adjusting the distance between eyes and the thickness of lips) to rapidly obtain a large number of new training samples, and meanwhile, 3 loss functions different from the L2-loss method are adopted, and the same image registration model is used, and an image registration model trained based on the training samples generated by the embodiment of the present invention is better improved in accuracy compared with an image registration model trained by the conventional training samples.
As shown in fig. 8, in the embodiment of the present invention, registration points are manually marked on an image with a human face, triangulation is performed according to the registration points to obtain triangular patches, the size, the position, and the like of the triangular patches are adjusted to generate new human face registration points, an old triangular patch is mapped to a new triangular patch to obtain new training data (amplification data), iterative adjustment and mapping operations are performed in sequence to generate a large amount of new training data, and finally, an image registration model is trained using the amplification data. The specific implementation process comprises the following steps:
1) average face triangle tiling
Everyone has a unique face in the world, but the structure of five sense organs is fixed, the mouth is positioned below the nose, and the eyes are respectively arranged at two sides of the wing of the nose. Based on this fixed structure, an average feature point distribution map as shown in fig. 9 can be obtained, in which the feature points 901, 902, 903 are one of the feature points in the right eye. Delaunay triangulation is applied to the average feature point distribution diagram of fig. 9 to obtain a triangular patch result as shown in fig. 10, and for example, feature points 901, 902, and 903 in fig. 9 are divided to obtain atriangular patch 1001 in fig. 10.
2) Triangle tiling of specific faces
For the face image shown in fig. 4B, which has been manually labeled with feature points, the triangulation method shown in fig. 10 is applied to the face shown in fig. 4B, and a specific face triangle tiling result shown in fig. 11 is obtained, where atriangle patch 1101 is one triangle patch in the specific face triangle tiling result.
3) Amplification of five-sense remodeling data
Based on the triangular face flaking result shown in fig. 11, the distribution of five sense organs, i.e., five sense organs remodeling, can be changed by moving the feature points of five sense organs (changing the size and position of the triangular face flakes) to obtain new feature point coordinates. Meanwhile, the texture of the corresponding triangular patch in fig. 11 is mapped to the corresponding triangular patch after transformation, so as to obtain a new image corresponding to the feature point. Wherein feature points of the eyes in fig. 11 are moved to obtain a new training sample with reduced eyes as shown in fig. 12A, wherein theright eye 1101 in fig. 12A is relatively small compared to the right eye in fig. 11, and a new training sample with enlarged eyes as shown in fig. 12B, wherein the right eye 1102 in fig. 12B is relatively large compared to the right eye in fig. 11; moving the feature points of the eyes in fig. 11 to obtain a new training sample for binocular zooming-in as shown in fig. 13A, wherein thedistance 1301 of the eyes in fig. 13A is smaller relative to the distance of the eyes in fig. 11, and a new training sample for binocular zooming-out as shown in fig. 13B, wherein thedistance 1302 of the eyes in fig. 13B is larger relative to the distance of the eyes in fig. 11; moving feature points of the eyebrow in fig. 11 to obtain a new training sample in which the eyebrow is tightened inward as shown in fig. 14A, in which theright eyebrow 1401 in fig. 14A is tightened inward with respect to the right eyebrow in fig. 11, and a new training sample in which the eyebrow is relaxed outward as shown in fig. 14B, in which theright eyebrow 1402 in fig. 14B is relaxed outward with respect to the right eyebrow in fig. 11; the characteristic points of the mouth in fig. 11 were moved to obtain a new training sample of the lateral stretching of the mouth as shown in fig. 15A, in which themouth 1501 in fig. 15A is longer in lateral distance with respect to the mouth in fig. 11, and a new training sample of the lateral contraction of the mouth as shown in fig. 15B, in which themouth 1502 in fig. 15B is shorter in lateral distance with respect to the mouth in fig. 11. Other operations such as moving the mouth position up and down, face contour becoming fat and thin, etc. may generate new training samples.
After the training samples were generated, the image registration model (deep neural network) was trained with a loss function other than L2-loss, and after the localization results of the key points (registration points) of the five sense organs were obtained by the image registration model, L2-loss was calculated with the manually labeled registration points, i.e., L-loss
Wherein K represents the number of key points, l
iThe predicted value is represented by a value of the prediction,
representing true values (manually labeled alignment points) unlike L2-loss, the present invention employs the following 3 loss functions (three loss functions can be used in permutation and combination):
1. l1-loss, i.e.
Wherein (x)
i,y
i) Is represented by
iIs determined by the coordinate of (a) in the space,
to represent
L1-loss is less sensitive to outliers and more robust.
2. Weighted five sense organ point balance loss which can solve two problems of 1) inconsistent key points of different five sense organs, and 2) different tolerance to the precision of each five sense organ, as shown in FIG. 9, one eye of the key points has 8 points, one mouth has 18 points, the outline has 19 points, and L2-loss obviously focuses more on more points, therefore, the loss of the point in FIG. 9 needs to be equalized, and the weight equalization of each point is changed into the weight equalization of each five sense organ, namely, the weight equalization of each five sense organ is realized
|K
iAnd | represents the number of key points in the corresponding five sense organs. In the last step, five sense organs are balanced, but in practical use, tolerance of each part of error is different, for example, a mouth registration point is often used for making up lipstick, the deviation of the mouth registration point can cause great discount or strong sense of incongruity of lipstick making, a contour point is often used for face slimming, and a certain degree of deviation does not affect the face slimmingThe face thinning effect. Therefore, different weight can be assigned to different five sense organs, i.e. the weighted five sense organs point balance loss is calculated as
Ω ═ left eyebrow, right eyebrow, left eye, right eye, nose, mouth, contour }, where w is
iRepresent the weights of the different five sense organs.
3. And (2) mirror image loss, wherein the human face is of a symmetrical structure, so that the original image can be subjected to mirror image processing in the data amplification process, and the training data can be doubled. Note that, as for the original image with a label shown in fig. 16A and the mirror image with a label shown in fig. 16B, the points 1 (original image) and 4 (mirror image), the points 2 (original image) and 3 (mirror image), the points 3 (original image) and 2 (mirror image), and the points 4 (original image) and 1 (mirror image) are symmetrical points to each other, and as for such input, the following relationships are given: 1) the x-coordinate of point 1 (original image) is the image width-x-coordinate of point 4 (mirror image); 2) the y-coordinate of the point 1 (original) is the y-coordinate of the point 4 (mirror image), and other symmetrical points have similar relationships. The formula for calculating the mirror loss is
Wherein, (x'
mi,y′
mi) Representing the sum of (x) in a mirror image
i,y
i) Coordinates of the corresponding keypoints. Through the mirror image characteristics of the mirror image samples, the image registration model trained through the mirror image loss has more stable output.
Therefore, the image registration network is trained by using different data amplification modes and new loss functions, and Root Mean Square Error (RMSE) is obtained through testing, and the smaller the Error value, the higher the accuracy is, as shown in table 1:
TABLE 1
| Data amplification method | RMSE |
| Data amplification method in related art | 2.21 |
| Data amplification method of the embodiment of the invention | 1.83 |
| Data amplification + three loss functions in the embodiment of the invention | 1.72 |
As can be seen from table 1, the embodiment of the present invention significantly improves the registration accuracy with a small cost and a limited data amount.
In summary, the embodiment of the invention is based on a small amount of artificially labeled training samples, can rapidly generate a large amount of training samples, can obviously improve the precision of face registration by combining three loss functions, and can obtain better application effect in practical application, wherein the precision of a registration task is improved, the precision of a registration model can be obviously improved by using an amplification mode of the embodiment of the invention under the condition of certain artificial labeling amount, the labor cost is saved, only one characteristic point of a face picture needs to be labeled manually, tens of hundreds of different training samples can be automatically generated by the embodiment of the invention, and a large amount of training samples are constructed by combining an amplification mode in related technology, the diversity of the training samples is increased, the distribution of points cannot be changed by an amplification mode in related technology, the distribution of points can be changed by the amplification method of the embodiment of the invention, the diversity of the training samples is increased, great help is provided for the accuracy of training samples, the consistency of the training samples is enhanced, different people labeling errors can be provided for the same face picture, the amplification mode of the embodiment of the invention not only increases the data amount, but also can ensure that the generated by the same training sample has great help of the consistent gain distribution of different training samples from the application points, and the application of the same facial picture, and the gain loss of the training samples is increased from 351, and the balanced application of the original image mining gain loss, and the foundation, and the balance of the foundation of the same training samples.
The above description is only an example of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present invention are included in the protection scope of the present invention.