Movatterモバイル変換


[0]ホーム

URL:


CN111553284B - Face image processing method, device, computer equipment and storage medium - Google Patents

Face image processing method, device, computer equipment and storage medium
Download PDF

Info

Publication number
CN111553284B
CN111553284BCN202010357461.XACN202010357461ACN111553284BCN 111553284 BCN111553284 BCN 111553284BCN 202010357461 ACN202010357461 ACN 202010357461ACN 111553284 BCN111553284 BCN 111553284B
Authority
CN
China
Prior art keywords
facial
dimensional
face
expression
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010357461.XA
Other languages
Chinese (zh)
Other versions
CN111553284A (en
Inventor
王骞
周满
赵艺
李琦
李丰廷
沈超
毕明伟
丁守鸿
黄飞跃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Tencent Technology Shenzhen Co Ltd
Wuhan University WHU
Original Assignee
Tsinghua University
Tencent Technology Shenzhen Co Ltd
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University, Tencent Technology Shenzhen Co Ltd, Wuhan University WHUfiledCriticalTsinghua University
Priority to CN202010357461.XApriorityCriticalpatent/CN111553284B/en
Publication of CN111553284ApublicationCriticalpatent/CN111553284A/en
Application grantedgrantedCritical
Publication of CN111553284BpublicationCriticalpatent/CN111553284B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本申请涉及一种人脸图像处理方法、装置、计算机设备和存储介质。所述方法包括:获取第一用户的第一人脸图像;基于所述第一人脸图像的人脸特征生成对应的三维投影图像;获取第二用户的第二人脸视频,基于所述第二人脸视频中的各第二人脸图像提取人脸表情特征;将提取获得的各所述人脸表情特征分别融合至所述三维投影图像中进行表情重构,获得合成人脸视频;将所述合成人脸视频投影至三维实体模型进行播放,经过投影的所述三维实体模型用于人脸识别。采用本方法能够有效提高人脸识别的验证效率。

The present application relates to a facial image processing method, device, computer equipment and storage medium. The method comprises: obtaining a first facial image of a first user; generating a corresponding three-dimensional projection image based on the facial features of the first facial image; obtaining a second facial video of a second user, extracting facial expression features based on each second facial image in the second facial video; fusing each extracted facial expression feature into the three-dimensional projection image to reconstruct the expression and obtain a synthetic facial video; projecting the synthetic facial video onto a three-dimensional entity model for playback, and using the projected three-dimensional entity model for facial recognition. The use of this method can effectively improve the verification efficiency of facial recognition.

Description

Face image processing method, device, computer equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a face image processing method, a device, a computer device, and a storage medium.
Background
Face recognition is a biological recognition technology for carrying out identity recognition based on facial feature information of people. With the rapid development of computer technology, identity authentication is performed by adopting face recognition technology in more and more application scenes. In order to verify the recognition effect of the face recognition system, face recognition can be performed through a real test user or a 3D mask model. These current verification methods typically require the recruitment of a large number of volunteer users, or the customization of a large number of 3D silicone masks, which requires significant resource costs, higher costs, and lower verification efficiency.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a face image processing method, apparatus, computer device, and storage medium that can effectively improve the verification efficiency of face recognition.
A face image processing method, the method comprising:
acquiring a first face image of a first user;
Generating a corresponding three-dimensional projection image based on the face features of the first face image;
Acquiring a second face video of a second user, and extracting facial expression features based on each second face image in the second face video;
respectively fusing each extracted facial expression characteristic into the three-dimensional projection image for carrying out expression reconstruction to obtain a synthetic facial video;
And projecting the synthesized face video to a three-dimensional solid model for playing, wherein the projected three-dimensional solid model is used for face recognition.
A face image processing apparatus, the apparatus comprising:
The image acquisition module is used for acquiring a first face image of the first user;
the image conversion module is used for generating a corresponding three-dimensional projection image based on the face characteristics of the first face image;
the expression extraction module is used for acquiring a second face video of a second user and extracting facial expression features based on each second face image in the second face video;
The expression reconstruction module is used for respectively fusing each extracted facial expression characteristic into the three-dimensional projection image to carry out expression reconstruction, so as to obtain a synthetic facial video;
And the video projection module is used for projecting the synthesized face video to a three-dimensional solid model for playing, and the projected three-dimensional solid model is used for face recognition.
A computer device comprising a memory storing a computer program and a processor which when executing the computer program performs the steps of:
acquiring a first face image of a first user;
Generating a corresponding three-dimensional projection image based on the face features of the first face image;
Acquiring a second face video of a second user, and extracting facial expression features based on each second face image in the second face video;
respectively fusing each extracted facial expression characteristic into the three-dimensional projection image for carrying out expression reconstruction to obtain a synthetic facial video;
And projecting the synthesized face video to a three-dimensional solid model for playing, wherein the projected three-dimensional solid model is used for face recognition.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
acquiring a first face image of a first user;
Generating a corresponding three-dimensional projection image based on the face features of the first face image;
Acquiring a second face video of a second user, and extracting facial expression features based on each second face image in the second face video;
respectively fusing each extracted facial expression characteristic into the three-dimensional projection image for carrying out expression reconstruction to obtain a synthetic facial video;
And projecting the synthesized face video to a three-dimensional solid model for playing, wherein the projected three-dimensional solid model is used for face recognition.
According to the face image processing method, the face image processing device, the computer equipment and the storage medium, after the first face image of the first user is acquired, the corresponding three-dimensional projection image is generated based on the face characteristics of the first face image, so that the face image which is used for being projected to the three-dimensional entity model and showing the entity face effect can be effectively generated. The second facial video of the second user is obtained, after facial expression features are extracted based on each second facial image in the second facial video, each facial expression feature obtained through extraction is respectively fused into a three-dimensional projection image for carrying out expression reconstruction, so that the expression features in the three-dimensional projection image can be accurately and effectively subjected to expression reconstruction, and the synthetic facial video which contains expression response and has higher authenticity can be effectively generated. And projecting the synthesized face video to a three-dimensional solid model for playing, wherein the projected three-dimensional solid model is used for face recognition. The effect of the face recognition system can be effectively verified by carrying out face recognition on the projection of the synthesized face video based on the fusion of the face of the first user and the expression of the second user in the three-dimensional solid model. By generating the synthetic face video with higher accuracy and authenticity and projecting the synthetic face video onto the reusable three-dimensional entity model for face recognition, the verification cost can be effectively saved, and the efficiency of a face recognition system can be effectively improved.
Drawings
FIG. 1 is an application environment diagram of a face image processing method in one embodiment;
FIG. 2 is a flow chart of a face image processing method in one embodiment;
FIG. 3 is a schematic diagram of a process of performing expression reconstruction by a person according to one embodiment;
FIG. 4 is a schematic diagram of video projection of a synthetic face onto a three-dimensional solid model in one embodiment;
FIG. 5 is a flowchart of a face image processing method according to another embodiment;
FIG. 6 is a flowchart of a face image processing method according to another embodiment;
FIG. 7 is a flowchart of a face image processing method according to still another embodiment;
FIG. 8 is a flowchart of a face image processing method in one embodiment;
FIG. 9 is a block diagram showing a face image processing apparatus in one embodiment;
fig. 10 is an internal structural view of a computer device in one embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The scheme provided by the embodiment of the application relates to the technology of biological feature recognition, computer vision, image processing and the like based on artificial intelligence. Artificial intelligence (ARTIFICIALINTELLIGENCE, AI) is a theory, technology, and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend, and extend human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
Computer Vision (CV) is a science of studying how to "look" a machine, and more specifically, to replace human eyes with a camera and a Computer to perform machine Vision such as recognition, following and measurement on a target, and further perform graphic processing, so that the Computer is processed into an image more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, and map construction, among others, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and others.
The face image processing method provided by the application can be applied to a terminal or a server, and can be understood to be also applied to a system comprising the terminal and the server and realized through interaction of the terminal and the server. Referring to fig. 1, the method and the device can be applied to the application environment shown in fig. 1 through the interaction implementation of a terminal and a server. Wherein the terminal 102 communicates with the server 104 via a network. The method comprises the steps of sending a first face image to a server 104, obtaining the first face image of the first user by the server 104, generating a corresponding three-dimensional projection image based on the face features of the first face image, obtaining a second face video of a second user collected by a terminal 102 by the server 104, extracting facial expression features based on the second face images in the second face video, respectively fusing the facial expression features obtained by extraction into the three-dimensional projection image for expression reconstruction, obtaining a synthetic face video, and sending the synthetic face video to the terminal 102. The terminal 102 projects the synthesized face video to a three-dimensional solid model for playing, and the projected three-dimensional solid model is used for face recognition. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smartphones, tablet computers, and portable wearable devices, and the server 104 may be implemented by a stand-alone server or a server cluster composed of a plurality of servers.
In one embodiment, as shown in fig. 2, a face image processing method is provided, and the method is applied to the terminal in fig. 1 for illustration, and includes the following steps:
S202, a first face image of a first user is acquired.
The face image refers to an image comprising the face of the user. The face image may extract a face image corresponding to the face region from an image or video stream including the face of the user.
The terminal can acquire an image or video including the face of the first user from the local database in advance, and extract a first face image corresponding to the face region. The terminal can also climb an image or video comprising the first user face from the network, and extract a first face image corresponding to the face area as a first face image to be processed.
In one embodiment, the terminal may further directly collect an image or video including the face of the first user, and extract a first face image corresponding to the first user from the collected image or video.
After the terminal acquires the initial face image including the face of the first user, since various noise and random interference may exist in the acquired initial face image, image preprocessing such as gray correction, noise filtering and the like is required to be performed on the initial face image directly acquired. Specifically, the terminal first detects a face region in an initial face image, and preprocessing the image based on a face region detection result. For the face image, the preprocessing process may include ray compensation, gray level transformation, histogram equalization, normalization, geometric correction, filtering, sharpening, etc. for the initial face image. And after preprocessing the initial face image, obtaining a first face image of the first user.
S204, generating a corresponding three-dimensional projection image based on the face features of the first face image.
The first face image is a two-dimensional image, and the two-dimensional image is a plane image which does not contain depth information. The three-dimensional (i.e. 3D) refers to a space system formed by adding one direction vector into a two-dimensional system, and the three-dimensional projection image can represent an image of the three-dimensional model mapped on a two-dimensional (i.e. 2D) plane, and the corresponding stereoscopic image is displayed after the three-dimensional projection image is projected to the three-dimensional solid model.
The first face image comprises facial feature points such as eyebrows, eyes, a nose, lips, chin and the like, and the facial feature points are mainly distributed at facial contour positions such as the eyebrows, nose bridges, eye contours, lip contours, mandible and the like, so that the contour and the expression of the face can be reflected. After the terminal obtains a first face image of a first user, feature points of the first face image are extracted, face gestures are estimated according to the feature points, and face features of the first face image are obtained based on the extracted feature points and the face gestures.
The face image acquired by the terminal is a two-dimensional image. When the image is projected to the three-dimensional solid model for use, a mapping relation between the 2D characteristic points and the 3D characteristic points of the human face needs to be established. The terminal can respectively match and map the two-dimensional coordinate information of the characteristic points in the first face image with the characteristic points in the preset three-dimensional face model to obtain corresponding three-dimensional coordinate information.
Specifically, after the terminal obtains the face feature of the first face image, face modeling can be performed through three-dimensional model mapping based on the obtained two-dimensional face feature points, and three-dimensional coordinate information corresponding to the face feature can be obtained. Face modeling may be performed using, for example, a 3D deformation model. And correcting the face gesture according to the face azimuth by the terminal to obtain face characteristics after expression normalization. The terminal further generates a three-dimensional projection image based on the face features mapped and corrected through the three-dimensional deformation, and the obtained three-dimensional projection image can be directly used for projection on the three-dimensional entity model.
In one embodiment, the face image processing method further comprises the steps of identifying shielding feature points in the first face image, calculating a lateral angle according to the feature points in the face image, and correcting the shielding feature points based on the lateral angle to obtain the face features of the first face image.
If the face image is not the front face image, a part of face feature points may be blocked, and the part of face feature points need to be restored. The terminal can detect whether the blocked feature points exist in the first face image, and when the blocked feature points in the first face image are identified, the blocked feature points need to be corrected and restored. Assuming that a face is approximately regarded as a cylinder, when the expression and the gesture of the face are changed, the positions of the feature points move along some parallel circular arcs on the surface of the cylinder, namely along the intersecting line of a tangent plane passing through the feature points and parallel to the bottom surface and the surface of the cylinder, and based on the principle of the phenomenon, the occluded feature points can be restored.
Specifically, after the terminal extracts the feature points in the first face image, the terminal recognizes the shielding feature points in the first face image, and then calculates the side-offset angle according to the feature points in the face features. Specifically, the side-facing angle of the face can be determined based on the central angle corresponding to the connected arc line by taking the positions of the characteristic points on the face as the starting points and the positions on the face after deflection as the ending points and acquiring the characteristic points (such as the nose tip, the eyebrow and the like) which are not blocked in the face gesture conversion process. And correcting the shielding characteristic points according to the side-offset angle and the non-shielded characteristic points, so that the face characteristics of the first face image can be accurately and effectively obtained.
S206, acquiring a second face video of the second user, and extracting facial expression features based on each second face image in the second face video.
Wherein the second user refers to another user that is different from the first user. Video is an image in which the continuous change per second exceeds a preset frame. The second face video refers to a video including a second user face region, and the second face video includes a plurality of consecutive frames of images including faces. Specifically, the second face video may be a video corresponding to a face that is collected to perform an expression response according to a specified instruction. For example, in the face recognition process, the user is required to respond correspondingly according to the instruction of the identity verification system, and the response video is recorded for further processing.
The terminal acquires each frame of second face image in the second face video, and performs expression recognition and expression feature extraction on each frame of second face image so as to extract the facial expression features in the second face video.
The facial expression features represent expression state features corresponding to facial actions of the face, such as eye blinking, mouth opening, head shaking and the like. Expression recognition refers to separating a particular expression state from a given still image or motion video sequence. Expression feature extraction refers to positioning and extracting organ features, texture regions and predefined feature points of a human face.
In one embodiment, the terminal may acquire a static second facial image of each frame, identify the expression state feature of each frame, and further determine the facial expression feature of the whole second user according to the expression state features of all frames, so as to effectively extract the facial expression feature from the facial video.
In one embodiment, the terminal may directly analyze the face images in the continuous dynamic video sequence, and identify and advance the overall facial expression feature corresponding to the second user from the dynamic video frame sequence.
S208, respectively fusing the extracted facial expression features into three-dimensional projection images for carrying out expression reconstruction to obtain a synthetic facial video.
The expression reconstruction means that new expression characteristics are obtained by adjusting expression characteristics in the face so as to obtain a face image with a specific expression. Facial expression images corresponding to various expressions can be obtained by carrying out expression reconstruction on the basis of expression classification and quantization.
The synthetic face refers to a face which performs face fusion processing on two or more face images, and the generated face simultaneously has the appearance characteristics of the two faces. The face fusion technology is to recognize the image features, the facial features, the expression features and the like of the uploaded photo of the user through a face recognition algorithm, and fuse the recognized features onto the template picture.
Specifically, each obtained facial expression feature may be a continuous sequence of expression features. The three-dimensional projection image further comprises initial expression features corresponding to the first user face, wherein the initial expression features can be first facial expression features obtained after the expression normalization processing is carried out on the first facial image.
After extracting facial expression features of a second user from the second facial video, the terminal respectively fuses each facial expression feature into the three-dimensional projection image for carrying out expression reconstruction. And inputting the facial expression features into a three-dimensional projection image of the first user, performing expression feature conversion in a corresponding expression region, and converting feature points in the three-dimensional projection image into facial expression features to obtain a three-dimensional facial image with the expression features. The obtained three-dimensional face image is a synthetic face image.
For example, a subspace deformation transmission mode can be adopted, and the facial expression characteristics of the second user are transmitted to the three-dimensional projection image corresponding to the first user in real time on the premise of not affecting the target facial characteristics. Assuming that the facial features of the first user and the facial features of the second user are relatively fixed, after the facial expression features are extracted from the facial video of the first user, the facial expression features, the three-dimensional projection image and the initial expression features in the three-dimensional projection image are used as input for in-process conversion, and the target three-dimensional facial image with the expression is directly output in a reduced subspace of the parameter prior. Therefore, the expression reconstruction can be accurately and effectively carried out on the expression characteristics in the three-dimensional projection image.
Because the dynamic expression features of the continuous frames are extracted, the obtained facial expression features are respectively fused into the three-dimensional projection images for carrying out expression reconstruction, and then the three-dimensional facial images of the continuous frames including the facial expressions can be obtained. The terminal then generates a composite face video from the three-dimensional face images of the successive frames, thereby enabling efficient generation of a face video containing an expressive response.
S210, projecting the synthesized face video to a three-dimensional solid model for playing, wherein the projected three-dimensional solid model is used for face recognition.
The three-dimensional solid model is a three-dimensional model corresponding to a specific object which is actually available and can be touched and tangible, and is used for displaying the effect of the solid face after the three-dimensional face image is projected on the three-dimensional solid model. For example, the three-dimensional solid model may be a 3D face mold, a 3D silicone facial mask, or the like.
After the terminal generates the synthesized face video with the expression response, the synthesized face video is projected to the three-dimensional solid model for playing, so that the solid face model with the expression response can be displayed on the projected three-dimensional solid model. Specifically, the terminal can project the synthesized face video to the three-dimensional solid model for playing through a projection device in the terminal, and can also project the synthesized face video to the three-dimensional solid model for playing through projection equipment connected with the terminal. For example, a high resolution (e.g., 1080 x 1920) micro projector may be used for projection.
Specifically, the projected three-dimensional solid model is used for face recognition. For example, in face recognition, images or videos of a face of a user need to be acquired for recognition. In a face recognition system based on tag response, a user is required to perform expression response according to an instruction, and a corresponding response video is acquired to perform face recognition. When a response video in the face recognition process is collected, the video of the projected three-dimensional entity model in the embodiment can be collected through the identity verification equipment, so that the face video with the expression response is obtained, and further the identity verification based on face recognition is carried out based on the collected face video. The face recognition is carried out on the face video synthesized by adopting the face of the first user and the expression of the second user, so that the effect of the face recognition system can be effectively verified, the face recognition system can be further maintained and updated, and the safety of face recognition is improved.
Referring to fig. 3, fig. 3 is a schematic flow chart of performing expression reconstruction on a first user face image and a second user image in one embodiment. Extracting facial features from a first facial image, extracting facial expression features from each second facial image of a second user, converting each obtained facial expression feature into a three-dimensional projection image corresponding to the facial features of the first user, and carrying out expression reconstruction, thereby obtaining a synthesized facial image with expression response.
In one embodiment, before the terminal projects the synthesized face video to the three-dimensional solid model for playing, the terminal may first project the three-dimensional projection image corresponding to the obtained first face image onto the three-dimensional solid model. The distance between the projection equipment and the three-dimensional solid model is continuously adjusted and kept opposite to each other, so that the three-dimensional projection image projected onto the three-dimensional solid model is ensured to be a relatively accurate front face image. And then, the focal length of the projection equipment is adjusted, so that the human face texture projected on the three-dimensional solid model is as clear as possible. The terminal can directly project the synthesized face video to the three-dimensional entity model after obtaining the synthesized face video containing the expression response in the face recognition process, so that the identity verification equipment can acquire the face image displayed by the three-dimensional entity model to perform face recognition authentication. Referring to fig. 4, fig. 4 illustrates a schematic diagram of projecting a composite face video onto a three-dimensional solid model in one embodiment. For example, the three-dimensional solid model can adopt a common silica gel face model, has higher simulation performance and lower cost on the human face skin, and can be reused by replacing the projected human face picture. Therefore, the cost for verifying the face recognition system can be effectively saved.
In the face image processing method, after the first face image of the first user is acquired, the corresponding three-dimensional projection image is generated based on the face features of the first face image, so that the face image which is used for being projected to the three-dimensional entity model and showing the entity face effect can be effectively generated. The second facial video of the second user is obtained, after facial expression features are extracted based on each second facial image in the second facial video, each facial expression feature obtained through extraction is respectively fused into a three-dimensional projection image for carrying out expression reconstruction, so that the expression features in the three-dimensional projection image can be accurately and effectively subjected to expression reconstruction, and the synthetic facial video which contains expression response and has higher authenticity can be effectively generated. And projecting the synthesized face video to a three-dimensional solid model for playing, wherein the projected three-dimensional solid model is used for face recognition. The face recognition is carried out on the projection of the synthesized face video based on the fusion of the face of the first user and the expression of the second user in the three-dimensional solid model, so that the effect of the face recognition system can be effectively verified, and the safety of face recognition can be further improved. By generating the synthetic face video with higher accuracy and authenticity and projecting the synthetic face video onto the reusable three-dimensional entity model for face recognition, the verification cost can be effectively saved, and the efficiency of a face recognition system can be effectively improved.
In one embodiment, generating a three-dimensional projection image based on the face features of the first face image includes obtaining two-dimensional coordinate information of feature points in the face features, determining three-dimensional face features of the first face image based on the two-dimensional coordinate information, and performing three-dimensional mapping processing on the three-dimensional face features to generate a three-dimensional projection image corresponding to the first face image.
The two-dimensional coordinates refer to a coordinate system formed by two mutually perpendicular numerical axes with a common origin in the same plane. Three-dimensional refers to a space system formed by adding a direction vector to a planar two-dimensional system. Pose estimation in the computer vision field is represented by a relative translation and rotation matrix of the target. The pose estimate may be obtained from a transformation relationship matrix between coordinates of n points of the object in the 3-dimensional world coordinate system and a set of points corresponding projected into the 2-dimensional image coordinate system.
The terminal can establish a mapping relation between the 2D characteristic points and the 3D characteristic points of the face based on a preset perspective projection model. After the terminal acquires the first face image of the first user, feature points corresponding to the facial features are extracted from the first face image, and two-dimensional coordinate information corresponding to the feature points is established. The terminal can respectively match and map the characteristic points in the first face image with the characteristic points in the preset three-dimensional face substrate to obtain three-dimensional coordinate points corresponding to the characteristic points in the preset three-dimensional face substrate, acquire depth information of the two-dimensional coordinate information in the three-dimensional space according to the three-dimensional coordinate points, and determine three-dimensional coordinates of the two-dimensional coordinate information mapped in the three-dimensional space according to the depth information, so that three-dimensional coordinate information corresponding to the face characteristics is obtained. And further determining three-dimensional face features of the first face image based on the two-dimensional coordinate information and the three-dimensional coordinate information.
And the terminal obtains the three-dimensional face features of the first face image based on the three-dimensional mapping relation according to the two-dimensional coordinate information of the feature points, and then further performs three-dimensional mapping processing on the three-dimensional face image. For example, the three-dimensional projection image corresponding to the first face image can be obtained based on the perspective projection model principle. The obtained three-dimensional projection image can be directly projected onto a three-dimensional solid model, so that the effect of the solid face can be accurately and effectively displayed on the three-dimensional solid model.
In one embodiment, determining the three-dimensional face feature of the first face image according to the two-dimensional coordinate information comprises estimating a face gesture according to the feature points, obtaining mapping coordinate information of a preset three-dimensional face substrate mapped on a two-dimensional plane, determining three-dimensional mapping parameters corresponding to the feature points according to the two-dimensional coordinate information and the mapping coordinate information, updating the feature points and the corresponding three-dimensional mapping parameters according to the face gesture and the two-dimensional coordinate information, and generating the three-dimensional face feature based on the updated feature points and the three-dimensional mapping parameters.
The face pose estimation mainly obtains angle information of face orientation. It can be generally expressed in terms of a rotation matrix, a rotation vector, a quaternion or euler angles (these four quantities can also be converted into each other). The three-dimensional face substrate can be a general three-dimensional face substrate model obtained in advance, or a three-dimensional face substrate model obtained based on training of a large number of face model samples.
After extracting feature points corresponding to each facial feature from the first face image, the terminal estimates the face gesture according to the feature points of the face, further establishes a two-dimensional and three-dimensional mapping model on the basis of a preset three-dimensional face base model, and corrects the face gesture to obtain a corresponding 3D projection image.
Specifically, the terminal may obtain mapping coordinate information of a preset three-dimensional face substrate mapped on a two-dimensional plane, and determine three-dimensional coordinate information and three-dimensional mapping parameters corresponding to the feature points according to the two-dimensional coordinate information and the mapping coordinate information. The terminal can further estimate the face gesture according to the three-dimensional coordinate information corresponding to the feature points and the three-dimensional mapping parameters. The terminal can also establish a two-dimensional and three-dimensional mapping model according to the face feature points, the two-dimensional coordinate information and the three-dimensional mapping parameters. The terminal further updates the feature points and the corresponding three-dimensional mapping parameters according to the face gesture and the two-dimensional coordinate information, so that the three-dimensional face features corresponding to the first user can be generated according to the updated three-dimensional mapping parameters and the face features. The obtained three-dimensional face features comprise three-dimensional coordinate information corresponding to two-dimensional coordinate points of each feature point.
For example, a new 2D and 3D mapping model can be established based on a preset three-dimensional deformation model, and the face pose can be corrected by mapping of the 3D deformation model. The formulation of a specific mapping model may be as follows:
Wherein F2Dloca represents the position of the feature point on the 2-dimensional plane,Representing the average shape of a human face, k is a scale factor, M1 is an orthogonal projection matrix, R is a 3×3 rotation matrix, T3D represents a feature point conversion vector, Pid represents the facial shape feature of the human, λid represents a shape weight, Pexp represents the facial expression feature of the human, and λexp represents an expression weight.
Specifically, the terminal may initialize λid and λexp to zero, use a weak perspective projection model, roughly estimate a face pose according to the extracted facial feature points, update the face feature points according to the face orientation, and calculate each three-dimensional mapping parameter according to the above formula, so as to effectively obtain the three-dimensional face feature corresponding to the first user.
In one embodiment, performing three-dimensional mapping processing based on three-dimensional face features to generate a three-dimensional projection image corresponding to a first face image comprises constructing a three-dimensional face mapping matrix according to the three-dimensional face features, positioning face contours of the face features, and adjusting boundaries of the three-dimensional face mapping matrix based on face contour features in the face features to obtain the three-dimensional projection image corresponding to the first face image.
Wherein the matrix is a set of complex or real numbers arranged in a rectangular array. The three-dimensional face mapping matrix may be a linear mapping matrix, which is a quantitative representation of linear mapping, and is used to represent the mapping relationship between the face feature points and the two-dimensional image and the three-dimensional stereo model. The face contour may include a facial feature local contour and a face outer contour, the face contour being an important feature among facial features of the face. The facial features may include local feature point information and global feature information.
After the face image is normalized by the face gesture expression, a part of blank area usually exists between the background and the face, and the terminal can adjust the boundary position of the three-dimensional projection image according to the face outline.
Specifically, after the terminal obtains the three-dimensional face feature corresponding to the first user based on the face feature points, the two-dimensional coordinate information and the three-dimensional mapping parameters, a three-dimensional face mapping matrix is constructed according to the three-dimensional face feature, and the face contour of the face feature is positioned. The terminal adjusts the boundary of the three-dimensional face mapping matrix according to the face contour features in the face features, so that a three-dimensional projection image corresponding to the face of the first user with higher accuracy is obtained.
Specifically, the formula for adjusting the boundary of the three-dimensional face mapping matrix may be as follows:
Where (x2_new,y2_new) represents the new position of the connection anchor 2 to be solved, (x1_con,y1_con) represents the predefined facial contour position of the boundary anchor 1, (x1,y1) and (x2,y2) represent the coordinates before adjustment.
In one embodiment, the terminal may automatically adjust the boundary of the three-dimensional face mapping matrix according to a preset program, so as to effectively obtain a three-dimensional projection image corresponding to the face of the first user.
In another embodiment, manual anchor point adjustment may also be performed on the terminal. By adjusting the anchor points around the human face, the anchor points are overlapped with the human face outline as much as possible, namely, boundary anchor points are moved to the predefined position on the 3D human face model, and the space distance is kept unchanged. The boundary of the three-dimensional face mapping matrix is adjusted, so that a three-dimensional projection image corresponding to the face of the first user is effectively obtained.
In one embodiment, the face image processing method further comprises the steps of extracting illumination parameters and face parameters corresponding to the first face image, and filling face details into the three-dimensional face mapping matrix based on the illumination parameters and the face parameters.
The illumination is a factor influencing face imaging and face image formation, and the illumination change is a key factor influencing face recognition performance. The illumination parameters of the face represent parameters of the face under illumination reflection, such as parameters including light intensity, glossiness, high light, ambient light, direction light, specular reflection parameters, and ambient reflection parameters. Facial parameters refer to parameters of the face of a person, including, for example, the reflection parameters of the face in light, as well as facial texture parameters, facial surface color parameters, and the like. For example, the illumination parameters of the face may be estimated by modeling illumination changes, based on representing changes caused by illumination in a suitable subspace, by estimating model parameters from face features in the first face image. For example, the method can be realized by adopting a subspace projection method, a quotient function method, an illumination cone method, a spherical harmonic base image-based algorithm and the like.
When the side angle of the face in the bias of the original face image is too large, the invisible area needs to be further filled. And simulating the reflection condition of the face under illumination to obtain illumination parameters of the face, and then filling face details according to the symmetry of the face, the illumination parameters and the like.
Specifically, after obtaining a three-dimensional face feature corresponding to a first user based on the face feature points, the two-dimensional coordinate information and the three-dimensional mapping parameters, the terminal extracts illumination parameters and face parameters corresponding to the first face image, and performs face detail filling on the three-dimensional face mapping matrix based on the illumination reflection parameters and the face parameters.
For example, a linear combination based on spherical harmonic reflection base can be used to simulate the reflection condition of a human face under illumination so as to obtain illumination parameters in a human face image. The formula for extracting the illumination parameter corresponding to the first face image may be as follows:
wherein, beta is a 9-dimensional illumination parameter, P corresponds to a 3D pixel point, and D represents a spherical harmonic reflection basis.
The terminal can supplement face details through mirror images according to the symmetry of the faces. Specifically, the present invention relates to a method for manufacturing a semiconductor device. The terminal can seamlessly insert the source object into the image by adopting an image editing mode based on a poisson equation. To obtain poisson partial differential equation with boundary conditions, the specific expression can be as follows:
where pic represents the processed image to be solved, delta represents the Laplacian operator,Laplacian values representing textures to be inserted, Ω representing an edit region, αΩ representing a boundary of the edit region, pic0 representing an input image to be processed.
In one embodiment, extracting facial expression features based on each second face image in the second face video comprises extracting second face images of key frames in the second face video, extracting face parameters in each frame of second face images, estimating normal expression distribution according to the face parameters in each frame of second face images, and obtaining facial expression features of the second user based on the normal expression distribution.
The second face video may be a face video that performs an expression response according to a specified instruction, where the face video includes a plurality of images including a face. The face parameters comprise parameters such as face feature points, face gestures, local expression features and the like, and the local expression features are states of various parts of the face, such as eyes opening, eyes closing, mouth opening and the like. Wherein the key frames comprise at least two or more video frames.
When the terminal extracts facial expression features based on each second facial image in the second facial video, only the key frame facial image corresponding to the expression response can be extracted. Specifically, the terminal can determine the key frame corresponding to the expression response according to the expression change action by identifying the expression change action in the face video, so as to extract the second face image of the key frame in the second face video.
In one embodiment, after the terminal identifies the expression change motion in the face video, the terminal determines the video frame corresponding to the expression response according to the expression change motion, and further extracts the key frame related to the expression change motion from the frames, for example, a part of the more important frames may be extracted from the video frames according to the expression change amplitude, and the extracted video frames are determined as the key frames. Therefore, important frames only related to expression change actions can be effectively obtained from the face video, and the computing resources for processing the face image by the terminal are reduced.
The terminal extracts a second face image corresponding to the key frame from the face video and generates a key frame set. And the terminal performs feature extraction on the second face images of each frame in the key frame set, extracts facial feature points and expression state features in the second face images of each frame, and obtains face parameters according to the facial feature points and the expression state features. The terminal further estimates normal expression distribution according to the face parameters in the second face image of each frame, and facial expression characteristics of the second user can be obtained based on the normal expression distribution.
For example, the terminal can estimate all parameters on k key frames of the input video sequence simultaneously, and the parameters to be estimated mainly comprise face parameters, global marks, inline functions, unknown poses of frames, illumination parameters and the like, so as to capture the expression of the user. For example, an iterative weighted least squares (IRLS) solver may be used to solve a normal equation set corresponding to the face parameters of the entire keyframe set at the same time, so as to estimate a normal expression distribution, and further obtain dynamic facial expression features according to the normal expression distribution.
In one embodiment, the method for obtaining the synthetic face video comprises the steps of respectively fusing the extracted facial expression features into three-dimensional projection images for carrying out expression reconstruction, carrying out model reconstruction on the three-dimensional projection images based on the second facial images to obtain three-dimensional facial images, carrying out expression conversion on the facial features in the three-dimensional facial images based on the facial expression features to obtain three-dimensional facial images comprising the facial expression features, and generating the synthetic face video according to the three-dimensional facial images comprising the facial expression features.
The three-dimensional face image is a projection image for displaying a three-dimensional face after being projected to the three-dimensional solid model.
And after extracting the facial expression features corresponding to the second user face from the face video, the terminal transmits the facial expression of the second user face to the first user face on the premise of not affecting the facial features of the first user. And converting the facial expression characteristics of the second user face and the facial expression of the first user face to obtain a facial image with the expression.
In face modeling, the face model may include position coordinates of key feature points (such as eyebrows, eye outlines, nose, lips, etc.) of the face, reflectivity of the key feature points, facial expression, and the like. The position coordinates of key points of the human face and the reflectivity of key feature points are mainly used for representing the shape of the human face. When carrying out expression reconstruction, a neutral face model, namely an expression-free face model, needs to be established for the first user face and the second user face. Therefore, the face model with the expression can be output by taking the face model of the second user and the neutral face model as inputs.
Specifically, after obtaining a second face image of a second user, the terminal performs model reconstruction based on the second face image and a three-dimensional projection image corresponding to the first face image to obtain a target face model, namely a neutral face model between the first face and the second face. And obtaining a reconstructed three-dimensional face image, wherein the three-dimensional face image is an image obtained by projecting a three-dimensional target face model on a two-dimensional plane.
The terminal further performs expression conversion on the facial features in the three-dimensional facial image based on the facial expression features. Specifically, the terminal inputs each facial expression characteristic of the second user into a target facial model corresponding to the three-dimensional facial image, and replaces the original facial expression characteristic according to each dynamic facial expression characteristic, so that the original facial expression of the target facial model is converted into the facial expression of the second user. Thus, three-dimensional face images including facial expression features can be obtained effectively.
For example, the terminal may perform expression reconstruction by using a subspace deformation transmission technology, and re-model the face of the second user through the extracted facial expression features of the second user. The position coordinates and the reverse luminosity of key feature points of the first user face are kept unchanged, and the original expression part of the target face model is modified, so that expression conversion can be efficiently realized without affecting the shape features of the face.
After the facial features in the three-dimensional facial image are subjected to the facial feature conversion based on the dynamic facial expression features, the continuous multi-frame three-dimensional facial image comprising the dynamic facial expression features can be effectively obtained. And the terminal synthesizes the three-dimensional face images of the continuous frames into corresponding videos, so as to generate the corresponding synthesized face videos.
In this embodiment, by performing expression reconstruction and conversion on the three-dimensional face image corresponding to the first user face based on each facial expression feature of the second user face, the three-dimensional face image corresponding to the first user face with the expression can be accurately and effectively obtained, and further the synthesized face video with the expression response can be effectively obtained.
In one embodiment, as shown in fig. 5, there is provided a face image processing method, including the steps of:
s502, a first face image of a first user is acquired.
S504, generating a corresponding three-dimensional projection image based on the face features of the first face image.
S506, acquiring a second face video of the second user, and extracting facial expression features based on each second face image in the second face video.
And S508, carrying out model reconstruction on the three-dimensional projection image based on each second face image to obtain a three-dimensional face image.
And S510, recognizing the expression region corresponding to the facial expression feature, and carrying out expression transformation on the feature points of the expression region in the three-dimensional projection image according to the facial expression feature to obtain the three-dimensional facial image comprising the facial expression feature.
And S512, generating a synthetic face video according to the three-dimensional face image comprising the facial expression features.
S514, projecting the synthesized face video to a three-dimensional solid model for playing, wherein the projected three-dimensional solid model is used for face recognition.
The expression region refers to a region corresponding to key feature points which change along with the expression action in the face with the expression, and the expression region can be a local feature region or an integral face feature region. For example, when the expression is "mouth opening" or "smiling", the expression area is only the lip area, and when the expression is "smiling" or "head shaking", the expression area may be the whole face area. The expression transformation can adopt an affine transformation mode, wherein the affine transformation is also called affine mapping, which means that one vector space in geometry is subjected to linear transformation once and then translated, and the vector space is transformed into the other vector space.
After the terminal obtains facial expression features corresponding to the second facial image from the second facial video, an expression region corresponding to the expression features is identified, and further expression conversion is carried out on the expression region part. Specifically, the terminal performs model reconstruction based on the second face image and the three-dimensional projection image corresponding to the first face image, after a target face model is obtained, each face expression characteristic of the second user is input into the target face model corresponding to the three-dimensional face image, and feature points corresponding to the expression areas in the target face model are subjected to expression transformation according to the face expression characteristics. The method specifically comprises the step of carrying out affine transformation on facial expression characteristics and characteristic points corresponding to the expression areas, so that expression conversion is realized. And carrying out expression transformation on the original expression characteristics according to the dynamic facial expression characteristics, so as to convert the original expression of the target facial model into the facial expression of the second user.
For example, taking the expression of the face of the second user as a "mouth opening" example, after the terminal obtains the facial expression feature corresponding to the second face image from the second face video, the terminal can recognize the expression area as the "mouth opening". The facial expression features corresponding to the mouth opening expression areas can comprise lip portions and oral cavity portions. When the facial expression features are transformed with the feature points corresponding to the expression areas in the target facial model, the lip features of the first user can be reserved, and only the oral cavity part is modified and transformed so as to realize the oral synthesis of the reserved mouth. Therefore, the expression of the second user can be effectively integrated into the face of the first user.
In one embodiment, as shown in fig. 6, there is provided a face image processing method, including the steps of:
S602, acquiring a first face image of a first user.
S604, generating a corresponding three-dimensional projection image based on the face features of the first face image.
S606, acquiring a second face video of the second user, and extracting facial expression features based on each second face image in the second face video.
And S608, carrying out model reconstruction on the three-dimensional projection image based on each second face image to obtain a three-dimensional face image.
And S610, identifying an expression region corresponding to the facial expression feature, and acquiring a facial expression image corresponding to the first user, the expression region and the facial expression feature.
And S612, splicing the target expression characteristics of the expression areas in the facial expression images into the three-dimensional facial images to obtain the three-dimensional facial images comprising the facial expression characteristics.
And S614, generating a synthetic face video according to the three-dimensional face image comprising the facial expression features.
S616, projecting the synthesized face video to a three-dimensional solid model for playing, wherein the projected three-dimensional solid model is used for face recognition.
The facial expression image of the first user can be obtained from an expression sample corresponding to the first user. The expression sample may be an offline sample of the target user pre-selected to be acquired, including a sequence of facial expression pictures of the target user. The facial expression picture sequence may include a complete face or a local expression region.
The terminal carries out model reconstruction based on the three-dimensional projection image corresponding to the second face image and the first face image, after a target face model is obtained, each face expression characteristic of the second user is input into the target face model corresponding to the three-dimensional face image, and expression transformation is carried out on the expression area in the target face model according to the face expression characteristic. Specifically, the terminal acquires an expression image sequence matched with the facial expression characteristics from an expression sample corresponding to the first user according to the facial expression characteristics, splices an expression area corresponding to the acquired expression image sequence to a three-dimensional facial image corresponding to the first user, and fuses a target expression characteristic in the expression image sequence to the facial characteristics in the three-dimensional facial image, thereby obtaining the three-dimensional facial image of the first user, which comprises the facial expression characteristics. Therefore, the three-dimensional face image sequence which is consistent with the expression of the second user and has higher fidelity can be effectively obtained.
For example, taking the expression of the face of the second user as a "mouth opening" example, after the terminal obtains the facial expression feature corresponding to the second face image from the second face video, the terminal can recognize the expression area as the "mouth opening". The facial expression features corresponding to the mouth opening expression region can comprise the opening degree of the upper lip and the lower lip, the distance between the left mouth corner and the right mouth corner, the deflection degree and the like. And the terminal acquires an expression picture sequence matched with the facial expression features from the expression samples corresponding to the first user according to the facial expression features. Specifically, the terminal may calculate the distance between the sample frame and the description matrix of the "lip" expression frame, where the description matrix includes each expression feature parameter. Then the terminal can use K-means algorithm, for example, 10 frames are used as 1 cluster, the frame with the smallest distance in each cluster is extracted, namely the frame closest to the target mouth shape of the 'lip' expression is used as the representative of the cluster, and finally, one frame with the smallest distance is selected from all cluster representatives to be used as the final matching result. The mouth shape with the highest matching degree with the target face is matched from the offline sample according to the feature similarity measure, so that a realistic mouth image can be effectively generated. Therefore, the expression picture sequence of the first user with high matching degree due to the expression of the second user can be extracted. And selecting frames which can be matched with the appointed expression actions from various mouth shape pictures of the target person, and splicing the extracted expression frames into the three-dimensional face image of the first user, so that a three-dimensional face image sequence which is consistent with the expression of the second user and has high fidelity can be effectively obtained.
In one embodiment, the face image processing further comprises the steps of obtaining a first face video of the first user, extracting face features based on the first face images in the first face video, and generating corresponding three-dimensional projection images according to the face features of the first face images.
The first user face video refers to a video comprising a face of a first user, and comprises a first user face image sequence of continuous frames.
After the terminal acquires the first face video of the first user, the terminal extracts the first face images of the continuous frames from the face video of the first user. The terminal extracts the face features in each first face image respectively, and then generates corresponding three-dimensional projection images according to the face features of each first face image. Specifically, the terminal estimates the face gesture based on dynamic feature points corresponding to each first face image, and performs face modeling through three-dimensional model mapping based on two-dimensional feature points to obtain three-dimensional coordinate information corresponding to the face features. The terminal corrects the face gesture according to the face azimuth, obtains face characteristics after expression normalization, and further generates a three-dimensional projection image based on the face characteristics after three-dimensional deformation mapping and correction. By extracting the face features of the first user face corresponding to the dynamic state from the face video, a three-dimensional face model with high accuracy can be effectively constructed, and a corresponding three-dimensional projection image can be effectively generated.
In one embodiment, as shown in fig. 7, a face image processing method is provided, which specifically includes the following steps:
s702, acquiring a first face video of a first user.
And S704, extracting face features based on each first face image in the first face video.
S706, corresponding three-dimensional projection images are generated according to the face features of the first face images.
S708, acquiring a second face video of the second user, and extracting facial expression features based on each second face image in the second face video.
S710, respectively fusing the extracted facial expression features into the three-dimensional projection images to reconstruct the expression, and obtaining the synthesized facial video.
S712, projecting the synthesized face video to a three-dimensional solid model for playing, wherein the projected three-dimensional solid model is used for face recognition.
After the terminal extracts the face features based on each first face image in the first face video, corresponding three-dimensional projection images are respectively generated according to the face features of each first face image, and therefore a three-dimensional projection image sequence of continuous frames is obtained.
The terminal acquires a second face video of a second user, extracts facial expression features based on each second face image in the second face video, and then further fuses each extracted facial expression feature into each three-dimensional projection image for expression reconstruction. Specifically, the terminal may respectively fuse facial expression features corresponding to each frame into three-dimensional projection images corresponding to each frame of facial image according to the image frame sequence to perform expression reconstruction, so as to respectively obtain three-dimensional facial images corresponding to each frame. The terminal further synthesizes the face video based on the dynamic expression response of each frame of three-dimensional face image. The facial expression features corresponding to the frames in the facial video of the second user are extracted and respectively synthesized into continuous three-dimensional projection images of the frames, so that the synthesized facial video with the dynamic expression response and high authenticity can be effectively generated.
In a specific embodiment, as shown in fig. 8, a face image processing method is provided, which includes the following steps:
S802, acquiring a first face image of a first user.
S804, two-dimensional coordinate information corresponding to the feature points in the face features is obtained, and the face gesture is estimated according to the feature points.
S806, mapping coordinate information of the preset three-dimensional face substrate mapped on the two-dimensional plane is obtained, and three-dimensional mapping parameters corresponding to the feature points are determined according to the two-dimensional coordinate information and the mapping coordinate information.
And S808, updating the feature points and the corresponding three-dimensional mapping parameters according to the face gestures and the two-dimensional coordinate information, and generating three-dimensional face features based on the updated feature points and the three-dimensional mapping parameters.
And S810, performing three-dimensional mapping processing on the three-dimensional face features to generate a three-dimensional projection image corresponding to the first face image.
S812, constructing a three-dimensional face mapping matrix according to the three-dimensional face features, and positioning the face outline of the face features.
S814, adjusting the boundary of the three-dimensional face mapping matrix based on the face contour features in the face features to obtain a three-dimensional projection image corresponding to the first face image.
S816, extracting a second face image of a key frame in the second face video, and extracting face parameters in the second face image of each frame.
S818, estimating normal expression distribution according to face parameters in the second face image of each frame, and obtaining face expression characteristics of the second user based on the normal expression distribution.
S820, performing model reconstruction on the three-dimensional projection image based on each second face image to obtain a three-dimensional face image.
S822, carrying out expression conversion on the facial features in the three-dimensional facial image based on the facial expression features to obtain the three-dimensional facial image comprising the facial expression features.
S824, generating a synthetic face video according to the three-dimensional face image comprising the facial expression features.
S826, projecting the synthesized face video to a three-dimensional solid model for playing, wherein the projected three-dimensional solid model is used for face recognition.
In this embodiment, the effect of the face recognition system can be effectively verified by performing face recognition on the projection of the synthesized face video based on the fusion of the face of the first user and the expression of the second user in the three-dimensional solid model, so as to further improve the safety of face recognition. By generating the synthetic face video with higher accuracy and authenticity and projecting the synthetic face video onto the reusable three-dimensional entity model for face recognition, the verification cost can be effectively saved, and the efficiency of a face recognition system can be effectively improved.
The application also provides an application scene, which is used for verifying the face recognition effect by applying the face image processing method. Specifically, when the identity verification device needs to perform face recognition on the first user to perform identity authentication, the terminal acquires a first face image of the first user, and generates a corresponding three-dimensional projection image based on face features of the first face image. The second user responds to the expression according to the indication sent by the identity authentication, the terminal acquires a second face video of the second user responding to the identity authentication, extracts facial expression features based on each second face image in the second face video, and respectively fuses each facial expression feature obtained by extraction into the three-dimensional projection image for carrying out expression reconstruction, so as to generate a synthetic face video containing the expression response. The terminal projects the synthesized face video to the three-dimensional solid model for playing, and the identity verification equipment acquires the face video corresponding to the projected three-dimensional solid model for face recognition to obtain a recognition result. So that the effect of face recognition can be verified according to the recognition result.
The application further provides an application scene, which is used for testing the face recognition system by applying the face image processing method. Specifically, when the face recognition system needs to perform face recognition on the first user to perform identity authentication, the terminal acquires a first face image of the first user, and generates a corresponding three-dimensional projection image based on face features of the first face image. The second user responds to the expression according to the indication sent by the identity authentication, the terminal acquires a second face video of the second user responding to the identity authentication, extracts facial expression features based on each second face image in the second face video, and respectively fuses each facial expression feature obtained by extraction into the three-dimensional projection image for carrying out expression reconstruction, so as to generate a synthetic face video containing the expression response. The terminal projects the synthesized face video to the three-dimensional solid model for playing, acquires the face video corresponding to the projected three-dimensional solid model through the identity verification equipment, and performs face recognition by utilizing the acquired face video so as to perform attack test on a face recognition system and obtain a face recognition result. Thus, the face recognition result and the data in the recognition process can be utilized to obtain the test result data. By carrying out attack test on the face recognition system, the safety of the face recognition system can be further effectively improved.
It should be understood that, although the steps in the flowcharts of fig. 2, 5-8 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps of FIGS. 2, 5-8 may include steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the steps or stages in other steps.
In one embodiment, as shown in fig. 9, a facial image processing apparatus 900 is provided, which may employ a software module or a hardware module, or a combination of both, as a part of a computer device, and specifically includes an image acquisition module 902, an image conversion module 904, an expression extraction module 906, an expression reconstruction module 908, and a video projection module 910, where:
An image acquisition module 902, configured to acquire a first face image of a first user;
an image conversion module 904, configured to generate a corresponding three-dimensional projection image based on a face feature of the first face image;
the expression extraction module 906 is configured to obtain a second face video of a second user, and extract facial expression features based on each second face image in the second face video;
The expression reconstruction module 908 is configured to respectively fuse each facial expression feature obtained by extraction into a three-dimensional projection image for performing expression reconstruction, so as to obtain a synthetic facial video;
the video projection module 910 is configured to project the synthesized face video onto a three-dimensional solid model for playing, where the projected three-dimensional solid model is used for face recognition.
In one embodiment, the image conversion module 904 is further configured to identify occlusion feature points in the first face image, calculate a sideways angle according to the feature points in the face image, and correct the occlusion feature points based on the sideways angle to obtain the face image of the first face image.
In one embodiment, the image conversion module 904 is further configured to obtain two-dimensional coordinate information corresponding to the feature points in the face feature, determine a three-dimensional face feature of the first face image based on the two-dimensional coordinate information, and perform three-dimensional mapping processing on the three-dimensional face feature to generate a three-dimensional projection image corresponding to the first face image.
In one embodiment, the image conversion module 904 is further configured to estimate a face pose according to the feature point, obtain mapping coordinate information of a preset three-dimensional face substrate mapped on a two-dimensional plane, determine a three-dimensional mapping parameter corresponding to the feature point according to the two-dimensional coordinate information and the mapping coordinate information, update the feature point and the corresponding three-dimensional mapping parameter according to the face pose and the two-dimensional coordinate information, and generate a three-dimensional face feature based on the updated feature point and the three-dimensional mapping parameter.
In one embodiment, the image conversion module 904 is further configured to construct a three-dimensional face mapping matrix according to the three-dimensional face features, position a face contour of the face features, and adjust a boundary of the three-dimensional face mapping matrix based on the face contour features in the face features to obtain a three-dimensional projection image corresponding to the first face image.
In one embodiment, the image conversion module 904 is further configured to extract illumination parameters and facial parameters corresponding to the first face image, and perform face detail filling on the three-dimensional face mapping matrix based on the illumination parameters and the facial parameters.
In one embodiment, the expression extraction module 906 is further configured to extract a second face image of a key frame in the second face video, extract face parameters in each frame of the second face image, estimate a normal expression distribution according to the face parameters in each frame of the second face image, and obtain facial expression features of the second user based on the normal expression distribution.
In one embodiment, the expression reconstruction module 908 is further configured to reconstruct a model of the three-dimensional projection image based on each second facial image to obtain a three-dimensional facial image, perform expression conversion on facial features in the three-dimensional facial image based on each facial expression feature to obtain a three-dimensional facial image including each facial expression feature, and generate a synthetic facial video according to the three-dimensional facial image including each facial expression feature.
In one embodiment, the expression reconstruction module 908 is further configured to identify an expression region corresponding to the facial expression feature, and perform expression transformation on feature points of the expression region in the three-dimensional projection image according to the facial expression feature to obtain a three-dimensional facial image including the facial expression feature.
In one embodiment, the expression reconstruction module 908 is further configured to identify an expression region corresponding to the facial expression feature, obtain a facial expression image corresponding to the first user and the expression region and the facial expression feature, and splice a target expression feature of the expression region in the facial expression image into the three-dimensional facial image to obtain the three-dimensional facial image including the facial expression feature.
In one embodiment, the image acquisition module 902 is further configured to acquire a first face video of the first user, the image conversion module 904 is further configured to extract a face feature based on each first face image in the first face video, and generate a corresponding three-dimensional projection image according to the face feature of each first face image.
In one embodiment, the expression reconstruction module 908 is further configured to fuse each facial expression feature obtained by extraction into each three-dimensional projection image for performing expression reconstruction, so as to obtain a synthetic facial video.
The specific limitation of the face image processing apparatus may be referred to as limitation of the face image processing method hereinabove, and will not be described herein. The above-described individual modules in the face image processing apparatus may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a terminal, and an internal structure diagram thereof may be as shown in fig. 10. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a face image processing method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structure shown in FIG. 10 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.
In one embodiment, a computer-readable storage medium is provided, storing a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims (27)

Translated fromChinese
1.一种人脸图像处理方法,其特征在于,所述方法包括:1. A facial image processing method, characterized in that the method comprises:获取第一用户的第一人脸图像;Acquire a first facial image of a first user;基于所述第一人脸图像的人脸特征生成对应的三维投影图像;generating a corresponding three-dimensional projection image based on the facial features of the first facial image;获取第二用户的第二人脸视频,基于所述第二人脸视频中的各第二人脸图像提取人脸表情特征;所述第二人脸视频,是在第二用户根据指示进行表情响应时录制的响应视频;Acquire a second face video of the second user, and extract facial expression features based on each second face image in the second face video; the second face video is a response video recorded when the second user responds with an expression according to the instruction;基于各第二人脸图像和所述三维投影图像进行模型重构,得到三维的中性人脸模型;Reconstructing a model based on each second face image and the three-dimensional projection image to obtain a three-dimensional neutral face model;根据所述人脸表情特征对所述中性人脸模型的人脸特征进行表情转换,获得合成人脸视频;Performing expression conversion on the facial features of the neutral face model according to the facial expression features to obtain a synthetic face video;将所述合成人脸视频投影至三维实体模型进行播放,经过投影的所述三维实体模型用于人脸识别。The synthesized face video is projected onto a three-dimensional entity model for playback, and the projected three-dimensional entity model is used for face recognition.2.根据权利要求1所述的方法,其特征在于,所述方法还包括:2. The method according to claim 1, characterized in that the method further comprises:识别所述第一人脸图像中的遮挡特征点;identifying occluded feature points in the first facial image;根据所述人脸特征中的特征点计算偏侧角度;Calculating the lateral angle according to the feature points in the facial features;基于所述偏侧角度对所述遮挡特征点进行矫正,得到所述第一人脸图像的人脸特征。The blocked feature points are corrected based on the deflection angle to obtain facial features of the first facial image.3.根据权利要求1所述的方法,其特征在于,所述基于所述第一人脸图像的人脸特征生成对应的三维投影图像包括:3. The method according to claim 1, wherein generating a corresponding three-dimensional projection image based on the facial features of the first facial image comprises:获取所述人脸特征中特征点对应的二维坐标信息;Obtaining two-dimensional coordinate information corresponding to feature points in the facial features;基于所述二维坐标信息确定所述第一人脸图像的三维人脸特征;Determine three-dimensional facial features of the first facial image based on the two-dimensional coordinate information;基于所述三维人脸特征进行三维映射处理,生成所述第一人脸图像对应的三维投影图像。A three-dimensional mapping process is performed based on the three-dimensional facial features to generate a three-dimensional projection image corresponding to the first facial image.4.根据权利要求3所述的方法,其特征在于,所述基于所述二维坐标信息确定所述第一人脸图像的三维人脸特征包括:4. The method according to claim 3, wherein determining the three-dimensional facial features of the first facial image based on the two-dimensional coordinate information comprises:根据所述特征点估计人脸姿态;estimating a facial posture based on the feature points;获取预设三维人脸基底映射在二维平面上的映射坐标信息;Obtaining mapping coordinate information of a preset three-dimensional face base mapping on a two-dimensional plane;根据所述二维坐标信息和所述映射坐标信息确定所述特征点对应的三维映射参数;Determine the three-dimensional mapping parameter corresponding to the feature point according to the two-dimensional coordinate information and the mapping coordinate information;根据所述人脸姿态和所述二维坐标信息更新所述特征点和对应的三维映射参数;Update the feature points and corresponding three-dimensional mapping parameters according to the facial posture and the two-dimensional coordinate information;基于更新后的特征点和三维映射参数生成所述三维人脸特征。The three-dimensional facial features are generated based on the updated feature points and the three-dimensional mapping parameters.5.根据权利要求3所述的方法,其特征在于,所述基于所述三维人脸特征进行三维映射处理,生成所述第一人脸图像对应的三维投影图像包括:5. The method according to claim 3, wherein the step of performing three-dimensional mapping based on the three-dimensional facial features to generate a three-dimensional projection image corresponding to the first facial image comprises:根据所述三维人脸特征构建三维人脸映射矩阵;Constructing a three-dimensional face mapping matrix according to the three-dimensional face features;对所述人脸特征的人脸轮廓进行定位;Positioning the facial contour of the facial feature;基于所述人脸特征中的人脸轮廓特征,对所述三维人脸映射矩阵的边界进行调整,得到所述第一人脸图像对应的三维投影图像。Based on the facial contour features in the facial features, the boundary of the three-dimensional face mapping matrix is adjusted to obtain a three-dimensional projection image corresponding to the first facial image.6.根据权利要求5所述的方法,其特征在于,所述方法还包括:6. The method according to claim 5, characterized in that the method further comprises:提取所述第一人脸图像对应的光照参数和面部参数;Extracting illumination parameters and facial parameters corresponding to the first face image;基于所述光照参数和所述面部参数对所述三维人脸映射矩阵进行人脸细节填充。The three-dimensional face mapping matrix is filled with face details based on the illumination parameters and the facial parameters.7.根据权利要求1所述的方法,其特征在于,所述基于所述第二人脸视频中的各第二人脸图像提取人脸表情特征包括:7. The method according to claim 1, characterized in that the step of extracting facial expression features based on each second facial image in the second facial video comprises:提取所述第二人脸视频中的关键帧的第二人脸图像;Extracting a second face image of a key frame in the second face video;提取各帧第二人脸图像中的人脸参数;Extracting facial parameters from the second facial image of each frame;根据所述各帧第二人脸图像中的人脸参数估计正态表情分布,基于所述正态表情分布得到所述第二用户的人脸表情特征。A normal expression distribution is estimated according to the facial parameters in each frame of the second facial image, and facial expression features of the second user are obtained based on the normal expression distribution.8.根据权利要求1所述的方法,其特征在于,所述根据所述人脸表情特征对所述中性人脸模型的人脸特征进行表情转换,获得合成人脸视频包括:8. The method according to claim 1, characterized in that the step of converting facial features of the neutral face model according to the facial expression features to obtain a synthetic face video comprises:获取所述中性人脸模型在二维平面投影所得的三维人脸图像;Acquire a three-dimensional face image obtained by projecting the neutral face model onto a two-dimensional plane;基于各所述人脸表情特征对所述三维人脸图像中的人脸特征进行表情转换,得到包括各所述人脸表情特征的三维人脸图像;Performing expression conversion on the facial features in the three-dimensional facial image based on the facial expression features to obtain a three-dimensional facial image including the facial expression features;根据所述包括各所述人脸表情特征的三维人脸图像生成所述合成人脸视频。The synthesized face video is generated according to the three-dimensional face image including each of the facial expression features.9.根据权利要求8所述的方法,其特征在于,所述基于各所述人脸表情特征对所述三维人脸图像中的人脸特征进行表情转换,得到包括各所述人脸表情特征的三维人脸图像包括:9. The method according to claim 8, characterized in that the step of performing expression conversion on the facial features in the three-dimensional facial image based on the facial expression features to obtain the three-dimensional facial image including the facial expression features comprises:识别所述人脸表情特征对应的表情区域;Identify the expression area corresponding to the facial expression feature;根据所述人脸表情特征对所述三维投影图像中表情区域的特征点进行表情变换,得到包括所述人脸表情特征的三维人脸图像。The expression transformation is performed on the feature points of the expression area in the three-dimensional projection image according to the facial expression features to obtain a three-dimensional facial image including the facial expression features.10.根据权利要求8所述的方法,其特征在于,所述基于各所述人脸表情特征对所述三维人脸图像中的人脸特征进行表情转换,得到包括各所述人脸表情特征的三维人脸图像包括:10. The method according to claim 8, characterized in that the step of performing expression conversion on the facial features in the three-dimensional facial image based on the facial expression features to obtain the three-dimensional facial image including the facial expression features comprises:识别所述人脸表情特征对应的表情区域;Identify the expression area corresponding to the facial expression feature;获取所述第一用户与所述表情区域和所述人脸表情特征对应的人脸表情图像;Acquire a facial expression image of the first user corresponding to the expression area and the facial expression feature;将所述人脸表情图像中所述表情区域的目标表情特征拼接至所述三维人脸图像中,得到包括所述人脸表情特征的三维人脸图像。The target expression features of the expression area in the facial expression image are spliced into the three-dimensional facial image to obtain a three-dimensional facial image including the facial expression features.11.根据权利要求1至10任意一项所述的方法,其特征在于,所述方法还包括:11. The method according to any one of claims 1 to 10, characterized in that the method further comprises:获取第一用户的第一人脸视频;Obtain a first face video of a first user;基于所述第一人脸视频中的各第一人脸图像提取人脸特征;Extracting facial features based on each first facial image in the first facial video;根据所述各第一人脸图像的人脸特征生成对应的三维投影图像。Generate corresponding three-dimensional projection images according to the facial features of each first facial image.12.根据权利要求11所述的方法,其特征在于,所述方法还包括:12. The method according to claim 11, characterized in that the method further comprises:将提取获得的各所述人脸表情特征分别融合至各所述三维投影图像中进行表情重构,获得合成人脸视频。The extracted facial expression features of each of the human faces are respectively integrated into each of the three-dimensional projection images to reconstruct the expression and obtain a synthetic facial video.13.一种人脸图像处理装置,其特征在于,所述装置包括:13. A facial image processing device, characterized in that the device comprises:图像获取模块,用于获取第一用户的第一人脸图像;An image acquisition module, used to acquire a first face image of a first user;图像转换模块,用于基于所述第一人脸图像的人脸特征生成对应的三维投影图像;An image conversion module, used to generate a corresponding three-dimensional projection image based on the facial features of the first facial image;表情提取模块,用于获取第二用户的第二人脸视频,基于所述第二人脸视频中的各第二人脸图像提取人脸表情特征;所述第二人脸视频,是在第二用户根据指示进行表情响应时录制的响应视频;An expression extraction module, used to obtain a second face video of a second user, and extract facial expression features based on each second face image in the second face video; the second face video is a response video recorded when the second user responds with an expression according to the instruction;模型重构模块,用于基于各第二人脸图像和所述三维投影图像进行模型重构,得到三维的中性人脸模型;A model reconstruction module, used for reconstructing the model based on each second face image and the three-dimensional projection image to obtain a three-dimensional neutral face model;表情转换模块,用于根据所述人脸表情特征对所述中性人脸模型的人脸特征进行表情转换,获得合成人脸视频;An expression conversion module, used for converting the facial features of the neutral face model according to the facial expression features to obtain a synthetic face video;视频投影模块,用于将所述合成人脸视频投影至三维实体模型进行播放,经过投影的所述三维实体模型用于人脸识别。The video projection module is used to project the synthesized face video onto a three-dimensional entity model for playback, and the projected three-dimensional entity model is used for face recognition.14.根据权利要求13所述的人脸图像处理装置,其特征在于,所述图像转换模块还用于识别所述第一人脸图像中的遮挡特征点;根据所述人脸特征中的特征点计算偏侧角度;基于所述偏侧角度对所述遮挡特征点进行矫正,得到所述第一人脸图像的人脸特征。14. The facial image processing device according to claim 13 is characterized in that the image conversion module is also used to identify occluded feature points in the first facial image; calculate the deviation angle based on the feature points in the facial features; correct the occluded feature points based on the deviation angle to obtain the facial features of the first facial image.15.根据权利要求13所述的人脸图像处理装置,其特征在于,所述图像转换模块还用于获取所述人脸特征中特征点对应的二维坐标信息;基于所述二维坐标信息确定所述第一人脸图像的三维人脸特征;基于所述三维人脸特征进行三维映射处理,生成所述第一人脸图像对应的三维投影图像。15. The facial image processing device according to claim 13 is characterized in that the image conversion module is also used to obtain two-dimensional coordinate information corresponding to feature points in the facial features; determine the three-dimensional facial features of the first facial image based on the two-dimensional coordinate information; and perform three-dimensional mapping processing based on the three-dimensional facial features to generate a three-dimensional projection image corresponding to the first facial image.16.根据权利要求15所述的人脸图像处理装置,其特征在于,所述图像转换模块还用于根据所述特征点估计人脸姿态;获取预设三维人脸基底映射在二维平面上的映射坐标信息;根据所述二维坐标信息和所述映射坐标信息确定所述特征点对应的三维映射参数;根据所述人脸姿态和所述二维坐标信息更新所述特征点和对应的三维映射参数;基于更新后的特征点和三维映射参数生成所述三维人脸特征。16. The facial image processing device according to claim 15 is characterized in that the image conversion module is also used to estimate facial posture based on the feature points; obtain mapping coordinate information of a preset three-dimensional facial base mapping on a two-dimensional plane; determine the three-dimensional mapping parameters corresponding to the feature points based on the two-dimensional coordinate information and the mapping coordinate information; update the feature points and the corresponding three-dimensional mapping parameters based on the facial posture and the two-dimensional coordinate information; and generate the three-dimensional facial features based on the updated feature points and three-dimensional mapping parameters.17.根据权利要求15所述的人脸图像处理装置,其特征在于,所述图像转换模块还用于根据所述三维人脸特征构建三维人脸映射矩阵;对所述人脸特征的人脸轮廓进行定位;基于所述人脸特征中的人脸轮廓特征,对所述三维人脸映射矩阵的边界进行调整,得到所述第一人脸图像对应的三维投影图像。17. The facial image processing device according to claim 15 is characterized in that the image conversion module is also used to construct a three-dimensional face mapping matrix based on the three-dimensional face features; locate the facial contour of the facial features; and adjust the boundaries of the three-dimensional face mapping matrix based on the facial contour features in the facial features to obtain a three-dimensional projection image corresponding to the first face image.18.根据权利要求17所述的人脸图像处理装置,其特征在于,所述图像转换模块还用于提取所述第一人脸图像对应的光照参数和面部参数;基于所述光照参数和所述面部参数对所述三维人脸映射矩阵进行人脸细节填充。18. The facial image processing device according to claim 17 is characterized in that the image conversion module is also used to extract lighting parameters and facial parameters corresponding to the first facial image; and fill the facial details of the three-dimensional face mapping matrix based on the lighting parameters and the facial parameters.19.根据权利要求13所述的人脸图像处理装置,其特征在于,所述表情提取模块还用于提取所述第二人脸视频中的关键帧的第二人脸图像;提取各帧第二人脸图像中的人脸参数;根据所述各帧第二人脸图像中的人脸参数估计正态表情分布,基于所述正态表情分布得到所述第二用户的人脸表情特征。19. The facial image processing device according to claim 13 is characterized in that the expression extraction module is also used to extract the second facial image of the key frame in the second facial video; extract the facial parameters in each frame of the second facial image; estimate the normal expression distribution based on the facial parameters in the second facial image of each frame, and obtain the facial expression characteristics of the second user based on the normal expression distribution.20.根据权利要求13所述的人脸图像处理装置,其特征在于,所述表情转换模块还用于获取所述中性人脸模型在二维平面投影所得的三维人脸图像;基于各所述人脸表情特征对所述三维人脸图像中的人脸特征进行表情转换,得到包括各所述人脸表情特征的三维人脸图像;根据所述包括各所述人脸表情特征的三维人脸图像生成所述合成人脸视频。20. The facial image processing device according to claim 13 is characterized in that the expression conversion module is also used to obtain a three-dimensional facial image obtained by projecting the neutral facial model on a two-dimensional plane; based on each of the facial expression features, facial features in the three-dimensional facial image are converted into expressions to obtain a three-dimensional facial image including each of the facial expression features; and the synthetic facial video is generated based on the three-dimensional facial image including each of the facial expression features.21.根据权利要求20所述的人脸图像处理装置,其特征在于,所述表情转换模块还用于识别所述人脸表情特征对应的表情区域;根据所述人脸表情特征对所述三维投影图像中表情区域的特征点进行表情变换,得到包括所述人脸表情特征的三维人脸图像。21. The facial image processing device according to claim 20 is characterized in that the expression conversion module is also used to identify the expression area corresponding to the facial expression feature; according to the facial expression feature, the feature points of the expression area in the three-dimensional projection image are transformed to obtain a three-dimensional facial image including the facial expression feature.22.根据权利要求20所述的人脸图像处理装置,其特征在于,所述表情转换模块还用于识别所述人脸表情特征对应的表情区域;获取所述第一用户与所述表情区域和所述人脸表情特征对应的人脸表情图像;将所述人脸表情图像中所述表情区域的目标表情特征拼接至所述三维人脸图像中,得到包括所述人脸表情特征的三维人脸图像。22. The facial image processing device according to claim 20 is characterized in that the expression conversion module is also used to identify the expression area corresponding to the facial expression feature; obtain the facial expression image of the first user corresponding to the expression area and the facial expression feature; splice the target expression feature of the expression area in the facial expression image into the three-dimensional facial image to obtain a three-dimensional facial image including the facial expression feature.23.根据权利要求13至22任意一项所述的人脸图像处理装置,其特征在于,所述图像获取模块还用于获取第一用户的第一人脸视频;所述图像转换模块还用于基于所述第一人脸视频中的各第一人脸图像提取人脸特征;根据所述各第一人脸图像的人脸特征生成对应的三维投影图像。23. The facial image processing device according to any one of claims 13 to 22 is characterized in that the image acquisition module is also used to acquire a first facial video of a first user; the image conversion module is also used to extract facial features based on each first facial image in the first facial video; and generate a corresponding three-dimensional projection image based on the facial features of each first facial image.24.根据权利要求23所述的人脸图像处理装置,其特征在于,所述表情转换模块还用于将提取获得的各所述人脸表情特征分别融合至各所述三维投影图像中进行表情重构,获得合成人脸视频。24. The facial image processing device according to claim 23 is characterized in that the expression conversion module is also used to fuse the extracted facial expression features into each of the three-dimensional projection images to reconstruct the expression and obtain a synthetic facial video.25.一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至12中任一项所述的方法。25. A computer device comprising a memory and a processor, wherein the memory stores a computer program, wherein the processor implements the method according to any one of claims 1 to 12 when executing the computer program.26.一种计算机可读存储介质,存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至12中任一项所述的方法。26. A computer-readable storage medium storing a computer program, wherein when the computer program is executed by a processor, the method according to any one of claims 1 to 12 is implemented.27.一种计算机程序产品,包括计算机指令,其特征在于,所述计算机指令被处理器执行时实现权利要求1至12中任一项所述的方法。27. A computer program product, comprising computer instructions, wherein when the computer instructions are executed by a processor, the method according to any one of claims 1 to 12 is implemented.
CN202010357461.XA2020-04-292020-04-29 Face image processing method, device, computer equipment and storage mediumActiveCN111553284B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202010357461.XACN111553284B (en)2020-04-292020-04-29 Face image processing method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202010357461.XACN111553284B (en)2020-04-292020-04-29 Face image processing method, device, computer equipment and storage medium

Publications (2)

Publication NumberPublication Date
CN111553284A CN111553284A (en)2020-08-18
CN111553284Btrue CN111553284B (en)2025-07-15

Family

ID=72004242

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202010357461.XAActiveCN111553284B (en)2020-04-292020-04-29 Face image processing method, device, computer equipment and storage medium

Country Status (1)

CountryLink
CN (1)CN111553284B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112257552B (en)*2020-10-192023-09-05腾讯科技(深圳)有限公司Image processing method, device, equipment and storage medium
CN112288851B (en)*2020-10-232022-09-13武汉大学 A 3D Face Modeling Method Based on Dual-Traffic Network
CN112562090B (en)*2020-11-302025-04-01厦门美图之家科技有限公司 Virtual makeup method, system and device
CN112562027A (en)*2020-12-022021-03-26北京百度网讯科技有限公司Face model generation method and device, electronic equipment and storage medium
CN113221619B (en)*2021-01-282024-02-20深圳市雄帝科技股份有限公司Face image highlight removing method and system based on Poisson reconstruction and storage medium thereof
CN114081496A (en)*2021-11-092022-02-25中国第一汽车股份有限公司Test system, method, equipment and medium for driver state monitoring device
CN113963425B (en)*2021-12-222022-03-25北京的卢深视科技有限公司Testing method and device of human face living body detection system and storage medium
CN114581978B (en)*2022-02-282024-10-22支付宝(杭州)信息技术有限公司Face recognition method and system
CN115631527B (en)*2022-10-312024-06-14福州大学至诚学院Angle self-adaption-based hairstyle attribute editing method and system
CN115955582A (en)*2022-12-152023-04-11中国平安人寿保险股份有限公司Short video generation method and device, computer equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN107610209A (en)*2017-08-172018-01-19上海交通大学Human face countenance synthesis method, device, storage medium and computer equipment
CN108985220A (en)*2018-07-112018-12-11腾讯科技(深圳)有限公司A kind of face image processing process, device and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN106303233B (en)*2016-08-082019-03-15西安电子科技大学 A Video Privacy Protection Method Based on Expression Fusion
CN108921795A (en)*2018-06-042018-11-30腾讯科技(深圳)有限公司A kind of image interfusion method, device and storage medium
CN109087379B (en)*2018-08-092020-01-17北京华捷艾米科技有限公司 Facial expression migration method and facial expression migration device
CN109377544B (en)*2018-11-302022-12-23腾讯科技(深圳)有限公司Human face three-dimensional image generation method and device and readable medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN107610209A (en)*2017-08-172018-01-19上海交通大学Human face countenance synthesis method, device, storage medium and computer equipment
CN108985220A (en)*2018-07-112018-12-11腾讯科技(深圳)有限公司A kind of face image processing process, device and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
High-fidelity Pose and Expression Normalization for face recognition in the wild;Xiangyu Zhu 等;《2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)》;20151015;第787-796页*
基于Candide-3 算法的图像中面部替换技术;黄诚;《计算技术与自动化》;20180615;第37卷(第2期);第100-104页*

Also Published As

Publication numberPublication date
CN111553284A (en)2020-08-18

Similar Documents

PublicationPublication DateTitle
CN111553284B (en) Face image processing method, device, computer equipment and storage medium
CN111710036B (en)Method, device, equipment and storage medium for constructing three-dimensional face model
Li et al.Monocular real-time volumetric performance capture
Jiang et al.3D face reconstruction with geometry details from a single image
CN110675487B (en) Three-dimensional face modeling and recognition method and device based on multi-angle two-dimensional face
CN105427385B (en)A kind of high-fidelity face three-dimensional rebuilding method based on multilayer deformation model
CN113570684B (en) Image processing method, device, computer equipment and storage medium
CN112633191B (en)Three-dimensional face reconstruction method, device, equipment and storage medium
JP6207210B2 (en) Information processing apparatus and method
Gou et al.Cascade learning from adversarial synthetic images for accurate pupil detection
CN118071968B (en)Intelligent interaction deep display method and system based on AR technology
JP4284664B2 (en) Three-dimensional shape estimation system and image generation system
CN113822965B (en)Image rendering processing method, device and equipment and computer storage medium
CN111815768B (en)Three-dimensional face reconstruction method and device
CN110660076A (en)Face exchange method
CN113628327A (en)Head three-dimensional reconstruction method and equipment
Yu et al.A video-based facial motion tracking and expression recognition system
CN114820907A (en)Human face image cartoon processing method and device, computer equipment and storage medium
Kang et al.Appearance-based structure from motion using linear classes of 3-d models
Purps et al.Reconstructing facial expressions of hmd users for avatars in vr
CN115375832A (en)Three-dimensional face reconstruction method, electronic device, storage medium, and program product
Guo et al.Photo‐realistic face images synthesis for learning‐based fine‐scale 3d face reconstruction
Bouafif et al.Monocular 3D head reconstruction via prediction and integration of normal vector field
Jian et al.Realistic face animation generation from videos
Abeysundera et al.Nearest neighbor weighted average customization for modeling faces

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp