WO2024099319A1

Movatterモバイル変換

Info

Publication number: WO2024099319A1
Application number: PCT/CN2023/130255
Authority: WO
Inventors: 张玉兵; 马雪浩; 王凯
Original assignee: Guangzhou Shiyuan Artificial Intelligence Innovation Research Institute Co Ltd; Guangzhou Shiyuan Electronics Thecnology Co Ltd
Current assignee: Guangzhou Shiyuan Artificial Intelligence Innovation Research Institute Co Ltd; Guangzhou Shiyuan Electronics Thecnology Co Ltd
Priority date: 2022-11-11
Filing date: 2023-11-07
Publication date: 2024-05-16
Anticipated expiration: 2025-05-11
Also published as: CN118037939A

Abstract

The present application belongs to the field of image processing. Provided are a virtual video image generation method and apparatus, and a device and a medium. The method comprises: acquiring a video image; acquiring a standard three-dimensional face model, and according to the standard three-dimensional face model, extracting from the video image a head posture angle and an expression base coefficient of a first person; acquiring a head posture angle adjustment amount; according to the head posture angle adjustment amount and the expression base coefficient, reconstructing an original three-dimensional face model corresponding to the video image, so as to obtain a reconstructed three-dimensional face model; performing portrait rendering on the reconstructed three-dimensional face model, so as to obtain a rendered two-dimensional facial image; and generating a virtual video image according to the rendered two-dimensional facial image and an image of a second person. By means of the method, an angle-of-view deviation generated due to the position of a camera not being located at the central position of a screen can be eliminated, a second person is a first person or another person, and the method can be adapted to various remote video scenarios. Only a head posture angle and an expression base coefficient need to be transmitted, thereby improving the efficiency of data transmission.

Description

Translated fromChinese

虚拟视频图像生成方法、装置、设备和介质Virtual video image generation method, device, equipment and medium

本申请要求于2022年11月11日提交国家知识产权局、申请号为202211413271.0、发明名称为“虚拟视频图像生成方法、装置、设备和介质”的中国专利申请的优先权，其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application filed with the State Intellectual Property Office on November 11, 2022, with application number 202211413271.0 and invention name “Virtual Video Image Generation Method, Device, Equipment and Medium”, all contents of which are incorporated by reference in this application.

技术领域Technical Field

本申请涉及图像处理技术领域，例如涉及一种虚拟视频图像生成方法、装置、设备和介质。The present application relates to the field of image processing technology, and for example, to a method, device, equipment and medium for generating a virtual video image.

背景技术Background technique

在远程视频会议中，临场感非常重要。由于摄像头位置不位于屏幕中心位置，摄像头、人物位置，人物眼睛高度以及屏幕中央位置之间会形成视角偏差，拍摄得到的本地人物处于俯视姿态，俯视姿态的形象会给远程人物带来很不好的体验。In remote video conferencing, the sense of presence is very important. Since the camera is not located at the center of the screen, there will be a perspective deviation between the camera, the person's position, the person's eye height and the center of the screen. The local person is photographed in a bird's-eye view, which will give the remote person a bad experience.

发明内容Summary of the invention

本申请目的在于：提供一种虚拟视频图像生成方法、装置、设备和介质，其能够解决视频图像中的远程人物处于俯视姿态的问题。The purpose of the present application is to provide a method, device, equipment and medium for generating a virtual video image, which can solve the problem that a remote person in a video image is in a downward-looking posture.

为达到上述目的，第一方面，本申请提供了一种虚拟视频图像生成方法，包括：To achieve the above objectives, in a first aspect, the present application provides a method for generating a virtual video image, comprising:

获取视频图像，所述视频图像为二维图像；Acquire a video image, wherein the video image is a two-dimensional image;

获取标准三维人脸模型，根据所述标准三维人脸模型提取所述视频图像第一人物的头部姿态角度和表情基系数；Acquire a standard three-dimensional face model, and extract the head posture angle and expression base coefficient of the first person in the video image according to the standard three-dimensional face model;

获取头部姿态角度调整量；Get the head posture angle adjustment amount;

根据所述头部姿态角度调整量和所述表情基系数重构与所述视频图像对应的原始三维人脸模型，得到重构后三维人脸模型；Reconstructing an original three-dimensional face model corresponding to the video image according to the head posture angle adjustment amount and the expression base coefficient to obtain a reconstructed three-dimensional face model;

对所述重构后三维人脸模型进行人像渲染，得到二维的渲染后人脸图像；Performing portrait rendering on the reconstructed three-dimensional face model to obtain a two-dimensional rendered face image;

根据所述二维的渲染后人脸图像和第二人物图像生成虚拟视频图像，所述第二人物为所述第一人物或其他人物。A virtual video image is generated according to the two-dimensional rendered face image and a second person image, where the second person is the first person or another person.

第二方面，本申请提供了一种虚拟视频图像生成装置，包括：In a second aspect, the present application provides a virtual video image generation device, comprising:

视频图像获取模块，用于获取视频图像，所述视频图像为二维图像；A video image acquisition module, used to acquire a video image, wherein the video image is a two-dimensional image;

参数提取模块，用于获取标准三维人脸模型，根据所述标准三维人脸模型提取所述视频图像第一人物的头部姿态角度和表情基系数；A parameter extraction module, used to obtain a standard three-dimensional face model, and extract the head posture angle and expression base coefficient of the first person in the video image according to the standard three-dimensional face model;

头部姿态角度调整量获取模块，用于获取头部姿态角度调整量；A head posture angle adjustment amount acquisition module is used to acquire the head posture angle adjustment amount;

原始三维人脸模型重构模块，用于根据所述头部姿态角度调整量和所述表情基系数重构与所述视频图像对应的原始三维人脸模型，得到重构后三维人脸模型；An original three-dimensional face model reconstruction module is used to reconstruct the original three-dimensional face model corresponding to the video image according to the head posture angle adjustment amount and the expression base coefficient to obtain a reconstructed three-dimensional face model;

人像渲染模块，用于对所述重构后三维人脸模型进行人像渲染，得到二维的渲染后人脸图像；A portrait rendering module, used for performing portrait rendering on the reconstructed three-dimensional face model to obtain a two-dimensional rendered face image;

虚拟视频图像生成模块，用于根据所述二维的渲染后人脸图像和第二人物图像生成虚拟视频图像，所述第二人物为所述第一人物或其他人物。A virtual video image generation module is used to generate a virtual video image based on the two-dimensional rendered face image and a second person image, where the second person is the first person or another person.

本申请还提供一种计算机设备，包括存储器和处理器，所述存储器中存储有计算机程序，所述处理器执行所述计算机程序时实现上述任一项所述的一种虚拟视频图像生成方法和/或上述任一项所述的虚拟视频图像生成方法的步骤。The present application also provides a computer device, including a memory and a processor, wherein the memory stores a computer program, and when the processor executes the computer program, it implements any one of the above-mentioned methods for generating a virtual video image and/or any one of the above-mentioned steps of the method for generating a virtual video image.

本申请还提供一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行时实现上述任一项所述的一种虚拟视频图像生成方法和/或上述任一项所述的虚拟视频图像生成方法的步骤。The present application also provides a computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the method for generating a virtual video image described in any one of the above items and/or the steps of the method for generating a virtual video image described in any one of the above items are implemented.

本申请的一种虚拟视频图像生成方法，包括获取二维的视频图像，所述视频图像为二维图像；获取标准三维人脸模型，根据所述标准三维人脸模型提取所述视频图像第一人物的头部姿态角度和表情基系数，头部姿态角度和表情基系数与人物身份无关，能够实现头部姿态角度、表情基系数与人物身份之间的解耦。获取头部姿态角度调整量，根据头部姿态角度调整量和所述表情基系数重构与视频图像对应的原始三维人脸模型，能够消除由于摄像头位置不位于屏幕中央位置产生的视角偏差，得到重构后三维人脸模型。在重构原始三维人脸模型之前，只需要传输头部姿态角和表情基系数，不需要传输会议视频或视频图像，能够减少对网络带宽的占用，提高了数据传输效率。对重构后三维人脸模型进行人像渲染，得到二维的渲染后人脸图像。根据第二人物图像将二维的渲染后人脸图像转换为第二人物的虚拟视频图像，在正式场合第二人物为第一人物，输出视频图像中第一人物的虚拟视频图像，在非正式场合第二人物为其他人物，输出第二人物的虚拟视频图像，能够适应多种远程视频场景，降低非正式远程视频场景的实时性要求。A virtual video image generation method of the present application includes obtaining a two-dimensional video image, wherein the video image is a two-dimensional image; obtaining a standard three-dimensional face model, and extracting the head posture angle and expression base coefficient of the first person in the video image according to the standard three-dimensional face model, wherein the head posture angle and expression base coefficient are independent of the person's identity, and can achieve decoupling between the head posture angle, expression base coefficient and the person's identity. Obtaining the head posture angle adjustment amount, reconstructing the original three-dimensional face model corresponding to the video image according to the head posture angle adjustment amount and the expression base coefficient, and eliminating the viewing angle deviation caused by the camera position not being located at the center of the screen, and obtaining the reconstructed three-dimensional face model. Before reconstructing the original three-dimensional face model, only the head posture angle and expression base coefficient need to be transmitted, and no conference video or video image needs to be transmitted, which can reduce the occupation of network bandwidth and improve data transmission efficiency. The reconstructed three-dimensional face model is subjected to portrait rendering to obtain a two-dimensional rendered face image. The two-dimensional rendered face image is converted into a virtual video image of the second person according to the image of the second person. In formal occasions, the second person is the first person, and the virtual video image of the first person in the video image is output. In informal occasions, the second person is another person, and the virtual video image of the second person is output. This method can adapt to a variety of remote video scenes and reduce the real-time requirements of informal remote video scenes.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为一实施例的虚拟视频图像生成方法的流程示意图；FIG1 is a schematic diagram of a flow chart of a method for generating a virtual video image according to an embodiment;

图2为一实施例的获取头部姿态角度调整量的流程示意图；FIG2 is a schematic diagram of a process for obtaining a head posture angle adjustment amount according to an embodiment;

图3为一实施例的提取视频图像的头部姿态角度和表情基系数的流程示意图；FIG3 is a schematic diagram of a process for extracting head posture angles and expression base coefficients of a video image according to an embodiment;

图4为一实施例的对待训练深度卷积神经网络进行训练的流程示意图；FIG4 is a schematic diagram of a process of training a deep convolutional neural network to be trained according to an embodiment;

图5为一实施例的虚拟视频图像生成装置的结构示意框图；FIG5 is a schematic block diagram of the structure of a virtual video image generating device according to an embodiment;

图6为一实施例的计算机设备的结构示意框图。FIG6 is a schematic block diagram of the structure of a computer device according to an embodiment.

本申请目的的实现、功能特点及优点将结合实施例，参照附图做进一步说明。The realization of the purpose, functional features and advantages of this application will be further explained in conjunction with embodiments and with reference to the accompanying drawings.

具体实施方式Detailed ways

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处描述的具体实施例仅仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solution and advantages of the present application more clearly understood, the present application is further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application and are not used to limit the present application.

本技术领域技术人员可以理解，除非特意声明，这里使用的单数形式“一”、“一个”、“上述”和“该”也可包括复数形式。应该进一步理解的是，本申请的说明书中使用的措辞“包括”是指存在特征、整数、步骤、操作、元件、模块和/或组件，但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、模块、组件和/或它们的组。应该理解，当我们称元件被“连接”或“耦接”到另一元件时，它可以直接连接或耦接到其他元件，或者也可以存在中间元件。此外，这里使用的“连接”或“耦接”可以包括无线连接或无线耦接。这里使用的措辞“和/或”包括一个或更多个相关联的列出项的全部或任一模块和全部组合。It will be understood by those skilled in the art that, unless expressly stated, the singular forms "a", "an", "above", and "the" used herein may also include plural forms. It should be further understood that the term "comprising" used in the specification of the present application refers to the presence of features, integers, steps, operations, elements, modules, and/or components, but does not exclude the presence or addition of one or more other features, integers, steps, operations, elements, modules, components, and/or groups thereof. It should be understood that when we refer to an element as being "connected" or "coupled" to another element, it may be directly connected or coupled to the other element, or there may also be an intermediate element. In addition, the "connection" or "coupling" used herein may include wireless connection or wireless coupling. The term "and/or" used herein includes all or any module and all combinations of one or more associated listed items.

本技术领域技术人员可以理解，除非另外定义，这里使用的所有术语(包括技术术语和科学术语)，具有与本申请所属领域中的普通技术人员的一般理解相同的意义。还应该理解的是，诸如通用字典中定义的那些术语，应该被理解为具有与相关技术的上下文中的意义一致的意义，并且除非像这里一样被特定定义，否则不会用理想化或过于正式的含义来解释。It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as those generally understood by those skilled in the art to which this application belongs. It should also be understood that terms such as those defined in general dictionaries should be understood to have meanings consistent with the meanings in the context of the relevant technology, and will not be interpreted with idealized or overly formal meanings unless specifically defined as here.

在远程视频会议或录制视频的过程中，临场感非常重要。由于摄像头位置不位于屏幕中心位置，摄像头、人物位置，人物眼睛高度以及屏幕中央位置之间会形成视角偏差。以摄像头位于屏幕上方为例，摄像头以俯视角度进行拍摄，观看视频图像的人物看到的视频图像中的人物处于俯视姿态，观看视频图像的人物无法与视频图像中的人物对着屏幕进行面对面的眼神互动交流，会议交流的体验很不好。相关技术中，解决视频图像中的人物处于俯视姿态的方法，是通过调整视频图像中的人物的眼睛的视线角度来改善交互体验。上述方法不能调整视频图像中的人物的头部姿态角度，无法解决视频图像中的人物处于俯视姿态的问题。During remote video conferencing or video recording, the sense of presence is very important. Since the camera is not located at the center of the screen, there will be a viewing angle deviation between the camera, the position of the person, the height of the person's eyes and the center of the screen. Taking the case where the camera is located above the screen, the camera shoots at a bird's-eye view. The person watching the video image sees the person in the video image in a bird's-eye view. The person watching the video image cannot interact with the person in the video image face to face with the screen through eye contact, and the conference communication experience is very bad. In the related art, a method for solving the problem of a person in a video image being in a bird's-eye view is to improve the interactive experience by adjusting the line of sight of the person's eyes in the video image. The above method cannot adjust the head posture angle of the person in the video image, and cannot solve the problem of the person in the video image being in a bird's-eye view.

在一个实施例中，参照图1，是本申请公开的虚拟视频图像生成方法的流程示意图，包括以下步骤S1-S6：In one embodiment, referring to FIG. 1 , it is a flow chart of a virtual video image generation method disclosed in the present application, which includes the following steps S1-S6:

S1：获取视频图像，所述视频图像为二维图像。S1: Acquire a video image, where the video image is a two-dimensional image.

视频图像为远程视频会议或录制视频时摄像头采集到的包含第一人物的人脸的二维真实图像，摄像头不在屏幕中央的位置，摄像头、人物位置，人物眼睛高度以及屏幕中央位置之间会形成视角偏差。以摄像头设置在远程视频会议的屏幕的上方为例，摄像头进行俯视拍摄，得到的视频图像中的第一人物可能处于俯视姿态。The video image is a two-dimensional real-time video of the first person’s face captured by the camera during a remote video conference or video recording. In a real image, if the camera is not in the center of the screen, there will be a viewing angle deviation between the camera, the person's position, the person's eye height and the center of the screen. For example, if the camera is set above the screen of a remote video conference, the camera shoots from a bird's-eye view, and the first person in the obtained video image may be in a bird's-eye view.

可选地，视频图像也可以为人工生成的包含人脸的虚拟图像，或者其他真实人物的视频图像。Optionally, the video image may also be an artificially generated virtual image containing a human face, or a video image of other real people.

S2：获取标准三维人脸模型，根据所述标准三维人脸模型提取所述视频图像第一人物的头部姿态角度和表情基系数。S2: Acquire a standard three-dimensional face model, and extract the head posture angle and expression base coefficient of the first person in the video image according to the standard three-dimensional face model.

对所述视频图像进行图像分割，得到二维人脸区域；Performing image segmentation on the video image to obtain a two-dimensional face area;

将所述二维人脸区域映射到所述标准三维人脸模型上，得到三维人脸区域；Mapping the two-dimensional face region onto the standard three-dimensional face model to obtain a three-dimensional face region;

对所述三维人脸区域使用面部动作捕捉方法提取所述头部姿态角度和所述表情基系数。The head posture angle and the expression base coefficient are extracted from the three-dimensional face area using a facial motion capture method.

标准三维人脸模型没有身份属性，标准三维人脸模型的参数通过计算多个具体用户的三维人脸模型的参数的平均值得到。The standard three-dimensional face model has no identity attribute, and the parameters of the standard three-dimensional face model are obtained by calculating the average value of the parameters of the three-dimensional face models of multiple specific users.

头部姿态角度包括头部的俯视或仰视角度，向左转向或向右转向角度，以及歪头转向角度，俯视角度或仰视角度对应三维空间中z方向上的角度，向左转向角度或向右转向角度对应三维空间中y方向上的角度，歪头转向角度对应三维空间中x方向上的角度。The head posture angles include the head's downward or upward looking angle, the left or right turning angle, and the tilted head turning angle. The downward or upward looking angle corresponds to the angle in the z direction in three-dimensional space, the left or right turning angle corresponds to the angle in the y direction in three-dimensional space, and the tilted head turning angle corresponds to the angle in the x direction in three-dimensional space.

表情基系数包括人脸的面部表情和人脸的口型信息。The expression base coefficient includes the facial expression and lip shape information of the face.

S3：获取头部姿态角度调整量。S3: Obtain the head posture angle adjustment amount.

获取所述第一人物的位置、摄像头位置、所述第一人物的眼睛高度以及屏幕中央位置；Obtaining the position of the first person, the camera position, the eye height of the first person, and the center position of the screen;

获取所述第一人物的位置、摄像头位置、所述第一人物的眼睛高度以及屏幕中央位置之间形成的视角偏差；Obtaining a perspective deviation formed between the position of the first person, the position of the camera, the eye height of the first person, and the center position of the screen;

根据所述视角偏差确定所述头部姿态角度调整量。The head posture angle adjustment amount is determined according to the viewing angle deviation.

视角偏差包括视频图像中头部的俯视角度偏差或仰视角度偏差，向左转向角度偏差或向右转向角度偏差，以及歪头转向角度偏差，视角偏差能够反映由摄像头位置引起的视频图像中的头部姿态角度与真实头部姿态角度之间的偏差。The perspective deviation includes the downward angle deviation or upward angle deviation of the head in the video image, the left turning angle deviation or the right turning angle deviation, and the tilted head turning angle deviation. The perspective deviation can reflect the deviation between the head posture angle in the video image caused by the camera position and the actual head posture angle.

视角偏差在三维空间中的三个方向上的偏离角度即为三维空间中三个方向上的头部姿态角度调整量，示例性，第一人物的真实头部姿态角度为向下俯视30度，视角偏差包括向下俯视30度，获取的视频图像中的第一人物的头部姿态角度为向下俯视60度。The deviation angle of the viewing angle deviation in three directions in the three-dimensional space is the head posture angle adjustment amount in the three directions in the three-dimensional space. For example, the real head posture angle of the first person is looking down 30 degrees, the viewing angle deviation includes looking down 30 degrees, and the head posture angle of the first person in the acquired video image is looking down 60 degrees.

S4：根据所述头部姿态角度调整量和所述表情基系数重构与所述视频图像对应的原始三维人脸模型，得到重构后三维人脸模型。S4: reconstructing an original three-dimensional face model corresponding to the video image according to the head posture angle adjustment amount and the expression base coefficient to obtain a reconstructed three-dimensional face model.

根据以下公式重构所述原始三维人脸模型：
The original 3D face model is reconstructed according to the following formula:

其中，B₁为所述头部姿态角度调整量，B₂为表情基，α为与所述头部姿态角度调整量对应的第一系数，β为所述表情基系数，S为所述重构后三维人脸模型，为所述原始三维人脸模型。Wherein,_B1 is the head posture angle adjustment amount,_B2 is the expression base, α is the first coefficient corresponding to the head posture angle adjustment amount, β is the expression base coefficient, S is the reconstructed three-dimensional face model, is the original 3D face model.

视频图像中的第一人物的表情与原始三维人脸模型的表情相同，示例性，视频图像中的第一人物为喜悦的表情，则原始三维人脸模型也为喜悦的表情。The expression of the first person in the video image is the same as the expression of the original three-dimensional face model. For example, if the first person in the video image has an expression of joy, then the original three-dimensional face model also has an expression of joy.

头部姿态角度调整量为一个向量，包括头部姿态角度调整量第一分量，即x方向调整分量，头部姿态角度调整量第二分量，即y方向调整分量，头部姿态角度调整量第三分量，即z方向调整分量。The head posture angle adjustment amount is a vector, including a first component of the head posture angle adjustment amount, namely, an x-direction adjustment component, a second component of the head posture angle adjustment amount, namely, a y-direction adjustment component, and a third component of the head posture angle adjustment amount, namely, a z-direction adjustment component.

通过改变与所述头部姿态角度调整量对应的第一系数，可以改变对原始三维人脸模型的头部姿态的调整程度。在一实施例中，将第一系数设置为1，使得重构后三维人脸模型的头部姿态角度与第一用户的真实头部姿态角度相同，例如，第一用户的真实头部姿态角度为向下俯视30度，重构后三维人脸模型的头部姿态角度也为向下俯视30度，从而消除视角误差。By changing the first coefficient corresponding to the head posture angle adjustment amount, the degree of adjustment of the head posture of the original three-dimensional face model can be changed. In one embodiment, the first coefficient is set to 1, so that the head posture angle of the reconstructed three-dimensional face model is the same as the real head posture angle of the first user. For example, the real head posture angle of the first user is looking down 30 degrees, and the head posture angle of the reconstructed three-dimensional face model is also looking down 30 degrees, thereby eliminating the perspective error.

S5：对所述重构后三维人脸模型进行人像渲染，得到二维的渲染后人脸图像。S5: performing portrait rendering on the reconstructed three-dimensional face model to obtain a two-dimensional rendered face image.

使用可微分渲染方法对所述重构后三维人脸模型进行人像渲染，得到所述二维的渲染后人脸图像。The reconstructed three-dimensional face model is rendered using a differentiable rendering method to obtain the two-dimensional rendered face image.

所述使用可微分渲染方法对所述重构后三维人脸模型进行人像渲染，得到所述二维的渲染后人脸图像，包括：The step of performing portrait rendering on the reconstructed three-dimensional face model using a differentiable rendering method to obtain the two-dimensional rendered face image comprises:

对所述重构后三维人脸模型进行光栅化，得到光栅化图像；rasterizing the reconstructed three-dimensional face model to obtain a rasterized image;

对所述光栅化图像进行插值，得到插值图像；interpolating the rasterized image to obtain an interpolated image;

生成所述插值图像的纹理，得到纹理图像；Generating a texture of the interpolated image to obtain a texture image;

对所述纹理图像进行抗锯齿处理，得到所述二维的渲染后人脸图像。Anti-aliasing processing is performed on the texture image to obtain the two-dimensional rendered face image.

二维的渲染后人脸图像为二维的人脸模板，不具备身份属性。二维的渲染后人脸图像能够以二维的方式展示重构后三维人脸模型的轮廓，二维的渲染后人脸图像中的人物具有与重构后三维人脸模型相同的表情。The two-dimensional rendered face image is a two-dimensional face template and does not have identity attributes. The two-dimensional rendered face image can show the outline of the reconstructed three-dimensional face model in a two-dimensional manner, and the person in the two-dimensional rendered face image has the same expression as the reconstructed three-dimensional face model.

S6：根据所述二维的渲染后人脸图像和第二人物图像生成虚拟视频图像，所述第二人物为所述第一人物或其他人物。S6: Generate a virtual video image according to the two-dimensional rendered face image and the second person image, where the second person is the first person or another person.

所述根据所述二维的渲染后人脸图像和第二人物图像生成虚拟视频图像，包括：The step of generating a virtual video image according to the two-dimensional rendered face image and the second person image comprises:

将所述二维的渲染后人脸图像和所述第二人物图像输入虚拟视频图像生成模型，生成所述虚拟视频图像；所述虚拟视频图像生成模型由对待训练深度卷积神经网络进行训练得到。The two-dimensional rendered face image and the second character image are input into a virtual video image generation model to generate the virtual video image; the virtual video image generation model is obtained by training a deep convolutional neural network to be trained.

所述对待训练深度卷积神经网络进行训练，包括：The training of the deep convolutional neural network to be trained includes:

获取训练视频图像集；Obtain a training video image set;

生成与所述训练视频图像集中每一张所述视频图像对应的所述虚拟视频图像；generating the virtual video image corresponding to each of the video images in the training video image set;

根据所述视频图像与所述虚拟视频图像计算第一损失函数值；Calculating a first loss function value according to the video image and the virtual video image;

根据所述视频图像的图像细节与所述虚拟视频图像的图像细节计算第二损失函数值；Calculating a second loss function value according to the image details of the video image and the image details of the virtual video image;

根据所述视频图像的图像特征和所述虚拟视频图像的图像特征计算第三损失函数值；Calculating a third loss function value according to the image features of the video image and the image features of the virtual video image;

根据所述第一损失函数值、第二损失函数值和第三损失函数值计算最终损失函数值；Calculate a final loss function value according to the first loss function value, the second loss function value and the third loss function value;

根据所述最终损失函数值进行反向传播，更新所述待训练深度卷积神经网络的网络参数。Back propagation is performed according to the final loss function value to update the network parameters of the deep convolutional neural network to be trained.

根据二维的渲染后人脸图像和第二人物图像生成第二人物的虚拟视频图像，第二人物图像中的第二人物可以为步骤S1中的视频图像的第一人物，也可以为数据库中的其他人物，还可以是卡通形象的人物。A virtual video image of a second person is generated based on the two-dimensional rendered face image and the second person image. The second person in the second person image can be the first person in the video image in step S1, or other people in the database, or a cartoon character.

待训练深度卷积神经网络可以采用Transformer神经网络，ResNet神经网络，ShuffleNet神经网络或MobileNet神经网络中的其中一种。The deep convolutional neural network to be trained can be one of the Transformer neural network, ResNet neural network, ShuffleNet neural network or MobileNet neural network.

将二维的渲染后人脸图像转换为第二人物的虚拟视频图像，在正式场合输出与视频图像中的第一人物对应的虚拟视频图像，在非正式场合输出其他人物的虚拟视频图像，能够适应多种远程视频场景，降低非正式远程视频场景的实时性要求。The two-dimensional rendered face image is converted into a virtual video image of a second person. In formal occasions, a virtual video image corresponding to the first person in the video image is output, and in informal occasions, virtual video images of other persons are output. This can adapt to a variety of remote video scenes and reduce the real-time requirements of informal remote video scenes.

本申请的一种虚拟视频图像生成方法，包括获取二维的视频图像，所述视频图像为二维图像；获取标准三维人脸模型，根据所述标准三维人脸模型提取所述视频图像第一人物的头部姿态角度和表情基系数，头部姿态角度和表情基系数与人物身份无关，能够实现头部姿态角度、表情基系数与人物身份之间的解耦。获取头部姿态角度调整量，根据头部姿态角度调整量和所述表情基系数重构与视频图像对应的原始三维人脸模型，能够消除由于摄像头位置不位于屏幕中央位置产生的视角偏差，得到重构后三维人脸模型。在重构原始三维人脸模型之前，只需要传输头部姿态角和表情基系数，不需要传输会议视频或视频图像，能够减少对网络带宽的占用，提高了数据传输效率。对重构后三维人脸模型进行人像渲染，得到二维的渲染后人脸图像。根据第二人物图像将二维的渲染后人脸图像转换为第二人物的虚拟视频图像，在正式场合输出视频图像中第一人物的虚拟视频图像，在非正式场合输出第二人物的虚拟视频图像，能够适应多种远程视频场景，降低非正式远程视频场景的实时性要求。A virtual video image generation method of the present application includes obtaining a two-dimensional video image, wherein the video image is a two-dimensional image; obtaining a standard three-dimensional face model, and extracting the head posture angle and expression base coefficient of the first person in the video image according to the standard three-dimensional face model, wherein the head posture angle and expression base coefficient are independent of the person's identity, and can achieve decoupling between the head posture angle, expression base coefficient and the person's identity. Obtaining the head posture angle adjustment amount, reconstructing the original three-dimensional face model corresponding to the video image according to the head posture angle adjustment amount and the expression base coefficient, and eliminating the viewing angle deviation caused by the camera position not being located at the center of the screen, and obtaining the reconstructed three-dimensional face model. Before reconstructing the original three-dimensional face model, only the head posture angle and expression base coefficient need to be transmitted, and no conference video or video image needs to be transmitted, which can reduce the occupation of network bandwidth and improve data transmission efficiency. The reconstructed three-dimensional face model is subjected to portrait rendering to obtain a two-dimensional rendered face image. The two-dimensional rendered face image is converted into a virtual video image of the second person according to the second person image. The virtual video image of the first person in the video image is output in formal occasions, and the virtual video image of the second person is output in informal occasions. This can adapt to a variety of remote video scenes and reduce the real-time requirements of informal remote video scenes.

在一个实施例中，参照图2，所述获取头部姿态角度调整量，包括：In one embodiment, referring to FIG. 2 , the step of obtaining the head posture angle adjustment value includes:

S31：获取所述第一人物的位置、摄像头位置、所述第一人物的眼睛高度以及屏幕中央位置。S31: Obtain the position of the first person, the camera position, the eye height of the first person, and the center position of the screen.

建立三维空间，获取三维空间中的第一人物的位置，得到第一坐标，获取摄像头位置，得到第二坐标，获取屏幕中央位置，得到第三坐标。Establish a three-dimensional space, obtain the position of the first person in the three-dimensional space, obtain the first coordinate, obtain the camera position, obtain the second coordinate, obtain the center position of the screen, and obtain the third coordinate.

第一坐标、第二坐标和第三坐标能够反映第一人物、摄像头位置以及屏幕中央位置之间的相对位置关系。The first coordinate, the second coordinate and the third coordinate can reflect the relative position relationship between the first person, the camera position and the center position of the screen.

S32：获取所述第一人物的位置、摄像头位置、所述第一人物的眼睛高度以及屏幕中央位置之间形成的视角偏差。S32: Obtain the perspective deviation formed between the position of the first character, the camera position, the eye height of the first character and the center position of the screen.

根据第一坐标、第二坐标和第三坐标计算第一人物的位置、摄像头位置、所述第一人物的眼睛高度以及屏幕中央位置之间的夹角，根据夹角能够得到视角偏差。视角偏差包括俯视姿态角度偏差偏差、仰视姿态角度偏差、向左旋转角度偏差、向右旋转角度偏差以及歪头转向角度偏差。The angle between the position of the first person, the camera position, the eye height of the first person and the center position of the screen is calculated according to the first coordinate, the second coordinate and the third coordinate, and the viewing angle deviation can be obtained according to the angle. The viewing angle deviation includes the angle deviation of the downward posture, the angle deviation of the upward posture, the left rotation angle deviation, the right rotation angle deviation and the tilted head steering angle deviation.

当视角偏差包括俯视姿态角度偏差时，需要将头部姿态角度向上调整；当视角偏差包括仰视姿态角度偏差时，需要将头部姿态角度向下调整。When the viewing angle deviation includes a downward-looking posture angle deviation, the head posture angle needs to be adjusted upward; when the viewing angle deviation includes an upward-looking posture angle deviation, the head posture angle needs to be adjusted downward.

当视角偏差包括向左转向角度偏差时，需要将头部姿态角度向右调整；当视角偏差包括向右转向角度偏差时，需要将头部姿态角度向左调整。When the viewing angle deviation includes a left turning angle deviation, the head posture angle needs to be adjusted to the right; when the viewing angle deviation includes a right turning angle deviation, the head posture angle needs to be adjusted to the left.

当视角偏差包括歪头顺时针转向偏差时，需要将头部姿态角度沿着逆时针调整；当视角偏差包括歪头逆时针转向偏差时，需要将头部姿态角度沿着顺时针调整。When the viewing angle deviation includes a clockwise deviation of the tilted head, the head posture angle needs to be adjusted counterclockwise; when the viewing angle deviation includes a counterclockwise deviation of the tilted head, the head posture angle needs to be adjusted clockwise.

S33：根据所述视角偏差确定所述头部姿态角度调整量。S33: Determine the head posture angle adjustment amount according to the viewing angle deviation.

头部姿态角度在三维空间的三个方向上的调整顺序可以是任意的，将头部姿态角度向上调整或向下调整的量作为头部姿态角度调整量的第一分量，若为向上调整，头部姿态角度调整量的第一分量为正，若为向下调整，头部姿态角度调整量的第一分量为负；将头部姿态角度向左调整或向右调整的量作为头部姿态角度的第二分量，若为向左调整，头部姿态角度调整量的第二分量为正，若为向右调整，头部姿态角度调整量的第二分量为负；将头部姿态角度顺时针调整或逆时针调整的量作为头部姿态角度调整量的第三分量，若为顺时针调整，则头部姿态角度调整量的第三分量为正，若为逆时针调整，则头部姿态角度调整量的第三分量为负。The adjustment order of the head posture angle in the three directions of the three-dimensional space can be arbitrary. The amount of the head posture angle adjusted upward or downward is taken as the first component of the head posture angle adjustment amount. If it is adjusted upward, the first component of the head posture angle adjustment amount is positive, and if it is adjusted downward, the first component of the head posture angle adjustment amount is negative; the amount of the head posture angle adjusted left or right is taken as the second component of the head posture angle. If it is adjusted left, the second component of the head posture angle adjustment amount is positive, and if it is adjusted right, the second component of the head posture angle adjustment amount is negative; the amount of the head posture angle adjusted clockwise or counterclockwise is taken as the third component of the head posture angle adjustment amount. If it is adjusted clockwise, the third component of the head posture angle adjustment amount is positive, and if it is adjusted counterclockwise, the third component of the head posture angle adjustment amount is negative.

头部姿态角度调整量用于重构原始三维人脸模型，能够消除摄像头位置、人物的位置、屏幕中央位置以及人物眼睛的高度之间的夹角导致的视角偏差。The head posture angle adjustment is used to reconstruct the original three-dimensional face model, which can eliminate the perspective deviation caused by the angle between the camera position, the character's position, the center position of the screen, and the height of the character's eyes.

示例性，视角偏差为向下俯视30度、向左转向40度和歪头顺时针转向50度，则头部姿态角度调整量的第一分量为正30度，头部姿态角度调整量的第二分量为负40度，头部姿态角度调整量的第三分量为负50度。For example, if the viewing angle deviation is 30 degrees looking down, 40 degrees turning to the left, and 50 degrees turning clockwise, the first component of the head posture angle adjustment is positive 30 degrees, the second component of the head posture angle adjustment is negative 40 degrees, and the third component of the head posture angle adjustment is negative 50 degrees.

如上所述，获取头部姿态角度调整量，包括获取第一人物的位置、摄像头位置、所述第一人物的眼睛高度以及屏幕中央位置。获取第一人物的位置、摄像头位置、所述第一人物的眼睛高度以及屏幕中央位置之间形成的视角偏差。根据视角偏差确定头部姿态角度调整量。头部姿态角度调整量用于重构原始三维人脸模型，能够消除摄像头位置、人物的位置、屏幕中央位置以及人物眼睛的高度之间的夹角导致的视角偏差。As described above, obtaining the head posture angle adjustment amount includes obtaining the position of the first person, the camera position, the eye height of the first person, and the center position of the screen. Obtaining the perspective deviation formed between the position of the first person, the camera position, the eye height of the first person, and the center position of the screen. Determining the head posture angle adjustment amount according to the perspective deviation. The head posture angle adjustment is used to reconstruct the original three-dimensional face model, which can eliminate the perspective deviation caused by the angle between the camera position, the character's position, the center position of the screen, and the height of the character's eyes.

在一个实施例中，所述根据所述头部姿态角度调整量和所述表情基系数重构与所述视频图像对应的原始三维人脸模型，得到重构后三维人脸模型，包括：In one embodiment, reconstructing the original three-dimensional face model corresponding to the video image according to the head posture angle adjustment amount and the expression base coefficient to obtain the reconstructed three-dimensional face model includes:

原始三维人脸模型通过对视频图像进行三维建模得到，可以采用三维建模软件进行三维建模，三维建模软件为3dmax或AUTOCAD。The original three-dimensional face model is obtained by performing three-dimensional modeling on the video image, and the three-dimensional modeling software can be used for the three-dimensional modeling. The three-dimensional modeling software is 3dmax or AUTOCAD.

通过改变与所述头部姿态角度调整量对应的第一系数，可以改变对原始三维人脸模型的头部姿态的调整程度。By changing the first coefficient corresponding to the head posture angle adjustment amount, the degree of adjustment of the head posture of the original three-dimensional face model can be changed.

第一系数越大，通过头部姿态角度调整量对原始三维人脸模型的调整程度越大，第一系数越小，通过头部姿态角度调整量对原始三维人脸模型的调整程度越小。The larger the first coefficient is, the greater the degree of adjustment of the original three-dimensional face model through the head posture angle adjustment amount, and the smaller the first coefficient is, the smaller the degree of adjustment of the original three-dimensional face model through the head posture angle adjustment amount.

原始三维人脸模型和标准三维人脸模型均可以使用BFM人脸模型、ARKit人脸模型和FLAME人脸模型的其中一种，原始三维人脸模型与标准三维人脸模型选用同一种人脸模型。Both the original 3D face model and the standard 3D face model can use one of the BFM face model, ARKit face model, and FLAME face model. The original 3D face model and the standard 3D face model use the same face model.

头部姿态角度调整量和表情基系数均为一个向量，B₁可以表示为(B₁₁，B₁₂，B₁₃)，B₁₁为头部姿态角度调整量的第一分量，B₁₂为头部姿态角度调整量的第二分量，B13为头部姿态角度调整量的第三分量。The head posture angle adjustment amount and the expression base coefficient are both a vector,_B1 can be expressed as (_B11 ,_B12 ,_B13 ),_B11 is the first component of the head posture angle adjustment amount,_B12 is the second component of the head posture angle adjustment amount, and B13 is the third component of the head posture angle adjustment amount.

一个原始三维人脸模型对应唯一的表情基，确定原始三维人脸模型后可以确定对应的表情基，保持表情基系数不变可以得到与视频图像具有相同表情的重构后三维人脸模型。An original 3D face model corresponds to a unique expression base. After determining the original 3D face model, the corresponding expression base can be determined. By keeping the expression base coefficients unchanged, a reconstructed 3D face model with the same expression as the video image can be obtained.

可选地，每一种表情对应一个标准表情基系数，获取目标表情对应的标准表情基系数，对表情基系数进行调整，当表情基系数与目标表情对应的标准表情基系数之间的差异小于表情基系数差异阈值时，结束调整表情基系数。标准三维人脸模型的表情默认为喜悦表情，调整表情基系数能够将原始三维人脸模型的表情调整为不同于视频图像中的表情的其他表情。Optionally, each expression corresponds to a standard expression base coefficient, the standard expression base coefficient corresponding to the target expression is obtained, the expression base coefficient is adjusted, and when the difference between the expression base coefficient and the standard expression base coefficient corresponding to the target expression is less than the expression base coefficient difference threshold, the expression base coefficient adjustment is terminated. The expression of the standard three-dimensional face model is a joyful expression by default, and adjusting the expression base coefficient can adjust the expression of the original three-dimensional face model to other expressions different from the expression in the video image.

根据头部姿态角度调整量和表情基系数能够结合三维空间中的三个方向和表情对原始三维人脸模型进行重构，消除视角误差。According to the head posture angle adjustment amount and the expression base coefficient, the original three-dimensional face model can be reconstructed by combining the three directions and expressions in the three-dimensional space to eliminate the perspective error.

如上所述，根据头部姿态角度调整量和表情基系数重构原始三维人脸模型，得到重构后三维人脸模型。根据头部姿态角度调整量和表情基系数能够结合三维空间中的三个方向和表情对原始三维人脸模型进行重构，消除视角误差。As described above, the original 3D face model is reconstructed according to the head posture angle adjustment amount and the expression base coefficient to obtain the reconstructed 3D face model. The original 3D face model is reconstructed to eliminate the perspective error.

在一个实施例中，参照图3，所述获取标准三维人脸模型，根据所述标准三维人脸模型提取所述视频图像第一人物的头部姿态角度和表情基系数，包括：In one embodiment, referring to FIG. 3 , the step of acquiring a standard three-dimensional face model and extracting a head posture angle and expression base coefficient of the first person in the video image according to the standard three-dimensional face model includes:

S21：对所述视频图像进行图像分割，得到二维人脸区域。S21: performing image segmentation on the video image to obtain a two-dimensional face region.

可以使用基于像素值的图像分割方法，也可以使用基于标签的图像分割方法，还可以使用其他图像分割方法。You can use a pixel value-based image segmentation method, a label-based image segmentation method, or other image segmentation methods.

使用只包含人脸的人脸区域，不需要对真实的背景区域进行实时处理，能够防止背景提取的过程中出现错误，还能够降低对算力的要求。Using the face area that only contains the human face eliminates the need for real-time processing of the actual background area, which can prevent errors in the background extraction process and reduce the requirements for computing power.

S22：将所述二维人脸区域映射到所述标准三维人脸模型上，得到三维人脸区域。S22: Mapping the two-dimensional face region onto the standard three-dimensional face model to obtain a three-dimensional face region.

映射的方式可以为非线性映射，也可以为线性映射，在一实施例中，采用非线性映射。The mapping method may be non-linear mapping or linear mapping. In one embodiment, non-linear mapping is adopted.

可选地，使用三维建模软件将二维人脸区域直接转换为三维人脸区域，三维建模软件可以为3dmax，也可以为AUTOCAD，还可以为其他三维建模软件。Optionally, a two-dimensional face region is directly converted into a three-dimensional face region using three-dimensional modeling software, and the three-dimensional modeling software may be 3dmax, AUTOCAD, or other three-dimensional modeling software.

S23：对所述三维人脸区域使用面部动作捕捉方法提取所述头部姿态角度和所述表情基系数。S23: extracting the head posture angle and the expression base coefficient from the three-dimensional face area using a facial motion capture method.

将所述三维人脸区域输入回归模型进行非线性回归，得到所述头部姿态角度和所述表情基系数；所述回归模型由对待训练ResNet50神经网络进行训练得到。The three-dimensional face area is input into a regression model for nonlinear regression to obtain the head posture angle and the expression base coefficient; the regression model is obtained by training the ResNet50 neural network to be trained.

头部姿态角度包括头部的俯视或仰视角度，向左转向或向右转向角度，以及歪头转向角度。The head posture angles include the angle of looking down or up, the angle of turning left or right, and the angle of tilting the head.

表情基系数包括人脸的面部表情系数和人脸的口型信息系数。The expression base coefficient includes the facial expression coefficient of the face and the lip shape information coefficient of the face.

在一实施例中，对三维人脸区域进行关键点检测，得到三维人脸关键点。不同的标准三维人脸模型对应不同的表情基，将三维人脸区域、三维人脸关键点和表情基输入回归模型进行非线性回归，得到头部姿态角度和表情基系数。In one embodiment, key point detection is performed on the 3D face region to obtain 3D face key points. Different standard 3D face models correspond to different expression bases, and the 3D face region, 3D face key points and expression base are input into a regression model for nonlinear regression to obtain head posture angles and expression base coefficients.

提取视频图像的头部姿态角度和表情基系数之后，只需要传输头部姿态角度和表情基系数，不需要传输步骤S1中的视频图像，能够减少对网络带宽的占用，提高了数据传输效率。After extracting the head posture angle and expression base coefficient of the video image, only the head posture angle and expression base coefficient need to be transmitted, and the video image in step S1 does not need to be transmitted, which can reduce the occupancy of network bandwidth and improve data transmission efficiency.

二维的渲染后人脸图像为一个二维人脸模版，可以根据二维的渲染后人脸图像生成第一人物或第二人物的虚拟视频图像。The two-dimensional rendered face image is a two-dimensional face template, and a virtual video image of the first person or the second person can be generated based on the two-dimensional rendered face image.

如上所述，提取视频图像的头部姿态角度和表情基系数包括对视频图像进行图像分割，得到二维人脸区域。将二维人脸区域映射到标准三维人脸模型上，得到三维人脸区域。对三维人脸区域使用面部动作捕捉方法提取头部姿态角度和表情基系数。只需要传输头部姿态角度和表情基系数，不需要传输视频图像，能够减少对网络带宽的占用，提高了数据传输效率。As described above, extracting the head posture angle and expression base coefficient of the video image includes segmenting the video image to obtain a two-dimensional face region. The two-dimensional face region is mapped to a standard three-dimensional face model to obtain a three-dimensional face region. The head posture angle and expression base coefficient are extracted from the three-dimensional face region using a facial motion capture method. Only the head posture angle and expression base coefficient need to be transmitted, and the video image does not need to be transmitted, which can reduce the occupation of network bandwidth and improve data transmission efficiency.

在一个实施例中，所述对所述重构后三维人脸模型进行人像渲染，得到二维的渲染后人脸图像，包括：In one embodiment, the step of performing portrait rendering on the reconstructed three-dimensional face model to obtain a two-dimensional rendered face image includes:

人像渲染能够将三维的重构后三维人脸模型转换为二维的渲染后人脸图像，二维的渲染后人脸图像用于生成虚拟视频图像。Portrait rendering can convert a three-dimensional reconstructed three-dimensional face model into a two-dimensional rendered face image, and the two-dimensional rendered face image is used to generate a virtual video image.

S511：对所述重构后三维人脸模型进行光栅化，得到光栅化图像。S511: rasterizing the reconstructed three-dimensional face model to obtain a rasterized image.

光栅化能够将重构后三维人脸模型的网格顶点转换为多个光栅化像素，多个光栅化像素形成二维的光栅化图像。Rasterization can convert the mesh vertices of the reconstructed three-dimensional face model into multiple rasterized pixels, and the multiple rasterized pixels form a two-dimensional rasterized image.

S512：对所述光栅化图像进行插值，得到插值图像。S512: interpolate the rasterized image to obtain an interpolated image.

重构后三维人脸模型的网格顶点包括纹理坐标、顶点法线和反射向量等参数，插值能够将所有参数传递到光栅化图像中，得到插值图像。插值可以采用内插值法，还可以采用外插值法。The mesh vertices of the reconstructed 3D face model include parameters such as texture coordinates, vertex normals and reflection vectors. Interpolation can transfer all parameters to the rasterized image to obtain an interpolated image. Interpolation can use an internal interpolation method or an external interpolation method.

S513：生成所述插值图像的纹理，得到纹理图像。S513: Generate a texture of the interpolated image to obtain a texture image.

二维的纹理图像能够较好的反映重构后三维人脸模型的人脸特征。The two-dimensional texture image can better reflect the facial features of the reconstructed three-dimensional face model.

S514：对所述纹理图像进行抗锯齿处理，得到所述二维的渲染后人脸图像。S514: performing anti-aliasing processing on the texture image to obtain the two-dimensional rendered face image.

纹理图像的边缘存在锯齿，对纹理图像进行抗锯齿处理，能够缓和纹理图像的边缘的锯齿，得到二维的渲染后人脸图像。There are jagged edges on the texture image. Anti-aliasing processing can alleviate the jagged edges on the texture image and obtain a two-dimensional rendered face image.

如上所述，对重构后三维人脸模型进行人像渲染，得到二维的渲染后人脸图像，通过依次对重构后三维人脸模型进行光栅化、插值、生成纹理和抗锯齿处理得到二维的渲染后人脸图像，二维的渲染后人脸图像能够较好的反映重构后三维人脸模型的人脸特征。As described above, the reconstructed three-dimensional face model is subjected to portrait rendering to obtain a two-dimensional rendered face image, and the two-dimensional rendered face image is obtained by successively performing rasterization, interpolation, texture generation and anti-aliasing processing on the reconstructed three-dimensional face model. The two-dimensional rendered face image can better reflect the facial features of the reconstructed three-dimensional face model.

在一个实施例中，所述根据所述二维的渲染后人脸图像和第二人物图像生成虚拟视频图像，包括：In one embodiment, generating a virtual video image according to the two-dimensional rendered face image and the second person image includes:

同时将二维的渲染后人脸图像和第二人物图像输入虚拟视频图像生成模型，直接生成与第二人物图像中的第二人物对应的虚拟视频图像。At the same time, the two-dimensional rendered face image and the second person image are input into the virtual video image generation model to directly generate a virtual video image corresponding to the second person in the second person image.

可选地，在将二维的渲染后人脸图像输入虚拟视频图像生成模型之前将第二人物图像保存至虚拟视频图像生成模型，将二维的渲染后人脸图像输入虚拟视频图像生成模型时，调用保存的第二人物图像，二维的渲染后人脸图像与第二人物图像生成与第二人物图像中的第二人物对应的虚拟视频图像。Optionally, the second character image is saved to the virtual video image generation model before the two-dimensional rendered facial image is input into the virtual video image generation model. When the two-dimensional rendered facial image is input into the virtual video image generation model, the saved second character image is called, and the two-dimensional rendered facial image and the second character image generate a virtual video image corresponding to the second character in the second character image.

第二人物图像中的第二人物可以是步骤S1中视频图像的第一人物，此时第二人物图像中的第二人物与步骤S1中视频图像的第一人物的头部姿态角度和表情可以相同，也可以不同。第二人物图像中的第二人物也可以是其他人物，还可以是卡通形象的人物。The second person in the second person image may be the first person in the video image in step S1, and the head posture angle and expression of the second person in the second person image and the first person in the video image in step S1 may be the same or different. The second person in the second person image may also be another person, or a cartoon character.

第二人物图像不同，生成的虚拟视频图像中的第二人物不同，在不调整表情基系数的情况下，虚拟视频图像中的第二人物的表情与二维的渲染后人脸图像中的人物的表情相同。The second character image is different, and the second character in the generated virtual video image is different. Without adjusting the expression base coefficient, the expression of the second character in the virtual video image is the same as the expression of the character in the two-dimensional rendered face image.

示例性，步骤S1中的视频图像的第一人物为人物A，第二人物图像中的第二人物为观看远程视频的人物B，将第二人物图像和二维的渲染后人脸图像同时输入虚拟视频图像生成模型，生成人物B的虚拟视频图像，虚拟视频图像中的人物B的头部姿态角度与人物A的真实头部姿态角度相同。不调整表情基系数，虚拟视频图像中的人物B的表情与视频图像中的人物A的表情相同。For example, the first person in the video image in step S1 is person A, and the second person in the second person image is person B who is watching the remote video. The second person image and the two-dimensional rendered face image are simultaneously input into the virtual video image generation model to generate a virtual video image of person B, and the head posture angle of person B in the virtual video image is the same as the real head posture angle of person A. The expression base coefficient is not adjusted, and the expression of person B in the virtual video image is the same as the expression of person A in the video image.

虚拟视频图像包括人脸区域和背景区域，人脸区域包括根据二维的渲染后人脸图像生成的第二人物的具体的人脸，背景区域为空白，不包含背景信息。The virtual video image includes a face area and a background area. The face area includes a specific face of a second person generated according to the two-dimensional rendered face image, and the background area is blank and does not contain background information.

虚拟视频图像相比于步骤S1中的视频图像消除了视角偏差，能够提升远程视频会议的人物的体验感。Compared with the video image in step S1, the virtual video image eliminates the viewing angle deviation, which can enhance the user experience of the people in the remote video conference.

如上所述，提取视频图像的头部姿态角度和表情基系数包括将二维的渲染后人脸图像和第二人物图像输入虚拟视频图像生成模型，生成虚拟视频图像；虚拟视频图像生成模型由对待训练深度卷积神经网络进行训练得到。虚拟视频图像相比于步骤S1中的视频图像消除了视角偏差，能够提升远程视频会议的人物的体验感。As described above, extracting the head posture angle and expression base coefficient of the video image includes inputting the two-dimensional rendered face image and the second person image into a virtual video image generation model to generate a virtual video image; the virtual video image generation model is obtained by training the deep convolutional neural network to be trained. Compared with the video image in step S1, the virtual video image eliminates the perspective deviation and can enhance the experience of the person in the remote video conference.

在一个实施例中，参照图4，所述对待训练深度卷积神经网络进行训练，包括：In one embodiment, referring to FIG. 4 , the training of the deep convolutional neural network to be trained includes:

S51’：获取训练视频图像集。S51’: Obtain a training video image set.

训练视频图像集包括多个视频图像，多个视频图像可以是同一个远程视频会议中的视频图像，也可以是多个远程视频会议中的视频图像。The training video image set includes multiple video images, and the multiple video images may be video images in the same remote video conference or video images in multiple remote video conferences.

在一实施例中，训练视频图像集中的人物为精心着装后形象较好的人物，并在具有良好的打光效果的摄影棚中录制训练视频图像集。In one embodiment, the characters in the training video image set are characters with good images after being carefully dressed, and the training video image set is recorded in a studio with good lighting effects.

S52’：生成与所述训练视频图像集中每一张所述视频图像对应的所述虚拟视频图像。S52’: Generate the virtual video image corresponding to each of the video images in the training video image set.

按照本申请实施例中的步骤S1-S6生成训练视频图像集中每一张视频图像对应的虚拟视频图像。According to steps S1-S6 in the embodiment of the present application, a virtual video image corresponding to each video image in the training video image set is generated.

S53’：根据所述视频图像与所述虚拟视频图像计算第一损失函数值。S53’: Calculate a first loss function value based on the video image and the virtual video image.

第一损失函数值的计算公式如下：
loss₁＝|img-v_img|；The calculation formula of the first loss function value is as follows:
loss₁ = |img-v_img|;

其中，loss₁为第一损失函数值，img为视频图像，v_img为由视频图像生成的虚拟视频图像。Among them, loss₁ is the first loss function value, img is the video image, and v_img is the virtual video image generated by the video image.

第一损失函数值用于度量视频图像和虚拟视频图像之间的差异。The first loss function value is used to measure the difference between the video image and the virtual video image.

S54’：根据所述视频图像的图像细节与所述虚拟视频图像的图像细节计算第二损失函数值。S54’: Calculate a second loss function value based on the image details of the video image and the image details of the virtual video image.

第二损失函数值的计算公式如下：
loss₂＝|Edge(img)-Edge(v_img)|；The calculation formula of the second loss function value is as follows:
loss₂ = |Edge(img)-Edge(v_img)|;

其中，loss₂为第二损失函数值，Edge为边缘提取算子，边缘提取算子可以为Sobel算子，也可以为Laplace算子，还可以为其他算子。Among them, loss₂ is the second loss function value, Edge is the edge extraction operator, and the edge extraction operator can be a Sobel operator, a Laplace operator, or other operators.

S55’：根据所述视频图像的图像特征和所述虚拟视频图像的图像特征计算第三损失函数值。S55’: Calculate the third loss function value based on the image features of the video image and the image features of the virtual video image.

第三损失函数值的计算公式如下：
loss₃＝||F(img)-F(v_img)||；The calculation formula of the third loss function value is as follows:
loss₃ = ||F(img)-F(v_img)||;

其中，loss₃为第三损失函数值，F为人脸特征提取算子，人脸特征提取算子可以是HOG算子，也可以是LBP算子。Among them, loss₃ is the third loss function value, F is the face feature extraction operator, and the face feature extraction operator can be a HOG operator or a LBP operator.

可选地，F为已训练人脸特征提取模型，已训练人脸特征提取模型由对待训练卷积神经网络训练得到。Optionally, F is a trained facial feature extraction model, and the trained facial feature extraction model is obtained by training the convolutional neural network to be trained.

S56’：根据所述第一损失函数值、第二损失函数值和第三损失函数值计算最终损失函数值。S56’: Calculate the final loss function value based on the first loss function value, the second loss function value and the third loss function value.

最终损失函数值的计算公式如下：
loss＝loss₁+λ₁loss₂+λ₂loss₃；The calculation formula for the final loss function value is as follows:
loss = loss₁ + λ₁ loss₂ + λ₂ loss₃ ;

其中，loss为最终损失函数值，λ₁为第一调节参数，λ₂为第二调节参数，通过调整第一调节参数和第二调节参数能够调整第二损失函数值和第三损失函数值对最终损失函数值的影响。Among them, loss is the final loss function value, λ₁ is the first adjustment parameter, and λ₂ is the second adjustment parameter. By adjusting the first adjustment parameter and the second adjustment parameter, the influence of the second loss function value and the third loss function value on the final loss function value can be adjusted.

第一损失函数值反映图像像素级别的差异，第二损失函数值反映边缘细节级别的差异，第三损失函数值反映高层特征级别的差异。The first loss function value reflects the difference in the image pixel level, and the second loss function value reflects the difference in the edge detail level. The third loss function value reflects the difference in high-level feature levels.

S57’：根据所述最终损失函数值进行反向传播，更新所述待训练深度卷积神经网络的网络参数。S57’: Perform back propagation according to the final loss function value to update the network parameters of the deep convolutional neural network to be trained.

训练待训练深度卷积神经网络的过程包括多次训练，每次训练计算出一个最终损失函数值，根据最终损失函数值进行反向传播，更新待训练深度卷积神经网络的网络参数。The process of training the deep convolutional neural network to be trained includes multiple trainings, each of which calculates a final loss function value, performs back propagation based on the final loss function value, and updates the network parameters of the deep convolutional neural network to be trained.

网络参数包括学习率、网络权重和网络偏置。Network parameters include learning rate, network weights, and network bias.

根据最终损失函数值能够结合图像像素级别、边缘细节级别和高层特征级别对待训练深度卷积神经网络进行训练，得到能够生成准确的虚拟视频图像的虚拟视频图像生成模型。According to the final loss function value, the deep convolutional neural network to be trained can be trained in combination with the image pixel level, edge detail level and high-level feature level to obtain a virtual video image generation model that can generate accurate virtual video images.

如上所述，对待训练深度卷积神经网络进行训练包括获取训练视频图像集，生成与训练视频图像集中每一张视频图像对应的虚拟视频图像。根据视频图像与虚拟视频图像计算第一损失函数值，根据视频图像的图像细节与虚拟视频图像的图像细节计算第二损失函数值。根据视频图像的图像特征和虚拟视频图像的图像特征计算第三损失函数值，根据第一损失函数值、第二损失函数值和第三损失函数值计算最终损失函数值。根据最终损失函数值进行反向传播，更新待训练深度卷积神经网络的网络参数。根据最终损失函数值能够结合图像像素级别、边缘细节级别和高层特征级别对待训练深度卷积神经网络进行训练，得到能够生成准确的虚拟视频图像的虚拟视频图像生成模型。As described above, training the deep convolutional neural network to be trained includes obtaining a training video image set and generating a virtual video image corresponding to each video image in the training video image set. A first loss function value is calculated based on the video image and the virtual video image, and a second loss function value is calculated based on the image details of the video image and the image details of the virtual video image. A third loss function value is calculated based on the image features of the video image and the image features of the virtual video image, and a final loss function value is calculated based on the first loss function value, the second loss function value, and the third loss function value. Back propagation is performed based on the final loss function value to update the network parameters of the deep convolutional neural network to be trained. Based on the final loss function value, the deep convolutional neural network to be trained can be combined with the image pixel level, edge detail level, and high-level feature level to obtain a virtual video image generation model that can generate accurate virtual video images.

在一个实施例中，所述根据所述二维的渲染后人脸图像和第二人物图像生成虚拟视频图像之后，还包括：In one embodiment, after generating a virtual video image according to the two-dimensional rendered face image and the second person image, the method further includes:

S71：从背景图像集中选取背景图像。S71: Select a background image from a background image set.

背景图像可以只包括步骤S1中的视频图像的真实背景，背景图像也可以只包括纯色背景或非纯色背景，本申请实施例以背景图像只包括纯色背景为例。The background image may only include the real background of the video image in step S1, or may only include a pure color background or a non-pure color background. The embodiment of the present application takes the example that the background image only includes a pure color background.

S72：将所述背景图像填充至所述虚拟视频图像的背景区域，得到待显示图像。S72: Fill the background image into the background area of the virtual video image to obtain an image to be displayed.

生成的虚拟视频图像的背景区域为空白，将虚拟视频图像的背景区域填充为纯色的背景区域，得到待显示图像。不对真实的背景区域进行处理，使用纯色的背景图像填充虚拟视频图像的背景区域能够减少运算量，提高运算效率。The background area of the generated virtual video image is blank, and the background area of the virtual video image is filled with a solid color background area to obtain the image to be displayed. Not processing the real background area and filling the background area of the virtual video image with a solid color background image can reduce the amount of calculation and improve the calculation efficiency.

如上所述，根据二维的渲染后人脸图像生成虚拟视频图像之后，还包括从背景图像集中选取背景图像，将背景图像填充至虚拟视频图像的背景区域，得到待显示图像。不对真实的背景区域进行处理，使用纯色图像填充虚拟视频图像的背景区域能够减少运算量，提高运算效率。As described above, after generating a virtual video image based on the two-dimensional rendered face image, the process further includes selecting a background image from a background image set and filling the background image into the background area of the virtual video image to obtain an image to be displayed. Not processing the real background area and filling the background area of the virtual video image with a pure color image can reduce the amount of calculation and improve the calculation efficiency.

参照图5，是本申请公开的一种虚拟视频图像生成装置的结构示意框图，装置包括：5 is a schematic block diagram of a virtual video image generating device disclosed in the present application, the device comprising:

视频图像获取模块10，用于获取视频图像，所述视频图像为二维图像；The video image acquisition module 10 is used to acquire a video image, wherein the video image is a two-dimensional image;

参数提取模块20，用于获取标准三维人脸模型，根据所述标准三维人脸模型提取所述视频图像第一人物的头部姿态角度和表情基系数；A parameter extraction module 20 is used to obtain a standard three-dimensional face model, and extract the head posture angle and expression base coefficient of the first person in the video image according to the standard three-dimensional face model;

头部姿态角度调整量获取模块30，用于获取头部姿态角度调整量；A head posture angle adjustment amount acquisition module 30, used to acquire the head posture angle adjustment amount;

原始三维人脸模型重构模块40，用于根据所述头部姿态角度调整量和所述表情基系数重构与所述视频图像对应的原始三维人脸模型，得到重构后三维人脸模型；An original 3D face model reconstruction module 40 is used to reconstruct the original 3D face model corresponding to the video image according to the head posture angle adjustment amount and the expression base coefficient to obtain a reconstructed 3D face model;

人像渲染模块50，用于对所述重构后三维人脸模型进行人像渲染，得到二维的渲染后人脸图像；A portrait rendering module 50 is used to perform portrait rendering on the reconstructed three-dimensional face model to obtain a two-dimensional rendered face image;

虚拟视频图像生成模块60，用于根据所述二维的渲染后人脸图像和第二人物图像生成虚拟视频图像，所述第二人物为所述第一人物或其他人物。The virtual video image generation module 60 is used to generate a virtual video image according to the two-dimensional rendered face image and the second person image, where the second person is the first person or another person.

在一个实施例中，所述头部姿态角度调整量获取模块30还包括：In one embodiment, the head posture angle adjustment amount acquisition module 30 further includes:

位置获取单元，用于获取所述第一人物的位置、摄像头位置、所述第一人物的眼睛高度以及屏幕中央位置；A position acquisition unit, used to acquire the position of the first person, the camera position, the eye height of the first person, and the center position of the screen;

视角偏差获取单元，用于获取所述第一人物的位置、摄像头位置、所述第一人物的眼睛高度以及屏幕中央位置之间形成的视角偏差；A viewing angle deviation obtaining unit, used to obtain a viewing angle deviation formed between the position of the first person, the position of the camera, the eye height of the first person, and the center position of the screen;

头部姿态角度调整量获取单元，用于根据所述视角偏差确定所述头部姿态角度调整量。A head posture angle adjustment amount acquisition unit is used to determine the head posture angle adjustment amount according to the viewing angle deviation.

在一个实施例中，原始三维人脸模型重构模块40还包括：In one embodiment, the original 3D face model reconstruction module 40 further includes:

三维人脸模型重构单元，用于根据以下公式重构所述原始三维人脸模型：
The 3D face model reconstruction unit is used to reconstruct the original 3D face model according to the following formula:

在一个实施例中，参数提取模块20还包括：In one embodiment, the parameter extraction module 20 further includes:

图像分割单元，用于对所述视频图像进行图像分割，得到二维人脸区域；An image segmentation unit, used to segment the video image to obtain a two-dimensional face region;

映射单元，用于将所述二维人脸区域映射到所述标准三维人脸模型上，得到三维人脸区域；A mapping unit, used for mapping the two-dimensional face region onto the standard three-dimensional face model to obtain a three-dimensional face region;

面部动作捕捉单元，用于对所述三维人脸区域使用面部动作捕捉方法提取所述头部姿态角度和所述表情基系数。A facial motion capture unit is used to extract the head posture angle and the expression base coefficient from the three-dimensional face area using a facial motion capture method.

在一个实施例中，所述人像渲染模块50还包括：In one embodiment, the portrait rendering module 50 further includes:

人像渲染单元，用于使用可微分渲染方法对所述重构后三维人脸模型进行人像渲染，得到所述二维的渲染后人脸图像。A portrait rendering unit is used to perform portrait rendering on the reconstructed three-dimensional face model using a differentiable rendering method to obtain to the two-dimensional rendered face image.

在一个实施例中，所述虚拟视频图像生成模块60还包括：In one embodiment, the virtual video image generation module 60 further includes:

第一虚拟视频图像生成单元，用于将所述二维的渲染后人脸图像和所述第二人物图像输入虚拟视频图像生成模型，生成所述虚拟视频图像；所述虚拟视频图像生成模型由对待训练深度卷积神经网络进行训练得到。The first virtual video image generation unit is used to input the two-dimensional rendered face image and the second character image into a virtual video image generation model to generate the virtual video image; the virtual video image generation model is obtained by training a deep convolutional neural network to be trained.

训练视频图像集获取单元，用于获取训练视频图像集；A training video image set acquisition unit, used to acquire a training video image set;

第二虚拟视频图像生成单元，用于生成与所述训练视频图像集中每一张所述视频图像对应的所述虚拟视频图像；A second virtual video image generating unit, configured to generate the virtual video image corresponding to each of the video images in the training video image set;

第一损失函数值计算单元，用于根据所述视频图像与所述虚拟视频图像计算第一损失函数值；A first loss function value calculation unit, configured to calculate a first loss function value according to the video image and the virtual video image;

第二损失函数值计算单元，用于根据所述视频图像的图像细节与所述虚拟视频图像的图像细节计算第二损失函数值；A second loss function value calculation unit, configured to calculate a second loss function value according to the image details of the video image and the image details of the virtual video image;

第三损失函数值计算单元，用于根据所述视频图像的图像特征和所述虚拟视频图像的图像特征计算第三损失函数值；A third loss function value calculation unit, configured to calculate a third loss function value according to the image features of the video image and the image features of the virtual video image;

最终损失函数值计算单元，用于根据所述第一损失函数值、第二损失函数值和第三损失函数值计算最终损失函数值；A final loss function value calculation unit, used to calculate a final loss function value according to the first loss function value, the second loss function value and the third loss function value;

网络参数更新单元，用于根据所述最终损失函数值进行反向传播，更新所述待训练深度卷积神经网络的网络参数。A network parameter updating unit is used to perform back propagation according to the final loss function value to update the network parameters of the deep convolutional neural network to be trained.

在一个实施例中，所述面部动作捕捉单元还包括：In one embodiment, the facial motion capture unit further comprises:

非线性回归子单元，用于将所述三维人脸区域输入回归模型进行非线性回归，得到所述头部姿态角度和所述表情基系数；所述回归模型由对待训练ResNet50神经网络进行训练得到。The nonlinear regression subunit is used to input the three-dimensional face area into a regression model for nonlinear regression to obtain the head posture angle and the expression base coefficient; the regression model is obtained by training the ResNet50 neural network to be trained.

在一个实施例中，所述人像渲染单元还包括：In one embodiment, the portrait rendering unit further includes:

光栅化子单元，用于对所述重构后三维人脸模型进行光栅化，得到光栅化图像；A rasterization subunit, used for rasterizing the reconstructed three-dimensional face model to obtain a rasterized image;

插值子单元，用于对所述光栅化图像进行插值，得到插值图像；An interpolation subunit, used for interpolating the rasterized image to obtain an interpolated image;

纹理生成子单元，用于生成所述插值图像的纹理，得到纹理图像；A texture generation subunit, used to generate the texture of the interpolation image to obtain a texture image;

抗锯齿处理子单元，用于对所述纹理图像进行抗锯齿处理，得到所述二维的渲染后人脸图像。The anti-aliasing processing subunit is used to perform anti-aliasing processing on the texture image to obtain the two-dimensional rendered face image.

在一个实施例中，所述虚拟视频图像生成装置还包括：In one embodiment, the virtual video image generating device further includes:

背景图像选取模块，用于从背景图像集中选取背景图像；A background image selection module, used to select a background image from a background image set;

填充模块，用于将所述背景图像填充至所述虚拟视频图像的背景区域，得到待显示图像。A filling module is used to fill the background image into the background area of the virtual video image to obtain an image to be displayed.

参照图6，本申请实施例中还提供一种计算机设备，该计算机设备的内部结构可以如图6所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中，该计算机设备设计的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作装置、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储头部姿态角度和表情基系数等。该计算机设备的网络接口用于与外部的终端通过网络连接通信。在一实施例中，上述计算机设备还可以设置有输入装置和显示屏等。该计算机程序被处理器执行时以实现虚拟视频图像生成方法，包括如下步骤：获取视频图像，所述视频图像为二维图像；获取标准三维人脸模型，根据所述标准三维人脸模型提取所述视频图像第一人物的头部姿态角度和表情基系数；获取头部姿态角度调整量；根据所述头部姿态角度调整量和所述表情基系数重构与所述视频图像对应的原始三维人脸模型，得到重构后三维人脸模型；对所述重构后三维人脸模型进行人像渲染，得到二维的渲染后人脸图像；根据所述二维的渲染后人脸图像和第二人物图像生成虚拟视频图像，所述第二人物为所述第一人物或其他人物。本领域技术人员可以理解，图6中示出的结构，是与本申请方案相关的部分结构的框图，并不构成对本申请方案所应用于其上的计算机设备的限定。Referring to Figure 6, a computer device is also provided in an embodiment of the present application, and the internal structure of the computer device can be shown in Figure 6. The computer device includes a processor, a memory, a network interface and a database connected via a system bus. Among them, the processor designed for the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating device, a computer program and a database. The internal memory provides an environment for the operation of the operating system and computer program in the non-volatile storage medium. The database of the computer device is used to store head posture angles and expression base coefficients, etc. The network interface of the computer device is used to communicate with an external terminal through a network connection. In one embodiment, the above-mentioned computer device can also be provided with an input device and a display screen, etc. When the computer program is executed by a processor, it implements a method for generating a virtual video image, including the following steps: obtaining a video image, which is a two-dimensional image; obtaining a standard three-dimensional face model, and extracting the head posture angle and expression base coefficient of the first person in the video image according to the standard three-dimensional face model; obtaining the head posture angle adjustment amount; reconstructing the original three-dimensional face model corresponding to the video image according to the head posture angle adjustment amount and the expression base coefficient to obtain a reconstructed three-dimensional face model; performing portrait rendering on the reconstructed three-dimensional face model to obtain a two-dimensional rendered face image; generating a virtual video image according to the two-dimensional rendered face image and a second person image, wherein the second person is the first person or another person. Those skilled in the art can understand that the structure shown in FIG. 6 is a block diagram of a partial structure related to the present application scheme, and does not constitute a limitation on the computer device to which the present application scheme is applied.

本申请一实施例还提供一种计算机可读存储介质，其上存储有计算机程序，计算机程序被处理器执行时实现虚拟视频图像生成方法，包括如下步骤：获取视频图像，所述视频图像为二维图像；获取标准三维人脸模型，根据所述标准三维人脸模型提取所述视频图像第一人物的头部姿态角度和表情基系数；获取头部姿态角度调整量；根据所述头部姿态角度调整量和所述表情基系数重构与所述视频图像对应的原始三维人脸模型，得到重构后三维人脸模型；对所述重构后三维人脸模型进行人像渲染，得到二维的渲染后人脸图像；An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored. When the computer program is executed by a processor, a method for generating a virtual video image is implemented, comprising the following steps: obtaining a video image, wherein the video image is a two-dimensional image; obtaining a standard three-dimensional face model, and extracting a head posture angle and a basic expression coefficient of a first person in the video image according to the standard three-dimensional face model; obtaining a head posture angle adjustment amount; reconstructing an original three-dimensional face model corresponding to the video image according to the head posture angle adjustment amount and the basic expression coefficient to obtain a reconstructed three-dimensional face model; performing portrait rendering on the reconstructed three-dimensional face model to obtain a two-dimensional rendered face image;

根据所述二维的渲染后人脸图像和第二人物图像生成虚拟视频图像，所述第二人物为所述第一人物或其他人物。可以理解的是，本实施例中的计算机可读存储介质可以是易失性可读存储介质，也可以为非易失性可读存储介质。A virtual video image is generated based on the two-dimensional rendered face image and the second person image, where the second person is the first person or another person. It can be understood that the computer-readable storage medium in this embodiment can be a volatile readable storage medium or a non-volatile readable storage medium.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，所述的计算机程序可存储于一非易失性计算机可读取存储介质中，该计算机程序在执行时，可包括如上述各方法的实施例的流程。其中，本申请所提供的和实施例中所使用的对存储器、存储、数据库或其它介质的任何引用，均可包括非易失性和/或易失性存储器。非易失性存储器可以包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限，RAM通过多种形式可得，诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双速据率SDRAM(SSRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those skilled in the art can understand that all or part of the processes in the above-mentioned embodiments can be implemented by instructing the relevant hardware through a computer program, and the computer program can be stored in a non-volatile computer-readable storage medium. When the computer program is executed, it can include the processes of the embodiments of the above-mentioned methods. Any reference to memory, storage, database or other media provided and used in the embodiments may include non-volatile and/or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM) or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not limitation, RAM is available in a variety of forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual-speed data rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

需要说明的是，在本文中，术语“包括”、“包含”或者其任何其它变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素，而且还包括没有明确列出的其它要素，或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。It should be noted that, in this article, the terms "include", "comprises" or any other variations thereof are intended to cover non-exclusive inclusion, so that a process, device, article or method including a series of elements includes not only those elements, but also includes other elements not explicitly listed, or also includes elements inherent to such process, device, article or method. In the absence of further restrictions, an element defined by the sentence "includes a ..." does not exclude the presence of other identical elements in the process, device, article or method including the element.

以上所述仅为本申请的一些实施例，并非因此限制本申请的专利范围，凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换，或直接或间接运用在其他相关的技术领域，均同理包括在本申请的专利保护范围内。The above descriptions are only some embodiments of the present application, and are not intended to limit the patent scope of the present application. Any equivalent structure or equivalent process transformation made using the contents of the present application specification and drawings, or directly or indirectly applied in other related technical fields, are also included in the patent protection scope of the present application.