CN116681986A

Movatterモバイル変換

Info

Publication number: CN116681986A
Application number: CN202310666069.7A
Authority: CN
Inventors: 宫凯程
Original assignee: Guangzhou Cubesili Information Technology Co Ltd
Current assignee: Guangzhou Cubesili Information Technology Co Ltd
Priority date: 2023-06-06
Filing date: 2023-06-06
Publication date: 2023-09-01

Abstract

Translated fromChinese

本申请涉及一种人脸预测模型的训练方法和装置、人脸多任务预测方法和装置、直播系统、计算机设备以及计算机可读存储介质；该训练方法包括：根据预测任务对人脸图像集进行分组，并分别对人脸图像分组进行数据标注；构建特征图金字塔网络以及各个预测任务对应的预测任务层，将预测任务层分别连接到特征图金字塔网络输出的特征图上得到人脸多任务预测模型；分别配置人脸多任务预测模型的各个预测任务层在预测训练中的损失函数；利用标注的人脸图像并根据损失函数对人脸多任务预测模型进行训练。该技术方案，能够训练得到同时执行多种预测任务的预测模型，提升人脸多任务预测的效率，特别适用于直播场景下预测需求，满足不同实时直播场景的使用需求。

The present application relates to a face prediction model training method and device, a face multi-task prediction method and device, a live broadcast system, computer equipment, and a computer-readable storage medium; the training method includes: performing a face image set according to a prediction task Grouping and labeling the face images into groups respectively; constructing a feature map pyramid network and the prediction task layer corresponding to each prediction task, and connecting the prediction task layer to the feature map output by the feature map pyramid network to obtain face multi-task prediction Model; respectively configure the loss function of each prediction task layer of the face multi-task prediction model in the prediction training; use the marked face image and train the face multi-task prediction model according to the loss function. This technical solution can train a prediction model that performs multiple prediction tasks at the same time, and improve the efficiency of face multi-task prediction. It is especially suitable for prediction needs in live broadcast scenarios and meets the needs of different real-time live broadcast scenarios.

Description

Translated fromChinese

人脸预测模型的训练及多任务预测方法、装置和直播系统Face prediction model training and multi-task prediction method, device and live broadcast system

技术领域technical field

本申请涉及网络直播技术领域，特别是涉及一种人脸预测模型的训练方法和装置、人脸多任务预测方法和装置、直播系统、计算机设备以及计算机可读存储介质。The present application relates to the technical field of webcasting, and in particular to a face prediction model training method and device, a face multi-task prediction method and device, a live broadcast system, computer equipment, and a computer-readable storage medium.

背景技术Background technique

随着网络直播技术的发展，各种美颜、美妆和整形等特效被广泛应用于网络直播当中，从而可以提高网络直播中分享的优质内容的传播效果。在直播过程中，为了精准定位到人脸位置并进行特效添加，需要对人脸关键点检测、分割等，因此人脸关键点检测、分割相关算法是美颜、美妆、整形等直播、短视频特效技术的基础。With the development of webcasting technology, various special effects such as beautification, makeup and plastic surgery are widely used in webcasting, which can improve the communication effect of high-quality content shared in webcasting. During the live broadcast, in order to accurately locate the position of the face and add special effects, it is necessary to detect and segment the key points of the face. Fundamentals of video effects techniques.

目前，在对人脸图像执行2D关键点、3D关键点和图像分割任务时，通常是训练人脸预测模型来进行检测，每个预测模型可以完成一个相应的预测任务，例如，预测2D关键点可以使用2D关键点模型、预测3D关键点可以使用3D关键点模型、预测图像分割可以使用分割模型等，然而，在直播场景下，当需要完成多种预测任务时，同时使用这些预测模型无法满足直播实时性的高要求，由此会影响直播场景下美颜、美妆算法的运算效率，难以满足直播业务中的多任务使用需求，容易影响直播效果。At present, when performing 2D key points, 3D key points and image segmentation tasks on face images, it is usually to train face prediction models for detection, and each prediction model can complete a corresponding prediction task, for example, predict 2D key points 2D key point models can be used, 3D key point models can be used to predict 3D key point models, and segmentation models can be used to predict image segmentation. However, in live broadcast scenarios, when multiple prediction tasks need to be completed, using these prediction models at the same time cannot satisfy The high real-time requirements of live broadcasting will affect the computing efficiency of beauty and makeup algorithms in live broadcasting scenarios, making it difficult to meet the multi-tasking requirements of live broadcasting services, and easily affect the live broadcasting effect.

发明内容Contents of the invention

基于此，有必要针对上述技术问题，提供一种人脸预测模型的训练方法和装置、人脸多任务预测方法和装置、直播系统、计算机设备以及计算机可读存储介质，实现单模型的多任务预测功能，提升了直播实时性。Based on this, it is necessary to provide a face prediction model training method and device, a face multi-task prediction method and device, a live broadcast system, computer equipment, and a computer-readable storage medium for the above-mentioned technical problems, so as to realize multi-task of a single model The prediction function improves the real-time performance of the live broadcast.

第一方面，本申请提供一种人脸预测模型的训练方法，包括：In a first aspect, the present application provides a training method for a face prediction model, including:

根据预测任务对人脸图像集进行分组，并根据各个预测任务分别对所述人脸图像分组进行数据标注；grouping the face image set according to the prediction task, and performing data labeling on the grouping of the face images according to each prediction task;

构建特征图金字塔网络以及各个预测任务对应的预测任务层，将所述预测任务层分别连接到所述特征图金字塔网络输出的特征图上得到人脸多任务预测模型；Constructing a feature map pyramid network and the prediction task layer corresponding to each prediction task, connecting the prediction task layer to the feature map output by the feature map pyramid network respectively to obtain a face multi-task prediction model;

分别配置所述人脸多任务预测模型的各个预测任务层在预测训练中的损失函数；Configure the loss function of each prediction task layer of the multi-task prediction model of human face in prediction training respectively;

利用所述标注的人脸图像并根据所述损失函数对所述人脸多任务预测模型进行训练。The human face multi-task prediction model is trained by using the marked human face image and according to the loss function.

在一个实施例中，构建特征图金字塔网络以及各个预测任务对应的预测任务层，包括：In one embodiment, constructing a feature map pyramid network and prediction task layers corresponding to each prediction task, including:

构建一个提取图像特征的特征图金字塔网络；其中，所述特征图金字塔网络以人脸图像为输入，输出分辨率逐渐增大的多个特征图；Constructing a feature map pyramid network for extracting image features; wherein, the feature map pyramid network takes the face image as input, and outputs a plurality of feature maps whose resolution gradually increases;

根据需要执行的各个预测任务，在所述特征图金字塔网络之后分别搭建对应的预测任务层。According to each prediction task that needs to be performed, a corresponding prediction task layer is built after the feature map pyramid network.

在一个实施例中，将所述预测任务层分别连接到所述特征图金字塔网络输出的特征图上得到人脸多任务预测模型，包括：In one embodiment, the prediction task layer is respectively connected to the feature map output by the feature map pyramid network to obtain a face multi-task prediction model, including:

针对于各个预测任务层，分别从所述特征图金字塔网络输出的特征图中选择至少一个特征图作为输入图像；For each prediction task layer, at least one feature map is selected as an input image from the feature map output by the feature map pyramid network;

根据各个预测任务对应选择的输入图像将所述预测任务层连接到所述特征图上，得到人脸多任务预测模型。The prediction task layer is connected to the feature map according to the selected input image corresponding to each prediction task to obtain a face multi-task prediction model.

在一个实施例中，利用所述标注的人脸图像并根据所述损失函数对所述人脸多任务预测模型进行训练，包括：In one embodiment, using the labeled face image and training the face multi-task prediction model according to the loss function includes:

读取各组已标注的人脸图像，并分别输入到人脸多任务预测模型；Read each group of marked face images and input them into the face multi-task prediction model;

计算在各个所述损失函数共同影响下人脸多任务预测模型输出的各个预测任务的预测结果；Calculating the prediction results of each prediction task output by the face multi-task prediction model under the joint influence of each of the loss functions;

根据所述预测结果对人脸多任务预测模型的参数进行调整，直至人脸多任务预测模型输出的预测结果达到设定指标要求。According to the prediction results, the parameters of the face multi-task prediction model are adjusted until the prediction results output by the face multi-task prediction model meet the set index requirements.

在一个实施例中，预测任务包括：预测人脸图像的2D关键点和预测人脸图像的3D关键点；In one embodiment, the prediction task includes: predicting 2D key points of the face image and predicting 3D key points of the face image;

所述根据各个预测任务分别对所述人脸图像分组进行数据标注，包括：According to each prediction task, the grouping of the face images is respectively marked with data, including:

在2D关键点人脸图像分组的每张人脸图像上人脸部位标注若干个2D关键点；Mark several 2D key points on each face image grouped by 2D key point face images;

利用人脸3D基模型渲染一张正面人脸图像，确定3D基模型顶点与各个2D关键点的对应关系；根据所述对应关系在3D关键点人脸图像分组的每张人脸图上人脸部位标注若干个2D关键点。Use the 3D base model of the face to render a frontal face image, determine the correspondence between the vertices of the 3D base model and each 2D key point; The part is marked with several 2D key points.

在一个实施例中，利用人脸3D基模型渲染一张正面人脸图像，确定3D基模型顶点与各个2D关键点的对应关系，包括：In one embodiment, a front face image is rendered by using the 3D base model of the human face, and the corresponding relationship between the vertices of the 3D base model and each 2D key point is determined, including:

将人脸3D基模型的表情基和形状基进行降维；Reduce the dimensionality of the expression base and shape base of the face 3D base model;

利用人脸3D基模型的平均脸渲染一张正面人脸图像，并在所述正面人脸图像上标注若干个2D关键点；Utilize the average face of the 3D base model of the human face to render a front face image, and mark several 2D key points on the front face image;

计算所述正面人脸图像上3D顶点的2D投影点，并分别确定与各个2D关键点之间距离最小的投影点，获得正脸时3D基模型顶点与2D关键点的对应关系；Calculate the 2D projection point of the 3D vertex on the frontal face image, and determine the projection point with the smallest distance between each 2D key point respectively, and obtain the corresponding relationship between the 3D base model vertex and the 2D key point when the frontal face is obtained;

对所述正面人脸图像脸颊轮廓的2D关键点进行调整处理，获得侧脸时3D基模型顶点与2D关键点的对应关系。The 2D key points of the cheek contour of the front face image are adjusted to obtain the corresponding relationship between the vertices of the 3D base model and the 2D key points of the side face.

在一个实施例中，所述预测任务还包括：预测人脸图像中人脸可见区域像素分类和预测3D人脸关键点所形成3D网格；In one embodiment, the prediction task further includes: predicting the pixel classification of the visible area of the face in the face image and predicting the 3D grid formed by the key points of the 3D face;

所述根据各个预测任务分别对所述人脸图像分组进行数据标注，还包括：According to each prediction task, the grouping of the face images is respectively marked with data, which also includes:

将人脸区域分割人脸图像分组的每张人脸图像上的人脸区域与背景区域进行分割，将所述人脸区域标注为前景像素，将背景区域标注为背景像素；Segmenting the face area and the background area on each face image grouped by the face area segmentation face image, the face area is marked as foreground pixels, and the background area is marked as background pixels;

根据所述3D基模型顶点连接关系确定3D关键点人脸图像分组的人脸图像的3D关键点连接关系。The 3D key point connection relationship of the face images grouped by the 3D key point face images is determined according to the 3D base model vertex connection relationship.

在一个实施例中，所述预测任务层包括：预测人脸图像的2D关键点的第一预测任务层，预测人脸图像中人脸可见区域像素分类的第二预测任务层，预测人脸图像的3D关键点的第三预测任务层和预测3D人脸关键点所形成3D网格的第四预测任务层；In one embodiment, the prediction task layer includes: a first prediction task layer for predicting 2D key points of a face image, a second prediction task layer for predicting pixel classification of a face visible area in a face image, and a prediction task layer for a face image The third prediction task layer of the 3D key points and the fourth prediction task layer of the 3D grid formed by predicting the 3D face key points;

所述图像特征的特征图金字塔网络输出分辨率逐渐增大的特征图F1、特征图F2、特征图F3和特征图F4；The feature map pyramid network output resolution of the image feature gradually increases feature map F1, feature map F2, feature map F3 and feature map F4;

其中，所述第一预测任务层和第二预测任务层为卷积神经网络层，以特征图F4作为输入图像；所述第三预测任务层和第四预测任务层为线性层，以特征图F1作为输入图像。Wherein, the first prediction task layer and the second prediction task layer are convolutional neural network layers, with the feature map F4 as the input image; the third prediction task layer and the fourth prediction task layer are linear layers, with the feature map F1 as the input image.

在一个实施例中，所述分别配置所述人脸多任务预测模型的各个预测任务层在预测训练中的损失函数，包括：In one embodiment, the respectively configuring the loss function of each prediction task layer of the face multi-task prediction model in prediction training includes:

配置第一预测任务层的损失函数为其中，/>表示人脸多任务预测模型预测的第i个关键点坐标，/>表示标注的第i个2D关键点坐标；Configure the loss function of the first prediction task layer as where, /> Indicates the coordinates of the i-th key point predicted by the face multi-task prediction model, /> Indicates the i-th 2D key point coordinates of the label;

配置第二预测任务层的损失函数为L_seg＝-(gt_seg*logpred_seg)+(1-gt_seg)*log(1-pred_seg))，其中，gt_seg为前景像素或者背景像素的标注信息，pred_seg为人脸多任务预测模型的预测结果；Configure the loss function of the second prediction task layer as L_seg =-(gt_seg *logpred_seg )+(1-gt_seg )*log(1-pred_seg )), where gt_seg is the label of foreground pixels or background pixels information, pred_seg is the prediction result of the face multi-task prediction model;

配置第三预测任务层和第四预测任务层的损失函数为其中，/>为人脸多任务预测模型预测的3D人脸关键点在人脸图像上的2D投影点，/>表示标注的第i个2D关键点坐标。Configure the loss function of the third prediction task layer and the fourth prediction task layer as where, /> The 2D projection points of the 3D face key points predicted by the face multi-task prediction model on the face image, /> Indicates the i-th 2D key point coordinates of the label.

第二方面，本申请提供一种人脸预测模型的训练装置，包括：In a second aspect, the present application provides a training device for a face prediction model, including:

数据标注模块，用于根据预测任务对人脸图像集进行分组，并根据各个预测任务分别对所述人脸图像分组进行数据标注；The data labeling module is used for grouping the face image set according to the prediction task, and performing data labeling on the grouping of the face images according to each prediction task;

模型搭建模块，用于构建特征图金字塔网络以及各个预测任务对应的预测任务层，将所述预测任务层分别连接到所述特征图金字塔网络输出的特征图上得到人脸多任务预测模型；A model building module, used to construct a feature map pyramid network and a prediction task layer corresponding to each prediction task, connecting the prediction task layer to the feature map output by the feature map pyramid network respectively to obtain a face multi-task prediction model;

函数配置模块，用于分别配置所述人脸多任务预测模型的各个预测任务层在预测训练中的损失函数；A function configuration module, configured to respectively configure the loss function of each prediction task layer of the multi-task prediction model of the human face in the prediction training;

模型训练模块，用于利用所述标注的人脸图像并根据所述损失函数对所述人脸多任务预测模型进行训练。The model training module is used to train the multi-task prediction model of human face by using the labeled human face image and according to the loss function.

第三方面，本申请提供一种人脸多任务预测方法，包括：In a third aspect, the present application provides a face multi-task prediction method, including:

获取主播的目标人脸图像；Obtain the target face image of the host;

将所述目标人脸图像输入所述人脸多任务预测模型；其中，所述人脸多任务预测模型采用所述的人脸预测模型的训练方法得到；Inputting the target face image into the face multi-task prediction model; wherein, the face multi-task prediction model is obtained by using the training method of the face prediction model;

根据预测任务需求从所述人脸多任务预测模型输出的预测结果中选择相应的目标任务预测结果。Selecting a corresponding target task prediction result from the prediction results output by the face multi-task prediction model according to the prediction task requirement.

第四方面，本申请提供一种人脸多任务预测装置，包括：In a fourth aspect, the present application provides a face multi-task prediction device, including:

图像获取模块，用于获取主播的目标人脸图像；An image acquisition module, configured to acquire the target face image of the anchor;

模型预测模块，用于将所述目标人脸图像输入所述人脸多任务预测模型；其中，所述人脸多任务预测模型采用所述的人脸预测模型的训练方法得到；A model prediction module, configured to input the target face image into the face multi-task prediction model; wherein, the face multi-task prediction model is obtained by using the described face prediction model training method;

结果选择模块，用于根据预测任务需求从所述人脸多任务预测模型输出的预测结果中选择相应的目标任务预测结果。The result selection module is configured to select a corresponding target task prediction result from the prediction results output by the face multi-task prediction model according to the prediction task requirements.

第五方面，本申请提供一种直播系统，包括：主播端、观众端以及直播服务器；其中，所述主播端和观众端分别通过通信网络连接至所述直播服务器；In a fifth aspect, the present application provides a live broadcast system, including: an anchor terminal, an audience end, and a live broadcast server; wherein, the anchor end and the audience end are respectively connected to the live broadcast server through a communication network;

所述主播端，用于接入直播间的主播以及采集主播直播视频流上传至直播服务器；The anchor terminal is used to access the anchor in the live broadcast room and collect the anchor live video stream and upload it to the live server;

所述直播服务器，用于进行主播端与观众端之间的直播转发和向观众端下发直播视频；从主播直播视频流中获取主播的目标人脸图像，利用所述的人脸多任务预测方法获取所述目标人脸图像的目标任务预测结果并添加特效；The live broadcast server is used to forward the live broadcast between the anchor end and the audience end and send the live video to the audience end; obtain the target face image of the anchor from the anchor live video stream, and utilize the multi-task prediction of the face The method obtains the target task prediction result of the target face image and adds special effects;

所述观众端，用于接入直播间的观众用户以及接收所述直播视频进行播放。The viewer terminal is used to access the viewer users of the live broadcast room and receive the live video for playing.

第六方面，本申请提供一种计算机设备，包括存储器和处理器，所述存储器存储有计算机程序，所述处理器执行所述计算机程序时实现所述的人脸预测模型的训练方法的步骤或者所述的人脸多任务预测方法的步骤。In a sixth aspect, the present application provides a computer device, including a memory and a processor, the memory stores a computer program, and when the processor executes the computer program, the steps or steps of the training method for the face prediction model are implemented. The steps of the described face multi-task prediction method.

第七方面，本申请提供一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行时实现所述的人脸预测模型的训练方法的步骤或者所述的人脸多任务预测方法的步骤。In a seventh aspect, the present application provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of the method for training a face prediction model or the steps of the face prediction model are implemented. Steps of the multi-task forecasting method.

上述各实施例提供的技术方案，在人脸预测模型的训练中，根据预测任务对人脸图像集进行分组并进行数据标注，通过构建特征图金字塔网络以及各个预测任务对应的预测任务层，连接特征图与预测任务层得到人脸多任务预测模型；然后配置各个预测任务层的损失函数，最后利用标注的人脸图像和损失函数对人脸多任务预测模型进行训练；该技术方案，能够训练得到同时执行多种预测任务的预测模型，提升人脸多任务预测的效率，特别适用于直播场景下预测需求，满足不同实时直播场景的使用需求。In the technical solutions provided by the above-mentioned embodiments, in the training of the face prediction model, the face image sets are grouped according to the prediction tasks and the data is marked, and by constructing a feature map pyramid network and a prediction task layer corresponding to each prediction task, the connection The feature map and the prediction task layer obtain the face multi-task prediction model; then configure the loss function of each prediction task layer, and finally use the marked face image and loss function to train the face multi-task prediction model; this technical solution can train A prediction model that performs multiple prediction tasks at the same time is obtained to improve the efficiency of face multi-task prediction. It is especially suitable for the prediction needs in live broadcast scenarios and meets the needs of different real-time live broadcast scenarios.

进一步的，在人脸多任务预测中，首先获取主播的目标人脸图像，然后输入到人脸多任务预测模型；再根据预测任务需求从人脸多任务预测模型输出的预测结果中选择相应的目标任务预测结果；该技术方案，利用输入的目标人脸图像可以同时预测并输出多种任务的预测结果，从而实现单模型预测多任务功能，可以根据需求来选择多种预测结果来添加直播特效，更好地满足直播业务对算法实时性的高要求。Further, in face multi-task prediction, first obtain the target face image of the anchor, and then input it into the face multi-task prediction model; then select the corresponding face image from the prediction results output by the face multi-task prediction model according to the prediction task requirements. Target task prediction results; this technical solution uses the input target face image to predict and output the prediction results of multiple tasks at the same time, so as to realize the multi-task function of single model prediction, and can choose multiple prediction results according to needs to add live special effects , to better meet the high real-time requirements of the live broadcast business on the algorithm.

附图说明Description of drawings

图1是一个示例的直播业务应用场景示意图；Figure 1 is a schematic diagram of an example live broadcast service application scenario;

图2是一个实施例的人脸预测模型的训练方法流程图；Fig. 2 is a flow chart of the training method of the face prediction model of an embodiment;

图3是一个示例的人脸图像3D关键点标注结果示意图；Fig. 3 is a schematic diagram of an example 3D key point labeling result of a face image;

图4是一个示例的3D关键点与2D关键点的转换示意图；Fig. 4 is a schematic diagram of conversion of an example 3D key point and 2D key point;

图5是一个示例的人脸多任务预测模型结构示意图；Fig. 5 is a structural schematic diagram of an example face multi-task prediction model;

图6是一个实施例的人脸预测模型的训练装置结构示意图；Fig. 6 is a schematic structural diagram of a training device for a face prediction model of an embodiment;

图7是一个实施例的人脸多任务预测方法流程图；FIG. 7 is a flow chart of a face multi-task prediction method according to an embodiment;

图8是一个实施例的人脸多任务预测装置结构示意图；Fig. 8 is a schematic structural diagram of a face multi-task prediction device according to an embodiment;

图9是一个示例的直播系统结构示意图；Fig. 9 is a structural schematic diagram of an example live broadcast system;

图10是一个实施例的计算机设备的结构示意图。Fig. 10 is a schematic structural diagram of a computer device of an embodiment.

具体实施方式Detailed ways

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处描述的具体实施例仅仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, and are not intended to limit the present application.

本申请实施例提供的技术方案，可以应用于如图1所示的本申请相关方法的应用场景中，图1是一个示例的直播业务应用场景示意图，该直播系统可以包括直播服务器、主播端和观众端，主播端和观众端通过通信网络与直播服务器进行数据通信，从而使得主播端的主播和观众端的观众用户能够进行实时网络直播。其中，对于主播端和观众端，其终端设备可以但不限于是各种个人计算机、笔记本电脑、智能手机和平板电脑，直播服务器可以用独立的服务器或者是多个服务器组成的服务器集群来实现。The technical solution provided by the embodiment of this application can be applied to the application scenario of the related method of this application as shown in Figure 1. Figure 1 is a schematic diagram of an example live broadcast service application scenario. The live broadcast system can include a live broadcast server, an anchor terminal and The viewer end, the host end and the viewer end perform data communication with the live broadcast server through the communication network, so that the anchor at the anchor end and the viewer users at the audience end can perform real-time network live broadcast. Among them, for the host end and the viewer end, the terminal devices can be but not limited to various personal computers, notebook computers, smart phones and tablet computers, and the live broadcast server can be realized by an independent server or a server cluster composed of multiple servers.

以下对本申请的人脸预测模型的训练方法的实施例进行说明，本申请可以是应用于人脸信息预测的场景中，能够通过单模型来同时实现多任务预测的功能；参考图2所示，图2是一个实施例的人脸预测模型的训练方法流程图，可以包括以下步骤：The embodiment of the training method of the face prediction model of the present application will be described below. This application can be applied to the scene of face information prediction, and the function of multi-task prediction can be realized simultaneously through a single model; as shown in Fig. 2, Fig. 2 is the flow chart of the training method of the face prediction model of an embodiment, can comprise the following steps:

S11，根据预测任务对人脸图像集进行分组，并根据各个预测任务分别对所述人脸图像分组进行数据标注。S11. Group the face image set according to the prediction tasks, and perform data labeling on the face image groups respectively according to each prediction task.

此步骤中，可以从互联网网络等途径收集包含人脸的人脸图像，然后根据预测任务将这些人脸图像分组，得到不同预测任务对应的人脸图像分组，再根据不同预测任务来对这些人脸图像分组进行数据标注。In this step, face images containing faces can be collected from the Internet and other channels, and then these face images are grouped according to prediction tasks to obtain face image groups corresponding to different prediction tasks, and then these face images are grouped according to different prediction tasks. Face images are grouped for data annotation.

例如，如果预测任务包括预测人脸图像的2D关键点和预测人脸图像的3D关键点，对应的，可以预先获取2D关键点人脸图像分组和3D关键点人脸图像分组，在对人脸图像分组进行数据标注时，可以包括如下：For example, if the prediction task includes predicting the 2D key points of the face image and predicting the 3D key points of the face image, correspondingly, the 2D key point face image grouping and the 3D key point face image grouping can be obtained in advance, and the face When image grouping is used for data labeling, it can include the following:

(1)在2D关键点人脸图像分组的每张人脸图像上人脸部位标注若干个2D关键点。(1) Mark several 2D key points on each face image grouped by 2D key point face images.

对于2D关键点，其是指人脸图像中人脸可见区域的关键点，一般情况下，在人脸处于侧脸情况下，脸颊轮廓点并非现实3D世界中人脸的脸颊位置，而是在2D图像可见区域中最接近3D真实人脸脸颊的位置上。For 2D key points, it refers to the key points of the visible area of the face in the face image. Generally, when the face is on the side, the cheek contour point is not the cheek position of the face in the real 3D world, but in the The closest position in the visible area of the 2D image to the cheek of the 3D real face.

示例性的，可以在2D关键点人脸图像分组的每张人脸图像上标注人脸图像的2D关键点，如每张人脸图像标注300个2D关键点。Exemplarily, the 2D key points of the face image can be marked on each face image of the 2D key point face image group, for example, 300 2D key points can be marked on each face image.

(2)利用人脸3D基模型渲染一张正面人脸图像，确定3D基模型顶点与各个2D关键点的对应关系；根据所述对应关系在3D关键点人脸图像分组的每张人脸图上人脸部位标注若干个2D关键点。(2) Utilize the face 3D base model to render a frontal face image, determine the corresponding relationship between the 3D base model vertex and each 2D key point; Several 2D key points are marked on the upper face.

对于3D关键点，其是指对应人脸图像中的3D真实人脸的关键点，一般情况下，在人脸侧脸情况下，虽然该区域被遮挡，在人脸图像上不可见，但是脸颊轮廓点处于3D真实人脸的脸颊区域。For the 3D key point, it refers to the key point corresponding to the 3D real face in the face image. Generally, in the case of a side face, although the area is blocked and cannot be seen on the face image, the cheek The contour points are in the cheek region of the 3D real face.

示例性的，可以利用人脸3D基模型渲染一张正面人脸图像，然后确定3D基模型顶点与各个2D关键点的对应关系，在标注3D关键点时，可以将3D关键点的数据标注转换为2D关键点来进行标注，3D关键点与2D关键点可以通过3D基模型所确定的对应关系来实现3D-2D关键点之间的转换。Exemplarily, a frontal face image can be rendered by using the 3D base model of the face, and then the corresponding relationship between the vertices of the 3D base model and each 2D key point can be determined. When marking the 3D key point, the data label conversion of the 3D key point can be Marking for 2D key points, 3D key points and 2D key points can be converted between 3D-2D key points through the corresponding relationship determined by the 3D base model.

作为实施例，可以通过BFM(Bfm facial model，人脸模型)人脸3D基模型来确定3D基模型的顶点与各个2D关键点的对应关系，从而确定3D-2D之间的转换关系，具体包括如下：As an example, the corresponding relationship between the vertices of the 3D basic model and each 2D key point can be determined through the BFM (Bfm facial model, face model) human face 3D basic model, so as to determine the conversion relationship between 3D-2D, specifically including as follows:

a、将人脸3D基模型的表情基和形状基进行降维。a. Dimensionality reduction is performed on the expression base and shape base of the 3D face base model.

例如，可以使用BFM2019人脸3D基模型(参数化模型)，利用PCA(PrincipalComponent Analysis，主成分分析)降维技术把BFM2019人脸3D基模型的表情基和形状基先降维到80维。For example, you can use the BFM2019 face 3D base model (parametric model), and use PCA (Principal Component Analysis, principal component analysis) dimensionality reduction technology to reduce the expression base and shape base of the BFM2019 face 3D base model to 80 dimensions.

b、利用人脸3D基模型的平均脸渲染一张正面人脸图像，并在所述正面人脸图像上标注若干个2D关键点。b. Rendering a front face image by using the average face of the 3D base model of the face, and marking several 2D key points on the front face image.

例如，利用BFM2019人脸3D基模型的平均脸meanshape来渲染一张正面人脸图像，参照2D关键点的标注方法，在正面人脸图像上标注300个2D关键点。For example, use the average face meanshape of the BFM2019 face 3D base model to render a frontal face image, and refer to the 2D key point labeling method to mark 300 2D key points on the frontal face image.

c、计算正面人脸图像上3D顶点的2D投影点，并分别确定与各个2D关键点之间距离最小的投影点，获得正脸时3D基模型顶点与2D关键点的对应关系。c. Calculate the 2D projection points of the 3D vertices on the front face image, and respectively determine the projection points with the smallest distances from each 2D key point, and obtain the corresponding relationship between the 3D base model vertices and the 2D key points in the frontal face.

例如，计算平均脸meanshape上每一个人脸图像的3D顶点的2D投影，对于所标注的300个2D关键点，分别找到与其距离最小的投影点，则在正脸情况下，得到该投影点的3D顶点与2D关键点之间的对应关系。For example, calculate the 2D projection of the 3D vertices of each face image on the average face meanshape, and find the projection point with the smallest distance to the 300 marked 2D key points, then in the case of the frontal face, get the projection point of the projection point Correspondence between 3D vertices and 2D keypoints.

参考图3所示，图3是一个示例的人脸图像3D关键点标注结果示意图，其中“·”点是标注的2D关键点，“×”点是与之对应的3D关键点在人脸图像上投影点，可见二者几乎是重合的，从而得到正面时3D-2D关键点之间的转换关系。Referring to Figure 3, Figure 3 is a schematic diagram of an example 3D key point labeling result of a face image, where the "·" point is the marked 2D key point, and the "×" point is the corresponding 3D key point in the face image On the projection point, it can be seen that the two are almost coincident, so as to obtain the conversion relationship between the 3D-2D key points in the front view.

d、对正面人脸图像脸颊轮廓的2D关键点进行调整处理，获得侧脸时3D基模型顶点与2D关键点的对应关系。d. Adjust the 2D key points of the cheek contour of the frontal face image to obtain the corresponding relationship between the vertices of the 3D base model and the 2D key points of the side face.

示例性的，由于人脸图像发生旋转时，人脸区域的脸颊位置的3D关键点与2D关键点差异较大，据此对人脸区域的脸颊轮廓的关键点进行调整，使得侧脸时也能获得3D基模型顶点与2D关键点的对应关系，从而得到侧面时3D-2D关键点之间的转换关系。Exemplarily, when the face image is rotated, the 3D key points of the cheek position of the face area are quite different from the 2D key points, so the key points of the cheek contour of the face area are adjusted accordingly, so that the side face also The corresponding relationship between the vertices of the 3D base model and the 2D key points can be obtained, so as to obtain the conversion relationship between the 3D-2D key points in the side view.

例如，参考图4所示，图4是一个示例的3D关键点与2D关键点的转换示意图；如图示，假设人脸图像正脸时，第1999号3D顶点v₁₉₉₉＝(x,y,z)对应第1号人脸图像的脸颊2D关键点pt₁＝(w,h)，根据v₁₉₉₉的Y轴坐标可以定义一条水平线，长度为人脸宽度的一半，将该水平线上的所有3D顶点编号加入2D关键点pt₁的3D顶点集合ω₁＝{1999,…}；对于人脸图像的左边侧脸情况，则取出集合ω₁中所有编号的3D顶点，x坐标最大的3D顶点对应此时的2D关键点pt₁，以此类推，可以得到人脸图像上的各个3D关键点对应的2D关键点。For example, as shown in FIG. 4, FIG. 4 is a schematic diagram of conversion between 3D key points and 2D key points_of an example; z) Corresponding to the cheek 2D key point pt₁ = (w, h) of the No. 1 face image, a horizontal line can be defined according to the Y-axis coordinates of v₁₉₉₉ , the length is half of the face width, and all 3D vertices on the horizontal line 3D vertex set ω₁ ={1999,…} whose numbers are added to the 2D key point pt₁ ; for the left profile of the face image, all numbered 3D vertices in the set ω₁ are taken out, and the 3D vertex with the largest x coordinate corresponds to this pt₁ of the 2D key point at time, and so on, the 2D key point corresponding to each 3D key point on the face image can be obtained.

如上述实施例中，由于将3D关键点转换为2D关键点进行标注，因此3D关键点人脸图像分组可以与2D关键点人脸图像分组共用一组人脸图像。As in the above embodiment, since the 3D key points are converted into 2D key points for labeling, the 3D key point face image grouping and the 2D key point face image grouping can share a set of face images.

在一个实施例中，预测任务还可以包括预测人脸图像中人脸可见区域像素分类和预测3D人脸关键点所形成3D网格；据此，可以先获取人脸区域分割人脸图像分组，在对人脸图像分组进行数据标注时，可以包括如下：In one embodiment, the prediction task may also include predicting the 3D grid formed by predicting the pixel classification of the visible area of the face in the face image and predicting the key points of the 3D face; accordingly, the face area segmentation face image grouping may be obtained first, When performing data labeling on face image groups, it may include the following:

(3)将人脸区域分割人脸图像分组的每张人脸图像上的人脸区域与背景区域进行分割，将所述人脸区域标注为前景像素，将背景区域标注为背景像素。(3) Segment the face area and the background area on each face image grouped by the face area, mark the face area as foreground pixels, and mark the background area as background pixels.

对于人脸区域分割，其是指人脸图像中人脸可见区域的逐像素分类，分割后的图像内容一般只包括人脸可见区域，不包括头发、服饰等其他背景区域。For face area segmentation, it refers to the pixel-by-pixel classification of the visible area of the face in the face image. The content of the segmented image generally only includes the visible area of the face, excluding other background areas such as hair and clothing.

示例性的，首先将人脸区域分割人脸图像分组的每张人脸图像上的人脸区域与头发、服饰等背景区域分割开来，然后人脸区域标注为前景像素(像素值为1)，将背景区域标注为背景像素(通常像素值取值为0)，从而将人脸区域与背景区域的分割。Exemplary, at first the face area on each face image grouped by the face area segmentation face image is separated from background areas such as hair and clothing, and then the face area is marked as a foreground pixel (pixel value is 1) , mark the background area as background pixels (usually the pixel value is 0), so as to separate the face area from the background area.

(4)根据所述3D基模型顶点连接关系确定3D关键点人脸图像分组的人脸图像的3D关键点连接关系。(4) Determine the 3D key point connection relationship of the face images grouped by the 3D key point face images according to the vertex connection relationship of the 3D base model.

对于3D网格，即3Dmesh，其是对应人脸图像中的3D真实人脸的3D网格，由3D关键点以及3D关键点之间的连接关系组成。由于3D基模型顶点连接关系是固定的，因此在对3D网格进行数据标注时，利用该3D基模型顶点连接关系及前面标注的3D关键点(由2D关键点及对应关系可确定)即可得到各个3D关键点之间的连接关系。The 3D grid, that is, 3Dmesh, is a 3D grid corresponding to a 3D real face in a face image, and is composed of 3D key points and connection relationships between 3D key points. Since the vertex connection relationship of the 3D base model is fixed, when labeling the 3D grid data, the 3D base model vertex connection relationship and the previously marked 3D key points (determined by the 2D key points and the corresponding relationship) can be used. Get the connection relationship between each 3D key point.

上述实施例的方案，设计了预测人脸图像的2D关键点、人脸区域分割和预测人脸图像的3D关键点及其3D网格的预测任务；并且通过将3D关键点转换为2D关键点进行数据标注的方案，从而便于在单个模型中进行多任务的模型训练。The scheme of above-mentioned embodiment, has designed the prediction task of the 2D key point of predicting face image, face region segmentation and the 3D key point of predicting face image and its 3D grid; And by converting 3D key point into 2D key point A scheme for data labeling, which facilitates multi-task model training in a single model.

S12，构建特征图金字塔网络以及各个预测任务对应的预测任务层，将所述预测任务层分别连接到所述特征图金字塔网络输出的特征图上得到人脸多任务预测模型。S12. Construct a feature map pyramid network and prediction task layers corresponding to each prediction task, and respectively connect the prediction task layers to the feature maps output by the feature map pyramid network to obtain a face multi-task prediction model.

此步骤中，为搭建人脸多任务预测模型的过程，利用特征图金字塔网络(FeaturePyramid Networks，FPN)来提取输入的人脸图像的特征图，并且根据各个预测任务搭建对应的预测任务层，再根据不同预测任务中对于不同特征图的需求将预测任务层分别连接到特征图金字塔网络输出的相应特征图上，从而搭建出人脸多任务预测模型。In this step, for the process of building a face multi-task prediction model, use Feature Pyramid Networks (FPN) to extract the feature map of the input face image, and build the corresponding prediction task layer according to each prediction task, and then According to the requirements for different feature maps in different prediction tasks, the prediction task layer is connected to the corresponding feature map output by the feature map pyramid network, so as to build a face multi-task prediction model.

在一个实施例中，对于步骤S12中的构建特征图金字塔网络以及各个预测任务对应的预测任务层的过程，可以包括如下：In one embodiment, the process of constructing the feature map pyramid network in step S12 and the prediction task layer corresponding to each prediction task may include the following:

S201，构建一个提取图像特征的特征图金字塔网络；其中，所述特征图金字塔网络以人脸图像为输入，输出分辨率逐渐增大的多个特征图。S201. Construct a feature map pyramid network for extracting image features; wherein, the feature map pyramid network takes a face image as input and outputs a plurality of feature maps with gradually increasing resolutions.

例如，构建图像特征的特征图金字塔网络输出分辨率逐渐增大的特征图F1、特征图F2、特征图F3和特征图F4；其中，特征图F1至F4的分辨率逐渐增大。For example, the feature map pyramid network that constructs image features outputs feature maps F1, feature map F2, feature map F3, and feature map F4 with gradually increasing resolutions; wherein, the resolutions of feature maps F1 to F4 gradually increase.

S202，根据需要执行的各个预测任务，在所述特征图金字塔网络之后分别搭建对应的预测任务层。S202. According to each prediction task that needs to be performed, respectively build corresponding prediction task layers after the feature map pyramid network.

例如，如前面实施例，可以设置预测人脸图像的2D关键点、预测人脸图像的3D关键点、预测人脸图像中人脸可见区域像素分类和预测3D人脸关键点所形成3D网格四个预测任务。For example, as in the previous embodiment, you can set the 2D key points of the predicted face image, the 3D key points of the predicted face image, the pixel classification of the visible area of the face in the predicted face image, and the 3D grid formed by the predicted 3D face key points Four prediction tasks.

对应的，搭建预测任务层包括：预测人脸图像的2D关键点的第一预测任务层，预测人脸图像中人脸可见区域像素分类的第二预测任务层，预测人脸图像的3D关键点的第三预测任务层和预测3D人脸关键点所形成3D网格的第四预测任务层。Correspondingly, building the prediction task layer includes: the first prediction task layer for predicting the 2D key points of the face image, the second prediction task layer for predicting the pixel classification of the visible area of the face in the face image, and the prediction of the 3D key points of the face image The third prediction task layer and the fourth prediction task layer of the 3D grid formed by predicting the key points of the 3D face.

优选的，第一预测任务层和第二预测任务层可以为卷积神经网络层，第三预测任务层和第四预测任务层可以为线性层。Preferably, the first prediction task layer and the second prediction task layer may be convolutional neural network layers, and the third prediction task layer and the fourth prediction task layer may be linear layers.

在上述实施例中，对于步骤S12中的将所述预测任务层分别连接到所述特征图金字塔网络输出的特征图上得到人脸多任务预测模型的过程，可以包括如下：In the foregoing embodiment, for the process of connecting the prediction task layer to the feature map output by the feature map pyramid network in step S12 to obtain the multi-task prediction model of the face, it may include the following:

S203，针对于各个预测任务层，分别从所述特征图金字塔网络输出的特征图中选择至少一个特征图作为输入图像。S203. For each prediction task layer, respectively select at least one feature map from the feature maps output by the feature map pyramid network as an input image.

示例性的，针对于上述四个预测任务层，可以特征图金字塔网络输出的特征图中选择不同的特征图作为该预测任务层的输入图像；优选的，可以选择分辨率最大的特征图F4作为第一预测任务层和第二预测任务层的输入图像，选择分辨率最小的特征图F1作为第三预测任务层和第四预测任务层的输入图像；当然也可以通过其他形式的选择方案，比如选择特征图F1和F2作为第三预测任务层和第四预测任务层的输入图像；选择特征图F3和F4作为第一预测任务层和第二预测任务层的输入图像等；具体可以不同情况下的训练需求进行设定。Exemplarily, for the above four prediction task layers, different feature maps can be selected in the feature map output by the feature map pyramid network as the input image of the prediction task layer; preferably, the feature map F4 with the largest resolution can be selected as For the input images of the first prediction task layer and the second prediction task layer, select the feature map F1 with the smallest resolution as the input image of the third prediction task layer and the fourth prediction task layer; of course, other forms of selection schemes can also be used, such as Select the feature map F1 and F2 as the input image of the third prediction task layer and the fourth prediction task layer; select the feature map F3 and F4 as the input image of the first prediction task layer and the second prediction task layer, etc.; specific circumstances can be different set training needs.

S204，根据各个预测任务对应选择的输入图像将所述预测任务层连接到所述特征图上，得到人脸多任务预测模型。S204. Connect the prediction task layer to the feature map according to the selected input image corresponding to each prediction task to obtain a face multi-task prediction model.

如图5所示，图5是一个示例的人脸多任务预测模型结构示意图，图中第一预测任务层和第二预测任务层以特征图F4作为输入图像，第三预测任务层和第四预测任务层以特征图F1作为输入图像，其中：As shown in Figure 5, Figure 5 is a schematic diagram of an example face multi-task prediction model structure, the first prediction task layer and the second prediction task layer in the figure take the feature map F4 as the input image, the third prediction task layer and the fourth prediction task layer The prediction task layer takes the feature map F1 as the input image, where:

第一预测任务层以特征图F4作为输入图像，采用卷积神经网络的操作函数conv2d函数来得到热图heatmap，然后利用热图heatmap并使用soft-argmax算法计算得到2D关键点坐标。The first prediction task layer takes the feature map F4 as the input image, uses the operation function conv2d function of the convolutional neural network to obtain the heatmap heatmap, and then uses the heatmap heatmap and uses the soft-argmax algorithm to calculate the 2D key point coordinates.

第二预测任务层为以特征图F4作为输入图像且为卷积神经网络层，采用卷积神经网络的操作函数conv2d函数来预测人脸图像的分割图像。The second prediction task layer is a convolutional neural network layer using the feature map F4 as an input image, and uses the convolutional neural network operation function conv2d function to predict the segmented image of the face image.

第三预测任务层利用线性函数linear函数来预测BFM基模型的系数和相机参数，其中系数包括形状系数为W_id∈R^80x1、表情系数为W_exp∈R^80x1；相机参数包括平移参数T∈R^3x1和旋转参数R∈R^3x1等，预测的3D人脸关键点集合V＝meanshape+W_id*B_id+W_exp*B_exp。The third prediction task layer uses the linear function linear function to predict the coefficients and camera parameters of the BFM base model, where the coefficients include the shape coefficient W_id ∈ R^80x1 , the expression coefficient W_exp ∈^{R 80x1} ; the camera parameters include the translation parameter T ∈ R^3x1 and rotation parameters R∈R^3x1 , etc., the predicted 3D face key point set V=meanshape+W_id *B_id +W_exp *B_exp .

第四预测任务层是利用第三预测任务层3D人脸关键点的预测结果及3D关键点连接关系得到，即利用3D人脸关键点集合V加上预定义的3D顶点三角形的连接关系即得到3Dmesh。The fourth prediction task layer is obtained by using the prediction results of the 3D face key points and the connection relationship of the 3D key points in the third prediction task layer, that is, using the 3D face key point set V plus the connection relationship of the predefined 3D vertex triangles to obtain 3D mesh.

上述实施例，提供了搭建人脸多任务预测模型的技术方案，该人脸多任务预测模型包含了当前直播场景中常用的几种预测任务，特别是覆盖了美颜、美妆、整形等直播、短视频特效技术常用的人脸关键点检测与分割算法，极大地提高了直播场景下的算法效率。The above embodiments provide a technical solution for building a face multi-task prediction model. The face multi-task prediction model includes several prediction tasks commonly used in the current live broadcast scene, especially covering live broadcasts such as beauty, make-up, and plastic surgery. , The face key point detection and segmentation algorithm commonly used in short video special effects technology has greatly improved the algorithm efficiency in live broadcast scenarios.

S13，分别配置所述人脸多任务预测模型的各个预测任务层在预测训练中的损失函数。S13. Respectively configure loss functions of each prediction task layer of the face multi-task prediction model in prediction training.

此步骤中，基于所搭建的人脸多任务预测模型，设计各个预测任务层在预测训练中的损失函数，由于人脸多任务预测模型是单模型多任务预测结构，在训练时各个预测任务层的损失函数相互影响，最终输出是多个预测任务同时输出预测结果。In this step, based on the face multi-task prediction model built, the loss function of each prediction task layer in the prediction training is designed. Since the face multi-task prediction model is a single-model multi-task prediction structure, each prediction task layer The loss functions of the influence each other, and the final output is that multiple prediction tasks output prediction results at the same time.

在一个实施例中，以预测人脸图像的2D关键点、预测人脸图像的3D关键点、预测人脸图像中人脸可见区域像素分类和预测3D人脸关键点所形成3D网格四个预测任务为例，对应的，配置各个预测任务层在预测训练中的损失函数，可以包括如下：In one embodiment, the 3D grid formed by predicting the 2D key points of the face image, predicting the 3D key points of the face image, predicting the pixel classification of the visible area of the face in the face image and predicting the 3D face key points is four Take the prediction task as an example. Correspondingly, configure the loss function of each prediction task layer in the prediction training, which can include the following:

配置第一预测任务层的损失函数为其中，/>表示人脸多任务预测模型预测的第i个关键点坐标，/>表示标注的第i个2D关键点坐标。Configure the loss function of the first prediction task layer as where, /> Indicates the coordinates of the i-th key point predicted by the face multi-task prediction model, /> Indicates the i-th 2D key point coordinates of the label.

配置第二预测任务层的损失函数为L_seg＝-(gt_seg*logpred_seg)+(1-gt_seg)*log(1-pred_seg))，其中，gt_seg为前景像素或者背景像素的标注信息，即像素属于前景像素(1)还是背景像素(0)，pred_seg为人脸多任务预测模型的预测结果(0～1)。Configure the loss function of the second prediction task layer as L_seg =-(gt_seg *logpred_seg )+(1-gt_seg )*log(1-pred_seg )), where gt_seg is the label of foreground pixels or background pixels Information, that is, whether the pixel belongs to the foreground pixel (1) or the background pixel (0), and the pred_seg is the prediction result of the face multi-task prediction model (0-1).

配置第三预测任务层和第四预测任务层的损失函数为其中，/>为人脸多任务预测模型预测的3D人脸关键点在人脸图像上的2D投影点，/>表示标注的第i个2D关键点坐标；其中，3D关键点与2D关键点的对应关系为3D基模型顶点与2D关键点的对应关系。Configure the loss function of the third prediction task layer and the fourth prediction task layer as where, /> The 2D projection points of the 3D face key points predicted by the face multi-task prediction model on the face image, /> Indicates the coordinates of the marked i-th 2D key point; where, the correspondence between the 3D key point and the 2D key point is the correspondence between the vertices of the 3D base model and the 2D key point.

上述实施例的方案，根据预测任务分别设计了各个预测任务层在预测训练中对应的损失函数，各个损失函数在模型训练过程中相互影响，从而可以形成多个预测任务同时输出预测结果的功能。In the solution of the above embodiment, the corresponding loss functions of each prediction task layer in the prediction training are designed according to the prediction tasks, and each loss function interacts with each other during the model training process, so that multiple prediction tasks can simultaneously output prediction results.

S14，利用所述标注的人脸图像并根据所述损失函数对所述人脸多任务预测模型进行训练。S14, using the labeled face image and training the face multi-task prediction model according to the loss function.

此步骤中，是利用各个人脸图像分组，在设定的损失函数下对人脸多任务预测模型进行训练，从而得到可以在实际场景中预测使用的人脸多任务预测模型。In this step, each face image group is used to train the face multi-task prediction model under the set loss function, so as to obtain a face multi-task prediction model that can be used for prediction in actual scenes.

在一个实施例中，对于训练的过程，可以包括如下：In one embodiment, the training process may include the following:

S401，读取各组已标注的人脸图像，并分别输入到人脸多任务预测模型；具体的，将标注好的人脸图像分别输入到人脸多任务预测模型中。S401. Read the marked face images of each group, and input them into the multi-task prediction model of face respectively; specifically, input the marked face images into the multi-task prediction model of face respectively.

S402，计算在各个所述损失函数共同影响下人脸多任务预测模型输出的各个预测任务的预测结果；在训练过程中，各个预测任务层均进行训练并同时输出预测结果。S402. Calculate the prediction results of each prediction task output by the face multi-task prediction model under the joint influence of each of the loss functions; during the training process, each prediction task layer is trained and simultaneously outputs the prediction results.

S402，根据所述预测结果对人脸多任务预测模型(包括特征图金字塔网络和各个预测任务层)的参数进行调整，直至人脸多任务预测模型输出的预测结果达到设定指标要求。S402. Adjust the parameters of the face multi-task prediction model (including feature map pyramid network and each prediction task layer) according to the prediction results until the prediction results output by the face multi-task prediction model meet the set index requirements.

上述实施例的方案，利用单个模型的多个预测任务层的损失函数下同时进行模型训练，从而可以使得人脸多任务预测模型可以对单个人脸图像进行多个任务同时预测，从而提升了模型预测算法效率。In the scheme of the above-mentioned embodiment, the model training is performed simultaneously under the loss function of multiple prediction task layers of a single model, so that the face multi-task prediction model can perform multiple tasks simultaneous prediction on a single face image, thereby improving the model Predictive Algorithm Efficiency.

应该理解的是，虽然如上所述的各实施例所涉及的流程图中的各个步骤按照箭头的指示依次显示，但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明，这些步骤的执行并没有严格的顺序限制，这些步骤可以以其它的顺序执行。而且，如上所述的各实施例所涉及的流程图中的至少一部分步骤可以包括多个步骤或者多个阶段，这些步骤或者阶段并不必然是在同一时刻执行完成，而是可以在不同的时刻执行，这些步骤或者阶段的执行顺序也不必然是依次进行，而是可以与其它步骤或者其它步骤中的步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the steps in the flow charts involved in the above embodiments are shown sequentially according to the arrows, these steps are not necessarily executed sequentially in the order indicated by the arrows. Unless otherwise specified herein, there is no strict order restriction on the execution of these steps, and these steps can be executed in other orders. Moreover, at least some of the steps in the flow charts involved in the above-mentioned embodiments may include multiple steps or stages, and these steps or stages are not necessarily executed at the same time, but may be performed at different times For execution, the execution order of these steps or stages is not necessarily performed sequentially, but may be executed in turn or alternately with other steps or at least a part of steps or stages in other steps.

基于同样的发明构思，本申请还提供了一种用于实现上述所涉及的相关方法的装置。该装置所提供的解决问题的实现方案与上述方法中所记载的实现方案相似，故下面所提供的一个或多个相关装置实施例中的具体限定可以参见上文中对于相关方法的限定，在此不再赘述。Based on the same inventive concept, the present application also provides a device for implementing the related method mentioned above. The solution to the problem provided by the device is similar to the implementation described in the above-mentioned method, so the specific limitations in one or more related device embodiments provided below can refer to the above-mentioned definition of the related method, here No longer.

参考图6所示，图6是一个实施例的人脸预测模型的训练装置结构示意图，该装置包括：Referring to Fig. 6, Fig. 6 is a schematic structural diagram of a training device for a face prediction model of an embodiment, the device comprising:

数据标注模块11，用于根据预测任务对人脸图像集进行分组，并根据各个预测任务分别对所述人脸图像分组进行数据标注；The data labeling module 11 is used for grouping the human face image set according to the prediction task, and carrying out data labeling to the grouping of the human face images according to each prediction task;

模型搭建模块12，用于构建特征图金字塔网络以及各个预测任务对应的预测任务层，将所述预测任务层分别连接到所述特征图金字塔网络输出的特征图上得到人脸多任务预测模型；Model building module 12, is used for constructing feature map pyramid network and the prediction task layer corresponding to each prediction task, described prediction task layer is respectively connected on the feature map that described feature map pyramid network outputs and obtains face multi-task prediction model;

函数配置模块13，用于分别配置所述人脸多任务预测模型的各个预测任务层在预测训练中的损失函数；Function configuration module 13, for respectively configuring the loss function of each prediction task layer of described human face multi-task prediction model in prediction training;

模型训练模块14，用于利用所述标注的人脸图像并根据所述损失函数对所述人脸多任务预测模型进行训练。The model training module 14 is configured to use the marked face image and train the face multi-task prediction model according to the loss function.

上述装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中，也可以以软件形式存储于计算机设备中的存储器中，以便于处理器调用执行以上各个模块对应的操作。Each module in the above-mentioned device can be fully or partially realized by software, hardware and a combination thereof. The above-mentioned modules can be embedded in or independent of the processor in the computer device in the form of hardware, and can also be stored in the memory of the computer device in the form of software, so that the processor can invoke and execute the corresponding operations of the above-mentioned modules.

本实施例的人脸预测模型的训练装置可执行本申请的实施例所提供的一种人脸预测模型的训练方法，其实现原理相类似，本申请各实施例中的人脸预测模型的训练装置中的各模块所执行的动作是与本申请各实施例中的人脸预测模型的训练方法中的步骤相对应的，对于人脸预测模型的训练装置的各模块的详细功能描述具体可以参见前文中所示的对应的人脸预测模型的训练方法中的描述，此处不再赘述。The face prediction model training device of this embodiment can execute a face prediction model training method provided in the embodiment of the present application, and its realization principle is similar. The training of the face prediction model in each embodiment of the present application The actions performed by each module in the device correspond to the steps in the training method of the face prediction model in each embodiment of the present application. For the detailed functional description of each module of the training device of the face prediction model, please refer to The description in the training method of the corresponding face prediction model shown above will not be repeated here.

基于前述实施例提供的人脸预测模型的训练方案，本申请还提供一种人脸多任务预测方法，在该方法中，利用了前述任意实施例的人脸预测模型的训练方法得到的人脸多任务预测模型。Based on the training scheme of the face prediction model provided in the foregoing embodiments, the present application also provides a face multi-task prediction method, in which the face Multi-task predictive models.

参考图7所示，图7是一个实施例的人脸多任务预测方法流程图，包括：Referring to Fig. 7, Fig. 7 is a flow chart of a face multi-task prediction method of an embodiment, including:

S21，获取主播的目标人脸图像。S21. Acquire the target face image of the anchor.

具体的，在直播场景中，直播服务器可以获取主播端上传的主播图像，根据主播图像进行人脸识别得到主播的目标人脸图像。Specifically, in a live broadcast scene, the live broadcast server may obtain the anchor image uploaded by the anchor terminal, and perform face recognition based on the anchor image to obtain the target face image of the anchor.

S22，将所述目标人脸图像输入所述人脸多任务预测模型。S22. Input the target face image into the face multi-task prediction model.

具体的，将需要添加特征等处理的目标人脸图像输入到的预先训练的人脸多任务预测模型进行多任务预测得到相应的预测结果。Specifically, the pre-trained face multi-task prediction model is input to the target face image that needs to be processed by adding features, etc., to perform multi-task prediction to obtain corresponding prediction results.

例如，可以输入一张目标人脸图像，人脸多任务预测模型即可输出人脸图像的2D关键点、人脸图像分割图、人脸图像的3D关键点及其3Dmesh的预测结果。For example, a target face image can be input, and the face multi-task prediction model can output the 2D key points of the face image, the segmentation map of the face image, the 3D key points of the face image and its 3Dmesh prediction results.

S23，根据预测任务需求从所述人脸多任务预测模型输出的预测结果中选择相应的目标任务预测结果。S23. Select a corresponding target task prediction result from the prediction results output by the face multi-task prediction model according to the prediction task requirement.

具体的，直播服务器根据使用需求，可以从人脸多任务预测模型输出的多个预测结果中选择所需的目标任务预测结果；例如，如果当前需要获取人脸图像的2D关键点和人脸图像分割图，则可以选择这两个输出预测结果进行使用。Specifically, the live broadcast server can select the desired target task prediction result from the multiple prediction results output by the face multi-task prediction model according to the usage requirements; for example, if it is currently necessary to obtain the 2D key points and face image Segmentation graph, you can select these two output prediction results for use.

如本申请提供的技术方案，可以用一个预测模型同时输出人脸的2D关键点、人脸分割图、3D关键点及其3Dmesh等预测结果，可以显著提高使用人脸关键点检测算法、人脸分割算法、人脸3D关键点与3Dmesh算法场景中的算法计算效率，更好地满足直播业务对算法的高实时性要求。As the technical solution provided by this application, a predictive model can be used to simultaneously output the 2D key points of the face, the face segmentation map, the 3D key points and its 3Dmesh and other prediction results, which can significantly improve the use of face key point detection algorithms, face detection algorithms, etc. Segmentation algorithm, face 3D key points and algorithm calculation efficiency in 3Dmesh algorithm scene can better meet the high real-time requirements of the live broadcast business for the algorithm.

参考图8所示，图8是一个实施例的人脸多任务预测装置结构示意图，该装置包括：Referring to FIG. 8, FIG. 8 is a schematic structural diagram of a face multi-task prediction device of an embodiment, which includes:

图像获取模块21，用于获取主播的目标人脸图像；Image acquisition module 21, for acquiring the target face image of anchor;

模型预测模块22，用于将所述目标人脸图像输入所述人脸多任务预测模型；其中，所述人脸多任务预测模型采用上述任意实施例的人脸预测模型的训练方法得到；A model prediction module 22, configured to input the target face image into the face multi-task prediction model; wherein, the face multi-task prediction model is obtained by using the training method of the face prediction model in any of the above-mentioned embodiments;

结果选择模块23，用于根据预测任务需求从所述人脸多任务预测模型输出的预测结果中选择相应的目标任务预测结果。The result selection module 23 is configured to select a corresponding target task prediction result from the prediction results output by the face multi-task prediction model according to the prediction task requirements.

本实施例的人脸多任务预测装置可执行本申请的实施例所提供的一种人脸多任务预测方法，其实现原理相类似，本申请各实施例中的人脸多任务预测装置中的各模块所执行的动作是与本申请各实施例中的人脸多任务预测方法中的步骤相对应的，对于人脸多任务预测装置的各模块的详细功能描述具体可以参见前文中所示的对应的人脸多任务预测方法中的描述，此处不再赘述。The face multi-task prediction device of this embodiment can execute a face multi-task prediction method provided in the embodiments of the present application, and its realization principle is similar, and the face multi-task prediction device in each embodiment of the present application The actions performed by each module correspond to the steps in the face multi-task prediction method in each embodiment of the present application. For the detailed functional description of each module of the face multi-task prediction device, please refer to the above-mentioned The description in the corresponding face multi-task prediction method will not be repeated here.

下面阐述直播系统的实施例。Embodiments of the live broadcast system are described below.

本实施例提供的直播系统，参考图9所示，图9是一个示例的直播系统结构示意图，该直播系统包括：主播端、观众端以及直播服务器；其中主播端和观众端分别通过通信网络连接至直播服务器。The live broadcast system provided by this embodiment is shown in FIG. 9 with reference to FIG. 9. FIG. 9 is a schematic structural diagram of a live broadcast system of an example. The live broadcast system includes: an anchor terminal, an audience end, and a live broadcast server; wherein the anchor end and the audience end are respectively connected through a communication network to the live server.

对于主播端，其是用于接入直播间的主播用户以及采集主播直播视频流上传至直播服务器；对于直播服务器，其是用于进行主播端与观众端之间的直播转发和向观众端下发直播视频；从主播直播视频流中获取主播的目标人脸图像，利用上述任意实施例的人脸多任务预测方法获取所述目标人脸图像的目标任务预测结果并添加特效；对于所述观众端，其是用于接入直播间的观众用户以及接收所述直播视频进行播放。For the anchor end, it is used to access the anchor user in the live broadcast room and collect the anchor live video stream and upload it to the live broadcast server; for the live broadcast server, it is used for forwarding the live broadcast between the anchor end and the audience end and downloading to the audience end Send the live video; Obtain the target face image of the anchor from the live video stream of the anchor, utilize the face multi-task prediction method of any of the above-mentioned embodiments to obtain the target task prediction result of the target face image and add special effects; for the audience Terminal, which is used to access the audience users of the live broadcast room and receive the live video for playback.

如图9所示，假设观众用户A、B、C……通过App客户端访问直播间观看主播的直播画面，当主播用户需要使用美颜、美妆等特效时，需要调用人脸多任务预测模型来对其人脸相关信息进行多任务预测，如人脸的2D关键点、人脸分割图、3D关键点及其3Dmesh等，此时，主播用户可以通过其客户端将视频流上传到直播服务器之后，由直播服务器调用人脸多任务预测模型来获取相关预测结果，并添加相应的特效效果；然后直播服务器可以将添加了特效效果之后的视频画面生成直播视频流下发到各个观众用户A、B、C……的客户端上进行播放。由于上述直播系统采用了本申请的人脸多任务预测模型，在进行人脸相关信息的多任务预测时，能够同时输出多个任务的预测结果，提高了模型算法效率，能够更好的服务于网络直播业务中的美颜、美妆等特效技术。As shown in Figure 9, assume that audience users A, B, C... access the live broadcast room through the App client to watch the live broadcast of the anchor. model to perform multi-task prediction on face-related information, such as 2D key points of the face, face segmentation map, 3D key points and its 3D mesh, etc. At this time, the anchor user can upload the video stream to the live broadcast through its client After the server, the live broadcast server calls the face multi-tasking prediction model to obtain relevant prediction results, and adds corresponding special effects; then the live broadcast server can generate live video streams from the video images with special effects added to each audience user A, Play on the client of B, C.... Since the above-mentioned live broadcast system adopts the face multi-task prediction model of the present application, when performing multi-task prediction of face-related information, it can output the prediction results of multiple tasks at the same time, which improves the efficiency of the model algorithm and can better serve the Special effect technologies such as beauty and make-up in webcasting business.

下面阐述本申请的计算机设备及计算机可读存储介质的实施例。Embodiments of the computer device and the computer-readable storage medium of the present application are set forth below.

参考图10所示，图10是一个示例的计算机设备结构示意图，该计算机设备可以是直播服务器应用的设备，也可以是观众端和主播端应用的设备，该计算机设备包括通过系统总线连接的处理器、存储器和网络接口。其中，该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质和内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储人脸图像数据集等数据。该计算机设备的网络接口用于与外部的设备通过通信网络连接。该计算机程序被处理器执行时以实现本申请实施例所提供的相关方法。Referring to FIG. 10, FIG. 10 is a schematic structural diagram of an exemplary computer device. The computer device may be a device used by a live broadcast server, or a device used by a viewer and an anchor. The computer device includes processing devices connected through a system bus. devices, storage and network interfaces. Wherein, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs and databases. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The database of the computer device is used to store data such as face image datasets. The network interface of the computer device is used to connect with external devices through a communication network. When the computer program is executed by the processor, the relevant methods provided in the embodiments of the present application can be realized.

本领域技术人员可以理解，上述实施例提供的计算机设备结构，仅仅是与本申请方案相关的部分结构的框图，并不构成对本申请方案所应用于其上的计算机设备的限定，具体的计算机设备可以包括比图中所示更多或更少的部件，或者组合某些部件，或者具有不同的部件布置。Those skilled in the art can understand that the computer equipment structure provided by the above-mentioned embodiments is only a block diagram of a partial structure related to the solution of this application, and does not constitute a limitation on the computer equipment on which the solution of this application is applied. The specific computer equipment More or fewer components than shown in the figures may be included, or certain components may be combined, or have a different arrangement of components.

本申请还提供一种计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现上述各实施例的方法中的步骤。本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，所述的计算机程序可存储于一非易失性计算机可读取存储介质中，该计算机程序在执行时，可包括如上述各方法的实施例的流程。其中，本申请所提供的各实施例中所使用的对存储器、数据库或其它介质的任何引用，均可包括非易失性和易失性存储器中的至少一种。非易失性存储器可包括只读存储器(Read-Only Memory，ROM)、磁带、软盘、闪存、光存储器、高密度嵌入式非易失性存储器、阻变存储器(ReRAM)、磁变存储器(Magnetoresistive RandomAccess Memory，MRAM)、铁电存储器(Ferroelectric Random Access Memory，FRAM)、相变存储器(Phase Change Memory，PCM)、石墨烯存储器等。易失性存储器可包括随机存取存储器(Random Access Memory，RAM)或外部高速缓冲存储器等。作为说明而非局限，RAM可以是多种形式，比如静态随机存取存储器(Static Random Access Memory，SRAM)或动态随机存取存储器(Dynamic Random Access Memory，DRAM)等。本申请所提供的各实施例中所涉及的数据库可包括关系型数据库和非关系型数据库中至少一种。非关系型数据库可包括基于区块链的分布式数据库等，不限于此。本申请所提供的各实施例中所涉及的处理器可为通用处理器、中央处理器、图形处理器、数字信号处理器、可编程逻辑器、基于量子计算的数据处理逻辑器等，不限于此。The present application also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps in the methods of the foregoing embodiments are implemented. Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above-mentioned embodiments can be completed by instructing related hardware through computer programs, and the computer programs can be stored in a non-volatile computer-readable memory In the medium, when the computer program is executed, it may include the processes of the embodiments of the above-mentioned methods. Wherein, any reference to storage, database or other media used in the various embodiments provided in the present application may include at least one of non-volatile and volatile storage. Non-volatile memory can include read-only memory (Read-Only Memory, ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive variable memory (ReRAM), magnetic variable memory (Magnetoresistive Random Access Memory (MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (Phase Change Memory, PCM), graphene memory, etc. The volatile memory may include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration and not limitation, RAM can be in various forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM). The databases involved in the various embodiments provided in this application may include at least one of a relational database and a non-relational database. The non-relational database may include a blockchain-based distributed database, etc., but is not limited thereto. The processors involved in the various embodiments provided by this application can be general-purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, data processing logic devices based on quantum computing, etc., and are not limited to this.

需要说明的是，本申请所涉及的用户信息(包括但不限于用户设备信息、用户个人信息等)和数据(包括但不限于用于分析的数据、存储的数据、展示的数据等)，均为经用户授权或者经过各方充分授权的信息和数据。It should be noted that the user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data used for analysis, stored data, displayed data, etc.) involved in this application are all Information and data authorized by the user or fully authorized by all parties.

以上实施例的各技术特征可以进行任意的组合，为使描述简洁，未对上述实施例中的各个技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. To make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should be It is considered to be within the range described in this specification.

以上所述实施例仅表达了本申请的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对本申请专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本申请构思的前提下，还可以做出若干变形和改进，这些都属于本申请的保护范围。因此，本申请的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation modes of the present application, and the description thereof is relatively specific and detailed, but should not be construed as limiting the patent scope of the present application. It should be noted that those skilled in the art can make several modifications and improvements without departing from the concept of the present application, and these all belong to the protection scope of the present application. Therefore, the protection scope of the present application should be determined by the appended claims.