CN116563908A

Movatterモバイル変換

Info

Publication number: CN116563908A
Application number: CN202310204150.3A
Authority: CN
Inventors: 宋海裕; 王浩宇
Original assignee: Zhejiang University of Finance and Economics
Current assignee: Zhejiang University of Finance and Economics
Priority date: 2023-03-06
Filing date: 2023-03-06
Publication date: 2023-08-08
Anticipated expiration: 2043-03-06
Also published as: CN116563908B

Abstract

The invention discloses a face analysis and emotion recognition method based on a multitasking cooperative network. The method comprises the following steps of 1, preprocessing experimental data; 2. constructing an MPNET network model; step 2.1, adopting ResNet18 as a backbone network of an encoder, and extracting semantic information of an input picture; step 2.2, constructing an edge perception branch, and adding a detail perception module DPM and a feature fusion module FFM into the edge perception branch; step 2.3, constructing a segmentation branch for outputting a face analysis result and supervising the face analysis; step 2.4, constructing a classification branch for identifying facial emotion; 3. training MPNET by utilizing the inter-task consistency learning loss function and the intra-task loss function; 4. and (3) performing experiments by adopting a trained MPNET network model, and verifying the model effect on the CelebAMask_HQ data set. The invention integrates the face analysis and the face emotion recognition into a network, has high real-time performance and accuracy, and can be deployed on mobile terminals and other devices.

Description

Translated fromChinese

一种基于多任务协同网络的人脸解析和情感识别方法A face analysis and emotion recognition method based on multi-task collaborative network

技术领域technical field

本发明属于人工智能技术领域，具体涉及一种基于多任务协同网络的人脸解析和情感识别方法。The invention belongs to the technical field of artificial intelligence, and in particular relates to a face analysis and emotion recognition method based on a multi-task collaborative network.

背景技术Background technique

人脸解析是一种细粒度语义分割任务，经常应用于P图技术、美化照片等。人脸面部情感识别是一种分类任务，可以应用于人机交互、心理健康评估等领域。本方法开发了一种新的多任务协同网络，可以同时实现人脸解析和情感识别。相比于其他方法，该方法的推理速度和精确度均明显提高，可以部署到手机等移动端设备。Face parsing is a fine-grained semantic segmentation task, which is often used in P-image technology, beautifying photos, etc. Facial emotion recognition is a classification task that can be applied in human-computer interaction, mental health assessment and other fields. This method develops a new multi-task collaborative network that can simultaneously achieve face parsing and emotion recognition. Compared with other methods, the reasoning speed and accuracy of this method are significantly improved, and it can be deployed to mobile devices such as mobile phones.

发明内容Contents of the invention

为克服现有技术不足，本发明提出一种基于多任务协同网络的人脸解析和情感识别方法。实现了人脸解析和情感识别。本发明提出了一种名为MPENet的深度学习模型，具体的步骤如下：In order to overcome the deficiencies of the prior art, the present invention proposes a face analysis and emotion recognition method based on a multi-task collaborative network. Realized face analysis and emotion recognition. The present invention proposes a kind of deep learning model called MPENet, and concrete steps are as follows:

步骤1、实验数据的预处理；Step 1, preprocessing of experimental data;

步骤2、构建MPENet网络模型；Step 2, build the MPENet network model;

步骤3、训练MPENet网络模型；Step 3, train the MPENet network model;

步骤4、采用训练好的MPENet网络模型在多个人脸解析数据集上进行实验，并对实验结果进行评估。Step 4. Use the trained MPENet network model to conduct experiments on multiple face analysis data sets, and evaluate the experimental results.

所述步骤1具体包括以下步骤：Described step 1 specifically comprises the following steps:

步骤1.1为了提高模型的泛化能力，首先对图像进行归一化处理；Step 1.1 In order to improve the generalization ability of the model, the image is first normalized;

步骤1.2对归一化后的图像进行裁剪，大小为512×512；Step 1.2 crops the normalized image to a size of 512×512;

步骤1.3对裁剪后的图像进行数据增强，具体通过随机旋转和随机缩放；Step 1.3 performs data enhancement on the cropped image, specifically through random rotation and random scaling;

步骤1.4划分训练集、验证集和测试集。Step 1.4 divides the training set, validation set and test set.

所述步骤2包括以下步骤：Described step 2 comprises the following steps:

步骤2.1采用ResNet18作为编码器的主干网络，提取输入图片的语义信息；Step 2.1 uses ResNet18 as the backbone network of the encoder to extract the semantic information of the input image;

步骤2.2构建边缘感知分支，并在边缘感知分支中添加细节感知模块(DPM)和特征融合模块(FFM)。Step 2.2 builds the edge-aware branch, and adds detail-aware module (DPM) and feature fusion module (FFM) in the edge-aware branch.

ResNet18的第二层特征首先经过细节感知模块DPM，其输出与经过2倍上采样的ResNet18第三层特征一起经过特征融合模块FFM进行特征融合，得到融合特征Ⅰ；The second-layer features of ResNet18 first pass through the detail perception module DPM, and its output and the third-layer features of ResNet18 that have been double-upsampled are fused together through the feature fusion module FFM to obtain the fusion feature I;

进一步的，融合特征Ⅰ再次经过细节感知模块DPM，其输出与经过4倍上采样的ResNet18第四层特征一起经过特征融合模块FFM进行融合，得到融合特征Ⅱ；最终融合特征Ⅱ再次经过细节感知模块DPM和4倍上采样后，分别送入两个Detai Head，得到人脸二分类边界图和人脸多分类边界图/>Detail Head由一层3×3的卷积、batch norm层、Relu激活函数和1×1的卷积构成。Further, the fusion feature I passes through the detail perception module DPM again, and its output is fused with the fourth layer feature of ResNet18 that has been 4 times upsampled through the feature fusion module FFM to obtain the fusion feature II; the final fusion feature II passes through the detail perception module again After DPM and 4 times upsampling, send it to two Detai Heads respectively to get the face binary classification boundary map and face multi-classification boundary map/> Detail Head consists of a layer of 3×3 convolution, batch norm layer, Relu activation function and 1×1 convolution.

细节感知模块DPM的主要结构如下：The main structure of the detail perception module DPM is as follows:

对于输入特征X，首先经过一层全局最大池化层和两层1×1的卷积层得到空间注意图Ⅰ。输入特征X经过一层全局平均池化层和两层1×1的卷积层得到空间注意图Ⅱ。空间注意图Ⅰ与空间注意图Ⅱ相加，再经过softmax函数，得到最终的空间注意图。再将空间注意图与输入特征X相乘，即可得到输出特征y。For the input feature X, first, a spatial attention map I is obtained through a global maximum pooling layer and two 1×1 convolutional layers. The input feature X passes through a global average pooling layer and two 1×1 convolutional layers to obtain a spatial attention map II. The spatial attention map I is added to the spatial attention map II, and then passed through the softmax function to obtain the final spatial attention map. Then multiply the spatial attention map with the input feature X to get the output feature y.

进一步的，将输出特征y作为新的输入特征，经过一层全局最大池化层得到通道注意图Ⅰ，输入特征经过一层全局平均池化层得到通道注意图Ⅱ。将通道注意图Ⅱ与通道注意图Ⅱ相加，再经过一层1×1的卷积层和softmax函数，得到最终的通道注意图，再将通道注意图与输入特征相乘即得到最终经过细节感知模块的特征。Further, the output feature y is used as a new input feature, and the channel attention map I is obtained through a layer of global maximum pooling layer, and the channel attention map II is obtained through a layer of global average pooling layer of the input feature. Add the channel attention map II to the channel attention map II, and then pass through a layer of 1×1 convolution layer and softmax function to obtain the final channel attention map, and then multiply the channel attention map with the input features to get the final details Features of the perception module.

特征融合模块的主要结构如下：The main structure of the feature fusion module is as follows:

输入特征Z₁和Z₂。首先将两个特征进行拼接，再经过一层全局平均池化层、1×1的卷积层和softmax函数，得到了分支注意图。将分支注意图按照通道维度展开，按照设定的维度索引分别与Z₁和Z₂相乘，得到最终融合后的输出特征Z。例如：分支注意图的维度为512，则展开后0-255号通道对应的权重值与Z₁相乘，256-511号通道对应的权重值与Z₂相乘，Input features Z₁ and Z₂ . First, the two features are spliced, and then a layer of global average pooling layer, 1×1 convolutional layer and softmax function are used to obtain the branch attention map. The branch attention map is expanded according to the channel dimension, and multiplied by Z₁ and Z₂ according to the set dimension index to obtain the final fused output feature Z. For example: if the dimension of the branch attention map is 512, then the weight value corresponding to the 0-255 channel after expansion is multiplied by Z₁ , and the weight value corresponding to the 256-511 channel is multiplied by Z₂ ,

步骤2.3构建分割分支，用于输出人脸解析的结果，并监督人脸解析。Step 2.3 builds a segmentation branch for outputting the results of face parsing and supervising face parsing.

对于编码器ResNet18的第五层特征，设计有一个五层相同结构的解码器，每一个解码器由3×3卷积层和上采样构成，经过五层相同结构的解码器，输入特征恢复到原始分辨率，即得到监督的人脸解析结果For the fifth layer feature of the encoder ResNet18, a five-layer decoder with the same structure is designed. Each decoder is composed of a 3×3 convolutional layer and upsampling. After five layers of decoders with the same structure, the input features are restored to The original resolution, that is, the supervised face parsing result

对于编码器ResNet18的第五层特征，先进行8倍上采样得到特征Y2，然后将边缘感知分支中经过最后一个细节感知模块DPM的特征Y1与8倍上采样后的特征Y2一起送入双图自适应学习模块DGALM，经过DGALM模块的特征再经过一个Seg Head即可得到最终的人脸解析结果。Seg Head由一层3×3的卷积、batch norm层、Relu激活函数和1×1的卷积构成。For the fifth layer features of the encoder ResNet18, first perform 8 times upsampling to obtain the feature Y2, and then send the feature Y1 and the 8 times upsampled feature Y2 into the dual image after passing through the last detail perception module DPM in the edge perception branch The adaptive learning module DGALM, through the features of the DGALM module and then through a Seg Head, can get the final face analysis result. Seg Head consists of a layer of 3×3 convolution, batch norm layer, Relu activation function and 1×1 convolution.

双图自适应学习模块的主要结构如下，首先将特征Y1与特征Y2进行拼接，拼接后的特征分别经过两个1×1的卷积，得到语义特征图Z_semantic和细节特征图Z_detail。在边界感知分支中，获得二分类人脸边界然后将/>缩放到原来尺寸的1/4，再然后利用二分类人脸边界，将Z_semantic和Z_detail区分为边界像素和非边界像素，具体公式如下所示：The main structure of the dual-image adaptive learning module is as follows. First, feature Y1 and feature Y2 are concatenated, and the concatenated features undergo two 1×1 convolutions to obtain the semantic feature map Z_semantic and the detail feature map Z_detail . In the boundary-aware branch, obtain the binary classification face boundary Then put /> Zoom to 1/4 of the original size, and then use the two-class face boundary to distinguish Z_semantic and Z_detail into boundary pixels and non-boundary pixels. The specific formula is as follows:

[Z_{detail_edge},Z_{detail_noneedge}]＝Z_detail⊙[Mask,A-Mask][Z_{detail_edge} ,Z_{detail_noneedge} ]＝Z_detail ⊙[Mask,A-Mask]

[Z_{semantic_edge},Z_{semantic_noneedge}]＝Z_semantic⌒[Mask,A-Mask][Z_{semantic_edge} ,Z_{semantic_noneedge} ]＝Z_semantic ⌒[Mask,A-Mask]

其中，⌒代表了矩阵点乘，Z_{detail_noneedge}是不包含边界像素的细节特征图，Z_{detail_edge}是包含边界像素细节特征图。Z_{semantic_noneedge}是不包含边界像素的语义特征图，Z_{semantic_edge}是包含边界像素语义特征图。A是只包含元素1的矩阵。argmax_dim＝2表示沿特征第二维度方向，获得最大值的索引。Among them, ⌒ represents the matrix point product, Z_{detail_noneedge} is the detail feature map that does not contain boundary pixels, and Z_{detail_edge} is the detail feature map that contains boundary pixels. Z_{semantic_noneedge} is a semantic feature map that does not contain boundary pixels, and Z_{semantic_edge} is a semantic feature map that includes boundary pixels. A is a matrix containing only element 1's. argmax_dim=2 means to obtain the index of the maximum value along the direction of the second dimension of the feature.

进一步的，在分割分支中获得人脸监督解析的结果然后将/>缩放到原来尺寸的1/4，从中选取Topk元素作为人脸组成成分，代表图的顶点。Further, the results of face supervision analysis are obtained in the segmentation branch Then put /> Zoom to 1/4 of the original size, and select the Topk element as the component of the face, representing the vertices of the graph.

其中，Z_{graph_semantic}是人脸语义组成成分、Z_{graph_detail}是人脸细节组成成分、Z_{semantic_noneedge}是不包含边界的语义特征图、Z_{detail_edge}是包含边界的细节特征图、C是特征的通道数。Among them, Z_{graph_semantic} is the semantic component of the face, Z_{graph_detail} is the detailed component of the face, Z_{semantic_noneedge} is the semantic feature map without the boundary, Z_{detail_edge} is the detail feature map including the boundary, and C is the number of channels of the feature.

进一步的，然后经过一层图卷积进行图推理，利用图神经的消息传递，建立不同人脸组件像素点之间的远程交互，得到和/>Further, after a layer of graph convolution for graph reasoning, using the message transfer of graph nerves, the remote interaction between pixels of different face components is established to obtain and />

进一步的，构建映射矩阵P₁和P₂，将特征映射到原来的几何空间中,具体实现如下：Further, the mapping matrices P₁ and P₂ are constructed to map the features into the original geometric space. The specific implementation is as follows:

进一步的，将经过映射矩阵的转置与图推理后的特征相乘，将特征映射回原来的几何空间,最终的特征输出结果为X_out。Further, the transposition of the mapping matrix is multiplied by the features after graph reasoning, and the features are mapped back to the original geometric space, and the final feature output result is X_out .

其中，表示映射回原几何空间的语义特征图；/>表示映射回原几何空间的细节特征图；/>表示特征拼接操作。in, Represents the semantic feature map mapped back to the original geometric space; /> Represents the detailed feature map mapped back to the original geometric space; /> Represents a feature concatenation operation.

步骤2.5构建分类分支，用于人脸情感识别。Step 2.5 builds a classification branch for facial emotion recognition.

对于编码器ResNet18输出的最后一层(第五层)特征S＝[s₁,s₂,...,s_C]，其中将s_i看作是输入transformer层的图像块，然后送入transformer层，最后输出的特征经过一层MLP即可得到人脸情感识别的结果/>For the last layer (fifth layer) feature S=[s₁ ,s₂ ,...,s_C ] output by the encoder ResNet18, where Think of_si as an image block input to the transformer layer, and then send it to the transformer layer, and finally output the features through a layer of MLP to get the result of facial emotion recognition />

所述步骤3包括以下步骤:Described step 3 comprises the following steps:

步骤3.1:构建任务内损失函数。Step 3.1: Construct the intra-task loss function.

首先是分割分支的损失函数，主要包含了监督人脸解析的损失和输出人脸解析的损失，使用交叉熵损失函数，具体如下：The first is the loss function of the segmentation branch, which mainly includes the loss of supervised face parsing and the loss of output face parsing. The cross-entropy loss function is used, as follows:

进一步的，构建边界感知分支的损失函数，我们使用交叉熵损失函数，具体如下：Further, to construct the loss function of the boundary-aware branch, we use the cross-entropy loss function, as follows:

进一步的，构建人脸情感识别的损失函数，我们使用交叉熵损失函数，具体如下：Further, to construct the loss function of facial emotion recognition, we use the cross-entropy loss function, as follows:

进一步的，总的任务内损失函数为：Further, the total intra-task loss function is:

步骤3.2:构建任务间一致性损失函数。Step 3.2: Construct the inter-task consistency loss function.

保持的0号通道不变，计算出/>Seg_2-joint-3代表了二分类人脸边界。Keep Channel 0 remains unchanged, and the calculated /> Seg_2-joint-3 represents the two-class face boundary.

然后利用dice系数计算二分类边界任务与多分类边界任务之间的任务一致性损失函数。Then the dice coefficient is used to calculate the task consistency loss function between the binary classification boundary task and the multi-classification boundary task.

进一步的，计算二分类边界任务、多分类边界任务和人脸解析任务之间的一致性损失函数。首先沿着的第二维度方向，获得最大值的索引，得到了然后使用边界定位算法，对于/>中位于边界的像素点赋值为1，其他非边界像素点赋值为0。将/>与/>相乘，即可以得到Further, the consistency loss function between the binary classification boundary task, the multi-classification boundary task and the face parsing task is calculated. first along The direction of the second dimension of , to obtain the index of the maximum value, we get Then using the boundary location algorithm, for /> Pixels located on the border are assigned a value of 1, and other non-boundary pixels are assigned a value of 0. will /> with /> multiplied to get

然后计算出and then calculate

然后利用dice系数分别计算解析任务与二分类边界任务之间的任务一致性损失函数、解析任务和多分类边界任务的一致性损失函数。Then the dice coefficient is used to calculate the task consistency loss function between the parsing task and the binary classification boundary task, and the consistency loss function between the parsing task and the multi-classification boundary task.

进一步的，总的任务间一致性损失函数为。Further, the total inter-task consistency loss function is .

所述步骤4具体包括以下步骤：Described step 4 specifically comprises the following steps:

步骤4.1：引入F1系数来评价人脸解析和情感识别的效果，定义如下：Step 4.1: Introduce the F1 coefficient to evaluate the effect of face analysis and emotion recognition, which is defined as follows:

与现有技术相比，本发明的有益结果使：Compared with prior art, beneficial result of the present invention makes:

本发明通过建立MPENet的深度学习模型，实现了人脸解析和人脸情感识别。增加边界感知分支，使得人脸解析的结果更加精细化。同时，增加了双图自适应学习模块，建立不同人脸成分之间的依赖关系。同时，MPENet在RTX 3090上的FPS达到了92.9，模型参数量仅为11.63M，具有很高的实时性，可以部署到移动端等设备上。The invention realizes face analysis and face emotion recognition by establishing a deep learning model of MPENet. The boundary perception branch is added to make the results of face analysis more refined. At the same time, a dual-image adaptive learning module is added to establish dependencies between different face components. At the same time, the FPS of MPENet on the RTX 3090 reached 92.9, and the number of model parameters was only 11.63M. It has high real-time performance and can be deployed on mobile devices and other devices.

附图说明Description of drawings

图1为MPENet的网络结构图。Figure 1 is a network structure diagram of MPENet.

图2为MPENet与其他模型对比的实例效果。Figure 2 shows the example effect of MPENet compared with other models.

图3为MPENet消融实验的实例效果。Figure 3 shows the example effect of the MPENet ablation experiment.

具体实施方式Detailed ways

下面结合附图和具体实施例对本发明作进一步的说明。The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

为了在人脸解析和人脸面部表达识别所遇到的问题，本发明设计了一种新的多任务协同学习网络，用于人脸解析和面部情感识别。具体来说，MPENet由一个共享编码器和三个下游分支(分类分支、分割分支和边缘感知分支)组成。在分类分支中，我们设计了一个transformer模块，将共享编码器提取的特征转换为嵌入级特征，用于人脸表情识别。在边缘感知分支中，我们利用多分类人脸边界和二分类边界提取人脸边界信息，帮助人脸解析任务更好地定位人脸边界。在分割分支中，我们使用双图自适应学习模块融合图像的边缘信息和语义信息来推理不同特征区域之间的关系，并捕获更多的上下文关系，同时我们设计了一个额外的解码器作为人脸解析的监督输出，从而得到更精细的解析图。最后，我们在任务之间设计了一个一致性学习损失函数，使得任务之间相互配合，提高模型的整体精度。In order to solve the problems encountered in face analysis and facial expression recognition, the present invention designs a new multi-task collaborative learning network for face analysis and facial emotion recognition. Specifically, MPENet consists of a shared encoder and three downstream branches (classification branch, segmentation branch and edge-aware branch). In the classification branch, we design a transformer module to transform the features extracted by the shared encoder into embedding-level features for facial expression recognition. In the edge-aware branch, we use multi-classification face boundary and binary classification boundary to extract face boundary information to help face parsing tasks to better locate the face boundary. In the segmentation branch, we use the dual-image adaptive learning module to fuse the edge information and semantic information of the image to infer the relationship between different feature regions and capture more contextual relations, and we design an additional decoder as a human Supervised output of face parsing, resulting in a finer parsing map. Finally, we design a consistent learning loss function between tasks, so that the tasks cooperate with each other and improve the overall accuracy of the model.

实施例1实验数据的预处理。Example 1 Preprocessing of experimental data.

(1)对数据归一化处理。(1) Normalize the data.

(2)对图片进行裁剪为大小512×512。(2) Crop the picture to a size of 512×512.

(3)对裁剪后的图像进行数据增强，进行随机旋转和随机缩放。(3) Perform data augmentation on the cropped image, perform random rotation and random scaling.

(4)划分数据集为训练集、验证集和测试集。(4) Divide the data set into training set, validation set and test set.

实施例2构建MPENet网络模型。Embodiment 2 builds the MPENet network model.

(1)采用ResNet18作为编码器的主干网络，提取语义信息(1) Using ResNet18 as the backbone network of the encoder to extract semantic information

(2)构建边界感知分支。ResNet的第二层特征首先经过DPM，然后与经过2倍上采样的第三层特征一起经过FFM。进一步的，融合后的特征再次经过DPM，与ResNet18经过上采样4倍的第四层特征一起经过FFM。最终融合的特征再次经过DPM和四倍上采样后，分别送入两个Detai Head，得到人脸二分类边界图和人脸多分类边界图。(2) Construct the boundary-aware branch. The second-layer features of ResNet first go through DPM, and then go through FFM together with the third-layer features that have been upsampled by 2 times. Further, the fused features go through DPM again, and go through FFM together with the fourth-layer features of ResNet18 that have been upsampled by 4 times. After the final fused features are again subjected to DPM and quadruple upsampling, they are sent to two Detai Heads respectively to obtain a face binary classification boundary map and a face multi-classification boundary map.

(3)对于编码器ResNet18的第五层特征，我们设计了一个五层的解码器结构，每一个解码器由3×3卷积层和上采样构成，经过五层解码器结构，输入特征恢复到原始分辨率，即可以得到监督的人脸解析结果。(3) For the fifth-layer feature of the encoder ResNet18, we designed a five-layer decoder structure, each decoder is composed of 3×3 convolutional layers and upsampling, after the five-layer decoder structure, the input feature recovery To the original resolution, the supervised face parsing results can be obtained.

(4)对于解码器ResNet18的第五层特征，我们先进行8倍上采样，然后将边缘感知分支中经过最后一个DPM的特征X与8倍上采样后的特征Y一起送入DGALM，经过DGALM模块的特征再经过一个Seghead即可得到最终的人脸解析结果。(4) For the fifth layer features of the decoder ResNet18, we first perform 8 times upsampling, and then send the feature X and the feature Y after the 8 times upsampling into DGALM together with the feature X after the last DPM in the edge perception branch, and then pass DGALM The features of the module can be passed through a Seghead to get the final face parsing result.

(5)编码器ResNet18输出的最后一层特征，经过一层transformer后，再经过一层MLP，即可得到最终的面部情感分类结果。(5) The last layer of features output by the encoder ResNet18, after a layer of transformer, and then a layer of MLP, the final facial emotion classification result can be obtained.

实施例3训练DA-Net网络模型。Embodiment 3 trains the DA-Net network model.

(1)采用SGD优化方式作为优化方法。(1) The SGD optimization method is used as the optimization method.

(2)MPENet编码器的ResNet18网络权重采用在ImageNet数据集上预训练的权重。(2) The ResNet18 network weights of the MPENet encoder adopt the weights pre-trained on the ImageNet dataset.

实施例4采用训练好的MPENet网络模型在公开人脸数据集CelebAMask_HQ上进行实验，并对实验效果进行评估。Example 4 uses the trained MPENet network model to conduct an experiment on the public face dataset CelebAMask_HQ, and evaluates the experimental effect.

(1)下表1为MPENet在CelebAMask_HQ数据集上与当下主流语义分割框架效果的对比。我们的模型平均F1系数达到了85.9％，人脸情感势必的Mean F1达到了80.04％。具体参看表1中MPENet与其他方法的效果对比。(1) Table 1 below shows the comparison between MPENet on the CelebAMask_HQ dataset and the current mainstream semantic segmentation framework. The average F1 coefficient of our model has reached 85.9%, and the Mean F1 of facial emotion has reached 80.04%. For details, see the comparison between MPENet and other methods in Table 1.

表1.MPENet与其他模型的结果对比Table 1. Comparison of results between MPENet and other models

(2)下表2为MPENet在CelebAMask_HQ数据集上的消融实验，可以看出MPENet的每一个模块均能提高模型准确率。(2) Table 2 below shows the ablation experiments of MPENet on the CelebAMask_HQ dataset. It can be seen that each module of MPENet can improve the accuracy of the model.

表2.MPENet的消融实验Table 2. Ablation experiments for MPENet

(3)下表2为MPENet与其他模型的性能对比，可以看出MPENet在推理速度和解析精度上均处于领先水平，FPS为92.9，模型参数量仅仅为11.6。(3) Table 2 below shows the performance comparison between MPENet and other models. It can be seen that MPENet is at the leading level in terms of inference speed and resolution accuracy, with an FPS of 92.9 and a model parameter of only 11.6.

表3.MPENet与其他模型的性能对比Table 3. Performance comparison of MPENet and other models