CN117409413B

Movatterモバイル変換

Info

Publication number: CN117409413B
Application number: CN202311720688.6A
Authority: CN
Inventors: 刘建明; 经卓勋
Original assignee: Jiangxi Normal University
Current assignee: Jiangxi Normal University
Priority date: 2023-12-14
Filing date: 2023-12-14
Publication date: 2024-04-05
Anticipated expiration: 2043-12-14
Also published as: CN117409413A

Abstract

The invention provides a small sample semantic segmentation method and a system based on background information mining, wherein the method comprises the following steps: the potential information of the background part in the small sample semantic segmentation task is mined through an offline background marking algorithm network to obtain a pseudo class data set, and the generalization capability and performance of the semantic segmentation model when facing new classes are greatly improved through combined training, so that the problem of base class bias of the model is greatly relieved. According to the small sample semantic segmentation method based on background information mining, the basic image dataset is input into the offline background marking algorithm network, prototype features are obtained through the unsupervised image segmentation algorithm sub-network and the backbone network to obtain the pseudo-type dataset in a clustering mode, and the pseudo-type dataset and the basic image dataset are subjected to joint training to perform segmentation tasks of the new-type image dataset, so that generalization capability and performance of a semantic segmentation model in the face of the new type are greatly improved, and the basic bias problem of the model is greatly relieved.

Description

Translated fromChinese

一种基于背景信息挖掘的小样本语义分割方法及系统A small sample semantic segmentation method and system based on background information mining

技术领域Technical Field

本发明涉及图像识别领域，特别涉及一种基于背景信息挖掘的小样本语义分割方法及系统。The present invention relates to the field of image recognition, and in particular to a small sample semantic segmentation method and system based on background information mining.

背景技术Background technique

随着人工智能产业的快速发展，深度学习被广泛的应用于各个行业领域之中，而在计算机视觉方面其应用尤为突出，其中的语义分割算法在目标检测、图像分类、实例分割、姿态估计这一系列图像识别任务中均起着重要作用，是图像识别领域的重点研究方向之一。With the rapid development of the artificial intelligence industry, deep learning has been widely used in various industries and fields, and its application in computer vision is particularly prominent. The semantic segmentation algorithm plays an important role in a series of image recognition tasks such as target detection, image classification, instance segmentation, and pose estimation. It is one of the key research directions in the field of image recognition.

目前存在的小样本语义分割的方法，通常都会面临“基类偏置问题”，这是因为在训练阶段使用大量基类数据，导致模型在测试阶段面对新类对象的分割效果下降，假如图像中同时存在新类和基类对象，还容易出现误分割现象，现有技术中，有的采用只更新模型网络中部分值的方法来解决这个问题，通过奇异值分解方法，分解主干网络的权重矩阵，以找出必须更新的奇异值，随后冻结其他权重参数，只更新奇异值，保持其他值不变，最后将更新后的值重新变换为模型的权重矩阵，在只更新少量参数的情况下提高了模型架构面对新类的泛化能力，有的通过添加了一个额外的基类学习器分支来精确地分割基类对象，并纠正最终的预测结果。The existing methods for semantic segmentation of small samples usually face the "base class bias problem". This is because a large amount of base class data is used in the training phase, which leads to a decrease in the segmentation effect of the model when facing new class objects in the test phase. If there are both new class and base class objects in the image, mis-segmentation is prone to occur. In the prior art, some methods use the method of only updating part of the values in the model network to solve this problem. The weight matrix of the backbone network is decomposed through the singular value decomposition method to find the singular values that must be updated, and then the other weight parameters are frozen. Only the singular values are updated, and other values are kept unchanged. Finally, the updated values are transformed back into the weight matrix of the model. The generalization ability of the model architecture in the face of new classes is improved while only a small number of parameters are updated. Some methods add an additional base class learner branch to accurately segment the base class objects and correct the final prediction results.

但目前这些方法中，不论是只更新模型的部分参数，还是通过额外的基类学习器分支来精确地分割基类对象，仍然难以缓解模型的基类偏置问题。However, among these current methods, whether only updating part of the model's parameters or accurately segmenting the base class objects through additional base class learner branches, it is still difficult to alleviate the model's base class bias problem.

发明内容Summary of the invention

基于此，本发明的目的是提供一种基于背景信息挖掘的小样本语义分割方法及系统，通过将基类图片输入离线背景标记算法网络，对小样本语义分割任务中背景部分的潜在信息进行挖掘，获取伪类数据集，再通过根据伪类数据集和原数据集进行联合训练，极大地提高了语义分割模型在面对新类时的泛化能力和性能，极大地缓解了模型的基类偏置问题。Based on this, the purpose of the present invention is to provide a small sample semantic segmentation method and system based on background information mining. By inputting the base class image into the offline background labeling algorithm network, the potential information of the background part in the small sample semantic segmentation task is mined, and a pseudo-class data set is obtained. Then, by jointly training the pseudo-class data set and the original data set, the generalization ability and performance of the semantic segmentation model when facing new classes are greatly improved, and the base class bias problem of the model is greatly alleviated.

本发明提出的基于背景信息挖掘的小样本语义分割方法，包括：The small sample semantic segmentation method based on background information mining proposed in the present invention includes:

将预先设定的基类图像数据集输入离线背景标记算法网络；Input the pre-set base class image dataset into the offline background labeling algorithm network;

通过无监督图像分割算法子网络和骨干网络获取所述基类图像的预分割子区域掩码和高层语义特征，以提取所述子区域中背景区域的原型特征；Obtaining pre-segmented sub-region masks and high-level semantic features of the base class image through an unsupervised image segmentation algorithm sub-network and a backbone network to extract prototype features of the background region in the sub-region;

根据所述原型特征进行聚类，以划分多个不同的伪类，并将所述伪类制作成伪类数据集；Clustering is performed according to the prototype features to divide a plurality of different pseudo-classes, and the pseudo-classes are made into a pseudo-class data set;

根据所述伪类数据集和所述基类图像数据集对语义分割模型进行联合训练，以通过训练后的所述语义分割模型进行新类图像数据集的分割任务。The semantic segmentation model is jointly trained according to the pseudo-class dataset and the base-class image dataset, so as to perform the segmentation task of the new-class image dataset through the trained semantic segmentation model.

综上，根据上述基于背景信息挖掘的小样本语义分割方法，通过将基类图片输入离线背景标记算法网络，对小样本语义分割任务中背景部分的潜在信息进行挖掘，获取伪类数据集，再通过根据伪类数据集和原数据集进行联合训练，极大地提高了语义分割模型在面对新类时的泛化能力和性能，极大地缓解了模型的基类偏置问题。具体的，将预先设定的基类图像数据集输入离线背景标记算法网络，设定基类图像中的前景区域与背景区域，再通过无监督图像分割算法子网络和骨干网络获取所述基类图像的预分割子区域掩码和高层语义特征，以提取所述子区域中背景区域的原型特征，根据所述原型特征进行聚类，以划分多个不同的伪类，并将所述伪类制作成伪类数据集，因为在原数据基础上通过离线背景标记算法网络进行了数据扩充，模型的训练阶段将不只有基类信息参与，还有生成的背景伪类信息，因此模型对于新类的泛化能力得到明显提升，根据所述伪类数据集和所述基类图像数据集对语义分割模型进行联合训练，以通过训练后的所述语义分割模型进行新类图像数据集的分割任务，极大地提高了语义分割模型在面对新类时的泛化能力和性能，极大地缓解了模型的基类偏置问题。In summary, according to the above-mentioned small-sample semantic segmentation method based on background information mining, by inputting the base class image into the offline background labeling algorithm network, the potential information of the background part in the small-sample semantic segmentation task is mined, and the pseudo-class dataset is obtained. Then, by jointly training the pseudo-class dataset and the original dataset, the generalization ability and performance of the semantic segmentation model when facing new classes are greatly improved, and the base class bias problem of the model is greatly alleviated. Specifically, a preset base class image data set is input into an offline background labeling algorithm network, a foreground area and a background area in the base class image are set, and then the pre-segmented sub-region mask and high-level semantic features of the base class image are obtained through an unsupervised image segmentation algorithm sub-network and a backbone network to extract prototype features of the background area in the sub-region, clustering is performed according to the prototype features to divide a plurality of different pseudo-classes, and the pseudo-classes are made into pseudo-class data sets. Because data expansion is performed through an offline background labeling algorithm network on the basis of the original data, the training phase of the model will involve not only the base class information but also the generated background pseudo-class information. Therefore, the generalization ability of the model for new classes is significantly improved. The semantic segmentation model is jointly trained according to the pseudo-class data set and the base class image data set, so that the segmentation task of the new class image data set is performed through the trained semantic segmentation model, which greatly improves the generalization ability and performance of the semantic segmentation model when facing new classes, and greatly alleviates the base class bias problem of the model.

进一步的，所述将预先设定的基类图像数据集输入离线背景标记算法网络的步骤包括：Furthermore, the step of inputting the preset base class image data set into the offline background labeling algorithm network includes:

在预先设定的基类图像数据集中选择当前的基类目标，设定所述基类图像中的基类区域为前景区域，设定所述基类图像中的非基类区域为背景区域。A current base class target is selected in a preset base class image data set, a base class area in the base class image is set as a foreground area, and a non-base class area in the base class image is set as a background area.

进一步的，所述通过无监督图像分割算法子网络和骨干网络获取所述基类图像的预分割子区域掩码和高层语义特征，以提取所述子区域中背景区域的原型特征的步骤包括：Furthermore, the step of obtaining the pre-segmented sub-region mask and high-level semantic features of the base class image through the unsupervised image segmentation algorithm sub-network and the backbone network to extract the prototype features of the background region in the sub-region includes:

将预先设定后的所述基类图像数据集中的基类图像进行缩放至预设待分割图像尺寸阈值；Scaling the base class images in the preset base class image data set to a preset threshold value of the image size to be segmented;

通过离线背景标记算法网络中的无监督图像分割算法子网络对所述基类图像进行预分割，以获取多个预分割子区域掩码；Pre-segmenting the base class image by using an unsupervised image segmentation algorithm sub-network in an offline background labeling algorithm network to obtain a plurality of pre-segmented sub-region masks;

将所述基类图像的未分割原图像通过所述离线背景标记算法网络中的骨干网络，对所述未分割原图像进行上采样操作后，提取所述未分割原图像的高层语义特征。The unsegmented original image of the base class image is passed through the backbone network in the offline background labeling algorithm network, and after upsampling operation is performed on the unsegmented original image, high-level semantic features of the unsegmented original image are extracted.

进一步的，所述通过无监督图像分割算法子网络和骨干网络获取所述基类图像的预分割子区域掩码和高层语义特征，以提取所述子区域中背景区域的原型特征的步骤还包括：Furthermore, the step of obtaining the pre-segmented sub-region mask and high-level semantic features of the base class image through the unsupervised image segmentation algorithm sub-network and the backbone network to extract the prototype features of the background region in the sub-region also includes:

通过基类目标掩码进行掩码取反操作，以取出当前基类目标的背景区域并抑制前景区域；Perform mask inversion operation through the base class target mask to extract the background area of the current base class target and suppress the foreground area;

根据预分割子区域掩码和高层语义特征，由以下公式进行哈达玛积计算：According to the pre-segmented sub-region mask and high-level semantic features, the Hadamard product is calculated by the following formula:

其中，为所述高层语义特征，其中/>，/>，/>分别为所述高层语义特征的高、宽、通道维数，/>代表将掩码沿着通道广播，/>代表哈达玛积，/>为经过掩码覆盖的伪掩码，其中的/>为获得的所述预分割子区域掩码的数量，/>为单幅图像背景区域的临时伪掩码标注，其中的/>，/>分别为背景掩码对应的前景基类以及背景原型特征的数量，/>为真实的背景掩码；in, is the high-level semantic feature, where/> ,/> ,/> are the height, width and channel dimensions of the high-level semantic features, respectively./> Represents broadcasting the mask along the channel, /> Represents Hadamard, /> is a pseudo mask covered by the mask, where /> is the number of the pre-segmented sub-region masks obtained,/> It is a temporary pseudo mask annotation for the background area of a single image, where /> ,/> are the number of foreground base classes and background prototype features corresponding to the background mask, respectively./> is the real background mask;

通过掩码平均池化获取原型特征。Prototype features are obtained through masked average pooling.

进一步的，所述根据所述原型特征进行聚类，以划分多个不同的伪类，并将所述伪类制作成伪类数据集的步骤包括：Furthermore, the step of clustering the prototype features to divide a plurality of different pseudo-classes and making the pseudo-classes into pseudo-class data sets includes:

对所有基类图像的背景区域进行标注，以获取伪类原型特征和伪掩码；Annotate the background areas of all base class images to obtain pseudo class prototype features and pseudo masks;

通过无监督聚类算法，将所有所述伪类原型特征进行聚类，以进行伪类划分；Clustering all the pseudo-class prototype features through an unsupervised clustering algorithm to perform pseudo-class division;

将所述背景区域的预分割子区域分类至对应伪类，并给所述伪掩码打上对应伪类标签，以制作获取伪类数据集。The pre-segmented sub-regions of the background region are classified into corresponding pseudo-classes, and the pseudo-masks are labeled with corresponding pseudo-class labels to obtain a pseudo-class dataset.

进一步的，所述根据所述伪类数据集和所述基类图像数据集对语义分割模型进行联合训练，以通过训练后的所述语义分割模型进行新类图像数据集的分割任务的步骤包括：Furthermore, the step of jointly training the semantic segmentation model according to the pseudo-class dataset and the base-class image dataset to perform the segmentation task of the new-class image dataset through the trained semantic segmentation model includes:

将基类数据集和伪类数据集分别输入联合训练骨干网络；The base class dataset and the pseudo class dataset are input into the joint training backbone network respectively;

通过所述联合训练骨干网络进行特征图提取，获取支持特征图和查询特征图；Extract feature graphs through the joint training backbone network to obtain support feature graphs and query feature graphs;

通过特征丰富模块进行多尺度特征提取后，进行所述支持特征图和所述查询特征图的比较并整合；After extracting multi-scale features through a feature enrichment module, comparing and integrating the support feature graph and the query feature graph;

将整合后的特征图进行卷积并经过分类器获取最终预测结果。The integrated feature map is convolved and passed through the classifier to obtain the final prediction result.

进一步的，所述将整合后的特征图进行卷积并经过分类器获取最终预测结果的步骤之后还包括：Furthermore, after the step of convolving the integrated feature map and obtaining the final prediction result through the classifier, the following steps are further included:

通过分类器获取最终预测结果后，根据以下公式计算由基类组成的原数据的损失函数：After obtaining the final prediction result through the classifier, the loss function of the original data composed of base classes is calculated according to the following formula :

再根据以下公式计算伪类数据的损失函数：Then calculate the loss function of the pseudo-class data according to the following formula :

根据以下公式计算整体损失函数L：The overall loss functionL is calculated according to the following formula:

以上公式中，为预测的查询图像分割结果，/>为对应像素点空间位置，/>是查询图像的真实值掩码，/>和/>为伪类通过小样本分割网络后的分割预测掩码和伪类掩码，/>为超参数。In the above formula, is the predicted query image segmentation result,/> is the spatial position of the corresponding pixel point, /> is the ground truth mask of the query image,/> and/> The segmentation prediction mask and pseudo class mask after the pseudo class passes through the small sample segmentation network,/> is a hyperparameter.

本发明提出的一种基于背景信息挖掘的小样本语义分割系统，包括：The present invention proposes a small sample semantic segmentation system based on background information mining, comprising:

背景挖掘模块，用于将预先设定的基类图像数据集输入离线背景标记算法网络，通过无监督图像分割算法子网络和骨干网络获取所述基类图像的预分割子区域掩码和高层语义特征，以提取所述子区域中背景区域的原型特征，根据所述原型特征进行聚类，以划分多个不同的伪类，并将所述伪类制作成伪类数据集；A background mining module is used to input a preset base class image data set into an offline background labeling algorithm network, obtain the pre-segmented sub-region mask and high-level semantic features of the base class image through an unsupervised image segmentation algorithm sub-network and a backbone network, so as to extract the prototype features of the background region in the sub-region, perform clustering according to the prototype features to divide a plurality of different pseudo-classes, and make the pseudo-classes into a pseudo-class data set;

联合训练模块，用于根据所述伪类数据集和所述基类图像数据集对语义分割模型进行联合训练，以通过训练后的所述语义分割模型进行新类图像数据集的分割任务。A joint training module is used to jointly train a semantic segmentation model based on the pseudo-class dataset and the base-class image dataset, so as to perform segmentation tasks of a new-class image dataset through the trained semantic segmentation model.

本发明另一方面，还提供一种存储介质，包括所述存储介质存储一个或多个程序，该程序被执行时实现如上述的基于背景信息挖掘的小样本语义分割方法。In another aspect, the present invention further provides a storage medium, including one or more programs stored in the storage medium, which, when executed, implement the small sample semantic segmentation method based on background information mining as described above.

本发明另一方面还提供一种计算机设备，所述计算机设备包括存储器和处理器，其中：Another aspect of the present invention further provides a computer device, the computer device comprising a memory and a processor, wherein:

所述存储器用于存放计算机程序；The memory is used to store computer programs;

所述处理器用于执行所述存储器上所存放的计算机程序时，实现如上述的基于背景信息挖掘的小样本语义分割方法。When the processor is used to execute the computer program stored in the memory, the small sample semantic segmentation method based on background information mining as described above is implemented.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明第一实施例提出的基于背景信息挖掘的小样本语义分割方法的流程图；FIG1 is a flow chart of a small sample semantic segmentation method based on background information mining proposed in a first embodiment of the present invention;

图2为本发明第二实施例提出的基于背景信息挖掘的小样本语义分割方法的流程图；FIG2 is a flow chart of a small sample semantic segmentation method based on background information mining proposed in a second embodiment of the present invention;

图3为本发明第三实施例提出的基于背景信息挖掘的小样本语义分割系统的结构示意图；3 is a schematic diagram of the structure of a small sample semantic segmentation system based on background information mining proposed in a third embodiment of the present invention;

图4为本发明第一实施例提出的基于背景信息挖掘的小样本语义分割方法的无监督图像分割算法流程图。FIG4 is a flow chart of an unsupervised image segmentation algorithm of a small sample semantic segmentation method based on background information mining proposed in the first embodiment of the present invention.

如下具体实施方式将结合上述附图进一步说明本发明。The following specific implementation manner will further illustrate the present invention in conjunction with the above-mentioned drawings.

具体实施方式Detailed ways

为了便于理解本发明，下面将参照相关附图对本发明进行更全面的描述。附图中给出了本发明的若干个实施例。但是，本发明可以以许多不同的形式来实现，并不限于本文所描述的实施例。相反地，提供这些实施例的目的是使对本发明的公开内容更加透彻全面。In order to facilitate the understanding of the present invention, the present invention will be described more fully below with reference to the relevant drawings. Several embodiments of the present invention are given in the drawings. However, the present invention can be implemented in many different forms and is not limited to the embodiments described herein. On the contrary, the purpose of providing these embodiments is to make the disclosure of the present invention more thorough and comprehensive.

需要说明的是，当元件被称为“固设于”另一个元件，它可以直接在另一个元件上或者也可以存在居中的元件。当一个元件被认为是“连接”另一个元件，它可以是直接连接到另一个元件或者可能同时存在居中元件。本文所使用的术语“垂直的”、“水平的”、“左”、“右”以及类似的表述只是为了说明的目的。It should be noted that when an element is referred to as being "fixed to" another element, it may be directly on the other element or there may be a central element. When an element is considered to be "connected to" another element, it may be directly connected to the other element or there may be a central element at the same time. The terms "vertical", "horizontal", "left", "right" and similar expressions used herein are for illustrative purposes only.

除非另有定义，本文所使用的所有的技术和科学术语与属于本发明的技术领域的技术人员通常理解的含义相同。本文中在本发明的说明书中所使用的术语只是为了描述具体的实施例的目的，不是旨在于限制本发明。本文所使用的术语“及／或”包括一个或多个相关的所列项目的任意的和所有的组合。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as those commonly understood by those skilled in the art to which the present invention belongs. The terms used herein in the specification of the present invention are only for the purpose of describing specific embodiments and are not intended to limit the present invention. The term "and/or" used herein includes any and all combinations of one or more of the related listed items.

请参阅图1，所示为本发明第一实施例提出的基于背景信息挖掘的小样本语义分割方法的流程图，该种基于背景信息挖掘的小样本语义分割方法包括步骤S01至步骤S04，其中：Please refer to FIG. 1 , which is a flow chart of a small sample semantic segmentation method based on background information mining proposed in a first embodiment of the present invention. The small sample semantic segmentation method based on background information mining includes steps S01 to S04, wherein:

步骤S01：将预先设定的基类图像数据集输入离线背景标记算法网络；Step S01: inputting a preset base class image data set into an offline background labeling algorithm network;

需要说明的，本实施例中，采用数据集为PASCAL-5i，PASCAL-5i数据集中总共有20个类别，每5个类别算作一组，本实施例在训练时，选取三组15个类别作为基类数据，另外一组5个类别作为新类用于测试。It should be noted that in this embodiment, the data set used is PASCAL-5i. There are a total of 20 categories in the PASCAL-5i data set, and every 5 categories are counted as a group. During training, this embodiment selects three groups of 15 categories as base class data, and another group of 5 categories as new classes for testing.

步骤S02：通过无监督图像分割算法子网络和骨干网络获取基类图像的预分割子区域掩码和高层语义特征，以提取子区域中背景区域的原型特征；Step S02: obtaining the pre-segmented sub-region mask and high-level semantic features of the base class image through the unsupervised image segmentation algorithm sub-network and the backbone network to extract the prototype features of the background region in the sub-region;

需要说明的，本实施例中首先将图像缩放到常用的473×473的大小，通过无监督分割算法对原图像进行预分割，预分割算法所分割子区域的上限设定为10，该分割方法会让图像产生多个分割子区域，同时原图像经过骨干网络，将其上采样到同样大小提取其高层语义特征，通过掩码取反取出当前类别的背景区域，抑制特征图的前景区域，随后将取预分割得到的子区域掩码和图像特征图计算哈达玛积Hadamard，本实施例中的无监督预分割方法使用反向传播的无监督图像分割算法，首先得到缩放后的原始图像，用前景掩码取反以获得图像的背景区域，然后使用基于图的分割算法预先分割图像替代原后向传播无监督分割方法中的Mask-SLIC算法以生成分割区域及其标签，算法的具体流程步骤请参阅图4，其中表示输入原图像的像素点集合（总共N个像素点），通过无监督算法/>得到/>个原型以及对应每个超像素的像素集合/>，随后输入神经网络进行t次迭代，卷积神经网络/>将图像作为输入来生成图像特征图/>，其中/>，/>，/>表示特征图的高度、宽度和通道维数，表示为像素点集合为/>，根据特征图，如果第/>个通道取得最大值，就将标记像素为/>，对应每个像素集合/>，我们找到其中出现次数最多的标签，并将整个集合的像素都记为这个标签/>，然后使用Softmax损失函数计算模型损失，使其接近预分割的结果，最后得到每个像素的分割预测/>，该方法在预分类算法下，为语义信息相同的小区域分配相同的语义标签，随后使用神经网络模型，对输入图片进行分类，让网络输出的分类结果尽可能地靠近图像分割算法的预分类结果，最后在符合预分类的结果基础上，将具备相同语义信息的小区块进行合并，得到最终的分割效果，通过无监督分割算法得到的单幅图像背景区域的临时伪掩码标注，记为/>，其中/>，/>分别为背景伪掩码对应的前景基类以及背景特征原型的数量，通过对原数据对应基类/>所对应的真实前景掩码取反，得到真实的背景掩码/>，通过骨干网络得到原图像的高层语义特征/>，其中/>，/>，/>分别为高层语义特征图的宽、高、通道维数，在此基础上，依次将/>，/>掩码双线性插值到特征图大小，并沿着通道广播，将维数从/>变为/>，和/>做哈达玛积，其公式如下：It should be noted that in this embodiment, the image is first scaled to a commonly used size of 473×473, and the original image is pre-segmented by an unsupervised segmentation algorithm. The upper limit of the sub-regions segmented by the pre-segmentation algorithm is set to 10. This segmentation method will allow the image to generate multiple segmented sub-regions. At the same time, the original image passes through the backbone network and is upsampled to the same size to extract its high-level semantic features. The background area of the current category is extracted by mask inversion, and the foreground area of the feature map is suppressed. Then, the sub-region mask obtained by the pre-segmentation and the image feature map are taken to calculate the Hadamard product. The unsupervised pre-segmentation method in this embodiment uses an unsupervised image segmentation algorithm of back propagation. First, the scaled original image is obtained, and the foreground mask is inverted to obtain the background area of the image. Then, the image is pre-segmented using a graph-based segmentation algorithm to replace the Mask-SLIC algorithm in the original back-propagation unsupervised segmentation method to generate segmented areas and their labels. For the specific process steps of the algorithm, please refer to Figure 4, where Represents the pixel set of the input original image (a total of N pixels), through the unsupervised algorithm/> Get/> prototypes and the set of pixels corresponding to each superpixel/> , then input into the neural network for t iterations, convolutional neural network/> Take the image as input to generate the image feature map/> , where/> ,/> ,/> Represents the height, width and channel dimension of the feature map, expressed as a set of pixels as/> , according to the feature graph, if the first/> The maximum value of the channel is obtained, and the pixel is marked as /> , corresponding to each pixel set/> , we find the label that appears most often and record all pixels in the entire set as this label/> , and then use the Softmax loss function to calculate the model loss to make it close to the pre-segmentation result, and finally get the segmentation prediction for each pixel/> In this method, under the pre-classification algorithm, the same semantic labels are assigned to small areas with the same semantic information. Then, the neural network model is used to classify the input image, so that the classification result output by the network is as close as possible to the pre-classification result of the image segmentation algorithm. Finally, based on the result that meets the pre-classification, the small blocks with the same semantic information are merged to obtain the final segmentation effect. The temporary pseudo-mask annotation of the background area of a single image obtained by the unsupervised segmentation algorithm is denoted as/> , where/> ,/> They are the number of foreground base classes and background feature prototypes corresponding to the background pseudo mask, and the number of base classes corresponding to the original data is obtained by The corresponding true foreground mask is inverted to obtain the true background mask/> , obtain the high-level semantic features of the original image through the backbone network/> , where/> ,/> ,/> are the width, height and channel dimensions of the high-level semantic feature map respectively. On this basis, ,/> The mask is bilinearly interpolated to the feature map size and broadcasted along the channels, reducing the dimensionality from /> becomes/> , and/> The formula for the Hadamard product is as follows:

其中，代表将掩码沿着通道广播，/>代表哈达玛积，得到一组经过掩码覆盖的伪掩码/>，且其中/>为所获得预分割伪掩码的数量。in, Represents broadcasting the mask along the channel, /> Represents the Hadamard product, and obtains a set of pseudo masks covered by the mask/> , and where/> is the number of pre-segmentation pseudo masks obtained.

步骤S03：根据原型特征进行聚类，以划分多个不同的伪类，并将伪类制作成伪类数据集；Step S03: clustering is performed according to the prototype features to divide a plurality of different pseudo-classes, and the pseudo-classes are made into a pseudo-class data set;

需要说明的，本实施例中对每张图像的背景区域按照顺序进行标注，每张图像上面得到个伪类原型和掩码，标注为/>，一般取U等于5，保存下来，最后当标注完成，对得到的所有原型向量进行再聚类，最终将所有向量划分到/>个伪类，随后根据之前的标注对每张图像的背景子区域分类至这/>个伪类，给伪掩码打上对应的伪类标签，得到最终的伪类数据集，具体则是在获取/>后，将/>做掩码平均池化得到一张图片的背景原型集合/>，随后在全数据集的范围上运行以上过程，得到所有图像的原型集合，在/>上运行基于余弦相似度的k-means算法，最终收敛为/>个伪类，一般取100，以此为标准，将原图像的背景类别重标记为/>。It should be noted that in this embodiment, the background area of each image is marked in order, and each image is marked with pseudo-class prototype and mask, marked as /> , generally take U equal to 5, save it, and finally when the labeling is completed, re-cluster all the prototype vectors obtained, and finally divide all the vectors into/> pseudo-classes, and then the background sub-regions of each image are classified into these classes according to the previous annotations. pseudo-class, label the pseudo-mask with the corresponding pseudo-class label, and obtain the final pseudo-class dataset. Specifically, After that, / > Perform mask average pooling to obtain a background prototype set of a picture/> , and then run the above process on the entire dataset to obtain the prototype set of all images , in/> Running the k-means algorithm based on cosine similarity, it finally converges to/> Pseudo-classes, usually 100, are used as the standard to re-label the background class of the original image as/> .

步骤S04：根据伪类数据集和基类图像数据集对语义分割模型进行联合训练，以通过训练后的语义分割模型进行新类图像数据集的分割任务；Step S04: jointly training the semantic segmentation model according to the pseudo-class dataset and the base-class image dataset, so as to perform the segmentation task of the new-class image dataset through the trained semantic segmentation model;

需要说明的，本实施例中首先按照将整体数据划分为基类数据集和新类数据集/>，其中，/>数据用于训练，/>用于测试，/>中提取支持图像集/>和查询图像集，本实施例采用的是“1-Way-1-Shot”的设置方式，即支持图像集仅一个类别抽取一张图像作为支持图像的情况，查询图像集/>则在测试阶段用于测试模型表现，本实施例中使用PFENet框架，从骨干网络的中层提取特征图，通过支持特征图和查询特征图进行特征比较后，再卷积通过最后的分类器得到最后的结果，因为在使用中层特征的同时，另外用高层语义特征计算一个先验掩码prior mask作为前景的概率预测图，并且设计了一个特征丰富模块FEM，使得各个尺度的特征聚合后建立了链接，从各个尺度的特征中提取信息，最后经过分类器得到最后的预测结果，提升了框架整体的分割能力。It should be noted that in this embodiment, the overall data is first divided into base class data sets. and new class dataset/> , where /> Data is used for training, /> For testing, /> Extract the supporting image set from and the query image set This embodiment adopts the "1-Way-1-Shot" setting mode, that is, the support image set only has one category and extracts one image as the support image. The query image set/> In the testing phase, it is used to test the performance of the model. In this embodiment, the PFENet framework is used to extract feature maps from the middle layer of the backbone network. After feature comparison is performed between the support feature map and the query feature map, the final result is obtained by convolution through the final classifier. Because while using the middle-layer features, a prior mask prior mask is calculated using high-level semantic features as the probability prediction map of the foreground, and a feature enrichment module FEM is designed, so that features of each scale are aggregated and links are established, information is extracted from features of each scale, and finally the final prediction result is obtained through the classifier, thereby improving the overall segmentation capability of the framework.

请参阅图2，所示为本发明第二实施例提出的基于背景信息挖掘的小样本语义分割方法的流程图，该种基于背景信息挖掘的小样本语义分割方法包括步骤S11至步骤S15，其中：Please refer to FIG. 2 , which is a flow chart of a small sample semantic segmentation method based on background information mining proposed in a second embodiment of the present invention. The small sample semantic segmentation method based on background information mining includes steps S11 to S15, wherein:

步骤S11：在预先设定的基类图像数据集中选择当前的基类目标，并设定前景区域与背景区域；Step S11: selecting a current base class target in a preset base class image data set, and setting a foreground area and a background area;

步骤S12：将基类图像缩放后通过无监督图像分割算法子网络进行预分割，以获取多个预分割子区域掩码，再将基类图像的未分割原图像进行上采样操作后通过离线背景标记算法网络中的骨干网络提取高层语义特征；Step S12: scaling the base class image and pre-segmenting it through an unsupervised image segmentation algorithm sub-network to obtain a plurality of pre-segmented sub-region masks, then upsampling the unsegmented original image of the base class image and extracting high-level semantic features through a backbone network in an offline background labeling algorithm network;

步骤S13：根据预分割子区域掩码和高层语义特征进行哈达玛积计算后，通过掩码平均池化获取原型特征；Step S13: after calculating the Hadamard product based on the pre-segmented sub-region mask and the high-level semantic features, the prototype features are obtained by mask average pooling;

步骤S14：对所有基类图像的背景区域进行标注，以获取伪类原型特征和伪掩码，再通过无监督聚类算法进行聚类，以划分伪类并制作伪类数据集；Step S14: annotating the background areas of all base class images to obtain pseudo class prototype features and pseudo masks, and then clustering them using an unsupervised clustering algorithm to divide pseudo classes and create a pseudo class dataset;

步骤S15：根据基类数据集和伪类数据集进行联合训练，以通过训练后的所述语义分割模型进行新类图像数据集的分割任务；Step S15: performing joint training according to the base class dataset and the pseudo class dataset, so as to perform the segmentation task of the new class image dataset through the trained semantic segmentation model;

需要说明的，本发明提出的离线背景标记算法OBAA（Offline BackgroundAnnotation Algorithm）应用到PFENet和BAM以及基于BAM模型改进的MSANet模型这三个小样本分割框架中，在相同设置不同基类分割的情况下其性能与未使用本发明方法的原始框架的性能对比如下：It should be noted that the offline background annotation algorithm OBAA (Offline Background Annotation Algorithm) proposed in the present invention is applied to three small sample segmentation frameworks: PFENet and BAM, and the MSANet model improved based on the BAM model. Under the same settings and different base class segmentation conditions, the performance is compared with the performance of the original framework without using the method of the present invention as follows:

表1Table 1

表2Table 2

从上表中结果可以明显看出，在两种条件下使用了背景挖掘算法进行数据扩充后的各模型，在平均交并比（MIoU）的数值上均有明显提升，由上表1可见在“1-Way-1-Shot”的设置中，使用了VGG-16作为骨干网络应用在PFENet框架上，结果表明其平均交并比相比原模型提升了1.2%，随后在使用了ResNet作为骨干网络时，平均交并比提升了0.78%，而BAM模型和MSANet在使用OBAA算法后，在PASCAL-5i数据集上的平均交并比和前景后景-交并比也有明显提升，平均交并比分别提升了0.54%和0.6%，由上表2可见，在“1-Way-5-Shot”条件下，BAM模型在使用OBAA算法后，在PASCAL-5i数据集上的平均交并比和前景后景-交并比也有明显提升，其中平均交并比提升了0.48%，前景后景-交并比提升了1.06%，在“1-Way-1-Shot”的设置条件下对COCO-20i数据集上也进行了测试，性能对比结果如下：It can be clearly seen from the results in the above table that the models after using the background mining algorithm for data expansion under the two conditions have significantly improved the average intersection over union (MIoU) values. From Table 1 above, it can be seen that in the "1-Way-1-Shot" setting, VGG-16 is used as the backbone network applied to the PFENet framework. The results show that its average intersection over union is improved by 1.2% compared with the original model. Subsequently, when ResNet is used as the backbone network, the average intersection over union is improved by 0.78%. After using the OBAA algorithm, the BAM model and MSANet have improved significantly in PASCAL-5i. The average IoU and IoU of foreground and background on the dataset have also been significantly improved, with the average IoU increasing by 0.54% and 0.6% respectively. As can be seen from Table 2 above, under the "1-Way-5-Shot" condition, the BAM model has also significantly improved the average IoU and IoU of foreground and background on the PASCAL-5i dataset after using the OBAA algorithm, with the average IoU increasing by 0.48% and the IoU of foreground and background increasing by 1.06%. The COCO-20i dataset was also tested under the "1-Way-1-Shot" setting, and the performance comparison results are as follows:

表3table 3

COCO-20i数据集整体的分割难度相比PASCAL-5i数据集更大，由上表3可见，本发明提出的方法在COCO-20i数据集上选用4种集合的情况下都获得了提升，整体的平均交并比提升了1.67%，分割效果提升更加明显。The overall segmentation difficulty of the COCO-20i dataset is greater than that of the PASCAL-5i dataset. As can be seen from Table 3 above, the method proposed in the present invention has been improved when the four sets are selected on the COCO-20i dataset. The overall average intersection-over-union ratio is improved by 1.67%, and the segmentation effect is more significantly improved.

请参阅图3，所示为本发明第三实施例提出的基于背景信息挖掘的小样本语义分割系统的结构示意图，该系统包括：Please refer to FIG3 , which is a schematic diagram of the structure of a small sample semantic segmentation system based on background information mining proposed in the third embodiment of the present invention. The system includes:

背景挖掘模块10，用于将预先设定的基类图像数据集输入离线背景标记算法网络，通过无监督图像分割算法子网络和骨干网络获取所述基类图像的预分割子区域掩码和高层语义特征，以提取所述子区域中背景区域的原型特征，根据所述原型特征进行聚类，以划分多个不同的伪类，并将所述伪类制作成伪类数据集；The background mining module 10 is used to input a preset base class image data set into an offline background labeling algorithm network, obtain the pre-segmented sub-region mask and high-level semantic features of the base class image through an unsupervised image segmentation algorithm sub-network and a backbone network, so as to extract the prototype features of the background region in the sub-region, perform clustering according to the prototype features to divide a plurality of different pseudo-classes, and make the pseudo-classes into a pseudo-class data set;

联合训练模块20，用于根据所述伪类数据集和所述基类图像数据集对语义分割模型进行联合训练，以通过训练后的所述语义分割模型进行新类图像数据集的分割任务。The joint training module 20 is used to jointly train the semantic segmentation model according to the pseudo-class dataset and the base-class image dataset, so as to perform the segmentation task of the new-class image dataset through the trained semantic segmentation model.

进一步的，背景挖掘模块10包括：Furthermore, the background mining module 10 includes:

特征提取单元101，用于将预先设定的基类图像数据集输入离线背景标记算法网络，通过无监督图像分割算法子网络和骨干网络获取所述基类图像的预分割子区域掩码和高层语义特征，以提取所述子区域中背景区域的原型特征；The feature extraction unit 101 is used to input a preset base class image data set into an offline background labeling algorithm network, and obtain the pre-segmented sub-region mask and high-level semantic features of the base class image through the unsupervised image segmentation algorithm sub-network and the backbone network to extract the prototype features of the background region in the sub-region;

伪类划分单元102，用于根据所述原型特征进行聚类，以划分多个不同的伪类，并将所述伪类制作成伪类数据集。The pseudo-class division unit 102 is configured to perform clustering according to the prototype features to divide the pseudo-classes into a plurality of different pseudo-classes, and to generate the pseudo-classes into a pseudo-class data set.

进一步的，联合训练模块20包括：Furthermore, the joint training module 20 includes:

联合训练单元201，用于根据所述伪类数据集和所述基类图像数据集对语义分割模型进行联合训练，以通过训练后的所述语义分割模型进行新类图像数据集的分割任务。The joint training unit 201 is used to jointly train the semantic segmentation model according to the pseudo-class dataset and the base-class image dataset, so as to perform the segmentation task of the new-class image dataset through the trained semantic segmentation model.

本发明另一方面还提出计算机存储介质，其上存储有一个或多个程序，该程序给处理器执行时实现上述的基于背景信息挖掘的小样本语义分割方法。On the other hand, the present invention further proposes a computer storage medium on which one or more programs are stored, and when the program is executed by a processor, the above-mentioned small sample semantic segmentation method based on background information mining is implemented.

本发明另一方面还提出一种计算机设备，包括存储器和处理器，其中所述存储器用于存放计算机程序，所述处理器用于执行所述存储器上所存放的计算机程序，以实现上述的基于背景信息挖掘的小样本语义分割方法。On the other hand, the present invention also proposes a computer device, including a memory and a processor, wherein the memory is used to store computer programs, and the processor is used to execute the computer programs stored in the memory to implement the above-mentioned small sample semantic segmentation method based on background information mining.

本领域技术人员可以理解，在流程图中表示或在此以其他方式描述的逻辑和/或步骤，例如，可以被认为是用于实现逻辑功能的可执行指令的定序列表，可以具体实现在任何计算机可读介质中，以供指令执行系统、装置或设备（如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统）使用，或结合这些指令执行系统、装置或设备而使用。就本说明书而言，“计算机可读介质”可以是任何可以包含存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。Those skilled in the art will appreciate that the logic and/or steps represented in the flowchart or otherwise described herein, for example, may be considered as an ordered list of executable instructions for implementing logical functions, and may be specifically implemented in any computer-readable medium for use by an instruction execution system, device or apparatus (such as a computer-based system, a system including a processor, or other system that can fetch instructions from an instruction execution system, device or apparatus and execute instructions), or in conjunction with such instruction execution systems, devices or apparatuses. For purposes of this specification, "computer-readable medium" may be any device that can contain storage, communication, propagation or transmission of a program for use by an instruction execution system, device or apparatus, or in conjunction with such instruction execution systems, devices or apparatuses.

计算机可读介质的更具体的示例（非穷尽性列表）包括以下：具有一个或多个布线的电连接部（电子装置），便携式计算机盘盒（磁装置），随机存取存储器（RAM），只读存储器（ROM），可擦除可编辑只读存储器（EPROM或闪速存储器），光纤装置，以及便携式光盘只读存储器（CDROM）。另外，计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质，因为可以例如通过对纸或其他介质进行光学扫描，接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序，然后将其存储在计算机存储器中。More specific examples of computer-readable media (a non-exhaustive list) include the following: an electrical connection with one or more wires (electronic device), a portable computer disk case (magnetic device), a random access memory (RAM), a read-only memory (ROM), an erasable and programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disk read-only memory (CDROM). In addition, the computer-readable medium may even be a paper or other suitable medium on which the program is printed, since the program may be obtained electronically, for example, by optically scanning the paper or other medium, followed by editing, deciphering or, if necessary, processing in another suitable manner, and then stored in a computer memory.

应当理解，本发明的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中，多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如，如果用硬件来实现，和在另一实施方式中一样，可用本领域公知的下列技术中的任一项或它们的组合来实现：具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路，具有合适的组合逻辑门电路的专用集成电路，可编程门阵列（PGA），现场可编程门阵列（FPGA）等。It should be understood that the various parts of the present invention can be implemented by hardware, software, firmware or a combination thereof. In the above-mentioned embodiments, multiple steps or methods can be implemented by software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented by hardware, as in another embodiment, it can be implemented by any one of the following technologies known in the art or a combination thereof: a discrete logic circuit having a logic gate circuit for implementing a logic function for a data signal, a dedicated integrated circuit having a suitable combination of logic gate circuits, a programmable gate array (PGA), a field programmable gate array (FPGA), etc.

在本说明书的描述中，参考术语“一个实施例”、“一些实施例”、 “示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中，对上述术语的示意性表述不一定指的是相同的实施例或示例。而且，描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。In the description of this specification, the description with reference to the terms "one embodiment", "some embodiments", "example", "specific example", or "some examples" means that the specific features, structures, materials or characteristics described in conjunction with the embodiment or example are included in at least one embodiment or example of the present invention. In this specification, the schematic representation of the above terms does not necessarily refer to the same embodiment or example. Moreover, the specific features, structures, materials or characteristics described can be combined in any one or more embodiments or examples in a suitable manner.

以上所述实施例仅表达了本发明的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对本发明专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本发明构思的前提下，还可以做出若干变形和改进，这些都属于本发明的保护范围。因此，本发明专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation methods of the present invention, and the descriptions thereof are relatively specific and detailed, but they cannot be understood as limiting the scope of the patent of the present invention. It should be pointed out that, for ordinary technicians in this field, several variations and improvements can be made without departing from the concept of the present invention, and these all belong to the protection scope of the present invention. Therefore, the protection scope of the patent of the present invention shall be subject to the attached claims.