CN116129207A

Movatterモバイル変換

Info

Publication number: CN116129207A
Application number: CN202310414590.1A
Authority: CN
Inventors: 刘刚; 王冰冰; 周杰; 王磊; 史魁杰; 曾辉; 张金烁; 胡莉
Original assignee: Jiangxi Normal University
Current assignee: Jiangxi Normal University
Priority date: 2023-04-18
Filing date: 2023-04-18
Publication date: 2023-05-16
Anticipated expiration: 2043-04-18
Also published as: CN116129207B

Abstract

The invention discloses an image data processing method of multi-scale channel attention, which is characterized in that global features and local features in input data are extracted, so that a convolutional neural network is more concerned about the whole information and local detail features of the input data, and the problems of target aggregation and target shielding in complex scenes are relieved.

Description

Translated fromChinese

一种多尺度通道注意力的图像数据处理方法A Method for Image Data Processing with Multi-Scale Channel Attention

技术领域technical field

本发明涉及计算机视觉领域，尤其涉及一种多尺度通道注意力的图像数据处理方法。The invention relates to the field of computer vision, in particular to an image data processing method of multi-scale channel attention.

背景技术Background technique

通道注意力机制能够显著地提高模型的表现力和泛化能力，且具有较低的计算成本，容易被集成到现有的卷积神经网络结构中。而由于这些优点，通道注意力机制也已经被广泛应用于图像分类、目标检测、语义分割等深度学习应用领域。The channel attention mechanism can significantly improve the expressiveness and generalization ability of the model, and has a low computational cost, and is easy to be integrated into the existing convolutional neural network structure. Due to these advantages, the channel attention mechanism has been widely used in deep learning applications such as image classification, object detection, and semantic segmentation.

通道注意力机制的本质是对不同通道的特征进行加权平均，从而得到更加丰富、稳定、可靠的特征表达。The essence of the channel attention mechanism is to weight and average the features of different channels, so as to obtain more abundant, stable and reliable feature expressions.

现有的通道注意力有SE，ECA，CA等，这些通道注意力仅仅关注某一局部特征中的细节信息或者全局特征中的语义信息，而没有同时关注细节信息与语义信息，导致不够丰富的通道维度的特征表达。The existing channel attention includes SE, ECA, CA, etc. These channel attention only focus on the detailed information in a certain local feature or the semantic information in the global feature, but do not pay attention to the detailed information and semantic information at the same time, resulting in insufficient richness. Feature representation of the channel dimension.

发明内容Contents of the invention

本发明的目的是为了提供一种多尺度通道注意力的图像数据处理方法。The purpose of the present invention is to provide a multi-scale channel attention image data processing method.

本发明所要解决的问题是：The problem to be solved by the present invention is:

提出一种多尺度通道注意力的图像数据处理方法，提取输入数据中的全局特征和局部特征，从而使得卷积神经网络对输入数据的整体信息以及局部细节特征更加关注，从而缓解复杂场景中出现的目标聚集与目标遮挡问题。A multi-scale channel attention image data processing method is proposed to extract the global features and local features in the input data, so that the convolutional neural network pays more attention to the overall information and local detail features of the input data, thereby alleviating the occurrence of complex scenes. The problem of target aggregation and target occlusion.

一种多尺度通道注意力的图像数据处理方法采用的技术方案如下：A technical scheme adopted by an image data processing method of multi-scale channel attention is as follows:

S21：对输入数据（原始图像或特征图）进行数字化处理，将提取到的特征转换为数字化，并通过张量矩阵存储，经过归一化处理使卷积神经网络收敛加快；S21: Digitize the input data (original image or feature map), convert the extracted features into digits, store them in a tensor matrix, and speed up the convergence of the convolutional neural network after normalization;

S22：使用全局通道注意力机制与局部通道注意力机制相结合的方法，对输入数据进行特征提取和特征融合；S22: Use the method of combining the global channel attention mechanism and the local channel attention mechanism to perform feature extraction and feature fusion on the input data;

S23：在全局通道注意力机制内使用全局平均池化、自适应选择卷积核大小的一维卷积层和Sigmoid激活函数，全局通道注意力可以通过对特征图的全局平均池化和逐元素变换，自适应地调整不同通道的权重，使得模型能够关注更重要的特征，提高模型的分类性能和鲁棒性，其中全局平均池化的计算公式为：，其中表示全局平均池化结果，为输入图像，其尺寸为W×H×C，W、H和C分别表示输入图像的宽、高和通道，i和j分别代表宽和高上的像素点位置；S23: In the global channel attention mechanism, global average pooling, one-dimensional convolution layer with adaptive selection of convolution kernel size and Sigmoid activation function are used. Global channel attention can be achieved through global average pooling of feature maps and element-by-element Transformation, adaptively adjust the weights of different channels, so that the model can focus on more important features, improve the classification performance and robustness of the model, and the calculation formula of the global average pooling is: ,in Indicates the global average pooling result, is the input image, its size is W×H×C, W, H and C represent the width, height and channel of the input image respectively, and i and j represent the pixel position on the width and height respectively;

自适应选择的计算公式为：，其中表示一维卷积的卷积核大小，表示通道数，表示k只能取奇数，和用于改变和之间的比例，本发明中和分别取2和1；The calculation formula for adaptive selection is: ,in Represents the convolution kernel size of one-dimensional convolution, Indicates the number of channels, Indicates that k can only take odd numbers, and used to change and The ratio between, in the present invention and Take 2 and 1 respectively;

Sigmoid激活函数也称为S型生长曲线，计算公式为：，其中为输入；The Sigmoid activation function is also known as the S-type growth curve, and the calculation formula is: ,in for input;

S24：在局部通道注意力机制中采用的是二维卷积实现的多层感知机MLP，用于提取局部特征，MLP架构为卷积核大小为1的两个二维卷积以及中间的ReLU函数激活，输入数据经二维卷积后仅改变其通道数，第一个卷积操作的输出通道数为输入通道数的十六分之一，第二个卷积操作的输出通道数与嵌入位置通道数一致，局部通道注意力则可以帮助模型更好地捕捉输入特征中的局部信息；S24: In the local channel attention mechanism, a multi-layer perceptron MLP implemented by two-dimensional convolution is used to extract local features. The MLP architecture is two two-dimensional convolutions with a convolution kernel size of 1 and a ReLU in the middle. Function activation, the input data only changes its number of channels after two-dimensional convolution, the number of output channels of the first convolution operation is one-sixteenth of the number of input channels, the number of output channels of the second convolution operation is the same as the embedding The number of position channels is the same, and the local channel attention can help the model better capture the local information in the input features;

S25：ReLU函数通过将相应的活性值设为0，仅保留正元素并丢弃所有负元素；S25: The ReLU function keeps only positive elements and discards all negative elements by setting the corresponding activity value to 0;

S26：将全局注意力与局部注意力的输出进行融合操作，并使用Sigmoid函数激活数据得到最终的注意力权重，然后将激活后的数据与输入数据进行逐像素相乘；S26: Fuse the output of the global attention and the local attention, and use the Sigmoid function to activate the data to obtain the final attention weight, and then multiply the activated data and the input data pixel by pixel;

S27：通过Sigmoid函数进行压缩，它将已有数据根据其范围，将任意输入压缩到区间（0, 1）中的某个值，以保证归一化；S27: Compress through the Sigmoid function, which compresses any input to a certain value in the interval (0, 1) according to the existing data according to its range, so as to ensure normalization;

S28：对输入数据与激活后的数据进行逐像素相乘操作，用来完成对输入数据的不同位置加权操作，从而更关注全局特征和局部特征。S28: Perform a pixel-by-pixel multiplication operation on the input data and the activated data to complete weighting operations on different positions of the input data, so as to pay more attention to global features and local features.

进一步的，上述输入数据通过上述步骤S24中二维卷积后仅改变其通道数，且在整个MLP架构内，对输入数据的通道以一种先收缩后扩张的方式估计通道间的注意力，其中的收缩系数为r，收缩后特征尺度为H×W×C/r，使用ReLU激活函数，扩张后特征尺度为H×W×C。Further, the above-mentioned input data only changes the number of channels after the two-dimensional convolution in the above-mentioned step S24, and in the entire MLP architecture, the channel of the input data is estimated in a way of shrinking first and then expanding, The shrinkage coefficient is r, the feature scale after shrinkage is H×W×C/r, and the ReLU activation function is used, and the feature scale after expansion is H×W×C.

进一步的，上述步骤S23和S24中分别通过全局通道注意力机制中全局平均池化的方式和局部通道注意力机制中多层感知机MLP的方式分别提取输入数据中的全局特征和局部特征，并通过上述步骤S26对步骤S23与步骤S24的全局通道注意力机制的输出与局部通道注意力机制的输出进行融合操作即对不同特征进行特征融合，从而使得卷积神经网络对输入数据的整体信息以及局部细节特征更加关注，从而缓解复杂场景中出现的目标聚集与目标遮挡问题。Further, in the above steps S23 and S24, the global features and local features in the input data are respectively extracted by means of global average pooling in the global channel attention mechanism and the multi-layer perceptron MLP in the local channel attention mechanism, and Through the above step S26, the output of the global channel attention mechanism of step S23 and step S24 is fused with the output of the local channel attention mechanism, that is, the feature fusion of different features is performed, so that the convolutional neural network can understand the overall information of the input data and Local detail features are more concerned, thereby alleviating the problems of target aggregation and target occlusion in complex scenes.

本发明的有益效果：复杂场景下的小目标检测的大量聚集和严重的遮挡等特点带来的检测精度不高、漏检率高等问题，可以通过多尺度通道注意力的图像数据处理方法进一步缓解，多尺度通道注意力的图像数据处理方法通过提取数据中的全局特征和局部特征并对不同特征进行特征融合，从而使得卷积神经网络对输入数据的整体信息以及局部细节特征更加关注，从而缓解复杂场景中出现的目标聚集及目标遮挡问题。Beneficial effects of the present invention: the problems of low detection accuracy and high missed detection rate caused by the large number of small target detection and serious occlusion in complex scenes can be further alleviated by the image data processing method of multi-scale channel attention , the image data processing method of multi-scale channel attention extracts the global features and local features in the data and performs feature fusion on different features, so that the convolutional neural network pays more attention to the overall information of the input data and local detail features, thereby alleviating Target aggregation and target occlusion problems in complex scenes.

附图说明Description of drawings

图1为本发明中多尺度通道注意力的图像数据处理方法示意图；Fig. 1 is a schematic diagram of an image data processing method of multi-scale channel attention in the present invention;

图2为本发明中ReLU函数修正线性示意图；Fig. 2 is a linear schematic diagram of ReLU function modification in the present invention;

图3为本发明中sigmoid函数数据归一化示意图。Fig. 3 is a schematic diagram of normalization of sigmoid function data in the present invention.

具体实施方式Detailed ways

下面结合说明书附图对本发明进一步清楚完整说明，但本发明的保护范围并不仅限于此。The present invention will be further clearly and completely described below in conjunction with the accompanying drawings, but the protection scope of the present invention is not limited thereto.

实施例Example

如图1至图3所示，一种多尺度通道注意力的图像数据处理方法，包括以下步骤：As shown in Figures 1 to 3, an image data processing method for multi-scale channel attention includes the following steps:

S22：使用全局通道注意力机制与局部通道注意力机制相结合的方法，如图1所示，对输入数据进行特征提取和特征融合；S22: Use a method combining the global channel attention mechanism and the local channel attention mechanism, as shown in Figure 1, to perform feature extraction and feature fusion on the input data;

S23：在全局通道注意力机制内使用全局平均池化、自适应选择卷积核大小的一维卷积层和Sigmoid激活函数，如图1左列所示，全局通道注意力可以通过对特征图的全局平均池化和逐元素变换，自适应地调整不同通道的权重，使得模型能够关注更重要的特征，提高模型的分类性能和鲁棒性，其中全局平均池化的计算公式为：，其中表示全局平均池化结果，为输入图像，其尺寸为W×H×C，W、H和C分别表示输入图像的宽、高和通道，i和j分别代表宽和高上的像素点位置；S23: In the global channel attention mechanism, use global average pooling, one-dimensional convolution layer with adaptive selection of convolution kernel size and Sigmoid activation function, as shown in the left column of Figure 1, global channel attention can be passed to the feature map The global average pooling and element-by-element transformation adaptively adjust the weights of different channels so that the model can focus on more important features and improve the classification performance and robustness of the model. The calculation formula of the global average pooling is: ,in Indicates the global average pooling result, is the input image, its size is W×H×C, W, H and C represent the width, height and channel of the input image respectively, and i and j represent the pixel position on the width and height respectively;

Sigmoid激活函数也称为S型生长曲线，如图3所示，计算公式为：，其中为输入；The Sigmoid activation function is also called the S-type growth curve, as shown in Figure 3, the calculation formula is: ,in for input;

S24：在局部通道注意力机制中采用的是二维卷积实现的多层感知机MLP，用于提取局部特征，MLP架构为卷积核大小为1的两个二维卷积以及中间的ReLU函数激活，ReLU函数激活使一部分神经元的输出为0，减少了参数的相互依存关系，缓解了过拟合问题的发生，输入数据经二维卷积后仅改变其通道数，第一个卷积操作的输出通道数为输入通道数的十六分之一，第二个卷积操作的输出通道数与嵌入位置通道数一致，局部通道注意力则可以帮助模型更好地捕捉输入特征中的局部信息，如图1右列所示；S24: In the local channel attention mechanism, a multi-layer perceptron MLP implemented by two-dimensional convolution is used to extract local features. The MLP architecture is two two-dimensional convolutions with a convolution kernel size of 1 and a ReLU in the middle. Function activation, ReLU function activation makes the output of someneurons 0, reduces the interdependence of parameters, and alleviates the occurrence of over-fitting problems. The input data only changes the number of channels after two-dimensional convolution. The first convolution The number of output channels of the convolution operation is one-sixteenth of the number of input channels, and the number of output channels of the second convolution operation is consistent with the number of embedding position channels. Local channel attention can help the model better capture the input features. The local information of , as shown in the right column of Figure 1;

S25：ReLU函数通过将相应的活性值设为0，如图2所示，仅保留正元素并丢弃所有负元素；S25: The ReLU function sets the corresponding activity value to 0, as shown in Figure 2, only retains positive elements and discards all negative elements;

S27：通过Sigmoid函数进行压缩，它将已有数据根据其范围，将任意输入压缩到区间（0, 1）中的某个值，以保证归一化，如图1所示；S27: Compress through the Sigmoid function, which compresses any input to a certain value in the interval (0, 1) according to the range of the existing data to ensure normalization, as shown in Figure 1;

S28：对输入数据与激活后的数据进行逐像素相乘操作，用来完成对输入数据的不同位置加权操作，从而更关注全局特征和局部特征如图1所示。S28: Perform a pixel-by-pixel multiplication operation on the input data and the activated data to complete weighting operations on different positions of the input data, so as to pay more attention to global features and local features, as shown in Figure 1 .

上述输入数据通过上述步骤S24中二维卷积后仅改变其通道数，且在整个MLP架构内，对输入数据的通道以一种先收缩后扩张的方式估计通道间的注意力，其中的收缩系数为r，收缩后特征尺度为H×W×C/r，使用ReLU激活函数，扩张后特征尺度为H×W×C。The above-mentioned input data only changes the number of channels after the two-dimensional convolution in the above-mentioned step S24, and in the entire MLP architecture, the channels of the input data are first contracted and then expanded to estimate the attention between channels, where the contraction The coefficient is r, the feature scale after shrinkage is H×W×C/r, and the ReLU activation function is used, and the feature scale after expansion is H×W×C.

所述步骤S23和S24中分别通过全局通道注意力机制中全局平均池化的方式和局部通道注意力机制中多层感知机MLP的方式分别提取输入数据中的全局特征和局部特征，并通过所述步骤S26对步骤S23与步骤S24的全局通道注意力机制的输出与局部通道注意力机制的输出进行融合操作即对不同特征进行特征融合，从而使得卷积神经网络对输入数据的整体信息以及局部细节特征更加关注，从而缓解复杂场景中出现的目标聚集与目标遮挡问题。In the steps S23 and S24, the global features and local features in the input data are respectively extracted by means of the global average pooling in the global channel attention mechanism and the multi-layer perceptron MLP in the local channel attention mechanism, and through the The above step S26 performs a fusion operation on the output of the global channel attention mechanism and the output of the local channel attention mechanism in steps S23 and S24, that is, performs feature fusion on different features, so that the convolutional neural network can fully understand the overall information and local information of the input data. More attention is paid to detailed features, thereby alleviating the problems of target aggregation and target occlusion in complex scenes.

本发明的实施例公布的是较佳的实施例，但并不局限于此，本领域的普通技术人员，极易根据上述实施例，领会本发明的精神，并做出不同的引申和变化，但只要不脱离本发明的精神，都在本发明的保护范围内。The embodiments of the present invention disclose preferred embodiments, but are not limited thereto. Those skilled in the art can easily comprehend the spirit of the present invention based on the above-mentioned embodiments, and make different extensions and changes. But as long as it does not deviate from the spirit of the present invention, it is within the protection scope of the present invention.

Claims

Translated fromChinese

1.一种多尺度通道注意力的图像数据处理方法，其特征在于，包括以下步骤：1. an image data processing method of multi-scale channel attention, is characterized in that, comprises the following steps:

S21：对输入数据即原始图像或特征图进行数字化处理，将提取到的特征转换为数字化，并通过张量矩阵存储，经过归一化处理使卷积神经网络收敛加快；S21: Digitize the input data, that is, the original image or feature map, convert the extracted features into digits, store them in a tensor matrix, and speed up the convergence of the convolutional neural network after normalization;

S23：在全局通道注意力机制内使用全局平均池化、自适应选择卷积核大小的一维卷积层和Sigmoid激活函数，其中所述全局平均池化过程的计算公式为：,其中表示全局平均池化结果,为输入图像，其尺寸为W×H×C，W、H和C分别表示输入图像的宽、高和通道，i和j分别代表宽和高上的像素点位置；S23: Using global average pooling, a one-dimensional convolutional layer and a Sigmoid activation function for adaptively selecting the size of the convolution kernel in the global channel attention mechanism, wherein the calculation formula of the global average pooling process is: ,in Indicates the global average pooling result, is the input image, its size is W×H×C, W, H and C represent the width, height and channel of the input image respectively, and i and j represent the pixel position on the width and height respectively;

自适应选择的计算公式为：，其中表示一维卷积的卷积核大小，表示通道数，表示k只能取奇数，和用于改变和之间的比例；The calculation formula for adaptive selection is: ,in Represents the convolution kernel size of one-dimensional convolution, Indicates the number of channels, Indicates that k can only take odd numbers, and used to change and the ratio between;

S24：在局部通道注意力机制中采用的是二维卷积实现的多层感知机MLP，用于提取局部特征，MLP架构为卷积核大小为1的两个二维卷积以及中间的ReLU函数激活，输入数据经二维卷积后仅改变其通道数，第一个卷积操作的输出通道数为输入通道数的十六分之一，第二个卷积操作的输出通道数与嵌入位置通道数一致；S24: In the local channel attention mechanism, a multi-layer perceptron MLP implemented by two-dimensional convolution is used to extract local features. The MLP architecture is two two-dimensional convolutions with a convolution kernel size of 1 and a ReLU in the middle. Function activation, the input data only changes its number of channels after two-dimensional convolution, the number of output channels of the first convolution operation is one-sixteenth of the number of input channels, the number of output channels of the second convolution operation is the same as the embedding The number of position channels is the same;

S26：将全局注意力与局部注意力的输出进行融合操作，并使用Sigmoid函数激活数据得到最终的注意力权重，然后将激活后的数据与输入数据进行逐像素相乘;S26: Fuse the output of the global attention and the local attention, and use the Sigmoid function to activate the data to obtain the final attention weight, and then multiply the activated data and the input data pixel by pixel;

2.根据权利要求1所述的一种多尺度通道注意力的图像数据处理方法，其特征在于，2. the image data processing method of a kind of multi-scale channel attention according to claim 1, is characterized in that,

3.根据权利要求1所述的一种多尺度通道注意力的图像数据处理方法，其特征在于，所述输入数据通过所述步骤S24中二维卷积后仅改变其通道数，且在整个MLP架构内，对输入数据的通道以一种先收缩后扩张的方式估计通道间的注意力，其中的收缩系数为r，收缩后特征尺度为H×W×C/r，使用ReLU激活函数，扩张后特征尺度为H×W×C。3. the image data processing method of a kind of multi-scale channel attention according to claim 1, is characterized in that, described input data only changes its channel number after two-dimensional convolution in the described step S24, and in the whole In the MLP architecture, the channel of the input data is estimated in a way of shrinking first and then expanding. The shrinkage coefficient is r, and the feature scale after shrinking is H×W×C/r. ReLU activation function is used. The feature scale after dilation is H×W×C.