CN118823344A

Movatterモバイル変換

Info

Publication number: CN118823344A
Application number: CN202410871458.8A
Authority: CN
Inventors: 张伟; 李源; 沈琼霞; 蔡文妍; 杨维明; 李璋; 刘国君; 石鑫
Original assignee: Hubei University
Current assignee: Hubei University
Priority date: 2024-07-01
Filing date: 2024-07-01
Publication date: 2024-10-22

Abstract

Translated fromChinese

本发明公开了基于通道和空间注意力机制医学图像语义分割方法及系统，方法包括：收集历史医学图像数据集，对数据集进行预处理；使用CSAB‑UNet模型进行对预处理后的医学图像进行语义分割，所述CSAB‑UNet模型为：使用U‑Net网络的卷积层和U‑Net网络的池化层对预处理后的数据进行特征提取和下采样操作，得到尺寸与通道数修改后的特征图；使用U‑Net网络的反卷积层、桥式通道注意模块和桥式空间注意力模块对所述特征图进行上采样，恢复所述特征图的尺寸和通道数；采用混合损失函数对所述CSAB‑UNet模型进行优化，使用优化好的CSAB‑UNet模型对实时的医学图像进行语义分割，得到分割结果。

The present invention discloses a medical image semantic segmentation method and system based on channel and spatial attention mechanism, the method comprising: collecting historical medical image data sets and preprocessing the data sets; using a CSAB-UNet model to perform semantic segmentation on the preprocessed medical images, the CSAB-UNet model comprising: using a convolutional layer of a U-Net network and a pooling layer of a U-Net network to perform feature extraction and downsampling operations on the preprocessed data to obtain a feature map with a modified size and number of channels; using a deconvolutional layer, a bridge channel attention module and a bridge spatial attention module of a U-Net network to upsample the feature map to restore the size and number of channels of the feature map; using a hybrid loss function to optimize the CSAB-UNet model, using the optimized CSAB-UNet model to perform semantic segmentation on real-time medical images to obtain a segmentation result.

Description

Translated fromChinese

基于通道和空间注意力机制医学图像语义分割方法及系统Medical image semantic segmentation method and system based on channel and spatial attention mechanism

技术领域Technical Field

本发明涉及卷积神经网络，神经科学，计算机等技术领域，具体涉及基于通道和空间注意力机制医学图像语义分割方法及系统。The present invention relates to the technical fields of convolutional neural networks, neuroscience, computers, and more particularly to a medical image semantic segmentation method and system based on channel and spatial attention mechanisms.

背景技术Background Art

医学图像语义分割算法的准确率研究对于临床诊断和治疗决策、疾病早期检测和预测、科学研究和知识发现，以及自动化和智能化医疗应用等方面具有重要意义。医学图像语义分割算法的准确率直接关系到临床医生对患者的诊断和治疗决策。准确的分割结果可以提供更全面、详细和可靠的信息，帮助医生更好地理解病理变化和病情发展，从而制定更精准和个体化的治疗方案。高准确率的算法可以减少人为误差，并提高医生对医学图像的解读和理解能力。The accuracy of medical image semantic segmentation algorithms is of great significance for clinical diagnosis and treatment decisions, early disease detection and prediction, scientific research and knowledge discovery, and automated and intelligent medical applications. The accuracy of medical image semantic segmentation algorithms is directly related to clinicians' diagnosis and treatment decisions for patients. Accurate segmentation results can provide more comprehensive, detailed and reliable information, helping doctors better understand pathological changes and disease progression, thereby formulating more accurate and individualized treatment plans. Highly accurate algorithms can reduce human errors and improve doctors' ability to interpret and understand medical images.

医学图像语义分割算法的准确率对于疾病早期的检测和预测也具有重要意义。通过准确地分割出病变区域，可以帮助医生发现微小的病灶或异常信号，从而实现对疾病的早期诊断和干预。此外，准确的分割结果还能为疾病的进展和预后提供有价值的信息，有助于制定个性化的治疗计划和预测患者的疾病发展趋势。The accuracy of medical image semantic segmentation algorithms is also of great significance for early detection and prediction of diseases. By accurately segmenting the lesion area, it can help doctors find tiny lesions or abnormal signals, thereby achieving early diagnosis and intervention of the disease. In addition, accurate segmentation results can also provide valuable information for the progression and prognosis of the disease, which helps to formulate personalized treatment plans and predict the development trend of the patient's disease.

同时，通过准确地分割医学图像中的组织结构、器官区域或病变区域，可以帮助研究人员深入了解疾病的病理特征、生理过程和治疗效果。准确的分割结果可以用于构建定量解剖和病理特征模型，从而促进医学研究的深入和进展。医学图像语义分割算法的准确性是实现自动化和智能化医疗应用的基础。At the same time, by accurately segmenting tissue structures, organ regions or lesion regions in medical images, researchers can gain a deeper understanding of the pathological characteristics, physiological processes and treatment effects of the disease. Accurate segmentation results can be used to build quantitative anatomical and pathological feature models, thereby promoting the in-depth and progress of medical research. The accuracy of medical image semantic segmentation algorithms is the basis for realizing automated and intelligent medical applications.

综上所述，医学图像语义分割算法的准确率研究具有重要的意义，通过不断提高算法准确率，可以提升医疗图像处理的可靠性和效率，为医疗实践和患者健康提供更好的支持。In summary, the research on the accuracy of medical image semantic segmentation algorithms is of great significance. By continuously improving the accuracy of the algorithm, the reliability and efficiency of medical image processing can be improved, providing better support for medical practice and patient health.

虽然现有的医学图像语义分割技术在一定程度上取得了成功，但仍存在一些缺点：医学图像中肿瘤、组织、器官可能受到背景干扰识别存在困难，这些医学图像的有些通道可能携带了更多与任务相关的信息，而其他通道可能包含了无用噪声或冗余信息。其次，模型泛化能力有限，由于医学图像的多样性和复杂性，现有模型在应对新数据集或不同疾病类型时泛化能力有限。处理细节和边界困难，现有技术对于处理细微结构和边界区域仍存在挑战，容易出现分割不准确、模糊或断裂的问题。Although the existing semantic segmentation technology for medical images has achieved some success, it still has some shortcomings: tumors, tissues, and organs in medical images may be difficult to identify due to background interference. Some channels of these medical images may carry more task-related information, while other channels may contain useless noise or redundant information. Secondly, the generalization ability of the model is limited. Due to the diversity and complexity of medical images, the generalization ability of existing models is limited when dealing with new data sets or different types of diseases. It is difficult to handle details and boundaries. Existing technologies still face challenges in handling subtle structures and boundary areas, and are prone to inaccurate, blurred, or broken segmentation.

发明内容Summary of the invention

本发明针对上述问题，提供了基于通道和空间注意力机制医学图像语义分割方法，所述方法包括：In view of the above problems, the present invention provides a medical image semantic segmentation method based on channel and spatial attention mechanism, the method comprising:

S1、收集历史医学图像数据集，对所述数据集进行预处理；S1. Collect historical medical image datasets and preprocess the datasets;

S2、基于U-Net模型构建CSAB-UNet模型，并使用预处理后的数据集对所述CSAB-UNet模型进行训练，所述CSAB-UNet模型为：使用U-Net网络的卷积层和U-Net网络的池化层对预处理后的数据进行特征提取和下采样操作，得到尺寸与通道数修改后的特征图；对所述特征图使用U-Net网络的反卷积层、进行上采样，恢复所述特征图的尺寸和通道数，将与上采样特征图尺寸对应的下采样特征图使用桥式通道注意力模块和桥式空间注意力模块保留所需语义信息；S2. Build a CSAB-UNet model based on the U-Net model, and use the preprocessed data set to train the CSAB-UNet model. The CSAB-UNet model is as follows: use the convolution layer of the U-Net network and the pooling layer of the U-Net network to perform feature extraction and downsampling operations on the preprocessed data to obtain a feature map with modified size and number of channels; use the deconvolution layer of the U-Net network to upsample the feature map, restore the size and number of channels of the feature map, and use the bridge channel attention module and the bridge spatial attention module to retain the required semantic information for the downsampled feature map corresponding to the size of the upsampled feature map;

S3、采用混合损失函数对训练后的CSAB-UNet模型进行优化，使用优化好的CSAB-UNet模型对实时的医学图像进行语义分割，得到分割结果。S3. Use the hybrid loss function to optimize the trained CSAB-UNet model, and use the optimized CSAB-UNet model to perform semantic segmentation on real-time medical images to obtain segmentation results.

可选的，所述S1中，所述预处理方法具体包括：Optionally, in S1, the preprocessing method specifically includes:

对收集的历史医学图像数据集进行数据清洗和标注，数据清洗和标注的过程具体包括：The collected historical medical image datasets are cleaned and annotated. The data cleaning and annotation process specifically includes:

将收集到的历史医学图像数据集进行图像格式转换、尺寸调整和颜色校正，使得所述历史医学图像数据集的格式、分辨率、大小一致；Performing image format conversion, size adjustment, and color correction on the collected historical medical image datasets so that the format, resolution, and size of the historical medical image datasets are consistent;

基于调整后的历史医学图像数据集，剔除有噪声、不完整的图像，得到基础元数据信息；Based on the adjusted historical medical image dataset, noisy and incomplete images are removed to obtain basic metadata information;

使用标注工具对所述基础元数据信息进行数据标注；Using a labeling tool to label the basic metadata information;

对标注后的数据进行数据增强，得到数据增强信息，将所述数据增强信息进行存储后，得到预处理数据。The labeled data is enhanced to obtain data enhancement information, and the data enhancement information is stored to obtain preprocessed data.

可选的，所述S2中，所述桥式通道注意模块包括特征映射的压缩模块、特征映射的转换模块和特征映射的重标定模块；所述桥式空间注意力模块包括压缩拼接模块、克难攻坚注意力权重激活提取模块和空间像素加权模块。Optionally, in S2, the bridge channel attention module includes a feature map compression module, a feature map conversion module and a feature map recalibration module; the bridge spatial attention module includes a compression splicing module, a tough attention weight activation extraction module and a spatial pixel weighting module.

可选的，所述S3中，混合损失函数为：Optionally, in S3, the mixed loss function is:

Bce_Dice_Loss＝Loss_BCE+Loss_DiceBce_Dice_Loss=Loss_BCE+Loss_Dice

其中，A为预测分割结果，B为真实的分割结果，p为某个样本预测类别为1的概率，y为真实的标签分类值。Among them, A is the predicted segmentation result, B is the actual segmentation result, p is the probability that a sample is predicted to be 1, and y is the actual label classification value.

本发明还公开基于通道和空间注意力机制医学图像语义分割系统，系统包括数据收集模块、模型构建模块和语义分割模块；The present invention also discloses a medical image semantic segmentation system based on channel and spatial attention mechanism, the system comprising a data collection module, a model building module and a semantic segmentation module;

所述数据收集模块用于收集历史医学图像数据集，对所述数据集进行预处理；The data collection module is used to collect historical medical image data sets and preprocess the data sets;

所述模型构建模块用于基于U-Net模型构建CSAB-UNet模型，并使用预处理后的数据集对所述CSAB-UNet模型进行训练；The model building module is used to build a CSAB-UNet model based on the U-Net model, and use the preprocessed data set to train the CSAB-UNet model;

所述语义分割模块用于采用混合损失函数对训练后的CSAB-UNet模型进行优化，使用优化好的CSAB-UNet模型对实时的医学图像进行语义分割，得到分割结果。The semantic segmentation module is used to optimize the trained CSAB-UNet model using a hybrid loss function, and use the optimized CSAB-UNet model to perform semantic segmentation on real-time medical images to obtain segmentation results.

可选的，所述数据收集模块的预处理过程包括：Optionally, the preprocessing process of the data collection module includes:

可选的，所述模型构建模块的桥式通道注意模块包括特征映射的压缩模块、特征映射的转换模块和特征映射的重标定模块；所述桥式空间注意力模块包括压缩拼接模块、克难攻坚注意力权重激活提取模块和空间像素加权模块。Optionally, the bridge channel attention module of the model building module includes a feature map compression module, a feature map conversion module and a feature map recalibration module; the bridge spatial attention module includes a compression splicing module, a tough attention weight activation extraction module and a spatial pixel weighting module.

可选的，所述语义分割模块的混合损失函数为：Optionally, the hybrid loss function of the semantic segmentation module is:

Bce_Dice_Loss＝Loss_BCE+LLoss_DiceBce_Dice_Loss=Loss_BCE+LLoss_Dice

与现有技术相比，本发明的有益效果为：Compared with the prior art, the present invention has the following beneficial effects:

本网络模型以U-Net为基本框架，将原来的跳跃连接都替换为桥式通道注意力模块和桥式空间注意力模块，使网络可自适应获取丰富的感受野信息，通过融合新特征，并获取分割任务所关注的通道和空间信息，便于对新特征进行重要性筛选。另外，实验中还使用多种数据增强方式提升训练数据量，使网络鲁棒性更好；在训练时采用混合损失函数Bce_Dice Loss，增强模型分析和优化能力。This network model uses U-Net as the basic framework, and replaces the original jump connections with bridge channel attention modules and bridge spatial attention modules, so that the network can adaptively obtain rich receptive field information, and obtain the channel and spatial information concerned by the segmentation task by fusing new features, which facilitates the importance screening of new features. In addition, the experiment also uses a variety of data enhancement methods to increase the amount of training data and make the network more robust; the mixed loss function Bce_Dice Loss is used during training to enhance the model analysis and optimization capabilities.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明的技术方案，下面对实施例中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solution of the present invention, the following briefly introduces the drawings required for use in the embodiments. Obviously, the drawings described below are only some embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative labor.

图1为本发明实施例CSAB-UNet整体结构图；FIG1 is an overall structure diagram of CSAB-UNet according to an embodiment of the present invention;

图2为本发明实施例桥式通道注意力模块结构图；FIG2 is a structural diagram of a bridge channel attention module according to an embodiment of the present invention;

图3为本发明实施例桥式空间注意力模块结构图。FIG3 is a structural diagram of a bridge-type spatial attention module according to an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

为使本发明的上述目的、特征和优点能够更加明显易懂，下面结合附图和具体实施方式对本发明作进一步详细的说明。In order to make the above-mentioned objects, features and advantages of the present invention more obvious and easy to understand, the present invention is further described in detail below with reference to the accompanying drawings and specific embodiments.

实施例一Embodiment 1

本发明提供基于通道和空间注意力机制医学图像语义分割方法，如图1所示，所述方法包括：The present invention provides a medical image semantic segmentation method based on channel and spatial attention mechanism, as shown in FIG1 , the method comprises:

S1、收集历史医学图像数据集，对所述数据集进行预处理。包括数据清洗、标注等操作。(1)数据收集与预处理:收集所需的医学图像数据集，如CT、MRI、X光片等，检查数据质量,确保图像格式、分辨率、大小等一致性，对图像进行格式转换、尺寸调整、颜色校正等预处理操作；(2)数据清洗:剔除模糊、噪音、遮挡严重的图像，修复或剔除不完整、错误标注的图像，确保每个图像都有清晰、准确的元数据信息；(3)数据标注:根据具体任务,为图像标注感兴趣的目标区域、病变类型等，可使用专业的标注工具，标注过程需要对标注规则进行统一,确保标注一致性；(4)数据增强:通过旋转、翻转、缩放等变换增加训练数据量，提高模型的泛化能力；(5)数据划分:将数据集划分为训练集、验证集和测试集，确保三个集合之间无重叠，并且分布尽量均匀；(6)数据存储与管理:以合适的文件格式保存数据，建立数据目录结构,并记录数据信息。S1. Collect historical medical image datasets and preprocess the datasets. This includes data cleaning, labeling and other operations. (1) Data collection and preprocessing: Collect the required medical image datasets, such as CT, MRI, X-rays, etc. Check the data quality, ensure the consistency of image format, resolution, size, etc., and perform preprocessing operations such as format conversion, size adjustment, and color correction on the images; (2) Data cleaning: Eliminate blurry, noisy, and severely occluded images, repair or eliminate incomplete and incorrectly labeled images, and ensure that each image has clear and accurate metadata information; (3) Data labeling: According to the specific task, label the target area of interest, lesion type, etc. for the image. Professional labeling tools can be used. The labeling process needs to unify the labeling rules to ensure labeling consistency; (4) Data enhancement: Increase the amount of training data through transformations such as rotation, flipping, and scaling to improve the generalization ability of the model; (5) Data partitioning: Divide the dataset into training set, validation set, and test set, ensuring that there is no overlap between the three sets and that the distribution is as even as possible; (6) Data storage and management: Save the data in an appropriate file format, establish a data directory structure, and record data information.

S2、基于U-Net模型构建CSAB-UNet模型，并使用预处理后的数据集对所述CSAB-UNet模型进行训练，所述CSAB-UNet模型为：使用U-Net网络的卷积层和U-Net网络的池化层对预处理后的数据进行特征提取和下采样操作，得到尺寸与通道数修改后的特征图；对所述特征图使用U-Net网络的反卷积层、进行上采样，恢复所述特征图的尺寸和通道数，将与上采样特征图尺寸对应的下采样特征图使用桥式通道注意力模块和桥式空间注意力模块保留所需语义信息。S2. Build a CSAB-UNet model based on the U-Net model, and use the preprocessed data set to train the CSAB-UNet model. The CSAB-UNet model is as follows: use the convolution layer of the U-Net network and the pooling layer of the U-Net network to perform feature extraction and downsampling operations on the preprocessed data to obtain a feature map with modified size and number of channels; use the deconvolution layer of the U-Net network to upsample the feature map, restore the size and number of channels of the feature map, and use the bridge channel attention module and the bridge spatial attention module to retain the required semantic information of the downsampled feature map corresponding to the size of the upsampled feature map.

U-Net网络结构包括：编码器：使用卷积层和池化层进行特征提取和下采样操作，逐渐减小特征图的尺寸和通道数。解码器：使用反卷积层和跳跃连接(skip connection)进行上采样操作，逐渐恢复特征图的尺寸和通道数。将编码器中对应尺寸的特征图与解码器中的特征图进行连接，以保留更多的语义信息。The U-Net network structure includes: Encoder: Use convolutional layers and pooling layers to perform feature extraction and downsampling operations, gradually reducing the size and number of channels of the feature map. Decoder: Use deconvolutional layers and skip connections to perform upsampling operations, gradually restoring the size and number of channels of the feature map. Connect the feature map of the corresponding size in the encoder with the feature map in the decoder to retain more semantic information.

本实施例将原来的跳跃连接都替换为桥式通道注意力模块和桥式空间注意力模块。使网络可自适应获取丰富的感受野信息，通过将扩展路径中不同尺度的五层中间特征融合成新特征，并获取分割任务所关注的通道和空间信息，便于对新特征进行重要性筛选。This embodiment replaces the original jump connections with bridge channel attention modules and bridge spatial attention modules, so that the network can adaptively obtain rich receptive field information, fuse the five-layer intermediate features of different scales in the expansion path into new features, and obtain the channel and spatial information concerned by the segmentation task, so as to facilitate the importance screening of new features.

如图2所示，其中，桥式通道注意力模块由三部分组成——特征映射的压缩、特征映射的转换、特征映射的重标定。该模块引入了一种可以在不同场景自适应调整通道注意力的机制。特征映射的压缩部分的作用是降低运算量并且为后续的特征映射转换和重标定提供基础。As shown in Figure 2, the bridge channel attention module consists of three parts: feature map compression, feature map conversion, and feature map recalibration. This module introduces a mechanism that can adaptively adjust channel attention in different scenarios. The function of the feature map compression part is to reduce the amount of calculation and provide a basis for subsequent feature map conversion and recalibration.

特征映射的压缩部分通过全局池化操作将输入特征图的长和宽维度压缩为1×1，忽略长和宽的维度信息以降低运算量，并且将所有经过全局平均池化的特征图在通道维度上进行拼接。The compression part of the feature map compresses the length and width dimensions of the input feature map to 1×1 through a global pooling operation, ignores the length and width dimension information to reduce the amount of computation, and concatenates all feature maps that have undergone global average pooling in the channel dimension.

特征映射的转换部分是将通过特征映射压缩得到的全局特征向量转换为一个中间表示。这个中间表示将用于后续的通道权重计算。在特征映射的转换部分，会引入一个一维卷积实现特征转换，该转换具有以下作用：The conversion part of the feature map is to convert the global feature vector obtained by feature map compression into an intermediate representation. This intermediate representation will be used for subsequent channel weight calculations. In the conversion part of the feature map, a one-dimensional convolution is introduced to achieve feature conversion, which has the following effects:

(1)参数共享：利用相同参数处理输入特征图不同位置像素点。这种参数共享特性使一维卷积具有较少参数量。(1) Parameter sharing: The same parameters are used to process pixels at different positions in the input feature map. This parameter sharing feature makes the one-dimensional convolution have fewer parameters.

(2)局部关联性：捕捉输入序列局部关联性。在特征映射转换中，一维卷积帮助模型学习通道间局部关系，从而更好地建模通道注意力。(2) Local correlation: Capturing the local correlation of the input sequence. In feature map conversion, one-dimensional convolution helps the model learn the local relationship between channels, thereby better modeling channel attention.

(3)平移不变性：输入进行平移不变处理。这意味着，如果输入特征在位置上发生平移，经过一维卷积后得到的特征映射仍保持对应平移。这个性质使一维卷积对于输入位置信息不敏感，更关注特征内容信息。(3) Translation invariance: The input is translation invariant. This means that if the input feature is translated in position, the feature map obtained after one-dimensional convolution still maintains the corresponding translation. This property makes one-dimensional convolution insensitive to input position information and focuses more on feature content information.

(4)多尺度建模：采用不同大小一维卷积核在不同感受野对输入建模。这种多尺度建模能力使一维卷积能够捕捉输入图像不同尺度特征。(4) Multi-scale modeling: One-dimensional convolution kernels of different sizes are used to model the input in different receptive fields. This multi-scale modeling capability enables one-dimensional convolution to capture different scale features of the input image.

这一层将全局特征向量映射到一个更高维度或低维度的中间表示，增强模型表达能力，这个中间表示将包含更多关于通道间的关系信息，并将用于后续通道权重计算。This layer maps the global feature vector to a higher-dimensional or lower-dimensional intermediate representation to enhance the model’s expressiveness. This intermediate representation will contain more information about the relationship between channels and will be used for subsequent channel weight calculations.

特征映射的重标定部分是将原始特征映射调整或转换的过程，以改变特征的表示形式或属性。重标定可通过不同方法和技术实现，具体取决于任务和目标。本发明采用一层一维卷积和一层全连接层建立特征之间的全局关联，能考虑特征之间全局信息，使其能更好建模复杂的通道间关系，通过应用激活函数Sigmoid引入非线性关系控制通道权重范围介于[0,1]，将得到的特征图根据各层输出特征图所需通道维度进行拆分，然后将通道权重同输入特征图相乘加权，该动作重标定各通道权重，最后受到残差连接思想的启发，本发明将重标定各通道权重后得到的特征图同原始图像相叠加增强了模型准确性和稳定性。The recalibration part of the feature map is the process of adjusting or converting the original feature map to change the representation or attribute of the feature. Recalibration can be achieved through different methods and techniques, depending on the task and goal. The present invention uses a one-dimensional convolution layer and a fully connected layer to establish a global association between features, which can consider the global information between features, so that it can better model complex channel relationships. By applying the activation function Sigmoid, a nonlinear relationship is introduced to control the channel weight range between [0,1]. The obtained feature map is split according to the channel dimension required by the output feature map of each layer, and then the channel weight is multiplied and weighted with the input feature map. This action recalibrates the weights of each channel. Finally, inspired by the idea of residual connection, the present invention superimposes the feature map obtained after recalibrating the weights of each channel with the original image to enhance the accuracy and stability of the model.

如图3所示，桥式空间注意力模块能够根据医学图像中肿瘤、组织、器官尺寸自适应地获取不同感受野的特征信息，从复杂的医学图像中提取有用的肿瘤、组织、器官信息，并抑制无关的背景信息和噪声。模块由三部分组成——压缩拼接、空间注意力权重激活提取、空间像素加权。该模块在空间轴上融合多阶段、多尺度的信息，生成各阶段注意图。使用验证集评估训练好的模型，计算准确率、召回率、F1分数等指标，评估模型的性能。As shown in Figure 3, the bridge spatial attention module can adaptively obtain feature information of different receptive fields according to the size of tumors, tissues, and organs in medical images, extract useful tumor, tissue, and organ information from complex medical images, and suppress irrelevant background information and noise. The module consists of three parts: compression splicing, spatial attention weight activation extraction, and spatial pixel weighting. The module fuses multi-stage and multi-scale information on the spatial axis to generate attention maps for each stage. Use the validation set to evaluate the trained model, calculate indicators such as accuracy, recall, and F1 score, and evaluate the performance of the model.

模块由三部分组成——压缩拼接、空间注意力权重激活提取、空间像素加权。The module consists of three parts - compression splicing, spatial attention weight activation extraction, and spatial pixel weighting.

压缩拼接。如图所示5个输入特征图分别通过最大、平均池化运算，拼接输出张量得到二通道张量，池化操作中宽度和高度保持不变。这一操作可达到以下效果：Compression splicing. As shown in the figure, the five input feature maps are respectively subjected to maximum and average pooling operations, and the spliced output tensors are obtained by obtaining a two-channel tensor. The width and height remain unchanged in the pooling operation. This operation can achieve the following effects:

(1)丰富特征表示：最大池化和平均池化操作可以捕捉不同类型特征。通过串联最大池化和平均池化操作丰富特征表示。(1) Enriching feature representation: Max pooling and average pooling operations can capture different types of features. The feature representation is enriched by cascading max pooling and average pooling operations.

(2)增强鲁棒性：最大池化对噪声和变化更敏感，平均池化则对噪声和变化相对不敏感。通过串联最大池化和平均池化增强模型鲁棒性，使其更好适应不同类型输入。(2) Enhanced robustness: Max pooling is more sensitive to noise and changes, while average pooling is relatively insensitive to noise and changes. By connecting max pooling and average pooling in series, the robustness of the model is enhanced, making it better adaptable to different types of inputs.

(3)提高特征多样性：得到二通道特征图分别捕捉最大和平均特征信息。这种多样性有助于提供更多信息区分不同特征模式。(3) Improve feature diversity: The obtained two-channel feature maps capture the maximum and average feature information respectively. This diversity helps to provide more information to distinguish different feature patterns.

空间注意力权重激活提取。利用膨胀卷积运算，该运算过程有以下作用：Spatial attention weight activation extraction. Using dilated convolution operations, this operation process has the following effects:

(1)减少参数量：共享卷积核权重。这对于大型网络和资源受限的场景非常有益，同时防止过拟合。(1) Reduce the number of parameters: Share convolution kernel weights. This is very beneficial for large networks and resource-constrained scenarios, while preventing overfitting.

(2)增强特征表示：扩展卷积核感知范围更好捕捉图像细节。共享的膨胀卷积运算通过在不同位置共享卷积核，可在特征图上提取相同特征表示，增强了特征的一致性和可重用性。(2) Enhanced feature representation: Expanding the convolution kernel’s perception range better captures image details. The shared dilated convolution operation can extract the same feature representation on the feature map by sharing the convolution kernel at different locations, thus enhancing the consistency and reusability of the features.

(3)保持空间分辨率：可在保持输入和输出的空间分辨率不变的同时增大感受野。这对于需要保留细节和位置信息的任务(如目标检测和分割)尤为重要。(3) Maintaining spatial resolution: The receptive field can be increased while maintaining the spatial resolution of the input and output. This is particularly important for tasks that require preserving details and location information, such as object detection and segmentation.

空间像素加权：空间像素加权用于对图像中的每个像素加权，以增强或调整其在图像中的影响力，在空间像素加权中，每个像素被赋予一个权重，该权重决定该像素对最终处理结果的贡献度。像素权重根据其在图像中的位置或特征进行计算。Spatial pixel weighting: Spatial pixel weighting is used to weight each pixel in an image to enhance or adjust its influence in the image. In spatial pixel weighting, each pixel is assigned a weight that determines the contribution of the pixel to the final processing result. Pixel weights are calculated based on their position or features in the image.

S3中，混合损失函数为：In S3, the mixed loss function is:

Bce_Dice_Loss＝Loss_BCE+Loss_DiceBce_Dice_Loss=Loss_BCE+Loss_Dice

其中，A为预测的分割结果，B为真实的分割结果，p为某个样本预测类别为1的概率，y为真实的标签分类值(本实验为二分类)。Among them, A is the predicted segmentation result, B is the actual segmentation result, p is the probability that a sample is predicted to be 1, and y is the actual label classification value (this experiment is binary classification).

本实施例在训练时采用混合损失函数Bce_Dice Loss，增加模型分析和优化能力。模型针对背景像素点在图像比例过大的问题，以及考虑到BCE和Dice Loss优势与不足，融合两者后的新损失函数为Bce_Dice_Loss。提升了模型预测结果的准确率。实验中还使用多种数据增强方式提升训练数据量，使网络鲁棒性更好。This embodiment uses a mixed loss function Bce_Dice Loss during training to increase the model's analysis and optimization capabilities. The model addresses the problem of too large a proportion of background pixels in the image, and considering the advantages and disadvantages of BCE and Dice Loss, the new loss function after integrating the two is Bce_Dice_Loss. This improves the accuracy of the model's prediction results. In the experiment, a variety of data enhancement methods were also used to increase the amount of training data to make the network more robust.

实施例二Embodiment 2

基于通道和空间注意力机制医学图像语义分割系统，所述系统包括：系统包括数据收集模块、模型构建模块和语义分割模块；A medical image semantic segmentation system based on channel and spatial attention mechanism, the system comprises: the system comprises a data collection module, a model building module and a semantic segmentation module;

所述数据收集模块用于收集历史医学图像数据集，对所述数据集进行预处理。包括数据清洗、标注等操作。对收集的历史医学图像数据集进行数据清洗和标注，数据清洗和标注的过程具体包括：将收集到的历史医学图像数据集进行图像格式转换、尺寸调整和颜色校正，使得所述历史医学图像数据集的格式、分辨率、大小一致；基于调整后的历史医学图像数据集，剔除有噪声、不完整的图像，得到基础元数据信息；使用标注工具对所述基础元数据信息进行数据标注；对标注后的数据进行数据增强，得到数据增强信息，将所述数据增强信息进行存储后，得到预处理数据。The data collection module is used to collect historical medical image data sets and preprocess the data sets. This includes operations such as data cleaning and labeling. The collected historical medical image data sets are cleaned and labeled, and the data cleaning and labeling process specifically includes: performing image format conversion, size adjustment and color correction on the collected historical medical image data sets to make the format, resolution and size of the historical medical image data sets consistent; based on the adjusted historical medical image data sets, eliminating noisy and incomplete images to obtain basic metadata information; using labeling tools to label the basic metadata information; performing data enhancement on the labeled data to obtain data enhancement information, and storing the data enhancement information to obtain preprocessed data.

在本实施例中，所采用的桥连通道注意力模块和普通通道注意力模块有相似性却有不同之处。In this embodiment, the bridge channel attention module used has similarities but differences with the ordinary channel attention module.

相似性在于：The similarities are:

(1)特征压缩方式相同：两者都是通过将每个通道特征图压缩为单一像素值，假设输入特征维度为(H,W,C)，其中H代表高，W代表宽，C代表通道数，则变换前后特征图维度由(H,W,C)变换为(1,1,C)，进而提取各通道注意力权重。(1) The feature compression method is the same: Both compress each channel feature map into a single pixel value. Assuming that the input feature dimension is (H, W, C), where H represents height, W represents width, and C represents the number of channels, the feature map dimension before and after the transformation is transformed from (H, W, C) to (1, 1, C), and then the attention weight of each channel is extracted.

(2)加权方式相同：都将该权重和原通道特征图信息相乘进行加权。(2) The weighting method is the same: the weight is multiplied by the original channel feature map information for weighting.

不同之处在于：The differences are:

(1)特征信息来源不同：原通道注意力模块SE-Net的输入仅来自相同维度特征图信息，而桥连通道注意力模块的输入来自U-Net五组不同维度的特征图信息。(1) Different sources of feature information: The input of the original channel attention module SE-Net only comes from feature map information of the same dimension, while the input of the bridge channel attention module comes from five sets of feature map information of different dimensions of U-Net.

(2)通道注意力权重提取方式不同：原通道注意力模块SE-Net在提取通道注意力权重时是将经过特征压缩后维度为(1,1,C)的特征图通过两层全连接层，而桥连通道注意力模块是将U-Net不同维度的五层特征图维度信息分别通过全局平均池化GAP后将特征串联在一起，起到特征图信息“桥梁”作用，再将串联后的特征图经过一层一维卷积和一层全连接层提取通道注意力权重。(2) The channel attention weight extraction method is different: the original channel attention module SE-Net extracts the channel attention weight by passing the feature map with dimension (1, 1, C) after feature compression through two fully connected layers, while the bridge channel attention module concatenates the features of the five layers of feature map dimensional information of U-Net with different dimensions through global average pooling GAP, which acts as a "bridge" for feature map information. The concatenated feature map is then passed through a one-dimensional convolution layer and a fully connected layer to extract the channel attention weight.

(3)特征信息重标定方式不同：原通道注意力模块SE-Net仅将提取得到的通道注意力权重和原特征图进行简单地相乘加权，而桥连通道注意力模块将提取得到的通道注意力权重和原特征图进行相乘加权的同时与原始特征图相叠加，这样做可以避免过拟合的出现，减弱源特征图噪声的干扰，有效提高该模块训练时的稳定性。(3) The feature information is recalibrated in different ways: the original channel attention module SE-Net simply multiplies the extracted channel attention weights and the original feature map, while the bridge channel attention module multiplies the extracted channel attention weights and the original feature map and superimposes them with the original feature map. This can avoid overfitting, reduce the interference of source feature map noise, and effectively improve the stability of the module during training.

具体来看，模块由三部分组成——特征映射的压缩、特征映射的转换、特征映射的重标定。Specifically, the module consists of three parts: compression of feature maps, conversion of feature maps, and recalibration of feature maps.

本发明所使用的桥连空间注意力模块和空间注意力模块有相似性却有不同之处。The bridging spatial attention module used in the present invention has similarities but differences with the spatial attention module.

相似性在于：The similarities are:

(1)通道压缩方式相同：两者都是通过将每个特征图压缩为单一通道特征图，假设输入特征维度为(H,W,C)，其中H代表高，W代表宽，C代表通道数，则变换前后特征图维度由(H,W,C)变换为(H,W,1)，通过串联最大池化和平均池化操作丰富单一通道特征图的特征表示。(1) The channel compression method is the same: Both methods compress each feature map into a single channel feature map. Assuming that the input feature dimension is (H, W, C), where H represents height, W represents width, and C represents the number of channels, the feature map dimension before and after the transformation is transformed from (H, W, C) to (H, W, 1). The feature representation of the single channel feature map is enriched by serializing the maximum pooling and average pooling operations.

(2)加权方式相同：都将该权重和原特征图像素点相乘进行加权。(2) The weighting method is the same: the weight is multiplied by the original feature map pixel for weighting.

不同之处在于：The differences are:

(1)特征信息来源不同：原CABM中的空间注意力模块的输入仅来自相同维度特征图信息，而桥连空间注意力模块的输入来自U-Net五组不同维度的特征图信息。(1) Different sources of feature information: The input of the spatial attention module in the original CABM only comes from the feature map information of the same dimension, while the input of the bridged spatial attention module comes from the feature map information of five groups of different dimensions of U-Net.

(2)空间注意力权重提取方式不同：原CABM中的空间注意力模块在提取空间注意力权重时是将经过特征压缩后维度为(H,W,1)的特征图通过两层全连接层，而桥连空间注意力模块是将U-Net不同维度的五层特征图维度信息分别通过最大池化和平均池化后将特征串联在一起，起到特征图信息“桥梁”作用，再将串联后的特征图经过共享的膨胀二维卷积层提取空间注意力权重。(2) Different methods of extracting spatial attention weights: The spatial attention module in the original CABM extracts spatial attention weights by passing the feature map with dimension (H, W, 1) after feature compression through two fully connected layers, while the bridging spatial attention module concatenates the five layers of feature map dimensional information of U-Net with different dimensions through maximum pooling and average pooling respectively, which acts as a "bridge" for feature map information. The concatenated feature map is then passed through a shared expanded two-dimensional convolutional layer to extract the spatial attention weights.

(3)特征信息重标定方式不同：原CABM中的空间注意力模块仅将提取得到的空间注意力权重和原特征图进行简单地相乘加权，而桥连空间注意力模块将提取得到的空间注意力权重和原特征图进行相乘加权的同时与原始特征图相叠加，这样做可以避免过拟合的出现，减弱源特征图噪声的干扰，有效提高该模块训练时的稳定性。(3) The feature information is recalibrated in different ways: the spatial attention module in the original CABM simply multiplies the extracted spatial attention weights and the original feature map, while the bridged spatial attention module multiplies the extracted spatial attention weights and the original feature map and superimposes them with the original feature map. This can avoid overfitting, reduce the interference of source feature map noise, and effectively improve the stability of the module during training.

具体来看，模块由三部分组成——压缩拼接、空间注意力权重激活提取、空间像素加权。Specifically, the module consists of three parts: compression splicing, spatial attention weight activation extraction, and spatial pixel weighting.

所述模型构建模块用于基于U-Net模型构建CSAB-UNet模型，并使用预处理后的数据集对所述CSAB-UNet模型进行训练。The model building module is used to build a CSAB-UNet model based on the U-Net model, and use the preprocessed data set to train the CSAB-UNet model.

所述CSAB-UNet模型为：使用U-Net网络的卷积层和U-Net网络的池化层对预处理后的数据进行特征提取和下采样操作，得到尺寸与通道数修改后的特征图；使用U-Net网络的反卷积层、桥式通道注意模块和桥式空间注意力模块对所述特征图进行上采样，恢复所述特征图的尺寸和通道数。The CSAB-UNet model is as follows: the convolution layer of the U-Net network and the pooling layer of the U-Net network are used to perform feature extraction and downsampling operations on the preprocessed data to obtain a feature map with modified size and number of channels; the deconvolution layer of the U-Net network, the bridge channel attention module and the bridge spatial attention module are used to upsample the feature map to restore the size and number of channels of the feature map.

V-Net网络结构包括：编码器：使用卷积层和池化层进行特征提取和下采样操作，逐渐减小特征图的尺寸和通道数。解码器：使用反卷积层和跳跃连接(skip connection)进行上采样操作，逐渐恢复特征图的尺寸和通道数。将编码器中对应尺寸的特征图与解码器中的特征图进行连接，以保留更多的语义信息。The V-Net network structure includes: Encoder: Use convolutional layers and pooling layers to perform feature extraction and downsampling operations, gradually reducing the size and number of channels of the feature map. Decoder: Use deconvolutional layers and skip connections to perform upsampling operations, gradually restoring the size and number of channels of the feature map. Connect the feature map of the corresponding size in the encoder with the feature map in the decoder to retain more semantic information.

其中，桥式通道注意力模块由三部分组成——特征映射的压缩、特征映射的转换、特征映射的重标定。该模块引入了一种可以在不同场景自适应调整通道注意力的机制。特征映射的压缩部分的作用是降低运算量并且为后续的特征映射转换和重标定提供基础。The bridge channel attention module consists of three parts: feature map compression, feature map conversion, and feature map recalibration. This module introduces a mechanism that can adaptively adjust channel attention in different scenarios. The function of the feature map compression part is to reduce the amount of calculation and provide a basis for subsequent feature map conversion and recalibration.

桥式空间注意力模块能够根据医学图像中肿瘤、组织、器官尺寸自适应地获取不同感受野的特征信息，从复杂的医学图像中提取有用的肿瘤、组织、器官信息，并抑制无关的背景信息和噪声。模块由三部分组成——压缩拼接、空间注意力权重激活提取、空间像素加权。该模块在空间轴上融合多阶段、多尺度的信息，生成各阶段注意图。The bridge spatial attention module can adaptively obtain feature information of different receptive fields according to the size of tumors, tissues, and organs in medical images, extract useful tumor, tissue, and organ information from complex medical images, and suppress irrelevant background information and noise. The module consists of three parts: compression splicing, spatial attention weight activation extraction, and spatial pixel weighting. The module fuses multi-stage and multi-scale information on the spatial axis to generate attention maps for each stage.

混合损失函数为：The mixed loss function is:

Bce_Dice_Loss＝Loss_BCE+Loss_DiceBce_Dice_Loss=Loss_BCE+Loss_Dice

综上，本实施例具体实施步骤如下所示：1.数据集选择：选择医学图像语义分割数据集ISIC2017，ISIC2018，BreCaHAD，其中包含了原始医学图像和对应的标签。In summary, the specific implementation steps of this embodiment are as follows: 1. Dataset selection: Select the medical image semantic segmentation datasets ISIC2017, ISIC2018, BreCaHAD, which contain original medical images and corresponding labels.

2.模型选择：在U-Net模型基础上用桥式通道注意力模块和桥式空间注意力模块代替跳跃连接。2. Model selection: Based on the U-Net model, the bridge channel attention module and the bridge spatial attention module are used to replace the skip connection.

3.模型选择：对输入图像进行预处理，包括大小调整、归一化、数据增强等。利用数据增强技术如随机裁剪、翻转、旋转提高模型的泛化能力。3. Model selection: Preprocess the input image, including resizing, normalization, data augmentation, etc. Use data augmentation techniques such as random cropping, flipping, and rotation to improve the generalization ability of the model.

4.模型的训练：使用训练集训练选定模型。在训练过程中，使用GT_BceDiceLoss来度量模型输出与真实标签之间的差异，考虑收敛速度和学习能力的均衡，初始学习率默认为1e-3，batch size(每次训练使用的样本数)为8，epoch设置为300并使用优化算法AdamW更新模型参数。4. Model training: Use the training set to train the selected model. During the training process, GT_BceDiceLoss is used to measure the difference between the model output and the true label. Considering the balance between convergence speed and learning ability, the initial learning rate defaults to 1e-3, the batch size (the number of samples used in each training) is 8, the epoch is set to 300, and the optimization algorithm AdamW is used to update the model parameters.

5.模型评估：使用测试集评估训练好的模型。5. Model evaluation: Use the test set to evaluate the trained model.

6.超参数调节：根据模型的评估结果，调节模型超参数，如学习率、批量大小、正则化参数等，以进一步改善模型性能。6. Hyperparameter adjustment: According to the evaluation results of the model, adjust the model hyperparameters, such as learning rate, batch size, regularization parameters, etc., to further improve the model performance.

7.可视化结果：对模型进行可视化分析，将模型预测的语义分割结果与真实标签进行对比，以检查模型准确性和边界细节。7. Visualization results: Perform visual analysis on the model and compare the semantic segmentation results predicted by the model with the true labels to check the model accuracy and boundary details.

8.模型部署：将训练好的模型应用到医学图像测试集进行推理，生成语义分割结果。8. Model deployment: Apply the trained model to the medical image test set for inference and generate semantic segmentation results.

进行消融实验对所设计的桥式通道注意力模块和桥式空间注意力模块的有效性进行了验证。采用了控制变量法，分别进行4组实验：第一组为：使用U-Net网络；第二组为：使用U-Net网络，将跳跃连接替换为桥式通道注意力模块；第三组为：使用U-Net网络，将跳跃连接替换为桥式空间注意力模块；第四组为：使用U-Net网络，将跳跃连接替换为桥式通道注意力模块，且加入桥式空间注意力模块。实验结果验证了桥式通道注意力模块可以选择对于分割任务更重要的通道信息，从而提高了模型分割能力。Ablation experiments were conducted to verify the effectiveness of the designed bridge channel attention module and bridge spatial attention module. The control variable method was used to conduct 4 groups of experiments: the first group was: using the U-Net network; the second group was: using the U-Net network, replacing the jump connection with the bridge channel attention module; the third group was: using the U-Net network, replacing the jump connection with the bridge spatial attention module; the fourth group was: using the U-Net network, replacing the jump connection with the bridge channel attention module, and adding the bridge spatial attention module. The experimental results verified that the bridge channel attention module can select channel information that is more important for the segmentation task, thereby improving the model segmentation ability.

以上所述的实施例仅是对本发明优选方式进行的描述，并非对本发明的范围进行限定，在不脱离本发明设计精神的前提下，本领域普通技术人员对本发明的技术方案做出的各种变形和改进，均应落入本发明权利要求书确定的保护范围内。The embodiments described above are only descriptions of the preferred embodiments of the present invention and are not intended to limit the scope of the present invention. Without departing from the design spirit of the present invention, various modifications and improvements made to the technical solutions of the present invention by ordinary technicians in this field should all fall within the protection scope determined by the claims of the present invention.

Claims

Translated fromChinese

1.一种基于通道和空间注意力机制医学图像语义分割方法，其特征在于，所述方法包括：1. A medical image semantic segmentation method based on channel and spatial attention mechanism, characterized in that the method comprises:

2.根据权利要求1所述的基于通道和空间注意力机制医学图像语义分割方法，其特征在于，所述S1中，所述预处理方法具体包括：2. According to the method for semantic segmentation of medical images based on channel and spatial attention mechanism in claim 1, it is characterized in that in S1, the preprocessing method specifically includes:

3.根据权利要求1所述的基于通道和空间注意力机制医学图像语义分割方法，其特征在于，所述S2中，所述桥式通道注意模块包括特征映射的压缩模块、特征映射的转换模块和特征映射的重标定模块；所述桥式空间注意力模块包括压缩拼接模块、克难攻坚注意力权重激活提取模块和空间像素加权模块。3. According to claim 1, the medical image semantic segmentation method based on channel and spatial attention mechanism is characterized in that, in S2, the bridge channel attention module includes a feature map compression module, a feature map conversion module and a feature map recalibration module; the bridge spatial attention module includes a compression splicing module, a difficult-to-solve attention weight activation extraction module and a spatial pixel weighting module.

4.根据权利要求1所述的基于通道和空间注意力机制医学图像语义分割方法，其特征在于，所述S3中，混合损失函数为：4. The method for semantic segmentation of medical images based on channel and spatial attention mechanism according to claim 1, characterized in that in S3, the mixed loss function is:

Bce_Dice_LossLoss_BCE+Loss_DiceBce_Dice_LossLoss_BCE+Loss_Dice

5.一种基于通道和空间注意力机制医学图像语义分割系统，所述系统用于实现权利要求1-4任一项所述的医学图像语义分割割方法，其特征在于，系统包括：数据收集模块、模型构建模块和语义分割模块；5. A medical image semantic segmentation system based on channel and spatial attention mechanism, the system is used to implement the medical image semantic segmentation method according to any one of claims 1 to 4, characterized in that the system comprises: a data collection module, a model building module and a semantic segmentation module;

6.根据权利要求5所述的基于通道和空间注意力机制医学图像语义分割系统，其特征在于，所述数据收集模块的预处理过程包括：6. The medical image semantic segmentation system based on channel and spatial attention mechanism according to claim 5, characterized in that the preprocessing process of the data collection module includes:

7.根据权利要求5所述的基于通道和空间注意力机制医学图像语义分割系统，其特征在于，所述模型构建模块的桥式通道注意模块包括特征映射的压缩模块、特征映射的转换模块和特征映射的重标定模块；所述桥式空间注意力模块包括压缩拼接模块、克难攻坚注意力权重激活提取模块和空间像素加权模块。7. According to claim 5, the medical image semantic segmentation system based on channel and spatial attention mechanism is characterized in that the bridge channel attention module of the model building module includes a feature map compression module, a feature map conversion module and a feature map recalibration module; the bridge spatial attention module includes a compression splicing module, a difficult-to-solve attention weight activation extraction module and a spatial pixel weighting module.

8.根据权利要求5所述的基于通道和空间注意力机制医学图像语义分割系统，其特征在于，所述语义分割模块的混合损失函数为：8. The medical image semantic segmentation system based on channel and spatial attention mechanism according to claim 5, characterized in that the hybrid loss function of the semantic segmentation module is:

Bce_Dice_LosSLoss_BCE+LoSs_DiceBce_Dice_LosSLoss_BCE+LoSs_Dice