技术领域Technical field
本发明属于医学图像技术领域,特别涉及一种基于感兴趣区域及Transformer的MR图像分类方法及装置。The invention belongs to the field of medical image technology, and particularly relates to an MR image classification method and device based on a region of interest and a Transformer.
背景技术Background technique
近年来,深度学习方法在医学图像分类任务中取得了巨大的成功。例如,深度卷积神经网络(Convolutional Neural Networks,CNN)已经被广泛验证具有从MR数据中学习高级特征的出色能力,极大地提高了脑部疾病诊断的性能。CNN能在计算机视觉任务上达到先进水平在一定程度上要归功于逐渐扩大的感受野,可以将结构化图像表示的层次结构作为语义来学习。在计算机视觉中,图像的视觉语义捕获通常被认为是构建网络的核心思想。然而,CNN也存在一些缺陷,它忽略了图像中的长距离依赖关系,难以描述大脑MR图像脑区间的复杂关系,从而导致了大脑MR图像诊断任务产生次优的结果。而Transformer可以捕获长距离依赖关系,具有通过动态计算的自注意权重进行自适应建模的能力,并且人们能够洞察模型关注的是什么。In recent years, deep learning methods have achieved great success in medical image classification tasks. For example, deep convolutional neural networks (CNN) have been widely verified to have excellent capabilities in learning advanced features from MR data, greatly improving the performance of brain disease diagnosis. CNN's ability to reach advanced levels in computer vision tasks is partly due to the gradually expanding receptive field, which can learn the hierarchical structure of structured image representations as semantics. In computer vision, the visual semantic capture of images is usually considered to be the core idea of building networks. However, CNN also has some shortcomings. It ignores long-distance dependencies in images and is difficult to describe the complex relationships between brain regions in brain MR images, which leads to suboptimal results in brain MR image diagnosis tasks. The Transformer can capture long-distance dependencies, has the ability to adaptively model through dynamically calculated self-attention weights, and people can gain insight into what the model pays attention to.
Transformer是一种基于注意力的编码器-解码器结构,目前已经成为自然语言处理(Natural Language Processing,NLP)的主流模型。Transformer完全基于自注意力机制实现构建,并通过多头注意力机制增强深度学习的特征表示能力。受Transformer在NLP中的成功启发,Dosovitskiy等人提出了ViT(Vision Transformer)模型,直接将Transformer应用于图像块序列,可以很好地实现图像分类任务。ViT及其变体已经在多个基准数据集上实现了最优的性能。在各种各样的计算机视觉任务中,Transformer已经变得非常流行。最近,在医学图像分析领域,Transformer也被成功应用于疾病诊断、图像分割、合成等任务,取得了显著的成果。然而,Transformer对大规模数据的依赖导致其很难被广泛应用。尤其是在医学图像领域,数据集往往较小且标签不可靠。与此同时,在处理高分辨率3D医学图像时,Transformer也遭受着极大的计算和空间复杂性的困扰。因此结合CNN和Transformer方法用于大脑MR图像分类具有很高的研究价值。Transformer is an attention-based encoder-decoder structure that has become a mainstream model in Natural Language Processing (NLP). Transformer is completely built based on the self-attention mechanism and enhances the feature representation capabilities of deep learning through the multi-head attention mechanism. Inspired by the success of Transformer in NLP, Dosovitskiy et al. proposed the ViT (Vision Transformer) model, which directly applies Transformer to image block sequences, which can effectively achieve image classification tasks. ViT and its variants have achieved state-of-the-art performance on multiple benchmark datasets. Transformers have become extremely popular in a wide variety of computer vision tasks. Recently, in the field of medical image analysis, Transformer has also been successfully used in tasks such as disease diagnosis, image segmentation, and synthesis, and has achieved remarkable results. However, Transformer's reliance on large-scale data makes it difficult to be widely used. Especially in the field of medical images, data sets tend to be small and have unreliable labels. At the same time, Transformer also suffers from tremendous computational and space complexity when processing high-resolution 3D medical images. Therefore, combining CNN and Transformer methods for brain MR image classification has high research value.
发明内容Contents of the invention
发明目的:本发明提供一种基于感兴趣区域及Transformer的MR图像分类方法及装置,有效学习感兴趣区域的深度特征,从而获得精准的分类结果。Purpose of the invention: The present invention provides an MR image classification method and device based on a region of interest and a Transformer, which can effectively learn the depth features of the region of interest, thereby obtaining accurate classification results.
技术方案:本发明所述的一种基于感兴趣区域及Transformer的MR图像分类方法,具体包括以下步骤:Technical solution: An MR image classification method based on regions of interest and Transformers described in the present invention specifically includes the following steps:
(1)通过特征学习网络CNN学习MR图像高级纹理特征,获得特征图fd;(1) Learn the advanced texture features of MR images through the feature learning network CNN to obtain the feature map fd ;
(2)将特征图fd送入ROI特征提取器中,获取各感兴趣区域的深度特征X;(2) Send the feature map fd to the ROI feature extractor to obtain the depth feature X of each area of interest;
(3)基于Vision Transformer的分类器,对特征图X识别脑区的细微结构变化,并捕获各脑区间的长依赖关系,得到相应的分类结果。(3) The classifier based on Vision Transformer identifies subtle structural changes in the brain area on the feature map
进一步地,步骤(1)所述特征学习网络CNN包括六个核大小相同的3D卷积层和两个3D反卷积层;每两个卷积层后面连接一个最大池化层。Further, the feature learning network CNN in step (1) includes six 3D convolution layers with the same kernel size and two 3D deconvolution layers; each two convolution layers are connected to a maximum pooling layer.
进一步地,所述各卷积层的通道数依次为32,32,64,64,128,128;核大小均为3×3×3;每个卷积层之后都进行批归一化和ReLU激活;将第六个卷积层得到的特征映射和第四个卷积层得到的特征映射上采样;连接两个反卷积层和第二个卷积层输出的特征映射,结合表征信息和语义信息,得到多通道特征映射图像fd。Further, the number of channels of each convolutional layer is 32, 32, 64, 64, 128, 128; the kernel size is 3×3×3; each convolutional layer is followed by batch normalization and ReLU. Activation; upsampling the feature map obtained by the sixth convolutional layer and the feature map obtained by the fourth convolutional layer; connecting the feature maps output by the two deconvolutional layers and the second convolutional layer, combining the representation information and Semantic information is used to obtain a multi-channel feature map image fd .
进一步地,所述两个3D反卷积层,输出通道数均为16,第一个反卷积层核大小为5×5×5,步长为4;第二个反卷积层核大小为3×3×3,步长为2。Further, the number of output channels of the two 3D deconvolution layers is 16, the first deconvolution layer kernel size is 5×5×5, and the step size is 4; the second deconvolution layer kernel size is is 3×3×3, and the step size is 2.
进一步地,所述最大池化层的核大小为2×2×2,步长为2。Further, the kernel size of the maximum pooling layer is 2×2×2, and the step size is 2.
进一步地,步骤(2)所述ROI特征提取器包括一个卷积核大小为1×1×1卷积层和两个组数相同的组卷积层,其中组卷积层中组数量设置为95,每组通道数分别为4和128。Further, the ROI feature extractor in step (2) includes a convolution layer with a convolution kernel size of 1×1×1 and two group convolution layers with the same number of groups, where the number of groups in the group convolution layer is set to 95, the number of channels in each group is 4 and 128 respectively.
进一步地,所述步骤(2)实现过程如下:Further, the implementation process of step (2) is as follows:
特征图fd在送入ROI特征提取单元前,需要通过一个1×1×1的卷积层,将通道数提高,即感兴趣区域的数量:Before the feature map fd is sent to the ROI feature extraction unit, it needs to pass through a 1×1×1 convolution layer to increase the number of channels, that is, the number of regions of interest:
fc=ReLU(Conv(fd)) (1)fc =ReLU(Conv(fd )) (1)
然后送入ROI特征提取单元,具体来说,第r个感兴趣区域的特征图fr为:Then it is sent to the ROI feature extraction unit. Specifically, the feature mapfr of the r-th region of interest is:
式中,δ(S=r)表示分割影像S中属于第r个感兴趣区域的值为100,否则为1,表示第r通道的特征图;然后将fr依次输入一个卷积核大小为16×16×16,步长为8的组卷积层,和一个核大小为19×23×19,步长为1的组卷积层,组数均为95,以展平为128维特征向量:In the formula, δ (S=r) means that the value belonging to the r-th region of interest in the segmented image S is 100, otherwise it is 1, Represents the feature map of the r-th channel; then fr is input in sequence into a group convolution layer with a convolution kernel size of 16×16×16 and a step size of 8, and a kernel size of 19×23×19 with a step size of 1 group convolutional layer, the number of groups is 95, flattened into a 128-dimensional feature vector:
将所有ROI特征向量组成一个序列,并输入到ViT分类器中:Combine all ROI feature vectors into a sequence and input it into the ViT classifier:
式中,N为感兴趣区域的数量。In the formula, N is the number of regions of interest.
进一步地,所述步骤(3)实现过程如下:Further, the implementation process of step (3) is as follows:
使用四层Transformer编码器,多头注意力的头部数量设置为16,并生成一个可学习的嵌入向量,拼接到输入序列X中。Using a four-layer Transformer encoder, the number of heads of multi-head attention is set to 16, and a learnable embedding vector is generated, which is spliced into the input sequence X.
基于相同的发明构思,本发明所述的一种装置设备,包括存储器和处理器,其中:存储器,用于存储能够在处理器上运行的计算机程序;处理器,用于在运行所述计算机程序时,执行如上所述基于感兴趣区域及Transformer的MR图像分类方法的步骤。Based on the same inventive concept, a device according to the present invention includes a memory and a processor, wherein: the memory is used to store a computer program that can be run on the processor; the processor is used to run the computer program. When, the steps of the MR image classification method based on the region of interest and Transformer are performed as described above.
基于相同的发明构思,本发明所述的一种存储介质,所述存储介质上存储有计算机程序,所述计算机程序被至少一个处理器执行时实现如上所述基于感兴趣区域及Transformer的MR图像分类方法的步骤。Based on the same inventive concept, the present invention provides a storage medium in which a computer program is stored on the storage medium. When the computer program is executed by at least one processor, the MR image based on the region of interest and the Transformer is implemented as described above. Steps of the classification method.
有益效果:与现有技术相比,本发明的有益效果:本发明继承了CNN在提取高级特征和增强局部性方面的优势以及Transformer在建立长距离依赖关系方面的优势,能够快速准确的对大脑MR图像进行正确分类。Beneficial effects: Compared with the existing technology, the invention has the following advantages: The invention inherits the advantages of CNN in extracting advanced features and enhancing locality and the advantages of Transformer in establishing long-distance dependencies, and can quickly and accurately analyze the brain MR images are correctly classified.
附图说明Description of the drawings
图1是基于感兴趣区域及Transformer的MR图像分类整体网络架构图;Figure 1 is the overall network architecture diagram of MR image classification based on the region of interest and Transformer;
图2是本发明中特征学习网络结构示意图;Figure 2 is a schematic structural diagram of the feature learning network in the present invention;
图3是本发明中ROI特征提取器结构示意图。Figure 3 is a schematic structural diagram of the ROI feature extractor in the present invention.
具体实施方式Detailed ways
下面结合附图对本发明作进一步详细描述。The present invention will be described in further detail below in conjunction with the accompanying drawings.
如图1所示,本发明提出一种基于感兴趣区域及Transformer的MR图像分类方法,具体实现过程如下:As shown in Figure 1, the present invention proposes an MR image classification method based on regions of interest and Transformer. The specific implementation process is as follows:
步骤1:通过特征学习网络CNN学习MR图像高级纹理特征,获得特征图fd。Step 1: Learn the high-level texture features of the MR image through the feature learning network CNN to obtain the feature map fd .
如图2所示特征学习网络具体结构如下:As shown in Figure 2, the specific structure of the feature learning network is as follows:
包括六个核大小均为3×3×3的3D卷积层,每两个卷积层后面连接一个最大池化层,核大小为2×2×2,步长为2,以学习更多抽象的特征表示,并降低特征映射的大小。具体来说,各卷积层的通道数依次为32,32,64,64,128,128。每个卷积层之后都进行批归一化(batch normalization,BN)和ReLU激活。之后,通过反卷积操作将学习到的特征映射上采样到输入图像大小(即164×192×164)。包括两个3D反卷积层,输出通道数均为16,第一个反卷积层核大小为5×5×5,步长为4,第二个反卷积层核大小为3×3×3,步长为2。分别将第六个卷积层得到的特征映射和第四个卷积层得到的特征映射上采样。最后,连接两个反卷积层和第二个卷积层输出的特征映射,获得三个尺度的特征映射,结合表征信息和语义信息,得到多通道特征映射图像fd,通道数为64。It includes six 3D convolutional layers with kernel sizes of 3×3×3. Each two convolutional layers are followed by a maximum pooling layer with a kernel size of 2×2×2 and a stride of 2 to learn more. Abstract feature representation and reduce the size of feature maps. Specifically, the number of channels of each convolutional layer is 32, 32, 64, 64, 128, 128. Each convolutional layer is followed by batch normalization (BN) and ReLU activation. Afterwards, the learned feature map is upsampled to the input image size (i.e. 164×192×164) through a deconvolution operation. It includes two 3D deconvolution layers, the number of output channels is 16, the first deconvolution layer kernel size is 5×5×5, the stride is 4, the second deconvolution layer kernel size is 3×3 ×3, step size is 2. The feature map obtained by the sixth convolutional layer and the feature map obtained by the fourth convolutional layer are upsampled respectively. Finally, the feature maps output by the two deconvolution layers and the second convolution layer are connected to obtain feature maps of three scales. Combining the representation information and semantic information, a multi-channel feature map image fd is obtained, with the number of channels being 64.
步骤2:将特征图fd送入ROI特征提取器中,获取各感兴趣区域的深度特征X。Step 2: Send the feature map fd to the ROI feature extractor to obtain the depth feature X of each area of interest.
如图3所示,本发明提到的ROI特征提取器包括一个卷积核大小为1×1×1卷积层和两个组数相同的组卷积层,其中组卷积层中组数量设置为95,每组通道数分别为4和128,具体结构如下所述:As shown in Figure 3, the ROI feature extractor mentioned in the present invention includes a convolution layer with a convolution kernel size of 1×1×1 and two group convolution layers with the same number of groups, where the number of groups in the group convolution layer Set to 95, the number of channels in each group is 4 and 128 respectively. The specific structure is as follows:
对于特征学习网络提取的特征映射,首先使用一个1×1×1的卷积和将通道数提高到95,即感兴趣区域的数量:For the feature map extracted by the feature learning network, a 1×1×1 convolution is first used to increase the number of channels to 95, which is the number of regions of interest:
fc=ReLU(Conv(fd)) (1)fc =ReLU(Conv(fd )) (1)
然后送入ROI特征提取单元,具体来说,第r个感兴趣区域的特征图fr为:Then it is sent to the ROI feature extraction unit. Specifically, the feature mapfr of the r-th region of interest is:
式中,δ(S=r)表示分割影像S中属于第r个感兴趣区域的值为100,否则为1,表示第r通道的特征图;然后将fr依次输入一个卷积核大小为16×16×16,步长为8的组卷积层,和一个核大小为19×23×19,步长为1的组卷积层,组数均为95,以展平为128维特征向量。In the formula, δ (S=r) means that the value belonging to the r-th region of interest in the segmented image S is 100, otherwise it is 1, Represents the feature map of the r-th channel; then fr is input in sequence into a group convolution layer with a convolution kernel size of 16×16×16 and a step size of 8, and a kernel size of 19×23×19 with a step size of The group convolutional layer of 1 has a group number of 95 and is flattened into a 128-dimensional feature vector.
将所有ROI特征向量组成一个序列,并输入到Vision Transformer(ViT)分类器中:All ROI feature vectors are formed into a sequence and input into the Vision Transformer (ViT) classifier:
式中,N为感兴趣区域的数量。In the formula, N is the number of regions of interest.
步骤3:基于ViT的分类器,对特征图X识别脑区的细微结构变化,并捕获各脑区间的长依赖关系,得到正确的分类结果。Step 3: Based on the ViT classifier, identify subtle structural changes in the brain area on the feature map
本实施方式使用了四层Transformer编码器,多头注意力的头部数量设置为16,并生成了一个可学习的嵌入向量[cls],拼接到输入序列X中。它的作用类似于ViT的[class]标记,其输出特征聚合了全局特征,因此可作为图像的分类特征通过一个线性分类器直接实现分类。每个图像都有标签,包括阿尔兹海默病患者(AD),正常对照(NC),轻度认知障碍患者(MCI),若输出的分类结果与标签相同则正确。This implementation uses a four-layer Transformer encoder, sets the number of heads of multi-head attention to 16, and generates a learnable embedding vector [cls], which is spliced into the input sequence X. Its function is similar to ViT's [class] tag. Its output feature aggregates global features, so it can be used as a classification feature of the image to directly implement classification through a linear classifier. Each image has a label, including Alzheimer's disease patient (AD), normal control (NC), and mild cognitive impairment patient (MCI). If the output classification result is the same as the label, it is correct.
本发明在国际标准数据集ADNI数据集上进行验证了对MR图像的分类性能,如表1所示,采用本发明得到的ADNI数据集阿尔兹海默病相关任务分类结果。可以看出本方法的分类精度很高,尤其在轻度认知障碍(Mild Cognitive Impairment,MCI)转换预测任务上取得了显著的提升(即ACC=0.875、SEN=0.824、SPE=0.913和AUC=0.937),在AD分类任务上,本发明方法在四个指标上也都得到了更好的结果(即ACC=0.940、SEN=0.942、SPE=0.938和AUC=0.968)。The present invention verified the classification performance of MR images on the ADNI data set, an international standard data set. As shown in Table 1, the ADNI data set Alzheimer's disease-related task classification results obtained by the present invention were used. It can be seen that the classification accuracy of this method is very high, especially in the mild cognitive impairment (Mild Cognitive Impairment, MCI) conversion prediction task, which has achieved significant improvement (i.e. ACC=0.875, SEN=0.824, SPE=0.913 and AUC= 0.937), on the AD classification task, the method of the present invention also achieved better results on four indicators (i.e. ACC=0.940, SEN=0.942, SPE=0.938 and AUC=0.968).
表1是基于感兴趣区域-Transformer的MR图像分类方法结果Table 1 is the results of the MR image classification method based on the region of interest-Transformer
本发明基于感兴趣区域及Transformer结构对MR图像进行分类,结合了CNN在提取低级特征和增强局部性方面的优势以及Transformer在建立长距离依赖关系方面的优势,进而得到了精准的分类结果。This invention classifies MR images based on the region of interest and the Transformer structure, combining the advantages of CNN in extracting low-level features and enhancing locality and the advantages of Transformer in establishing long-distance dependencies, thereby obtaining accurate classification results.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202311249430.2ACN117372750A (en) | 2023-09-26 | 2023-09-26 | MR image classification method and device based on region of interest and transducer |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202311249430.2ACN117372750A (en) | 2023-09-26 | 2023-09-26 | MR image classification method and device based on region of interest and transducer |
| Publication Number | Publication Date |
|---|---|
| CN117372750Atrue CN117372750A (en) | 2024-01-09 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202311249430.2APendingCN117372750A (en) | 2023-09-26 | 2023-09-26 | MR image classification method and device based on region of interest and transducer |
| Country | Link |
|---|---|
| CN (1) | CN117372750A (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118573914A (en)* | 2024-05-21 | 2024-08-30 | 中国科学技术大学 | Transformer-oriented continuous video stream low redundancy reasoning method |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118573914A (en)* | 2024-05-21 | 2024-08-30 | 中国科学技术大学 | Transformer-oriented continuous video stream low redundancy reasoning method |
| Publication | Publication Date | Title |
|---|---|---|
| CN114119638B (en) | Medical image segmentation method integrating multi-scale features and attention mechanisms | |
| CN109920501A (en) | Electronic medical record classification method and system based on convolutional neural network and active learning | |
| CN111898432B (en) | Pedestrian detection system and method based on improved YOLOv3 algorithm | |
| CN117274608B (en) | Semantic segmentation method of remote sensing images based on spatial detail perception and attention guidance | |
| CN111738363B (en) | Alzheimer disease classification method based on improved 3D CNN network | |
| CN111524144B (en) | Intelligent lung nodule diagnosis method based on GAN and Unet network | |
| CN115731178A (en) | A Cross-Modal Unsupervised Domain Adaptive Medical Image Segmentation Method | |
| CN113449548B (en) | Method and device for updating object recognition model | |
| CN112381846B (en) | An ultrasound thyroid nodule segmentation method based on asymmetric network | |
| CN114037699B (en) | Pathological image classification method, equipment, system and storage medium | |
| CN112037239B (en) | Text guidance image segmentation method based on multi-level explicit relation selection | |
| CN112420170B (en) | Method for improving image classification accuracy of computer aided diagnosis system | |
| CN115841464A (en) | Multi-modal brain tumor image segmentation method based on self-supervision learning | |
| CN117274607A (en) | Multi-path pyramid-based lightweight medical image segmentation network, method and equipment | |
| CN118470036A (en) | HL-UNet image segmentation model and cardiac dynamic magnetic resonance imaging segmentation method | |
| CN113496228B (en) | Human body semantic segmentation method based on Res2Net, transUNet and cooperative attention | |
| CN119941731B (en) | Lung nodule analysis method, system, equipment and medium based on large model | |
| CN118196113A (en) | A liver and tumor segmentation method based on SNAU-Net | |
| CN116758005A (en) | PET/CT medical image-oriented detection method | |
| CN115471512B (en) | Medical image segmentation method based on self-supervision contrast learning | |
| CN117372750A (en) | MR image classification method and device based on region of interest and transducer | |
| CN119273914B (en) | Image-text fusion-based few-sample semantic segmentation method | |
| CN114494827A (en) | A small target detection method for detecting aerial pictures | |
| CN117437514B (en) | A colposcopy image modality conversion method based on CycleGan | |
| CN113469962A (en) | Feature extraction and image-text fusion method and system for cancer lesion detection |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |