CN117523202A

Movatterモバイル変換

Info

Publication number: CN117523202A
Application number: CN202311534511.7A
Authority: CN
Inventors: 曹新容; 李睿; 丁诗峰; 李佐勇; 滕升华
Original assignee: Minjiang University
Current assignee: Minjiang University
Priority date: 2023-11-17
Filing date: 2023-11-17
Publication date: 2024-02-06

Abstract

Translated fromChinese

本发明涉及一种基于视觉注意力融合网络的眼底血管图像分割方法。该方法引入具有全局信息感知能力的Transformer架构，结合卷积神经网络，对血管特征进行多尺度提取以及全局与局部的特征融合来实现对视网膜血管的分割。首先，利用卷积注意力Transformer编码器对血管特征编码，在编码器最后利用并行特征细化块对血管信息进行全局与局部特征融合，实现视网膜血管的全局信息交互。最后在解码器中引入了对微小血管分割表现良好的全局空间激活模块，并逐步恢复分割掩码。实验结果表明，本章提出的方法可以提升视网膜血管的分割精度，能够很好地检测和分割血管。

The invention relates to a fundus blood vessel image segmentation method based on a visual attention fusion network. This method introduces a Transformer architecture with global information perception capabilities, combined with a convolutional neural network, to perform multi-scale extraction of blood vessel features and fusion of global and local features to achieve segmentation of retinal blood vessels. First, the convolutional attention Transformer encoder is used to encode the blood vessel features. At the end of the encoder, the parallel feature refinement block is used to fuse global and local features of the blood vessel information to achieve global information interaction of retinal blood vessels. Finally, a global spatial activation module that performs well on tiny blood vessel segmentation is introduced in the decoder, and the segmentation mask is gradually restored. Experimental results show that the method proposed in this chapter can improve the segmentation accuracy of retinal blood vessels and can detect and segment blood vessels well.

Description

Translated fromChinese

一种基于视觉注意力融合网络的眼底血管图像分割方法A fundus blood vessel image segmentation method based on visual attention fusion network

技术领域Technical field

本发明涉及图像处理技术领域，特别是一种基于视觉注意力融合网络的眼底血管图像分割方法。The present invention relates to the field of image processing technology, in particular to a fundus blood vessel image segmentation method based on a visual attention fusion network.

背景技术Background technique

眼底是全身唯一能用肉眼直接观察到动脉、静脉和毛细血管的部位，这些血管可以反映人体全身血液循环的动态以及健康状况，许多全身疾病都可以从眼底图像上反映出来。血管的形态发生变异是眼底病变的主要特征。因此，分析眼底图像是医生检查和诊断眼部疾病的关键步骤。但眼底图像在获取过程中往往会因为光照差异、设备限制的原因，出现亮度对比度分布不均、噪声、模糊等问题，导致图像中的血管结构不突出，难以进行准确的分析。有经验的专家能对眼底图像中的血管进行初步提取，但由于低对比度和噪声等干扰因素的存在，人工血管分割不仅费时费力而且容易出错。因此获取高质量的眼底血管图像结构是分析和诊断心脑血管疾病和眼科疾病的关键步骤。The fundus is the only part of the body where arteries, veins and capillaries can be directly observed with the naked eye. These blood vessels can reflect the dynamics of blood circulation and health conditions throughout the body. Many systemic diseases can be reflected in fundus images. Morphological variation of blood vessels is the main feature of fundus lesions. Therefore, analyzing fundus images is a critical step for doctors to examine and diagnose eye diseases. However, during the acquisition process, fundus images often suffer from problems such as uneven brightness and contrast distribution, noise, and blur due to differences in lighting and equipment limitations. As a result, the blood vessel structure in the image is not prominent, making it difficult to perform accurate analysis. Experienced experts can initially extract blood vessels in fundus images, but due to the presence of interference factors such as low contrast and noise, artificial blood vessel segmentation is not only time-consuming and labor-intensive but also error-prone. Therefore, obtaining high-quality fundus blood vessel image structures is a key step in analyzing and diagnosing cardiovascular and cerebrovascular diseases and ophthalmic diseases.

目前多数视网膜血管分割架构均是由卷积神经网络构成，利用CNN的局部特征提取能力来提取血管信息。但是CNN存在一定的缺点，就是其不能够对全局信息进行建模，忽略了当前分割区域与相邻血管特征之间的相关性。因此本章在视网膜血管分割任务中引入具有全局信息感知能力的Transformer架构，结合CNN强大的特征提取能力来实现对血管的全局和局部特征融合，从而提高视网膜血管的分割精度。At present, most retinal blood vessel segmentation architectures are composed of convolutional neural networks, using the local feature extraction capabilities of CNN to extract blood vessel information. However, CNN has certain shortcomings, that is, it cannot model global information and ignores the correlation between the current segmentation area and adjacent blood vessel features. Therefore, this chapter introduces the Transformer architecture with global information perception capabilities in the retinal blood vessel segmentation task, and combines the powerful feature extraction capabilities of CNN to achieve global and local feature fusion of blood vessels, thereby improving the segmentation accuracy of retinal blood vessels.

在医学图像处理领域，研究人员曾利用空洞卷积或者增大卷积核尺寸的方式扩大感受野，尝试获取当前区域与相邻像素之间的关系。但仍然无法完全从整个特征图中挖掘全局信息来协助网络提取血管特征。Transformer由于其强大的全局信息提取能力和远程序列建模能力在自然语言处理领域得到了很广泛的应用，Alexey Dosovitskiy等人将其迁移到计算机视觉领域提出了Vision Transformer(ViT)用于图像的处理与分析，并在图像分类方向探索出一条新的道路。目前也有研究者借助ViT优秀的全局信息感知能力探索其在医学图像处理领域的表现，并取得了一定的成果。In the field of medical image processing, researchers have used atrous convolution or increasing the size of the convolution kernel to expand the receptive field and try to obtain the relationship between the current area and adjacent pixels. However, it is still impossible to fully mine global information from the entire feature map to assist the network in extracting blood vessel features. Transformer has been widely used in the field of natural language processing due to its powerful global information extraction capabilities and long-range sequence modeling capabilities. Alexey Dosovitskiy and others migrated it to the field of computer vision and proposed Vision Transformer (ViT) for image processing. and analysis, and explore a new path in the direction of image classification. At present, some researchers are also using ViT's excellent global information perception capabilities to explore its performance in the field of medical image processing, and have achieved certain results.

利用CNN可以实现对视网膜血管的局部特征提取，卷积运算的低计算量以及平移不变性的优点被广泛应用于计算机视觉领域，但其固有的感受野使得其无法捕捉远距离信息。Transformer提供了一个新的思路，其自注意力机制具有强大的全局信息感知能力，能够从全局角度出发实现特征信息的挖掘。但同样它巨大的计算量，忽略图像的二维空间位置信息等缺点限制了Transformer在计算机视觉领域的进一步发挥。CNN can be used to extract local features of retinal blood vessels. The advantages of low computational complexity and translation invariance of convolution operations are widely used in the field of computer vision, but its inherent receptive field makes it unable to capture long-range information. Transformer provides a new idea. Its self-attention mechanism has strong global information perception capabilities and can mine feature information from a global perspective. But it also has shortcomings such as its huge amount of calculation and neglect of the two-dimensional spatial position information of the image, which limits the further development of Transformer in the field of computer vision.

发明内容Contents of the invention

有鉴于此，本发明的目的在于提供一种基于视觉注意力融合网络的眼底血管图像分割方法，提升视网膜血管的分割精度，能够很好地检测和分割血管。In view of this, the purpose of the present invention is to provide a fundus blood vessel image segmentation method based on a visual attention fusion network to improve the segmentation accuracy of retinal blood vessels and enable good detection and segmentation of blood vessels.

为实现上述目的，本发明采用如下技术方案：一种基于视觉注意力融合网络的眼底血管图像分割方法，包括卷积注意力Transformer编码器，并行特征细化块实现对血管特征全局和局部信息的融合；Transformer编码器中的CA-Trans block采用计算速度快，参数量少且适用于高分辨率图像的卷积注意力来编码特征；在Transformer编码器的最后引入并行特征细化块，强调血管特征的全局信息交互；在解码器阶段加入对微小血管分割友好的全局空间激活模块，并通过逐步解码重建精细的视网膜血管分割图。In order to achieve the above purpose, the present invention adopts the following technical solution: a fundus blood vessel image segmentation method based on a visual attention fusion network, including a convolutional attention Transformer encoder and a parallel feature refinement block to realize global and local information of blood vessel features. Fusion; the CA-Trans block in the Transformer encoder uses convolutional attention that is fast in calculation, has a small number of parameters, and is suitable for high-resolution images to encode features; a parallel feature refinement block is introduced at the end of the Transformer encoder to emphasize blood vessels Global information interaction of features; a global spatial activation module friendly to small blood vessel segmentation is added in the decoder stage, and a fine retinal blood vessel segmentation map is reconstructed through step-by-step decoding.

在一较佳的实施例中，所述Transformer编码器包括一个标准卷积层和三个重复的多尺度卷积注意力层组成；标准卷积层由两组步长为1的3×3卷积、DropBlock、GN、ReLU激活函数组成；多尺度卷积注意力层中的CA-Transformerblock沿用了ViT的经典架构，包含一个归一化层，一个多尺度卷积注意力层和一个前馈层，通过两次残差连接的方式与该层输入拼接，来提取眼底图像中的血管信息；其中多出度视觉注意力的输出可表示为：In a preferred embodiment, the Transformer encoder includes a standard convolutional layer and three repeated multi-scale convolutional attention layers; the standard convolutional layer consists of two sets of 3×3 convolutions with a stride of 1 It is composed of product, DropBlock, GN, and ReLU activation functions; the CA-Transformerblock in the multi-scale convolution attention layer follows the classic architecture of ViT, including a normalization layer, a multi-scale convolution attention layer and a feed-forward layer. , and is spliced with the input of this layer through two residual connections to extract the blood vessel information in the fundus image; the output of the extra degree of visual attention can be expressed as:

其中MCAtt代表多尺度卷积注意力输出，DWConv表示深度卷积，Branch_i,i∈{0,1,2,3}，表示第i个分支。Among them, MCAtt represents the multi-scale convolution attention output, DWConv represents the depth convolution, and Branch_i ,i∈{0,1,2,3} represents the i-th branch.

在一较佳的实施例中，所述Transformer编码器由一个标准卷积层和三个多尺度卷积注意力层构成。In a preferred embodiment, the Transformer encoder consists of one standard convolutional layer and three multi-scale convolutional attention layers.

在一较佳的实施例中，所述并行特征细化块将自注意力与多尺度卷积注意力进行并行的融合；引入自注意力机制来对特征图的全局信息进行建模，强调全局信息的重要性；在经过归一化之后，对自注意力分支的输入进行Reshape，之后送入多头自注意力中，输出与多尺度卷积注意力分支的输出进行拼接，之后经过1×1的逐点卷积整合特征信息，最后与输入做残差连接后送入前馈层；并行特征细化块的输出表示为：In a preferred embodiment, the parallel feature refinement block fuses self-attention and multi-scale convolution attention in parallel; a self-attention mechanism is introduced to model the global information of the feature map, emphasizing the global The importance of information; after normalization, the input of the self-attention branch is reshaped, and then sent to the multi-head self-attention. The output is spliced with the output of the multi-scale convolution attention branch, and then passed through 1×1 The point-by-point convolution integrates the feature information, and finally the residual connection is made with the input and sent to the feed-forward layer; the output of the parallel feature refinement block is expressed as:

其中F表示输入特征图，MHA表示多头注意力，DWConv表示深度卷积，Branch_i,i∈{0,1,2,3}表示第i个分支。Among them, F represents the input feature map, MHA represents multi-head attention, DWConv represents depth convolution, and Branch_i ,i∈{0,1,2,3} represents the i-th branch.

在一较佳的实施例中，所述全局空间激活模块即GSA模块，首先对输入的特征图做通道维度的全局平均池化和最大池化得到两个1*H*W的特征图，通过1*1卷积和sigmoid函数得到1*H*W的全局信息聚合的血管特征概率图；之后在空间域上应用空间激活函数得到自适应的加权概率图；最后与输入特征图做逐元素相乘运算得到激活后的特征图；空间激活函数的公式如下：In a preferred embodiment, the global spatial activation module, that is, the GSA module, first performs global average pooling and maximum pooling on the input feature map in the channel dimension to obtain two 1*H*W feature maps. 1*1 convolution and sigmoid function obtain a 1*H*W global information aggregation blood vessel feature probability map; then apply a spatial activation function in the spatial domain to obtain an adaptive weighted probability map; finally, perform element-by-element comparison with the input feature map The multiplication operation obtains the activated feature map; the formula of the spatial activation function is as follows:

其中p表示输入特征图上的每个像素值；where p represents each pixel value on the input feature map;

全局空间激活模块的输出可表示为：The output of the global spatial activation module can be expressed as:

其中F_Max和F_Avg分别表示经过全局最大池化与平均池化后的特征图，c^1×1表示1×1的卷积操作，δ表示Sigmoid激活函数，F表示输入特征图。Among them, F_Max and F_Avg represent the feature map after global maximum pooling and average pooling respectively, c^1×1 represents the 1×1 convolution operation, δ represents the Sigmoid activation function, and F represents the input feature map.

在一较佳的实施例中，采用二进制交叉熵损失函数通过对所有样本求每个样本交叉熵的均值来优化模型，定义如下：In a preferred embodiment, a binary cross-entropy loss function is used to optimize the model by averaging the cross-entropy of each sample for all samples, which is defined as follows:

Loss＝-(ylogp+(1-y)log(1-p))Loss＝-(ylogp+(1-y)log(1-p))

其中y表示血管标签，p表示分割为血管的概率。Among them, y represents the blood vessel label, and p represents the probability of segmentation into blood vessels.

与现有技术相比，本发明具有以下有益效果：本发明的目的在于提高眼底图像的血管分割精度，提供一种基于视觉注意力融合的眼底血管图像分割方法，该方法引入具有全局信息感知能力的Transformer架构，结合卷积神经网络，对血管特征进行多尺度提取以及全局与局部的特征融合来实现对视网膜血管的分割。首先，利用卷积注意力Transformer编码器对血管特征编码，在编码器最后利用并行特征细化块对血管信息进行全局与局部特征融合，实现视网膜血管的全局信息交互。最后在解码器中引入了对微小血管分割表现良好的全局空间激活模块，并逐步恢复分割掩码。本发明能够高效、准确地完成眼底血管图像分割任务，对微小血管起到很好的分割效果。Compared with the existing technology, the present invention has the following beneficial effects: The purpose of the present invention is to improve the blood vessel segmentation accuracy of fundus images and provide a fundus blood vessel image segmentation method based on visual attention fusion, which introduces global information perception capabilities The Transformer architecture, combined with the convolutional neural network, performs multi-scale extraction of blood vessel features and fusion of global and local features to achieve segmentation of retinal blood vessels. First, the convolutional attention Transformer encoder is used to encode the blood vessel features. At the end of the encoder, the parallel feature refinement block is used to fuse global and local features of the blood vessel information to achieve global information interaction of retinal blood vessels. Finally, a global spatial activation module that performs well on tiny blood vessel segmentation is introduced in the decoder, and the segmentation mask is gradually restored. The invention can efficiently and accurately complete the task of segmenting fundus blood vessel images, and achieves a good segmentation effect on tiny blood vessels.

附图说明Description of drawings

图1为本发明优选实施例的方法流程图。Figure 1 is a method flow chart of a preferred embodiment of the present invention.

图2为本发明优选实施例的网络模型架构。Figure 2 is a network model architecture of a preferred embodiment of the present invention.

图3为本发明优选实施例的卷积注意力Transformer编码器。Figure 3 is a convolutional attention Transformer encoder according to a preferred embodiment of the present invention.

图4为本发明优选实施例的并行特征细化块。Figure 4 is a parallel feature refinement block of a preferred embodiment of the present invention.

图5为本发明优选实施例的全局空间激活模块。Figure 5 is a global space activation module according to a preferred embodiment of the present invention.

图6为本发明优选实施例的DRIVE、CHASE_DB1、STARE数据集上不同模型得到的分割结果。Figure 6 shows the segmentation results obtained by different models on the DRIVE, CHASE_DB1, and STARE data sets according to the preferred embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图及实施例对本发明做进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and examples.

应该指出，以下详细说明都是例示性的，旨在对本申请提供进一步的说明。除非另有指明，本文使用的所有技术和科学术语具有与本申请所属技术领域的普通技术人员通常理解的相同含义。It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the present application. Unless otherwise defined, all technical and scientific terms used herein have the same meanings commonly understood by one of ordinary skill in the art to which this application belongs.

需要注意的是，这里所使用的术语仅是为了描述具体实施方式，而非意图限制根据本申请的示例性实施方式；如在这里所使用的，除非上下文另外明确指出，否则单数形式也意图包括复数形式，此外，还应当理解的是，当在本说明书中使用术语“包含”和/或“包括”时，其指明存在特征、步骤、操作、器件、组件和/或它们的组合。It should be noted that the terms used herein are for the purpose of describing particular embodiments only and are not intended to limit the exemplary embodiments according to the application; as used herein, the singular form is also intended to include unless the context clearly indicates otherwise. Plural forms, in addition, it should also be understood that when the terms "comprises" and/or "includes" are used in this specification, they indicate the presence of features, steps, operations, means, components and/or combinations thereof.

本发明一种基于视觉注意力融合的眼底血管图像分割方法，首先，利用卷积注意力Transformer编码器对血管特征编码，在编码器最后利用并行特征细化块对血管信息进行全局与局部特征融合，实现视网膜血管的全局信息交互。最后引入全局空间激活模块，并逐步恢复分割掩码。The present invention is a fundus blood vessel image segmentation method based on visual attention fusion. First, a convolutional attention Transformer encoder is used to encode blood vessel features. At the end of the encoder, a parallel feature refinement block is used to fuse global and local features of the blood vessel information. , to achieve global information exchange of retinal blood vessels. Finally, a global spatial activation module is introduced and the segmentation mask is gradually restored.

以下为本发明具体实现过程。The following is the specific implementation process of the present invention.

为了将眼底血管从复杂背景的眼底图像中分割出来，如图1-6所示，本发明提出了一种改进的U-Net网络模型，其结构如图2所示，首先，利用卷积注意力Transformer编码器对血管特征编码，在编码器最后利用并行特征细化块对血管信息进行全局与局部特征融合，实现视网膜血管的全局信息交互。最后在解码器中引入了对微小血管分割表现良好的全局空间激活模块，并逐步恢复分割掩码。In order to segment fundus blood vessels from fundus images with complex backgrounds, as shown in Figures 1-6, the present invention proposes an improved U-Net network model, whose structure is shown in Figure 2. First, using convolution attention The Force Transformer encoder encodes blood vessel features, and at the end of the encoder, a parallel feature refinement block is used to fuse global and local features of blood vessel information to achieve global information interaction of retinal blood vessels. Finally, a global spatial activation module that performs well on tiny blood vessel segmentation is introduced in the decoder, and the segmentation mask is gradually restored.

1、卷积注意力Transformer编码器1. Convolutional attention Transformer encoder

编码器阶段是对血管信息进行特征提取的阶段，卷积注意力Transformer编码器包括一个标准卷积层和三个重复的多尺度卷积注意力层组成。标准卷积层由两组步长为1的3×3卷积、DropBlock、GN、ReLU激活函数组成。，首先经过一个1×1的卷积和GeLU激活函数，之后送入多尺度卷积注意力块，经过一个3×3的深度卷积，之后分别通过多个K×1和1×K的条状深度卷积来捕获多尺度信息，最后利用1×1的逐点卷积整合不同通道下的多尺度信息作为注意力权重来对输入特征图进行加权。多尺度卷积注意力层中的CA-Transformerblock沿用了ViT的经典架构，包含一个归一化层，一个多尺度卷积注意力层和一个前馈层，通过两次残差连接的方式与该层输入拼接，具体结构如图3所示。The encoder stage is the stage for feature extraction of blood vessel information. The convolutional attention Transformer encoder consists of a standard convolutional layer and three repeated multi-scale convolutional attention layers. The standard convolutional layer consists of two sets of 3×3 convolution, DropBlock, GN, and ReLU activation functions with a stride of 1. , first through a 1×1 convolution and GeLU activation function, and then sent to the multi-scale convolution attention block, through a 3×3 depth convolution, and then through multiple K×1 and 1×K strips respectively. depth convolution to capture multi-scale information, and finally use 1×1 point-wise convolution to integrate multi-scale information under different channels as attention weights to weight the input feature map. The CA-Transformerblock in the multi-scale convolutional attention layer follows the classic architecture of ViT, including a normalization layer, a multi-scale convolutional attention layer and a feed-forward layer, which are connected to the multi-scale convolution attention layer through two residual connections. Layer input splicing, the specific structure is shown in Figure 3.

2、并行特征细化块2. Parallel feature refinement block

为了可以充分利用特征图的全局信息，在编码器的最后设计了并行特征细化块(Parallel Feature Refinement block，PFR)。它与编码器中的CA-Transformerblock不同的是，PFR将自注意力与多尺度卷积注意力进行了并行的融合。此时的特征图具有了高级的语义信息，但是由于分辨率的降低损失了全局信息，因此引入自注意力机制来对特征图的全局信息进行建模，强调全局信息的重要性。并行特征细化块的结构如图4所示，在经过归一化之后，对自注意力分支的输入进行Reshape，之后送入多头自注意力中，输出与多尺度卷积注意力分支的输出进行拼接，之后经过1×1的逐点卷积整合特征信息，最后与输入做残差连接后送入前馈层。图4显示了该模块的详细结构。In order to make full use of the global information of the feature map, a Parallel Feature Refinement block (PFR) is designed at the end of the encoder. It is different from the CA-Transformerblock in the encoder in that PFR fuses self-attention and multi-scale convolutional attention in parallel. At this time, the feature map has high-level semantic information, but due to the reduction of resolution, global information is lost. Therefore, a self-attention mechanism is introduced to model the global information of the feature map and emphasize the importance of global information. The structure of the parallel feature refinement block is shown in Figure 4. After normalization, the input of the self-attention branch is reshaped, and then sent to the multi-head self-attention, and the output is combined with the output of the multi-scale convolution attention branch. After splicing, the feature information is integrated through 1×1 point-by-point convolution, and finally the residual connection is made with the input and sent to the feed-forward layer. Figure 4 shows the detailed structure of this module.

3、全局空间激活模块3. Global space activation module

提出改进的全局空间激活模块，该模块可以从全局聚合的空间信息中增强微小血管特征表示，并通过网络训练抑制无关噪声信息。解码阶段是将拥有高级语义的特征图逐步恢复到原始分辨率，在解码器中加入该模块会减少无关噪声信息的干扰，同时增强模型对微小血管的识别能力。An improved global spatial activation module is proposed, which can enhance the feature representation of tiny blood vessels from globally aggregated spatial information and suppress irrelevant noise information through network training. The decoding stage is to gradually restore the feature map with high-level semantics to the original resolution. Adding this module to the decoder will reduce the interference of irrelevant noise information and enhance the model's ability to identify tiny blood vessels.

首先对输入的特征图做通道维度的全局平均池化和最大池化得到两个1*H*W的特征图，通过1*1卷积和sigmoid函数得到1*H*W的全局信息聚合的血管特征概率图，之后在空间域上应用空间激活函数得到自适应的加权概率图，最后与输入特征图做逐元素相乘运算得到激活后的特征图。经过观察最终的血管分割概率图，发现微小血管的概率值集中在0.5附近，而背景和较粗壮的血管值更趋向与0和1。为了强调微小血管的重要性，选择使用高斯函数对血管概率图进行加权，同时添加偏置项，以实现对微小血管的关注。全局空间激活模块详细结构如图5所示。First, global average pooling and maximum pooling of the channel dimension are performed on the input feature map to obtain two 1*H*W feature maps. Through 1*1 convolution and sigmoid function, a global information aggregation of 1*H*W is obtained. The blood vessel feature probability map is then applied to the spatial domain to obtain an adaptive weighted probability map. Finally, element-wise multiplication is performed with the input feature map to obtain the activated feature map. After observing the final blood vessel segmentation probability map, we found that the probability values of small blood vessels are concentrated around 0.5, while the background and thicker blood vessel values tend to be 0 and 1. In order to emphasize the importance of small blood vessels, a Gaussian function was chosen to weight the blood vessel probability map, and a bias term was added to focus on the small blood vessels. The detailed structure of the global spatial activation module is shown in Figure 5.

4、损失函数4. Loss function

视网膜血管分割实质上是一个像素级的二分类任务，所以在实验中采用二进制交叉熵损失函数通过对所有样本求每个样本交叉熵的均值来优化模型，定义如下：Retinal blood vessel segmentation is essentially a pixel-level binary classification task, so in the experiment, a binary cross-entropy loss function was used to optimize the model by calculating the mean cross-entropy of each sample for all samples, which is defined as follows:

Loss＝-(ylogp+(1-y)log(1-p))Loss＝-(ylogp+(1-y)log(1-p))

5、实验数据及评价5. Experimental data and evaluation

为了评价眼底血管图像分割算法的性能，在公开的视网膜血管分割数据集DRIVE和CHASE_DB1进行了验证实验。输出结果中的每个预测值被分为四类：TP(True Positive)表示真阳性样本，FP(False Positive)表示假阳性样本,TN(True Negative)表示真阴性样本,FN(False Negative)表示假阴性样本。基于这四种样本分类采用通用的灵敏度(SE)，特异性(SP)，准确度(ACC)，曲线下面积(AUC)，F1分数作为最终的评价指标。Acc是一种综合度量方法，是指分类正确的样本个数占所有样本个数的比例。Se是指正确分割为血管的比率，SP是指正确分割为非血管的比率。Auc是指Receiver Operating Characteristic(ROC)下的面积。AUC值越大，表明模型的性能越好。F1分数同时兼顾了模型的准确率和召回率，可以看作是模型准确率和召回率的一种加权平均，值越大意味着模型越好。以上指标的定义如下：In order to evaluate the performance of the fundus blood vessel image segmentation algorithm, verification experiments were conducted on the public retinal blood vessel segmentation data sets DRIVE and CHASE_DB1. Each prediction value in the output result is divided into four categories: TP (True Positive) represents true positive samples, FP (False Positive) represents false positive samples, TN (True Negative) represents true negative samples, and FN (False Negative) represents False negative samples. Based on these four sample classifications, the common sensitivity (SE), specificity (SP), accuracy (ACC), area under the curve (AUC), and F1 score are used as the final evaluation indicators. Acc is a comprehensive measurement method, which refers to the proportion of the number of correctly classified samples to the number of all samples. Se refers to the rate of correct segmentation into blood vessels, and SP refers to the rate of correct segmentation into non-vessels. Auc refers to the area under the Receiver Operating Characteristic (ROC). The larger the AUC value, the better the performance of the model. The F1 score takes into account both the accuracy and recall of the model. It can be regarded as a weighted average of the accuracy and recall of the model. The larger the value, the better the model. The definitions of the above indicators are as follows:

5、定量比较5. Quantitative comparison

为了评估本发明模型的分割效果，本发明将其分割结果与其他眼底血管图像分割模型进行比较。如表1所示，VAT-Net在三个数据集上的灵敏度均是最高的，分别为80.52％、84.77％以及82.96％。灵敏度的提升得益于并行特征细化块的全局和局部信息融合，可以识别更多正确的血管像素。定量的数据表明VAT-Net的性能优于其他经典的分割方法，可以胜任视网膜血管分割任务。In order to evaluate the segmentation effect of the model of the present invention, the present invention compares its segmentation results with other fundus blood vessel image segmentation models. As shown in Table 1, VAT-Net has the highest sensitivity on the three data sets, which are 80.52%, 84.77% and 82.96% respectively. The improvement in sensitivity is due to the fusion of global and local information in parallel feature refinement blocks, which can identify more correct blood vessel pixels. Quantitative data shows that VAT-Net performs better than other classic segmentation methods and is capable of retinal blood vessel segmentation tasks.

表1：不同数据集上的定量分割结果Table 1: Quantitative segmentation results on different datasets

6、定性比较6. Qualitative comparison

图6给出了VAT-Net和其他分割模型在三个数据集上的视觉分割结果示例，其中R2U-Net和AttU-Net在U-Net的基础上分割效果有所提升，但仍存在部分血管分割错误以及出现血管断裂的情况，而VAT-Net在U-Net基础上引入了多尺度卷积注意力Transformer架构来提取血管特征信息并通过自注意力与多尺度卷积注意力并行的方式融合血管特征的全局与局部信息，使网络在整体血管分割以及血管连接处的分割更加准确。实验结果表明，VAT-Net可以提升视网膜血管的分割精度，能够很好地检测和分割血管。Figure 6 shows examples of visual segmentation results of VAT-Net and other segmentation models on three data sets. Among them, R2U-Net and AttU-Net have improved segmentation effects based on U-Net, but some blood vessels still exist. Segmentation errors and blood vessel breaks occur, and VAT-Net introduces a multi-scale convolution attention Transformer architecture based on U-Net to extract blood vessel feature information and fuse it in a parallel manner with self-attention and multi-scale convolution attention. The global and local information of blood vessel characteristics makes the network more accurate in overall blood vessel segmentation and blood vessel junction segmentation. Experimental results show that VAT-Net can improve the segmentation accuracy of retinal blood vessels and can detect and segment blood vessels well.

本发明方法提出了一种多尺度通道融合与空间激活网络模型，实现对眼底血管图像，提高视网膜血管的分割精度。具体地，首先，利用卷积注意力Transformer编码器对血管特征编码，在编码器最后利用并行特征细化块对血管信息进行全局与局部特征融合，实现视网膜血管的全局信息交互。最后在解码器中引入了对微小血管分割表现良好的全局空间激活模块，并逐步恢复分割掩码。实验结果表明，提出的方法可以提升视网膜血管的分割精度，能够很好地检测和分割血管。The method of the present invention proposes a multi-scale channel fusion and spatial activation network model to realize fundus blood vessel images and improve the segmentation accuracy of retinal blood vessels. Specifically, first, the convolutional attention Transformer encoder is used to encode the blood vessel features, and at the end of the encoder, the parallel feature refinement block is used to fuse global and local features of the blood vessel information to achieve global information interaction of retinal blood vessels. Finally, a global spatial activation module that performs well on tiny blood vessel segmentation is introduced in the decoder, and the segmentation mask is gradually restored. Experimental results show that the proposed method can improve the segmentation accuracy of retinal blood vessels and can detect and segment blood vessels well.

以上是本发明的较佳实施例，凡依本发明技术方案所作的改变，所产生的功能作用未超出本发明技术方案的范围时，均属于本发明的保护范围。The above are the preferred embodiments of the present invention. Any changes made according to the technical solution of the present invention and the resulting functional effects do not exceed the scope of the technical solution of the present invention, all belong to the protection scope of the present invention.

Claims

1. A ocular fundus blood vessel image segmentation method based on a visual attention fusion network is characterized by comprising a convolution attention transducer encoder, wherein a parallel feature refinement block realizes fusion of blood vessel feature global and local information; CA-Trans block in the transducer encoder adopts convolution attention which has high calculation speed, small parameter quantity and is suitable for high-resolution images to encode the characteristics; introducing a parallel feature refinement block at the end of the transducer encoder, and emphasizing global information interaction of the vascular features; a global spatial activation module friendly to micro-vessel segmentation is added at the decoder stage and a fine retinal vessel segmentation map is reconstructed by step-wise decoding.

2. The ocular fundus blood vessel image segmentation method based on a visual attention fusion network of claim 1, wherein said transducer encoder comprises a standard convolution layer and three repeated multi-scale convolution attention layers; the standard convolution layer consists of two groups of 3×3 convolutions with step sizes of 1 and DropBlock, GN, reLU activation functions; CA-Transformer block in the multi-scale convolution attention layer adopts a ViT classical architecture, comprises a normalization layer, a multi-scale convolution attention layer and a feedforward layer, and is input-spliced with the layer in a mode of two residual connection to extract blood vessel information in a fundus image; wherein the output of the multi-output visual attention can be expressed as:

where MCAtt represents the multi-scale convolution attention output, DWConv represents the depth convolution, branch_i I e {0,1,2,3}, represents the i-th branch.

3. The ocular fundus blood vessel image segmentation method based on a visual attention fusion network of claim 2, wherein said transducer encoder is comprised of a standard convolution layer and three multi-scale convolution attention layers.

4. A fundus blood vessel image segmentation method based on a visual attention fusion network according to claim 1, wherein the parallel feature refinement block fuses self-attention and multi-scale convolution attention in parallel; introducing a self-attention mechanism to model global information of the feature map, and emphasizing the importance of the global information; after normalization, carrying out Reshape on the input of the self-attention branch, then sending the input into a multi-head self-attention, splicing the output with the output of the multi-scale convolution attention branch, integrating characteristic information through point-by-point convolution of 1×1, and finally sending the characteristic information into a feedforward layer after residual connection with the input; the output of the parallel feature refinement block is expressed as:

where F represents the input feature map, MHA represents multi-head attention, DWConv represents the depth convolution, branch_i I e {0,1,2,3} represents the i-th branch.

5. The method for segmenting fundus blood vessel image based on visual attention fusion network according to claim 1, wherein the global space activation module is a GSA module, firstly, global average pooling and maximum pooling of channel dimensions are carried out on an input feature image to obtain two feature images of 1 x h x w, and a global information aggregated blood vessel feature probability image of 1 x h x w is obtained through 1*1 convolution and sigmoid function; then, a space activation function is applied to a space domain to obtain a self-adaptive weighted probability map; finally, performing element-by-element multiplication operation with the input feature map to obtain an activated feature map; the formula of the spatial activation function is as follows:

where p represents each pixel value on the input feature map;

the output of the global space activation module may be expressed as:

F_GSA ＝F·f(δ(c^1×1 (F_Max ；F_Avg )))

wherein F is_Max And F_Avg C, respectively representing the feature graphs after global maximum pooling and average pooling^1×1 Representing a 1 x 1 convolution operation, δ represents a Sigmoid activation function, and F represents an input feature map.

6. The ocular fundus blood vessel image segmentation method based on a visual attention fusion network according to claim 1, wherein a binary cross entropy loss function is used to optimize a model by averaging the cross entropy of each sample for all samples, defined as follows:

Loss＝-(ylogp+(1-y)log(1-p))

where y represents a vessel label and p represents a probability of dividing into vessels.