CN117078697B

Movatterモバイル変換

Info

Publication number: CN117078697B
Application number: CN202311053124.1A
Authority: CN
Inventors: 庹恩涛; 万程; 沈烨宇
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2023-08-21
Filing date: 2023-08-21
Publication date: 2024-04-09
Anticipated expiration: 2043-08-21
Also published as: CN117078697A

Abstract

The invention provides a fundus disease detection method based on cascade model fusion, which comprises the following steps: collecting an original fundus image; preprocessing an image and inputting the image into a segmentation network; extracting the fundus visual cup test disc, blood vessels and macular area part images by adopting a segmentation network; inputting the preprocessed image and the segmentation result into a healthy/unhealthy fundus image classifier, and judging whether the image is healthy fundus; the preprocessed image and the segmentation result are input into a fundus disease seed classifier to judge fundus diseases. The invention performs accurate part segmentation based on the fundus image automatic segmentation network, combines the fundus image automatic segmentation network with the original image for diagnosis, improves the algorithm interpretability, and the segmentation result is helpful for clinical diagnosis of doctors and can rapidly position the focus. According to the invention, the single-model processing single disease type combining mode precision can be achieved only by using a single-segmentation model and double-classification model cascading mode, the problem that the number of models is continuously increased along with the continuous increase of the number of disease types is avoided, the diagnosis efficiency is improved, and the operation cost is reduced.

Description

Translated fromChinese

一种基于级联模型融合的眼底病种检测方法A method for detecting fundus diseases based on cascade model fusion

技术领域Technical Field

本发明属于图像检测技术领域，尤其涉及一种基于级联模型融合的眼底病种检测方法。The present invention belongs to the technical field of image detection, and in particular relates to a method for detecting fundus diseases based on cascade model fusion.

背景技术Background technique

近年来，眼底多病种疾病发病率较高，危害严重。眼底彩照检查是发现眼底疾病的最简便有效的方法，一般是医生通过使用专业眼底照相机对眼底图像进行采集，然后进行人工的诊断。眼底检查需要眼底专业医生和相应的检查设备，医生培养困难，设备投入较高，而基层医院和查体中心又存在大量眼底疾病检查需求的人群，导致问题十分严峻。另外，即便是检查仪器和医生人数充足，眼底疾病诊断也十分依赖医生的个人状态和经验。因此，随着深度学习的不断发展，逐渐引进人工智能来协助医疗诊断，能够为医生提供眼底多病种疾病的诊断建议，提高诊断的效率及准确性。In recent years, the incidence of multiple fundus diseases has been high and the harm is serious. Fundus color photography is the simplest and most effective way to detect fundus diseases. Generally, doctors use professional fundus cameras to collect fundus images and then make manual diagnoses. Fundus examination requires professional fundus doctors and corresponding examination equipment. Doctors are difficult to train and the equipment investment is high. However, there are a large number of people in primary hospitals and physical examination centers who need fundus disease examinations, which makes the problem very serious. In addition, even if there are enough examination instruments and doctors, the diagnosis of fundus diseases is also very dependent on the doctor's personal status and experience. Therefore, with the continuous development of deep learning, the gradual introduction of artificial intelligence to assist medical diagnosis can provide doctors with diagnostic suggestions for multiple fundus diseases and improve the efficiency and accuracy of diagnosis.

但是由于眼底彩色图像与自然图像存在很大的差异，导致目前现有人工智能病种诊断方法也存在着以下缺陷：(1)目前基于深度学习算法的眼底病种诊断方法中，缺少对重点部位的关注，缺乏可解释性与针对性，难以对临床诊断提供实际帮助；(2)目前所存在的多病种诊断方法中基本采用单模型，而单模型普遍无法很好应对现实情况中极不均衡的眼底病种标签分布，导致检测精度低；(3)现有技术中普遍使用的单模型处理单病种并进行组合的方式虽然能够提高检测精度，但是成本高，处理时间长，流程繁琐，不利于推广使用。However, due to the great difference between fundus color images and natural images, the existing artificial intelligence disease diagnosis methods also have the following defects: (1) The current fundus disease diagnosis methods based on deep learning algorithms lack attention to key areas, lack interpretability and specificity, and are difficult to provide practical help for clinical diagnosis; (2) The current multi-disease diagnosis methods basically use a single model, and a single model is generally unable to cope with the extremely uneven distribution of fundus disease labels in reality, resulting in low detection accuracy; (3) The single model commonly used in the existing technology to process a single disease and combine them can improve detection accuracy, but it is costly, takes a long time to process, and has a cumbersome process, which is not conducive to popularization and use.

发明内容Summary of the invention

针对现有技术中存在不足，本发明提供了一种基于级联模型融合的眼底病种检测方法，能自动提取眼底重要部位信息如血管、视杯试盘等，既可以将分割后部位提供给操作人员如医师提高诊断效率，也能为进一步病种智能诊断提供特征。In view of the shortcomings in the prior art, the present invention provides a method for detecting fundus diseases based on cascade model fusion, which can automatically extract information on important fundus parts such as blood vessels, optic cups, etc., and can provide the segmented parts to operators such as doctors to improve diagnostic efficiency, and can also provide features for further intelligent diagnosis of diseases.

本发明是通过以下技术手段实现上述技术目的的。The present invention achieves the above technical objectives through the following technical means.

一种基于级联模型融合的眼底病种检测方法，包括如下过程：A method for detecting fundus diseases based on cascade model fusion includes the following processes:

步骤1：采集原始眼底图像并输入图像处理设备；Step 1: Collect the original fundus image and input it into the image processing device;

步骤2：图像处理设备通过预处理算法对原始眼底图像进行预处理，并将预处理后的图像输入分割网络中；Step 2: The image processing device preprocesses the original fundus image through a preprocessing algorithm and inputs the preprocessed image into the segmentation network;

步骤3：采用分割网络提取眼底视杯试盘、血管和黄斑区部位图像，并将分割结果输入数据接收设备中；分割网络包括编码器模块、空洞卷积模块、注意力模块、解码器模块；Step 3: Use a segmentation network to extract images of the fundus cup test disc, blood vessels and macular area, and input the segmentation results into a data receiving device; the segmentation network includes an encoder module, a hole convolution module, an attention module, and a decoder module;

步骤4：将预处理后的图像以及数据接收设备接收的分割结果共同输入健康/非健康眼底图像分类器，判断是否为健康眼底，若是则进入步骤6，否则进入步骤5；Step 4: input the preprocessed image and the segmentation result received by the data receiving device into the healthy/unhealthy fundus image classifier to determine whether it is a healthy fundus. If so, proceed to step 6, otherwise proceed to step 5;

步骤5：将预处理后的图像以及数据接收设备接收的分割结果共同输入眼底病种分类器，判断眼底图像对应的眼底疾病类别；Step 5: input the preprocessed image and the segmentation result received by the data receiving device into the fundus disease classifier to determine the fundus disease category corresponding to the fundus image;

步骤6：将眼底图像对应的具体眼底病种诊断结论保存至数据服务器中，并返回给接口调用方，供其查看。Step 6: The specific fundus disease diagnosis conclusion corresponding to the fundus image is saved in the data server and returned to the interface caller for review.

进一步地，所述步骤2中的预处理方法如下：Furthermore, the preprocessing method in step 2 is as follows:

步骤2.1：将原始眼底图像转化为灰度图，每个像素点此时的灰度值均为0～255；Step 2.1: Convert the original fundus image into a grayscale image, where the grayscale value of each pixel is 0 to 255;

步骤2.2：创建与灰度图同等尺寸的掩模图并设定阈值，阈值记为X，将灰度图中灰度值低于阈值的像素点设为假，灰度值高于阈值的像素点设为真；Step 2.2: Create a mask image of the same size as the grayscale image and set a threshold, denoted by X. Set the pixels in the grayscale image whose grayscale values are lower than the threshold to false, and set the pixels whose grayscale values are higher than the threshold to true.

步骤2.3：分别按行(列)对掩模图进行或操作，若结果为假则表明该整行(列)像素点的灰度值均低于阈值，裁掉该行(列)；若结果为真则表明该整行(列)像素点的灰度值均高于或等于阈值，保留该行(列)；Step 2.3: Perform OR operation on the mask image by row (column) respectively. If the result is false, it means that the grayscale values of the pixels in the entire row (column) are all lower than the threshold, and the row (column) is cut off; if the result is true, it means that the grayscale values of the pixels in the entire row (column) are all higher than or equal to the threshold, and the row (column) is retained;

步骤2.4：将原始眼底图像合并到或操作后的掩模图中，得到去除黑色区域的感兴趣区域；Step 2.4: Merge the original fundus image into the mask image after the OR operation to obtain the region of interest with the black area removed;

步骤2.5：归一化图像色彩，将合并后的图像进行高斯模糊处理后与原始眼底图像反向叠加，并将像素色彩均值移动到128。Step 2.5: Normalize the image colors, perform Gaussian blur processing on the merged image, and then reversely superimpose it with the original fundus image, and move the pixel color mean to 128.

进一步地，所述步骤3中，分割网络提取眼底视杯试盘、血管和黄斑区部位图像的具体方法如下：Furthermore, in step 3, the specific method of extracting the fundus optic cup test disc, blood vessels and macular area images by the segmentation network is as follows:

步骤3.1：编码器模块采用EfficientNet-V2卷积网络的体征提取部分对预处理后的图像进行特征提取，获得核心编码特征以及多阶段特征，并分别输入空洞卷积模块和注意力模块；Step 3.1: The encoder module uses the sign extraction part of the EfficientNet-V2 convolutional network to extract features from the preprocessed image, obtain core encoding features and multi-stage features, and input them into the dilated convolution module and the attention module respectively;

步骤3.2：空洞卷积模块搭建在分割网络核心部分，空洞卷积模块采用三种空洞系数的空洞卷积层提取输入的特征图的特征并进行组合，最终获得多感受野特征，并输入到解码器模块；Step 3.2: The dilated convolution module is built in the core part of the segmentation network. The dilated convolution module uses dilated convolution layers with three dilated coefficients to extract the features of the input feature map and combine them to finally obtain multi-receptive field features and input them into the decoder module.

注意力模块包括空间注意力模块与通道注意力模块，空间注意力模块通过同一特征图的不同像素点进行加权，从而提高特征图中病灶部分的特征响应，通道注意力模块通过聚集所有特征图并分别对每张特征图加权，从而选择性地强调某些特征图；最后，将空间注意力模块与通道注意力模块的输出特征进行融合以获得最终的权重重新分配特征，并输入到解码器模块；The attention module includes a spatial attention module and a channel attention module. The spatial attention module weights different pixels of the same feature map to improve the feature response of the lesion part in the feature map. The channel attention module gathers all feature maps and weights each feature map separately to selectively emphasize certain feature maps. Finally, the output features of the spatial attention module and the channel attention module are fused to obtain the final weight redistribution feature and input into the decoder module.

步骤3.3：解码器模块通过将提取的特征经过多级上采样生成对应尺寸分割图，分割图为二值化结果，1表示目标区域，0表示背景区域；解码器模块最终输出三通道的分割图，分别代表三个眼底重点部位，即视杯试盘、血管和黄斑区的分割结果，分割结果进一步输入数据接收设备中。Step 3.3: The decoder module generates a segmentation map of corresponding size by multi-level upsampling of the extracted features. The segmentation map is a binary result, 1 represents the target area, and 0 represents the background area. The decoder module finally outputs a three-channel segmentation map, which represents the segmentation results of three key parts of the fundus, namely the optic cup, blood vessels and macular area. The segmentation results are further input into the data receiving device.

进一步地，所述编码器模块在每个节点都有通道上的压缩与激活，以提高关键通道特征的响应，编码器模块的特征提取拆分为5个阶段，每个阶段对输出的特征图都采用3×3卷积，步长为2，并采用与原始图相同大小方式填充，使得最终每个阶段输出特征图的大小均为上一阶段的二分之一；空洞卷积模块的模块浅层获得特征图的相邻信息，以提高对小目标的识别能力，模块深层获得与特征图相近的感受野，以提高对大目标的识别能力。Furthermore, the encoder module has compression and activation on the channel at each node to improve the response of key channel features. The feature extraction of the encoder module is divided into 5 stages. Each stage uses 3×3 convolution on the output feature map with a step size of 2 and fills it with the same size as the original map, so that the size of the output feature map of each stage is half of that of the previous stage. The shallow layer of the atrous convolution module obtains adjacent information of the feature map to improve the recognition ability of small targets, and the deep layer of the module obtains a receptive field similar to the feature map to improve the recognition ability of large targets.

进一步地，所述步骤4中，健康/非健康眼底图像分类器采用增加了注意力模块的EfficientNetV2-M网络结构，EfficientNet中核心模块MBConv具有通道注意力机制而缺乏空间注意力机制，将基准模型集成注意力模块设计在模型末端，去除模型的池化层和全连接层，替换为AttentionNet，将模型提取的特征直接输入注意力模块中，并将注意力模块生成的掩模图与原始特征图相乘，从而得到空间上的关注，抑制空间上无用特征表达。Furthermore, in step 4, the healthy/unhealthy fundus image classifier adopts the EfficientNetV2-M network structure with an added attention module. The core module MBConv in EfficientNet has a channel attention mechanism but lacks a spatial attention mechanism. The baseline model integrated attention module is designed at the end of the model, the pooling layer and the fully connected layer of the model are removed and replaced with AttentionNet, the features extracted by the model are directly input into the attention module, and the mask map generated by the attention module is multiplied by the original feature map, so as to obtain spatial attention and suppress the expression of useless features in space.

进一步地，所述健康/非健康眼底图像分类器的分类算法过程如下：Furthermore, the classification algorithm process of the healthy/unhealthy fundus image classifier is as follows:

步骤4.1：将预处理后的图像与三个眼底重点部位的分割图共同输入Transformer模型进行特征提取；Step 4.1: Input the preprocessed image and the segmentation maps of the three key fundus parts into the Transformer model for feature extraction;

步骤4.2：将Transformer提取的特征采用空间注意力模块为特征图重新分配空间权重，以提高重点关注部位特征表达；Step 4.2: Use the spatial attention module to reallocate the spatial weights of the features extracted by Transformer to improve the feature expression of the key focus areas;

步骤4.3：将Transformer提取的特征经过广义均值池化方式进行池化处理，其中，指数因子p设为Y，Y的范围根据判断精度动态调整，取值范围为2～4，初始化为3；Step 4.3: The features extracted by Transformer are pooled by generalized mean pooling, where the exponential factor p is set to Y, and the range of Y is dynamically adjusted according to the judgment accuracy, with a value range of 2 to 4 and an initial value of 3;

步骤4.4：池化后的结果通过全连接层输出，健康/非健康眼底图像分类器的全连接层输出2个节点，分别代表眼底图像健康或非健康的概率。Step 4.4: The pooled result is output through the fully connected layer. The fully connected layer of the healthy/unhealthy fundus image classifier outputs 2 nodes, representing the probability of the fundus image being healthy or unhealthy.

进一步地，所述步骤5，眼底病种分类器采用增加了注意力模块的EfficientNetV2-S，注意力模块的设计以及增加注意力模块的方法和健康/非健康眼底图像分类器相同；Furthermore, in step 5, the fundus disease classifier uses EfficientNetV2-S with an added attention module, and the design of the attention module and the method of adding the attention module are the same as those of the healthy/unhealthy fundus image classifier;

眼底病种分类器输出六分类结果，用于判断该眼底图像属于哪一类具体眼底疾病，即眼底病种分类器的全连接层输出6个节点，分别代表眼底图像患有高度近视、黄斑病变、青光眼、静脉阻塞、糖尿病性视网膜病变或其他病变的概率。The fundus disease classifier outputs six classification results, which are used to determine which specific fundus disease the fundus image belongs to. That is, the fully connected layer of the fundus disease classifier outputs 6 nodes, representing the probability that the fundus image suffers from high myopia, macular degeneration, glaucoma, venous occlusion, diabetic retinopathy or other diseases.

本发明具有如下有益效果：The present invention has the following beneficial effects:

(1)本发明提供的眼底病种检测方法，基于眼底图像自动分割网络进行精确的部位分割结果，将其与原始图像共同进行诊断，提高了算法可解释性，分割结果有助于医生临床诊断，快速定位病灶。(1) The fundus disease detection method provided by the present invention performs accurate segmentation results based on the fundus image automatic segmentation network, and diagnoses it together with the original image, thereby improving the interpretability of the algorithm. The segmentation results are helpful for doctors to make clinical diagnosis and quickly locate lesions.

(2)本发明通过多家医院长时间的数据获取并分析得到现实中眼底数据集的实际分布，其中，健康眼底占样本比例近一半，为了充分利用神经网络对均衡数据集表现更优的特点，本发明采用双分类模型级联方式，提高健康/非健康眼底图像分类器与眼底病种分类器的训练及预测样本分布平衡性，有效提高对具体病种分类准确性。(2) The present invention obtains and analyzes the actual distribution of fundus data sets in reality through long-term data acquisition and analysis of data from multiple hospitals, among which healthy fundus accounts for nearly half of the sample proportion. In order to fully utilize the characteristic that neural networks perform better on balanced data sets, the present invention adopts a dual-classification model cascade method to improve the training and prediction sample distribution balance of healthy/unhealthy fundus image classifiers and fundus disease classifiers, and effectively improve the classification accuracy of specific diseases.

(2)本发明仅使用单分割模型+双分类模型级联方式，可达到单模型处理单病种并组合方式的精度，避免随着病种数量不断增加而导致模型数量不断上升的问题，有效提高了诊断效率，降低了运算成本。(2) The present invention only uses a single segmentation model + a dual classification model cascade method, which can achieve the accuracy of a single model processing a single disease and a combination method, avoiding the problem of an increasing number of models as the number of diseases continues to increase, effectively improving the diagnostic efficiency and reducing the computing cost.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为基于级联模型融合的眼底病种检测方法流程图；FIG1 is a flow chart of a method for detecting fundus diseases based on cascade model fusion;

图2为分割网络框架图；Figure 2 is a diagram of the segmentation network framework;

图3为分割网络的算法原理图；Figure 3 is a schematic diagram of the algorithm for segmenting the network;

图4为空洞卷积模块结构图；Figure 4 is a structural diagram of the dilated convolution module;

图5为注意力模块结构图；Figure 5 is a structural diagram of the attention module;

图6为空间注意力模块结构图；Figure 6 is a diagram of the spatial attention module structure;

图7为通道注意力模块结构图；Figure 7 is a structural diagram of the channel attention module;

图8为预处理算法流程图；Fig. 8 is a flow chart of the preprocessing algorithm;

图9为分类算法流程图；Figure 9 is a flow chart of the classification algorithm;

图10为基准模型与注意力模块结合示意图；FIG10 is a schematic diagram of the combination of the baseline model and the attention module;

图11为AttentionNet结构示意图。Figure 11 is a schematic diagram of the AttentionNet structure.

具体实施方式Detailed ways

下面结合附图以及具体实施例对本发明作进一步的说明，但本发明的保护范围并不限于此。The present invention is further described below in conjunction with the accompanying drawings and specific embodiments, but the protection scope of the present invention is not limited thereto.

一种基于级联模型融合的眼底病种检测方法如图1所示，包括如下过程：A method for detecting fundus diseases based on cascade model fusion is shown in FIG1 , and includes the following steps:

步骤1：利用眼底照相设备采集获得原始眼底图像，并输入图像处理设备，图像处理设备一般为台式电脑；Step 1: Use fundus photography equipment to acquire the original fundus image and input it into an image processing device, which is generally a desktop computer;

参照图8，预处理的具体方法如下：Referring to FIG8 , the specific method of preprocessing is as follows:

步骤2.2：创建与灰度图同等尺寸的掩模图并设定阈值，阈值记为X，本实施例中的X取7，将灰度图中灰度值低于阈值的像素点设为假，灰度值高于阈值的像素点设为真；Step 2.2: Create a mask image of the same size as the grayscale image and set a threshold, where the threshold is X. In this embodiment, X is 7. Set the pixels in the grayscale image whose grayscale values are lower than the threshold to false, and set the pixels whose grayscale values are higher than the threshold to true.

步骤2.3：分别按行(列)对掩模图进行或操作，若结果为假则表明该整行(列)像素点的灰度值均低于阈值，则裁掉该行(列)；为了充分保留边界的细节，若结果为真则表明该整行(列)像素点的灰度值均高于或等于阈值，则保留该行(列)；Step 2.3: Perform OR operation on the mask image by row (column) respectively. If the result is false, it means that the grayscale values of the pixels in the entire row (column) are all lower than the threshold, and the row (column) is cut off; in order to fully retain the details of the boundary, if the result is true, it means that the grayscale values of the pixels in the entire row (column) are all higher than or equal to the threshold, and the row (column) is retained;

步骤3：参照图2、3，采用分割网络提取眼底视杯试盘、血管和黄斑区部位图像，分割网络包括编码器模块、空洞卷积模块、注意力模块、解码器模块；Step 3: Referring to Figures 2 and 3, a segmentation network is used to extract images of the fundus optic cup test disc, blood vessels and macular area. The segmentation network includes an encoder module, a hole convolution module, an attention module and a decoder module;

其中，编码器模块在每个节点都有通道上的压缩与激活从而提高关键通道特征的响应，有效应对眼底彩色图像特征复杂问题；编码器模块的特征提取拆分为5个阶段，每个阶段对输出的特征图都采用3×3卷积，步长为2，并采用与原始图相同大小方式填充，使得最终每个阶段输出特征图的大小均为上一阶段的二分之一；Among them, the encoder module has compression and activation on each node of the channel to improve the response of key channel features and effectively deal with the complex problem of fundus color image features; the feature extraction of the encoder module is divided into 5 stages, and each stage uses 3×3 convolution on the output feature map with a step size of 2 and fills it with the same size as the original map, so that the size of the output feature map of each stage is half of that of the previous stage;

步骤3.2：眼底图像关键部位如视杯试盘、血管、黄斑区均存在大小不一问题，为了提高分割网络对不同目标的识别精度，丰富网络感受野并得到不同感受野特征，本实施例在分割网络核心部分搭建了如图4所示的空洞卷积模块，空洞卷积模块采用三种空洞系数的空洞卷积层提取输入的特征图的特征并进行组合，最终获得多感受野特征，并输入到解码器模块；其中，空洞卷积模块的模块浅层获得特征图的相邻信息，能够提高对小目标的识别能力，模块深层获得与特征图相近的感受野，能够提高对大目标的识别能力；Step 3.2: Key parts of fundus images such as optic cups, blood vessels, and macular areas all have different sizes. In order to improve the recognition accuracy of the segmentation network for different targets, enrich the network receptive field and obtain different receptive field features, this embodiment builds a dilated convolution module as shown in FIG4 in the core part of the segmentation network. The dilated convolution module uses a dilated convolution layer with three dilated coefficients to extract the features of the input feature map and combine them, and finally obtains multi-receptive field features and inputs them into the decoder module; wherein, the dilated convolution module obtains adjacent information of the feature map at the shallow layer, which can improve the recognition ability of small targets, and the deep layer of the module obtains a receptive field similar to the feature map, which can improve the recognition ability of large targets;

注意力模块用于增强需要分割的关键特征表达，针对输入的特征图，进行空间与通道上加权处理；其中，如图5、6、7所示，注意力模块包括空间注意力模块与通道注意力模块，空间注意力模块通过同一特征图的不同像素点进行加权，从而提高特征图中有效位置(病灶部分)的特征响应，通道注意力模块通过聚集所有特征图并分别对每张特征图加权从而选择性地强调某些特征图；最后，将空间注意力模块与通道注意力模块的输出特征进行融合以获得更好的特征表示，即获得最终的权重重新分配特征，并输入到解码器模块；The attention module is used to enhance the expression of key features that need to be segmented, and performs spatial and channel weighted processing on the input feature map; as shown in Figures 5, 6, and 7, the attention module includes a spatial attention module and a channel attention module. The spatial attention module weights different pixels of the same feature map to improve the feature response of the effective position (lesion part) in the feature map, and the channel attention module selectively emphasizes certain feature maps by gathering all feature maps and weighting each feature map separately; finally, the output features of the spatial attention module and the channel attention module are fused to obtain a better feature representation, that is, to obtain the final weight redistribution feature, and input it into the decoder module;

步骤3.3：解码器模块通过将提取的特征经过多级上采样生成对应尺寸分割图，分割图为二值化结果，1表示目标区域，0表示背景区域；本实施例中，解码器模块最终输出三通道的分割图，分别代表三个眼底重点部位，即视杯试盘、血管和黄斑区的分割结果，分割结果进一步输入数据接收设备中。Step 3.3: The decoder module generates a segmentation map of corresponding size by multi-level upsampling of the extracted features. The segmentation map is a binary result, 1 represents the target area, and 0 represents the background area. In this embodiment, the decoder module finally outputs a three-channel segmentation map, which respectively represents the segmentation results of three key fundus parts, namely the optic cup test disc, blood vessels and macular area. The segmentation results are further input into the data receiving device.

步骤4：将图像处理设备预处理后的图像以及数据接收设备接收的分割结果共同输入健康/非健康眼底图像分类器，对于分类器的选择，健康/非健康眼底图像分类器采用增加了注意力模块的EfficientNetV2-M网络结构，EfficientNet中核心模块MBConv具有通道注意力机制而缺乏空间注意力机制，为了进一步提高网络的分类性能，将基础的模型集成注意力模块设计在模型末端，去除模型的池化层和全连接层，替换为AttentionNet，将模型提取的特征直接输入注意力模块中，并将注意力模块生成的掩模图与原始特征图相乘，从而得到空间上的关注，抑制空间上无用特征表达；该注意力模块可以适用于任何模型，不需要重新的设计；Step 4: The image preprocessed by the image processing device and the segmentation result received by the data receiving device are input into the healthy/unhealthy fundus image classifier. For the selection of the classifier, the healthy/unhealthy fundus image classifier adopts the EfficientNetV2-M network structure with an added attention module. The core module MBConv in EfficientNet has a channel attention mechanism but lacks a spatial attention mechanism. In order to further improve the classification performance of the network, the basic model integrated attention module is designed at the end of the model, the pooling layer and the fully connected layer of the model are removed and replaced with AttentionNet, the features extracted by the model are directly input into the attention module, and the mask map generated by the attention module is multiplied by the original feature map, so as to obtain spatial attention and suppress the expression of useless features in space; the attention module can be applied to any model and does not need to be redesigned;

基准模型和注意力模块结合如图10所示，其中，Baseline Model为去除池化层和全连接层的任意模型，F_E为该模型生成的特征图，F_A为AttentionNet生成的注意力门，GeMPooling可以看作更泛化的池化操作，为池化操作设置一个指数因子p，当p＝1时，即为平均池化操作；p＝∞时，为最大池化操作；取p之间的值可以获得综合性更强的效果，经过实验得出p＝3在本数据集效果更佳；其中，AttentionNet结构可表示为C256-C128-C64-C1-C256，Ck表示该卷积层中有k个卷积核，每个卷积核的大小为1×1，五个卷积层的步长均为1，每个卷积层后使用ReLU作为激活函数，最终输出通道数为1，尺寸与原始特征图相同的注意力门F_A，过程如图11所示；The combination of the baseline model and the attention module is shown in Figure 10, where Baseline Model is an arbitrary model without the pooling layer and the fully connected layer,_FE is the feature map generated by the model,_FA is the attention gate generated by AttentionNet, GeMPooling can be regarded as a more generalized pooling operation, and an exponential factor p is set for the pooling operation. When p = 1, it is an average pooling operation; when p = ∞, it is a maximum pooling operation; taking a value between p can obtain a more comprehensive effect. After experiments, it is found that p = 3 has a better effect in this data set; where the AttentionNet structure can be expressed as C256-C128-C64-C1-C256, Ck indicates that there are k convolution kernels in the convolution layer, the size of each convolution kernel is 1×1, the step size of the five convolution layers is 1, and ReLU is used as the activation function after each convolution layer. The final output channel number is 1, and the attention gate_FA with the same size as the original feature map, the process is shown in Figure 11;

健康/非健康眼底图像分类器处理后输出二分类结果，表示该眼底图像是否判断为健康眼底，若是则进入步骤6，否则进入步骤5；After processing, the healthy/unhealthy fundus image classifier outputs a binary classification result, indicating whether the fundus image is judged to be a healthy fundus. If so, proceed to step 6, otherwise proceed to step 5;

其中，参照图9，健康/非健康眼底图像分类器的分类算法过程如下：9, the classification algorithm process of the healthy/unhealthy fundus image classifier is as follows:

步骤4.4：池化后的结果通过全连接层输出，健康/非健康眼底图像分类器的全连接层输出2个节点，分别代表眼底图像健康或非健康的概率；Step 4.4: The pooled result is output through the fully connected layer. The fully connected layer of the healthy/unhealthy fundus image classifier outputs two nodes, representing the probability of the fundus image being healthy or unhealthy respectively;

上述过程中，Transformer的全局自注意力机制在捕获全局特征方面表现良好，能够有效应对医疗图像中全局特征之间的复杂关系，对序列后的不同特征之间捕获联系。In the above process, Transformer's global self-attention mechanism performs well in capturing global features. It can effectively deal with the complex relationship between global features in medical images and capture the connection between different features after the sequence.

步骤5：将图像处理设备预处理后的图像以及数据接收设备接收的分割结果共同输入眼底病种分类器，眼底病种分类器采用增加了注意力模块的EfficientNetV2-S，注意力模块的设计以及增加注意力模块的方法和健康/非健康眼底图像分类器相同。眼底病种分类器输出六分类结果，用于判断该眼底图像属于哪一类具体眼底疾病；眼底病种分类器的分类算法与健康/非健康眼底图像分类器的分类算法类似，区别在于眼底病种分类器的全连接层输出6个节点，分别代表眼底图像患有高度近视、黄斑病变、青光眼、静脉阻塞、糖尿病性视网膜病变或其他病变的概率，因而此处不再赘述具体算法过程。Step 5: The image preprocessed by the image processing device and the segmentation result received by the data receiving device are input into the fundus disease classifier. The fundus disease classifier uses EfficientNetV2-S with an added attention module. The design of the attention module and the method of adding the attention module are the same as those of the healthy/unhealthy fundus image classifier. The fundus disease classifier outputs six classification results to determine which specific fundus disease the fundus image belongs to. The classification algorithm of the fundus disease classifier is similar to that of the healthy/unhealthy fundus image classifier, except that the fully connected layer of the fundus disease classifier outputs 6 nodes, which represent the probability of the fundus image suffering from high myopia, macular degeneration, glaucoma, venous occlusion, diabetic retinopathy or other lesions, so the specific algorithm process will not be repeated here.

所述实施例为本发明的优选的实施方式，但本发明并不限于上述实施方式，在不背离本发明的实质内容的情况下，本领域技术人员能够做出的任何显而易见的改进、替换或变型均属于本发明的保护范围。The embodiments are preferred implementations of the present invention, but the present invention is not limited to the above-mentioned implementations. Any obvious improvements, substitutions or modifications that can be made by those skilled in the art without departing from the essential content of the present invention belong to the protection scope of the present invention.

Claims

Translated fromChinese

1.一种基于级联模型融合的眼底病种检测方法，其特征在于，包括如下过程：1. A method for detecting fundus diseases based on cascade model fusion, characterized by comprising the following process:

步骤3：采用分割网络提取眼底视杯视盘、血管和黄斑区部位图像，并将分割结果输入数据接收设备中；分割网络包括编码器模块、空洞卷积模块、注意力模块、解码器模块；Step 3: Use a segmentation network to extract images of the fundus optic cup, optic disc, blood vessels and macular area, and input the segmentation results into a data receiving device; the segmentation network includes an encoder module, a hole convolution module, an attention module, and a decoder module;

步骤6：将眼底图像对应的具体眼底病种诊断结论保存至数据服务器中，并返回给接口调用方，供其查看；Step 6: Save the specific fundus disease diagnosis conclusion corresponding to the fundus image to the data server and return it to the interface caller for review;

所述步骤3中，分割网络提取眼底视杯视盘、血管和黄斑区部位图像的具体方法如下：In step 3, the specific method of extracting the images of the fundus optic cup, optic disc, blood vessels and macular area by the segmentation network is as follows:

步骤3.3：解码器模块通过将提取的特征经过多级上采样生成对应尺寸分割图，分割图为二值化结果，1表示目标区域，0表示背景区域；解码器模块最终输出三通道的分割图，分别代表三个眼底重点部位，即视杯视盘、血管和黄斑区的分割结果，分割结果进一步输入数据接收设备中。Step 3.3: The decoder module generates a segmentation map of corresponding size by multi-level upsampling of the extracted features. The segmentation map is a binary result, 1 represents the target area, and 0 represents the background area. The decoder module finally outputs a three-channel segmentation map, which represents the segmentation results of three key parts of the fundus, namely the optic cup and optic disc, blood vessels and macular area. The segmentation results are further input into the data receiving device.

2.根据权利要求1所述的基于级联模型融合的眼底病种检测方法，其特征在于，所述步骤2中的预处理方法如下：2. The method for detecting fundus diseases based on cascade model fusion according to claim 1, characterized in that the preprocessing method in step 2 is as follows:

步骤2.3：分别按行/列对掩模图进行或操作，若结果为假则表明该行/列像素点的灰度值均低于阈值，裁掉该行/列；若结果为真则表明该行/列像素点的灰度值均高于或等于阈值，保留该行/列；Step 2.3: Perform an OR operation on the mask image by row/column respectively. If the result is false, it means that the grayscale values of the pixels in the row/column are all lower than the threshold, and the row/column is cut off; if the result is true, it means that the grayscale values of the pixels in the row/column are all higher than or equal to the threshold, and the row/column is retained;

3.根据权利要求1所述的基于级联模型融合的眼底病种检测方法，其特征在于，所述编码器模块在每个节点都有通道上的压缩与激活，以提高关键通道特征的响应，编码器模块的特征提取拆分为5个阶段，每个阶段对输出的特征图都采用3×3卷积，步长为2，并采用与原始图相同大小方式填充，使得最终每个阶段输出特征图的大小均为上一阶段的二分之一；空洞卷积模块的模块浅层获得特征图的相邻信息，以提高对小目标的识别能力，模块深层获得与特征图相近的感受野，以提高对大目标的识别能力。3. According to the method for detecting fundus diseases based on cascade model fusion according to claim 1, it is characterized in that the encoder module has channel compression and activation at each node to improve the response of key channel features, and the feature extraction of the encoder module is divided into 5 stages. Each stage uses 3×3 convolution for the output feature map with a step size of 2, and fills it with the same size as the original map, so that the size of the output feature map of each stage is half of that of the previous stage; the shallow layer of the atrous convolution module obtains adjacent information of the feature map to improve the recognition ability of small targets, and the deep layer of the module obtains a receptive field similar to the feature map to improve the recognition ability of large targets.

4.根据权利要求1所述的基于级联模型融合的眼底病种检测方法，其特征在于，所述步骤4中，健康/非健康眼底图像分类器采用增加了注意力模块的EfficientNetV2-M网络结构，EfficientNet中核心模块MBConv具有通道注意力机制而缺乏空间注意力机制，将基准模型集成注意力模块设计在模型末端，去除模型的池化层和全连接层，替换为AttentionNet，将模型提取的特征直接输入注意力模块中，并将注意力模块生成的掩模图与原始特征图相乘，从而得到空间上的关注，抑制空间上无用特征表达。4. According to the method for detecting fundus diseases based on cascade model fusion according to claim 1, it is characterized in that in the step 4, the healthy/unhealthy fundus image classifier adopts the EfficientNetV2-M network structure with an added attention module. The core module MBConv in EfficientNet has a channel attention mechanism but lacks a spatial attention mechanism. The benchmark model integrated attention module is designed at the end of the model, the pooling layer and the fully connected layer of the model are removed and replaced with AttentionNet, the features extracted by the model are directly input into the attention module, and the mask map generated by the attention module is multiplied by the original feature map, so as to obtain spatial attention and suppress the expression of useless features in space.

5.根据权利要求1所述的基于级联模型融合的眼底病种检测方法，其特征在于，所述健康/非健康眼底图像分类器的分类算法过程如下：5. The method for detecting fundus diseases based on cascade model fusion according to claim 1, wherein the classification algorithm process of the healthy/unhealthy fundus image classifier is as follows:

6.根据权利要求4所述的基于级联模型融合的眼底病种检测方法，其特征在于，所述步骤5，眼底病种分类器采用增加了注意力模块的EfficientNetV2-S，注意力模块的设计以及增加注意力模块的方法和健康/非健康眼底图像分类器相同；6. The method for detecting fundus diseases based on cascade model fusion according to claim 4, characterized in that, in step 5, the fundus disease classifier adopts EfficientNetV2-S with an added attention module, and the design of the attention module and the method of adding the attention module are the same as those of the healthy/unhealthy fundus image classifier;