CN117746119A

Movatterモバイル変換

Info

Publication number: CN117746119A
Application number: CN202311745703.2A
Authority: CN
Inventors: 陈华; 王浩; 马明伦; 成小石; 王在鑫; 蒋臣君; 王影; 安聪静
Original assignee: Yanshan University; Second Hospital of Hebei Medical University
Current assignee: Yanshan University; Second Hospital of Hebei Medical University
Priority date: 2023-12-19
Filing date: 2023-12-19
Publication date: 2024-03-22

Abstract

The invention discloses an ultrasonic image breast tumor classification method based on feature fusion and attention mechanism, which belongs to the field of computer-aided diagnosis of breast tumors, and comprises the steps of establishing a breast ultrasonic image data set, and segmenting a breast tumor lesion area based on a coding-decoding focus segmentation network of MobileNet trunk and feature fusion; and carrying out segmentation enhancement pretreatment on the breast ultrasonic image by using the prior information of tumor segmentation. Removing noise, details and edge enhancement from the original implementation by using morphological operation, bilateral filtering, CLAHE and other methods, and generating RGB images for tumor classification; the improved design of the lightweight visual attention network VAN is used for classifying tumors, a mixed feedforward neural network module is introduced, the large-core attention mechanism is utilized to extract image local and detail information, and the Softmax classifier is adopted to classify image features. The method can more accurately realize automatic multi-classification of breast tumor ultrasonic images.

Description

Translated fromChinese

基于特征融合和注意力机制的超声图像乳腺肿瘤分类方法Ultrasound image breast tumor classification method based on feature fusion and attention mechanism

技术领域Technical field

本发明属于计算机辅助诊断乳腺肿瘤领域，具体涉及一种基于特征融合和注意力机制的超声图像乳腺肿瘤分类方法。The invention belongs to the field of computer-aided diagnosis of breast tumors, and specifically relates to a method for classifying breast tumors in ultrasound images based on feature fusion and attention mechanisms.

背景技术Background technique

乳腺癌已经成为威胁女性生命安全的最严重的癌症之一，乳腺病变的早期诊断和恶性病变与良性病变的鉴别对患者的预后极为重要。目前用来进行乳腺影像检查一般包括磁共振成像检查、电子计算机断层扫描检查、乳腺钼靶X射线检查和超声检查等方法。乳腺钼靶X射线检查和超声检查是如今乳腺癌筛查、诊断最为常用的办法。但是X射线检查的敏感性和特异性相对较低且价格昂贵。超声检查因具有实时成本低、辐射小、无创伤、动态、重复性强等优点，已成为临床医学检查中十分重要且不可或缺的部分，它为发现临床上可疑的乳腺癌提供了重要的影像信息，可作为早期、隐匿性和非钙化性乳腺癌的筛查方法。但是由于超声成像图像存在着对比度较低，散斑噪声较为严重，图像中各组织部分间的边界模糊不清等缺点，极易造成混淆，无疑给超声医师增加了工作量和诊断难度，必须理解各种相关因素对超声图像质量的影响，以降低不必要活检或建议短期复查的次数。针对乳腺癌临床诊断的存在的误检和漏检的问题，如何利用现有技术手段，对乳腺超声图像进行重建、分析和处理成为了一大研究热点。Breast cancer has become one of the most serious cancers that threatens women's lives. Early diagnosis of breast lesions and the identification of malignant lesions and benign lesions are extremely important for the patient's prognosis. The methods currently used for breast imaging examinations generally include magnetic resonance imaging examinations, computerized tomography examinations, mammography X-ray examinations, and ultrasound examinations. Mammography and ultrasound are the most commonly used methods for breast cancer screening and diagnosis today. However, X-ray examination has relatively low sensitivity and specificity and is expensive. Ultrasound examination has become a very important and indispensable part of clinical medical examination due to its advantages of low real-time cost, low radiation, non-invasive, dynamic and highly reproducible. It provides an important way to detect clinically suspicious breast cancer. Imaging information can be used as a screening method for early-stage, occult and non-calcified breast cancer. However, due to the low contrast of ultrasonic imaging images, severe speckle noise, and blurred boundaries between various tissue parts in the image, it is easy to cause confusion, which undoubtedly increases the workload and difficulty of diagnosis for sonographers. It must be understood The impact of various related factors on the quality of ultrasound images to reduce unnecessary biopsies or recommend the number of short-term re-examinations. In view of the problems of misdetection and missed detection in the clinical diagnosis of breast cancer, how to use existing technical means to reconstruct, analyze and process breast ultrasound images has become a major research focus.

随着机器学习深度学习的不断发展，计算机辅助诊断在医学图像处理领域得到广泛的应用，计算机辅助诊断系统得以快速发展，该技术通过数字化信息和计算机的手段对医学影像进行识别和概率性判断，并实现定位、分割、分类等医学目的。相比于人工图像诊断方法，计算机辅助诊断系统可以快速处理批量数据，在保证准确率的同时更加高效和稳定，能够为放射科医生进行临床诊断提供重要的“第二意见”。目前，计算机辅助诊断系统已经在各种疾病医学图像的诊断分析当中已经得到广泛应用，如乳腺癌疾病，脑部疾病，肺部疾病，肝脏疾病等疾病。With the continuous development of machine learning and deep learning, computer-aided diagnosis has been widely used in the field of medical image processing, and computer-aided diagnosis systems have developed rapidly. This technology uses digital information and computers to identify and make probabilistic judgments on medical images. And achieve medical purposes such as positioning, segmentation, and classification. Compared with manual image diagnosis methods, computer-aided diagnosis systems can quickly process batch data, are more efficient and stable while ensuring accuracy, and can provide important "second opinions" for radiologists in clinical diagnosis. At present, computer-aided diagnosis systems have been widely used in the diagnosis and analysis of medical images for various diseases, such as breast cancer, brain diseases, lung diseases, liver diseases and other diseases.

发明内容Contents of the invention

本发明要解决的技术问题，在于提供一种基于特征融合和注意力机制的超声图像乳腺肿瘤分类方法。通过对乳腺肿瘤超声图像进行基于自适应融合的分割、预处理、基于大核注意力机制的分类，实现对乳腺肿瘤更准确地自动多级别分类。包括如下步骤：The technical problem to be solved by the present invention is to provide a method for classifying breast tumors in ultrasound images based on feature fusion and attention mechanism. By performing adaptive fusion-based segmentation, preprocessing, and large-core attention mechanism-based classification on breast tumor ultrasound images, more accurate automatic multi-level classification of breast tumors can be achieved. Includes the following steps:

步骤S1、建立BI-RADS乳腺超声图像数据集，获取原始乳腺肿瘤超声图像及分级类别。Step S1: Establish a BI-RADS breast ultrasound image data set and obtain original breast tumor ultrasound images and grading categories.

步骤S2、利用基于特征融合的编码-解码的卷积神经网络对乳腺肿瘤图像进行分割，首先，在网络编码阶段使用轻量级的卷积神经网络进行特征提取，然后通过改进的自适应特征融合的办法对提取到的特征图进行融合，并输入到图像解码器，利用双线性插值的方法自下而上的还原掩膜图像，此外，加入了约束肿瘤分割时对背景区域关注的正则项，对损失函数进行改进。Step S2. Use the encoding-decoding convolutional neural network based on feature fusion to segment breast tumor images. First, use a lightweight convolutional neural network for feature extraction in the network encoding stage, and then use improved adaptive feature fusion. The extracted feature maps are fused and input into the image decoder. The mask image is restored from bottom to top using the bilinear interpolation method. In addition, a regular term is added to constrain the focus on the background area during tumor segmentation. , improve the loss function.

步骤S3、将掩膜图像与原始图像相结合进行预处理，先将掩膜图像和原始图像通过形态学操作、边缘检测和图像滤波得到肿瘤的病灶图像、边缘图像和滤波图像，然后将上述三种分解图像进行三通道融合，得到RGB图像。Step S3: Combine the mask image and the original image for preprocessing. First, the mask image and the original image are processed through morphological operations, edge detection and image filtering to obtain the tumor focus image, edge image and filtered image, and then the above three The decomposed images are fused into three channels to obtain an RGB image.

步骤S4、对处理后的图像提取肿瘤特征，并对特征进行分类。采用改进了轻量化Transformer网络VAN模型用于肿瘤分类。引入混合前馈网络Mix-FFN模块，对分割增强后的RGB图像运用大核注意力机制LKA进行局部和细节特征提取，提高了模型的长期依赖性和自适应性，最后通过Softmax分类器输出肿瘤类别。Step S4: Extract tumor features from the processed images and classify the features. An improved lightweight Transformer network VAN model is used for tumor classification. The hybrid feedforward network Mix-FFN module is introduced, and the large-core attention mechanism LKA is used to extract local and detailed features on the segmented and enhanced RGB images, which improves the long-term dependence and adaptability of the model. Finally, the tumor is output through the Softmax classifier. category.

本发明技术方案的进一步改进在于：所述步骤S1中乳腺肿瘤分级类别包括BI-RADS 2类、BI-RADS 3类、BI-RADS 4A类、BI-RADS 4B类、BI-RADS 4C类、BI-RADS 5类，共6种类别肿瘤。A further improvement of the technical solution of the present invention is that the breast tumor grading categories in step S1 include BI-RADS category 2, BI-RADS category 3, BI-RADS category 4A, BI-RADS category 4B, BI-RADS category 4C, BI -RADS 5 categories, a total of 6 categories of tumors.

本发明技术方案的进一步改进在于：所述步骤S4中Softmax分类具体操作方法如下：所述步骤S2中对乳腺肿瘤图像进行分割具体步骤如下：A further improvement of the technical solution of the present invention is that: the specific operation method of Softmax classification in step S4 is as follows: the specific steps of segmenting breast tumor images in step S2 are as follows:

步骤S21、采用基于MobileNetV3网络的编码器。以MobileNet作为编码器的主干Backbone，采用轻量级的MobileNetV3进行多层次特征提取，引入通道注意力机制结构，使用h-swish(x)激活函数代替ReLU激活函数，编码器采用MobileNetV3网络进行特征提取，通过MobileNetV3的5个Stage得到具有5个特征级的特征金字塔，特征图的大小从上到下依次为(240×240×16)，(120×120×24)，(60×60×40)，(30×30×80)，(15×15×960)，对MobileNetV3的5个Stage提取到的特征图进行可视化。Step S21: Use an encoder based on the MobileNetV3 network. Using MobileNet as the backbone of the encoder, the lightweight MobileNetV3 is used for multi-level feature extraction, the channel attention mechanism structure is introduced, and the h-swish(x) activation function is used instead of the ReLU activation function. The encoder uses the MobileNetV3 network for feature extraction. , a feature pyramid with 5 feature levels is obtained through the 5 stages of MobileNetV3. The size of the feature map from top to bottom is (240×240×16), (120×120×24), (60×60×40) , (30×30×80), (15×15×960), visualize the feature maps extracted from the 5 stages of MobileNetV3.

步骤S22、采用改进自适应特征融合(Adaptively Spatial Feature Fusion,ASFF)的方法，来融合MobileNetV3提取到的特征金字塔用以解码。改进ASFF方法主要由两个步骤组成：特征调整和自适应融合，特征调整过程分别调整五个级别的特征矩阵的大小，使用卷积层、池化层、批归一化层和激活函数来使不同特征层的大小和信道数保持一致，自适应融合过程定义了权重归一化函数，利用Softmax函数分别得到五个权重矩阵，然后分别与初始特征映射相乘，得到相同尺度的特征矩阵用于并行特征融合ADD，对于ASFF，其梯度计算如式(1)所示：Step S22: Use an improved Adaptively Spatial Feature Fusion (ASFF) method to fuse the feature pyramid extracted by MobileNetV3 for decoding. The improved ASFF method mainly consists of two steps: feature adjustment and adaptive fusion. The feature adjustment process adjusts the size of the five levels of feature matrices respectively, using convolution layers, pooling layers, batch normalization layers and activation functions to make The size and number of channels of different feature layers are consistent. The adaptive fusion process defines a weight normalization function. The Softmax function is used to obtain five weight matrices, which are then multiplied by the initial feature maps to obtain feature matrices of the same scale. Parallel feature fusion ADD, for ASFF, its gradient calculation is as shown in Equation (1):

其中L为损失函数，为特征图中的某个像素(i,j)的梯度，/>分别为五个层次特征的梯度，/>为不同特征层的权重参数。where L is the loss function, is the gradient of a certain pixel (i,j) in the feature map,/> are the gradients of five levels of features,/> are the weight parameters of different feature layers.

解码器输入的是改进ASFF的结果即大小为15×15×960的特征矩阵，用于生成全分辨率的掩模图像，采用的是上采样结构，上采样运用的是双线性插值法。The decoder input is the result of improved ASFF, which is a feature matrix of size 15×15×960, which is used to generate a full-resolution mask image. It uses an upsampling structure, and the upsampling uses bilinear interpolation.

步骤S23、设计内外约束损失函数。加入了约束肿瘤分割时对背景区域关注的正则化项(In and Out,IaO)，对损失函数进行了改进，达到优化模型的作用，所述正则项表达式如式(2)所示：Step S23: Design internal and external constraint loss functions. A regularization term (In and Out, IaO) is added to constrain the focus on the background area during tumor segmentation, and the loss function is improved to achieve the function of optimizing the model. The expression of the regular term is shown in Equation (2):

其中|GT|-|GT∩S|表示背景区域分错的部分，被称为假阳性False Positive,FP，H×W表示为图像的尺寸，IaO前半部分代表着病变区域的损失，后半部分代表背景部分的损失。用于优化分割网络模型、防止过拟合的内外约束损失函数由Dice损失(相似度指数)与正则项IaO和其权重系数λ组成，其表达式如式(3)所示：Among them, |GT|-|GT∩S| represents the wrong part of the background area, which is called False Positive, FP. H×W represents the size of the image. The first half of IaO represents the loss of the lesion area, and the second half Represents the loss of the background part. The internal and external constraint loss function used to optimize the segmentation network model and prevent overfitting consists of Dice loss (similarity index), the regular term IaO and its weight coefficient λ, and its expression is as shown in Equation (3):

l^ALL＝l^Dice+IaO·λ (3)^lALL ＝^lDice +IaO·λ (3)

本发明技术方案的进一步改进在于：所述步骤S4中Softmax分类具体操作方法如下：所述步骤S3中对掩膜图像与原始图像相结合进行预处理，具体步骤如下：A further improvement of the technical solution of the present invention is that: the specific operation method of Softmax classification in step S4 is as follows: in step S3, the mask image and the original image are combined to preprocess, and the specific steps are as follows:

步骤S31、将原始图像与掩膜图像通过形态学操作、边缘检测和图像滤波得到肿瘤的病灶图像、边缘图像和滤波图像。病灶图像是通过掩膜图像与CLAHE图像按元素乘得到的；边缘图像是通过掩膜图像的边缘检测结果与原始图像按元素乘得到的，首先利用Canny算子对掩膜图像进行边缘检测，然后通过膨胀操作和开运算操作，得到边缘的二值化图像，最后与原始图像进行像素乘得到所需要的边缘图像；滤波图像是采用双边滤波器对原始图像进行滤波处理得到的。Step S31: The original image and the mask image are subjected to morphological operations, edge detection and image filtering to obtain the focus image, edge image and filtered image of the tumor. The lesion image is obtained by multiplying the mask image and the CLAHE image element by element; the edge image is obtained by multiplying the edge detection result of the mask image and the original image element by element. First, the Canny operator is used to perform edge detection on the mask image, and then Through the expansion operation and the opening operation, the binary image of the edge is obtained, and finally the required edge image is obtained by pixel multiplication with the original image; the filtered image is obtained by filtering the original image using a bilateral filter.

步骤S32、采用通道融合技术将病灶图像、边缘图像和滤波图像融合成为三通道的RGB图像，公式如式(4)所示：Step S32: Use channel fusion technology to fuse the lesion image, edge image and filtered image into a three-channel RGB image. The formula is as shown in Equation (4):

I_f＝Concat(I_a,I_b,I_c) (4)I_f =Concat(I_a ,I_b ,I_c ) (4)

其中，Concat为三个单通道图像矩阵的叠加函数，I_f为融合后的RGB图像，I_a为原始肿瘤超声图像，I_b为提取了图像边缘的边缘复杂度图像，I_c为双边滤波图像。Among them, Concat is the superposition function of three single-channel image matrices, I_f is the fused RGB image, I_a is the original tumor ultrasound image, I_b is the edge complexity image with extracted image edges, and I_c is the bilateral filtered image. .

本发明技术方案的进一步改进在于：所述步骤S4中Softmax分类具体操作步骤如下：A further improvement of the technical solution of the present invention is that the specific operation steps of Softmax classification in step S4 are as follows:

步骤S41、对前馈神经网络模块进行改进，通过在FFN上引入3×3的卷积，引入混合前馈网络(Mix feed-forward network，Mix-FFN)模块，采用Mix-FFN进行图像编码，分类Mix-FNN由全连接层，深度卷积层，激活函数GELU和dropout层组成，Mix-FFN的表达式如式(5)所示：Step S41: Improve the feed-forward neural network module by introducing a 3×3 convolution on FFN, introducing a Mix feed-forward network (Mix-FFN) module, and using Mix-FFN for image encoding. Classification Mix-FNN consists of a fully connected layer, a deep convolution layer, an activation function GELU and a dropout layer. The expression of Mix-FFN is as shown in Equation (5):

x_out＝MLP(GELU(Conv_3×3(MLP(x_in))))+x_in (5)x_out =MLP(GELU(Conv_3×3 (MLP(x_in ))))+x_in (5)

其中x_in是自我注意力模块的特征。where x_in is the feature of the self-attention module.

步骤S42、采用大核注意力(Large Kernel Attention,LKA)进行局部和细节特征提取，LKA将一个大卷积核等效地分解为3个部分：一个深度卷积(DW-Conv)、一个深度扩张卷积(Depth-wise Dilation Convolution,DW-D-Conv)和一个逐点卷积(PW-Conv)，通过在卷积核中适当的加入空格，设置扩张系数a，设原卷积核尺寸为k，则扩张卷积核实际核尺寸大小的表达式如式(6)所示：Step S42: Use Large Kernel Attention (LKA) to extract local and detailed features. LKA equivalently decomposes a large convolution kernel into three parts: a depth convolution (DW-Conv), a depth Dilated convolution (Depth-wise Dilation Convolution, DW-D-Conv) and a point-wise convolution (PW-Conv), by adding appropriate spaces to the convolution kernel, setting the expansion coefficient a, and setting the original convolution kernel size is k, then the expression of the actual kernel size of the dilated convolution kernel is as shown in Equation (6):

k＝k+(k-1)(a-1) (6)k＝k+(k-1)(a-1) (6)

通过等效分解，可以依靠更小的FLOPs和参数来更好地表示全局图像特征，LKA的表达式如式(7)所示：Through equivalent decomposition, we can rely on smaller FLOPs and parameters to better represent global image features. The expression of LKA is as shown in Equation (7):

其中F_in是输入特征矩阵，表示按元素乘。where F_in is the input feature matrix, Represents element-wise multiplication.

步骤S43、最后利用Softmax分类器输出分类，识别BI-RADS2类、BI-RADS 3类、BI-RADS 4A类、BI-RADS 4B类、BI-RADS 4C类、BI-RADS 5类。Step S43: Finally, use the Softmax classifier to output the classification and identify BI-RADS type 2, BI-RADS type 3, BI-RADS type 4A, BI-RADS type 4B, BI-RADS type 4C, and BI-RADS type 5.

由于采用了上述技术方案，本发明取得的技术进步是：Due to the adoption of the above technical solutions, the technical progress achieved by the present invention is:

本发明提出的一种基于特征融合和注意力机制的超声图像乳腺肿瘤分类方法，识别BI-RADS2、3、4A、4B、4C、5型(1型和6型分别是确定的良性和恶性结节)共6种类别肿瘤。This invention proposes a method for classifying breast tumors in ultrasound images based on feature fusion and attention mechanism, identifying BI-RADS2, 3, 4A, 4B, 4C, and 5 types (types 1 and 6 are determined benign and malignant tumors respectively. Section) There are 6 types of tumors in total.

本发明提出的基于特征融合的编码-解码的卷积神经网络对乳腺肿瘤图像进行分割，通过MobileNet作为主干网络提取多层次特征，再利用改进自适应特征融合(Adaptively Spatial Feature Fusion,ASFF)对特征进行融合，并利用双线性插值的方法对特征图进行解码。通过多特征的提取和融合，能够兼顾到图像的全局信息和细节信息，有利于解码器对于乳腺超声图像中的病灶进行预测。The encoding-decoding convolutional neural network based on feature fusion proposed by this invention segments breast tumor images, uses MobileNet as the backbone network to extract multi-level features, and then uses improved adaptive feature fusion (Adaptively Spatial Feature Fusion, ASFF) to segment the features. Fusion is performed and the feature map is decoded using bilinear interpolation. Through the extraction and fusion of multiple features, the global information and detailed information of the image can be taken into account, which is beneficial to the decoder in predicting lesions in breast ultrasound images.

本发明为了使分割网络收敛，防止过拟合，设计一种由Dice损失(相似度指数)与能够约束病变区域与背景区域的内外正则化项IaO和其权重系数λ组成的损失函数。In order to make the segmentation network converge and prevent over-fitting, the present invention designs a loss function composed of Dice loss (similarity index), an internal and external regularization term IaO that can constrain the lesion area and the background area, and its weight coefficient λ.

本发明提出一种将掩膜图像与原始图像相结合进行分割增强预处理方法，保留且增强BUS图像的边缘复杂度、背景模糊、钙化、病灶内部回声和长宽比等特征，丰富了肿瘤的边缘信息，并实现了颜色转换，为后续进行乳腺肿瘤病变区域分类做好准备。The present invention proposes a segmentation enhancement preprocessing method that combines the mask image with the original image, retains and enhances the edge complexity, background blur, calcification, internal echo and aspect ratio of the BUS image, and enriches the characteristics of the tumor. Edge information and color conversion are achieved to prepare for subsequent classification of breast tumor lesion areas.

本发明提出的基于改进大核注意力机制的轻量级视觉注意力网络，能够对图像的局部信息与细节信息进行识别，兼顾自注意力机制的长期依赖性和自适应性，完成更好分类的同时，实现轻量化的目标。The lightweight visual attention network proposed by this invention based on the improved large-core attention mechanism can identify the local information and detailed information of the image, take into account the long-term dependence and adaptability of the self-attention mechanism, and complete better classification. At the same time, the goal of lightweighting is achieved.

附图说明Description of drawings

图1是本发明流程图；Figure 1 is a flow chart of the present invention;

图2是BI-RADS数据集示例图像；Figure 2 is an example image of the BI-RADS data set;

图3是MobileNetV3网络结构；Figure 3 is the MobileNetV3 network structure;

图4是MobileNetV3输出特征图可视化；Figure 4 is a visualization of the MobileNetV3 output feature map;

图5是自适应特征融合金字塔结构；Figure 5 is the adaptive feature fusion pyramid structure;

图6是分割增强预处理方法；Figure 6 is the segmentation enhancement preprocessing method;

图7是轻量级视觉注意力网络结构；Figure 7 is the lightweight visual attention network structure;

图8是大核注意力机制LKA示意图。Figure 8 is a schematic diagram of the large core attention mechanism LKA.

具体实施方式Detailed ways

下面结合实施例对本发明做进一步详细说明：The present invention will be further described in detail below in conjunction with the examples:

实施例1Example 1

如图1所示，一种基于特征融合和注意力机制的超声图像乳腺肿瘤分类方法，包括如下步骤：As shown in Figure 1, a breast tumor classification method for ultrasound images based on feature fusion and attention mechanism includes the following steps:

步骤S1、建立BI-RADS乳腺超声图像数据集，获取原始乳腺肿瘤超声图像及分级类别。如一个具体实例：BI-RADS乳腺超声图像数据集共7606张图像(1096张2型，1862张3型，1033张4A型，1290张4B型，1329张4C型，996张5型)，具体示例图像如图2所示。识别BI-RADS2、3、4A、4B、4C、5型共6种类别肿瘤。Step S1: Establish a BI-RADS breast ultrasound image data set and obtain original breast tumor ultrasound images and grading categories. As a specific example: the BI-RADS breast ultrasound image data set has a total of 7606 images (1096 type 2 images, 1862 type 3 images, 1033 type 4A images, 1290 type 4B images, 1329 type 4C images, 996 type 5 images), specifically An example image is shown in Figure 2. Identify 6 types of tumors including BI-RADS2, 3, 4A, 4B, 4C, and 5 types.

首先采用更深层次的MobileNet作为编码器的主干(Backbone)，采用轻量级的MobileNetV3进行多层次特征提取。为了从输入图像中获取语义信息，编码器采用MobileNetV3网络如图3所示，进行特征提取，通过MobileNetV3的5个阶段(Stage)可以得到具有5个特征级的特征金字塔，特征图的大小从上到下依次为(240×240×16)，(120×120×24)，(60×60×40)，(30×30×80)，(15×15×960)。对MobileNetV3的5个Stage提取到的特征图进行可视化，如图4所示，图4a)到图4e)依次表示为Stage1、Stage2、Stage3、Stage4和Stage5的输出特征图，卷积神经网络会提取图像的语义信息，语义信息会联系上下文信息，也就是本层的输出来自于上一层的输入。上下文信息也可以被叫做上下文特征，上下文特征就指像素以及周边像素的某种联系。First, the deeper MobileNet is used as the backbone of the encoder, and the lightweight MobileNetV3 is used for multi-level feature extraction. In order to obtain semantic information from the input image, the encoder uses the MobileNetV3 network as shown in Figure 3 for feature extraction. A feature pyramid with 5 feature levels can be obtained through the 5 stages of MobileNetV3. The size of the feature map is from above The order from bottom to top is (240×240×16), (120×120×24), (60×60×40), (30×30×80), (15×15×960). Visualize the feature maps extracted from the five stages of MobileNetV3, as shown in Figure 4. Figure 4a) to Figure 4e) are sequentially represented as the output feature maps of Stage1, Stage2, Stage3, Stage4 and Stage5. The convolutional neural network will extract The semantic information of the image will be linked to the context information, that is, the output of this layer comes from the input of the previous layer. Contextual information can also be called contextual features, which refer to some relationship between a pixel and surrounding pixels.

其次通过改进的自适应特征融合的方法对提取到的特征图进行融合，并输入到图像解码器，利用双线性插值的方法自下而上的还原掩膜图像。改进自适应特征融合(Adaptively Spatial Feature Fusion,ASFF)的方法来融合MobileNetV3提取到的特征金字塔用以解码。改进ASFF方法如图5所示，主要由两个步骤组成：特征调整(Featureresizing)和自适应融合(Adaptive Fusion)，ASFF的输入为MobileNetV3的五级特征金字塔，首先，特征调整过程分别调整五个级别的特征矩阵的大小，使用卷积层、池化层、批归一化层和激活函数来使不同特征层的大小和信道数保持一致。然后，自适应融合过程定义了权重归一化(Weight normalization)函数，利用Softmax函数分别得到五个权重矩阵，然后分别与初始特征映射相乘，得到相同尺度的特征矩阵用于并行特征融合(ADD)。在整个权重自动分配的过程中，最关键的是通过计算任意两个特征通道之间的交互来直接捕捉远程依赖，得到更多的全局辅助信息以弥补小卷积核信息获取不足的缺陷，进而对所有特征通道进行更加合理权重分配。对于ASFF，其梯度计算如式(1)所示。其中L为损失函数，为特征图中的某个像素(i,j)的梯度，/>分别为五个层次特征的梯度，/>为不同特征层的权重参数。Secondly, the extracted feature maps are fused through an improved adaptive feature fusion method and input into the image decoder, and the mask image is restored from bottom to top using bilinear interpolation method. Improve the Adaptively Spatial Feature Fusion (ASFF) method to fuse the feature pyramid extracted by MobileNetV3 for decoding. The improved ASFF method is shown in Figure 5. It mainly consists of two steps: Feature resizing and Adaptive Fusion. The input of ASFF is the five-level feature pyramid of MobileNetV3. First, the feature adjustment process adjusts the five The size of the feature matrix of the level, using convolutional layers, pooling layers, batch normalization layers and activation functions to make the size and number of channels of different feature layers consistent. Then, the adaptive fusion process defines a weight normalization function, using the Softmax function to obtain five weight matrices, and then multiplying them with the initial feature maps to obtain feature matrices of the same scale for parallel feature fusion (ADD ). In the entire process of automatic weight allocation, the most critical thing is to directly capture long-range dependencies by calculating the interaction between any two feature channels, and obtain more global auxiliary information to make up for the lack of information acquisition in small convolution kernels, and then Allocate more reasonable weights to all feature channels. For ASFF, its gradient calculation is shown in Equation (1). where L is the loss function, is the gradient of a certain pixel (i,j) in the feature map,/> are the gradients of five levels of features,/> are the weight parameters of different feature layers.

如图1所示，解码器输入的是改进ASFF的结果即大小为15×15×960的特征矩阵，用于生成全分辨率的掩模图像。采用的是上采样结构，上采样运用的是双线性插值法，双线性插值其核心思想是在两个方向分别进行1次线性插值，是有两个变量的插值函数的线性插值扩展，它可以有效地消除在特征矩阵恢复的过程中的棋盘效应，同时通过与MobileNetV3前四个stage的输出特征矩阵与对应解码上采样阶段的结果进行跳跃链接，以此将编码阶段的细节信息补充到解码阶段，消除上采样特征图与下采样特征图之间的语义鸿沟。为预测每个像素位置的类别，解码器最后一层加入1×1的卷积用来保留输出空间尺寸和减少通道数。通过1×1的卷积，可以生成两张特征图对像素值进行预测，一张是病变区域，另外一张是背景区域。为了辨别像素值的具体类别，通道维度的Softmax操作被用来将预测值转化为概率值，即病变区域与背景的概率，其中两张特征图的概率和为1，因此在预测时，只需知道其中一张概率图的像素类别即可知道另一张像素图类别，可以通过设定阈值(通常为0.5)，来判断像素值类别。最后，类别判断结果作为乳腺肿瘤的掩膜图像。As shown in Figure 1, the decoder input is the result of improved ASFF, which is a feature matrix of size 15×15×960, which is used to generate a full-resolution mask image. An upsampling structure is adopted. The upsampling uses bilinear interpolation. The core idea of bilinear interpolation is to perform linear interpolation once in two directions. It is a linear interpolation extension of the interpolation function with two variables. It can effectively eliminate the checkerboard effect in the process of feature matrix recovery. At the same time, it supplements the detailed information of the encoding stage by performing jump links with the output feature matrices of the first four stages of MobileNetV3 and the results of the corresponding decoding upsampling stage. In the decoding stage, the semantic gap between the upsampled feature map and the downsampled feature map is eliminated. To predict the category of each pixel position, a 1×1 convolution is added to the last layer of the decoder to preserve the output spatial size and reduce the number of channels. Through 1×1 convolution, two feature maps can be generated to predict pixel values, one is the lesion area, and the other is the background area. In order to identify the specific category of pixel values, the softmax operation of the channel dimension is used to convert the predicted value into a probability value, that is, the probability of the lesion area and the background. The sum of the probabilities of the two feature maps is 1, so when predicting, only By knowing the pixel category of one of the probability maps, you can know the category of the other pixel map. You can determine the pixel value category by setting a threshold (usually 0.5). Finally, the category judgment result is used as a mask image of breast tumors.

最后，加入了约束肿瘤分割时对背景区域关注的正则项，对损失函数进行改进。达到优化模型的作用。本发明设计一种能够约束病变区域与背景区域的内外正则化项(Inand Out,IaO)，其表达式为式(2)所示：Finally, a regular term is added to constrain the focus on the background area during tumor segmentation, and the loss function is improved. to achieve the function of optimizing the model. The present invention designs an internal and external regularization term (Inand Out, IaO) that can constrain the lesion area and the background area, and its expression is as shown in formula (2):

其中|GT|-|GT∩S|表示背景区域分错的部分，被称为假阳性(False Positive,FP)。H×W表示为图像的尺寸。IaO前半部分代表着病变区域的损失，后半部分代表背景部分的损失。由此，用于优化分割网络模型，防止过拟合的内外约束损失函数由Dice损失(相似度指数)与正则项IaO和其权重系数λ组成，其表达式如式(3)所示：Among them, |GT|-|GT∩S| represents the wrong part of the background area, which is called a false positive (False Positive, FP). H×W represents the size of the image. The first half of IaO represents the loss of the lesion area, and the second half represents the loss of the background part. Therefore, the internal and external constraint loss function used to optimize the segmentation network model and prevent overfitting consists of Dice loss (similarity index), the regular term IaO and its weight coefficient λ, and its expression is as shown in Equation (3):

l^ALL＝l^Dice+IaO·λ (3)^lALL ＝^lDice +IaO·λ (3)

步骤S3、将掩膜图像与原始图像相结合进行预处理，先将掩膜图像和原始图像通过形态学操作、边缘检测和图像滤波得到肿瘤的病灶图像、边缘图像和滤波图像，然后将上述三种分解图像进行三通道融合，得到RGB图像。分割增强预处理过程如图6所示。Step S3: Combine the mask image and the original image for preprocessing. First, the mask image and the original image are processed through morphological operations, edge detection and image filtering to obtain the tumor focus image, edge image and filtered image, and then the above three The decomposed images are fused into three channels to obtain an RGB image. The segmentation enhancement preprocessing process is shown in Figure 6.

第一步得到三种分解图像的方法和作用分别如下：In the first step, three methods and functions of decomposing images are obtained as follows:

(1)病灶图像(1) Lesion image

病灶图像是通过掩膜图像与CLAHE图像按元素乘得到的。通过分割网络得到的掩膜图像是二值图像，在背景区域是0，病灶区域是1。CLAHE操作抑制了异性扩散滤波，有效的增强了肿瘤的纹理特征，乳腺肿瘤图像模糊增强后可以增强原始图像中病灶与背景区域的对比度，有效降低非病灶区域对于肿瘤分类的影响。最后，将掩模图像与CLAHE图像相乘后能够仅保留图像的病灶部分细节而剔除背景区域。The lesion image is obtained by multiplying the mask image and the CLAHE image element by element. The mask image obtained through the segmentation network is a binary image, with 0 in the background area and 1 in the lesion area. The CLAHE operation suppresses heterogeneous diffusion filtering and effectively enhances the texture characteristics of the tumor. The blur enhancement of the breast tumor image can enhance the contrast between the lesion and the background area in the original image, effectively reducing the impact of non-lesion areas on tumor classification. Finally, after multiplying the mask image and the CLAHE image, only the details of the lesion part of the image can be retained and the background area can be eliminated.

(2)边缘图像(2) Edge image

边缘图像是通过掩膜图像的边缘检测结果与原始图像按元素乘得到的，首先利用Canny算子对掩膜图像进行边缘检测，然后通过膨胀操作和开运算操作，得到边缘的二值化图像，最后与原始图像进行像素乘得到所需要的边缘图像。由于乳腺肿瘤恶性病灶向周围组织不规则浸润，恶性肿瘤边缘较良性边缘较为复杂，通过肿瘤的边缘图像提取，有效的突出了乳腺肿瘤的边缘复杂性。The edge image is obtained by multiplying the edge detection result of the mask image and the original image element by element. First, the Canny operator is used to detect the edge of the mask image, and then through the expansion operation and the opening operation, the binary image of the edge is obtained. Finally, the required edge image is obtained by pixel multiplication with the original image. Due to the irregular infiltration of malignant lesions of breast tumors into surrounding tissues, the edges of malignant tumors are more complex than the edges of benign tumors. Through the edge image extraction of tumors, the edge complexity of breast tumors can be effectively highlighted.

(3)滤波图像(3) Filtered image

滤波图像是通过双边滤波器对原始图像进行滤波处理，双边滤波器可以同时考虑图像的空间信息和灰度信息，以平滑整个图像的散斑噪声，保留原始图像信息，帮助改善图像质量。双边滤波图像的主要目的是去除肿瘤的噪声，降低分类模型对于图像背景区域的敏感度。Filtering the image is to filter the original image through a bilateral filter. The bilateral filter can consider the spatial information and grayscale information of the image at the same time to smooth the speckle noise of the entire image, retain the original image information, and help improve the image quality. The main purpose of bilateral filtering images is to remove tumor noise and reduce the sensitivity of the classification model to the image background area.

第二步采用通道融合技术将病灶图像、边缘图像和滤波图像融合三通道的RGB图像，公式表示如(4)所示：The second step uses channel fusion technology to fuse the lesion image, edge image and filtered image into a three-channel RGB image. The formula is shown in (4):

I_f＝Concat(I_a,I_b,I_c) (4)I_f =Concat(I_a ,I_b ,I_c ) (4)

改进基于大核注意力机制(Large Kernel Attention,LKA)的轻量级视觉注意力网络(Visual Attention Network,VAN)，如图7所示。由轻量级视觉注意力网络VAN的结构图可以看出，针对输入图像，VAN首先对输入进行下采样，并使用步幅来控制下采样率，使得同一阶段每层输出保持一致，即空间分辨率和通道数。然后通过L组的批量归一化(BN)，点卷积，GELU激活函数，LKA和混合前馈神经网络(Mix-FFN)模块，最后经过Softmax层，将肿瘤分为六类。在VAN网络四个阶段中VAN块(vanblock)的个数分别为2、2、4、2。VAN具有简单的四阶段结构，输出的空间分辨率分别为H/4×W/4、H/8×W/8、H/16×W/16、H/32×W/32，H和W分别为输入图像的高和宽。Improve the lightweight visual attention network (VAN) based on the large kernel attention mechanism (Large Kernel Attention, LKA), as shown in Figure 7. It can be seen from the structural diagram of the lightweight visual attention network VAN that for the input image, VAN first downsamples the input and uses the stride to control the downsampling rate so that the output of each layer in the same stage remains consistent, that is, spatial resolution rate and number of channels. Then through L groups of batch normalization (BN), point convolution, GELU activation function, LKA and hybrid feedforward neural network (Mix-FFN) modules, and finally through the Softmax layer, the tumors are divided into six categories. The numbers of VAN blocks (vanblocks) in the four stages of the VAN network are 2, 2, 4, and 2 respectively. VAN has a simple four-stage structure, and the output spatial resolutions are H/4×W/4, H/8×W/8, H/16×W/16, H/32×W/32, H and W are the height and width of the input image respectively.

对前馈神经网络模块进行改进，通过在FFN上引入3×3的卷积，引入混合前馈网络(Mix feed-forward network Mix-FFN)模块，采用Mix-FFN进行图像编码，能够更好的引入位置信息，适应于研究的BUS图像，分类Mix-FNN由全连接层，深度卷积层，激活函数GELU和dropout层组成。Mix-FFN的表达式如式(5)所示。Improve the feed-forward neural network module by introducing 3×3 convolution on FFN, introducing the Mix feed-forward network Mix-FFN module, and using Mix-FFN for image coding, which can better Introducing position information and adapting to the studied BUS images, the classification Mix-FNN consists of a fully connected layer, a deep convolution layer, an activation function GELU and a dropout layer. The expression of Mix-FFN is shown in Equation (5).

其中x_in是自我注意力模块的特征，Transformer模型使用了位置编码来引入定位信息。但是位置编码的大小是固定的，当模型训练与测试时，输入分辨率不一致时，会造成位置编码插值(padding)，零编码会丢失一些定位信息，导致模型精度下降。因此，当测试的时候输入分辨率与训练的时候的分辨率不一致的话，位置分数就需要被插值，导致精度的下降。Among them, x_in is the feature of the self-attention module, and the Transformer model uses position encoding to introduce positioning information. However, the size of the position encoding is fixed. When the input resolution is inconsistent during model training and testing, position encoding interpolation (padding) will occur. Zero encoding will lose some positioning information, resulting in a decrease in model accuracy. Therefore, when the input resolution during testing is inconsistent with the resolution during training, the position scores need to be interpolated, resulting in a decrease in accuracy.

因此，提出一种新型的视觉注意力方法，即大核注意力(Large KernelAttention,LKA)，LKA将一个大卷积核等效地分解为三个部分：一个深度卷积(DW-Conv)、一个深度扩张卷积(Depth-wise Dilation Convolution,DW-D-Conv)和一个逐点卷积(PW-Conv)，如图8所示。LKA结合了CNN和Transformer各自的优点，引入注意力机制使得模型在进行通道获得自适应性。设置扩张系数a，设原卷积核尺寸为k，则扩张卷积核实际核尺寸大小如式(6)所示：Therefore, a new visual attention method is proposed, namely Large Kernel Attention (LKA). LKA equivalently decomposes a large convolution kernel into three parts: a depth convolution (DW-Conv), A depth-wise dilation convolution (DW-D-Conv) and a point-wise convolution (PW-Conv), as shown in Figure 8. LKA combines the respective advantages of CNN and Transformer, and introduces an attention mechanism to make the model adaptive in the channel. Set the expansion coefficient a, and assume the original convolution kernel size is k, then the actual kernel size of the expanded convolution kernel is as shown in Equation (6):

k＝k+(k-1)(a-1) (6)k＝k+(k-1)(a-1) (6)

通过等效分解，仍然可以依靠更小的FLOPs和参数来更好地表示全局图像特征。LKA的表达式如式(7)所示：Through equivalent decomposition, we can still rely on smaller FLOPs and parameters to better represent global image features. The expression of LKA is shown in equation (7):

其中F_in是输入特征矩阵，表示按元素乘。LKA结合了卷积和自注意力机制的优点，不仅利用了局部感受野信息，同时也考虑了大感受野信息。where F_in is the input feature matrix, Represents element-wise multiplication. LKA combines the advantages of convolution and self-attention mechanisms, not only utilizing local receptive field information, but also considering large receptive field information.

最后利用Softmax分类器输出分类，识别BI-RADS2、3、4A、4B、4C、5型共6种类别肿瘤。Finally, the Softmax classifier is used to output the classification, and a total of 6 categories of tumors are identified: BI-RADS2, 3, 4A, 4B, 4C, and 5 types.

以上所述的实例仅是对本发明的优选实施方式进行描述，并非对本发明的范围进行限定，在不脱离本发明设计精神的前提下，本领域普通技术人员对本发明的技术方案做出的各种变形和改进，均应落入本发明权利要求书所确定的保护范围之内。The examples described above are only descriptions of preferred embodiments of the present invention and do not limit the scope of the present invention. Without departing from the design spirit of the present invention, those of ordinary skill in the art can make various modifications to the technical solutions of the present invention. All modifications and improvements shall fall within the protection scope determined by the claims of the present invention.