CN116246654A

Movatterモバイル変換

Info

Publication number: CN116246654A
Application number: CN202310104612.4A
Authority: CN
Inventors: 张明辉; 孙威威; 孙萍; 张帆
Original assignee: Nanchang University
Current assignee: Nanchang University
Priority date: 2023-02-13
Filing date: 2023-02-13
Publication date: 2023-06-09

Abstract

The invention discloses an improved Swin-transducer-based breath sound automatic classification method, which belongs to the field of audio signal identification and comprises three steps. Step A: ICBHI2017 datasets were prepared, divided into four classes of breath sounds: normal breathing sounds, wheezing sounds, pop sounds, wheezing sounds, and pop sounds. And (B) step (B): audio signal preprocessing. Firstly, the audio signal is downsampled, and a five-order Butterworth band-pass filter is adopted to filter the interference such as heart sounds. Secondly, the audio signal is intelligently filled. And finally, generating a spectrogram by adopting short-time Fourier transform and simultaneously cutting a black area to generate a data set. Step C: the improved Swin-fransformer network is trained using training data. And displaying the predicted result of four-class breathing sounds of the Swin-transducer network by using the confusion matrix. The invention provides a new improvement scheme for automatic classification of breath sounds, and is beneficial to promoting development of breath sound recognition equipment.

Description

Translated fromChinese

一种基于改进的Swin-Transformer的呼吸音自动分类方法An Automatic Classification Method of Breath Sounds Based on Improved Swin-Transformer

技术领域technical field

本发明属于音频信号识别领域，具体为一种基于改进的Swin-Transformer的呼吸音自动分类方法。The invention belongs to the field of audio signal recognition, in particular to an automatic breathing sound classification method based on an improved Swin-Transformer.

背景技术Background technique

呼吸系统疾病是高发和高致亡率的疾病，根据世界卫生组织发布的关于全球十大死亡疾病数据显示，每年与呼吸相关疾病的致死率占10％以上。早发现、早诊断、早治疗具有积极的临床意义。呼吸音听诊是呼吸系统疾病(特别是支气管炎、肺炎、支气管哮喘等)前期诊断最重要、最便捷的诊断手段。然而，使用传统双耳听诊器进行听诊受到很多限制：①听诊需要有经验的医生进行；②是一个主观的过程，听诊结果取决于听诊人感知、区分不同病理声音的经验和能力，容易造成漏诊、误诊；③难以进行定量测量以及检查记录难以保留和回放；④佩戴个人防护装备的医生很难在不违反防护标准的情况下进行听诊，限制了对患有通过空气传播或飞沫传播的肺部疾病患者的听诊。呼吸音自动分析有望克服这些限制。然而，异常呼吸音的自动识别和分类是一项具有挑战性的工作，因为呼吸音具有非平稳的性质，且与同一种疾病相关的异常呼吸音变异较大，不同患者的同种异常呼吸音特征可能不同，甚至单个患者在不同的呼吸周期内也可能发生改变。提高呼吸音自动分类准确率一直以来都是呼吸音分类领域亟待解决的问题。Respiratory diseases are diseases with high incidence and high fatality rate. According to the data on the top ten deadly diseases in the world released by the World Health Organization, the annual mortality rate of respiratory-related diseases accounts for more than 10%. Early detection, early diagnosis and early treatment have positive clinical significance. Breath sound auscultation is the most important and convenient diagnostic method for early diagnosis of respiratory diseases (especially bronchitis, pneumonia, bronchial asthma, etc.). However, the use of traditional binaural stethoscopes for auscultation has many limitations: ①Auscultation needs to be performed by experienced doctors; ②It is a subjective process, and the result of auscultation depends on the experience and ability of the auscultator to perceive and distinguish different pathological sounds, which may easily lead to missed diagnosis, Misdiagnosis; ③It is difficult to carry out quantitative measurement and inspection records are difficult to keep and playback; ④It is difficult for doctors wearing personal protective equipment to auscultate without violating the protection standards, which limits the diagnosis and treatment of patients with lung disease transmitted by air or droplets. Auscultation of a sick patient. Automated analysis of breath sounds promises to overcome these limitations. However, the automatic identification and classification of abnormal breath sounds is a challenging task because of the non-stationary nature of breath sounds and the large variability of abnormal breath sounds associated with the same disease. Signatures may vary and even individual patients may change from breath cycle to breath cycle. Improving the accuracy of automatic classification of breath sounds has always been an urgent problem in the field of breath sound classification.

发明内容Contents of the invention

针对上述提出的问题，本发明的目的是提出一种切实可行、效果优良的呼吸音自动分类方法，包括呼吸音数据预处理技术和基于改进的Swin-Transformer的呼吸音自动分类模型。In view of the problems raised above, the purpose of the present invention is to propose a practical and effective breathing sound automatic classification method, including breathing sound data preprocessing technology and an improved Swin-Transformer-based breathing sound automatic classification model.

为达到上述目的，本发明提供的技术方案具体如下：In order to achieve the above object, the technical scheme provided by the invention is specifically as follows:

步骤A：准备呼吸音ICBHI 2017公开数据集，将数据集按照官网训练集和测试集6:4进行数据划分，根据音频分割代码将数据集分为四类呼吸音：正常呼吸音(Normal)、哮鸣音(Wheeze)、爆裂音(Crackle)、哮鸣音和爆裂音(Wheeze&Crackle)。其中，训练集为4142份，测试集为2756份；Step A: Prepare the public dataset of breath sounds ICBHI 2017, divide the dataset according to the official website training set and test set 6:4, and divide the dataset into four types of breath sounds according to the audio segmentation code: normal breath sounds (Normal), Wheeze, Crackle, Wheeze&Crackle. Among them, the training set is 4142 and the test set is 2756;

步骤B：音频信号预处理。首先对音频信号进行4000Hz下采样，对下采样后的音频数据利用五阶巴特沃斯带通滤波器(100Hz-1800Hz)进行滤波去除心音等干扰。其次，对音频时域信号进行填充，使得每条音频信号长度相同并对填充后的数据集采用短时傅里叶变换生成声谱图。最后对每个声谱图剪切黑色区域，生成最后训练数据和测试数据；Step B: audio signal preprocessing. First, the audio signal is down-sampled at 4000 Hz, and the down-sampled audio data is filtered by a fifth-order Butterworth bandpass filter (100 Hz-1800 Hz) to remove interference such as heart sounds. Secondly, the audio time-domain signal is filled so that each audio signal has the same length, and the filled data set is generated by short-time Fourier transform to generate a spectrogram. Finally, cut the black area for each spectrogram to generate the final training data and test data;

步骤C：用处理后的数据集对基于改进的Swin-Transformer网络构建的分类模型进行训练。利用混淆矩阵(Confusion Matrix)显示Swin-Transformer网络四分类呼吸音的预测结果，将Swin-Transformer网络呼吸音测试准确率与卷积神经网络呼吸音测试准确率进行比较，以此进行网络模型性能评价。Step C: use the processed data set to train the classification model based on the improved Swin-Transformer network. Use the confusion matrix (Confusion Matrix) to display the prediction results of the four-category breath sounds of the Swin-Transformer network, and compare the test accuracy of the Swin-Transformer network breath sounds with the convolutional neural network breath sound test accuracy, so as to evaluate the performance of the network model .

本发明提高了呼吸音分类准确率。具体而言，本发明通过对音频信号下采样，采用巴特沃斯带通滤波器滤除干扰信号，利用智能填充技术和黑色区域剪切技术，使得每一类音频信号的声谱图具有比较明显的特征，对基于改进的Swin-Transformer分类网络模型进行训练。实验结果表明，本设计方案呼吸音分类效果显著。The invention improves the accuracy rate of breath sound classification. Specifically, the present invention down-samples the audio signal, uses a Butterworth band-pass filter to filter out interference signals, and utilizes intelligent filling technology and black area cutting technology, so that the spectrogram of each type of audio signal has a more obvious The features are used to train the improved Swin-Transformer classification network model. Experimental results show that this design scheme has a significant effect on breath sound classification.

本发明具有以下技术优点:The present invention has the following technical advantages:

(1)对原始音频数据按照特征出现时间提取相对应的特征片段，更加具有准确性。(1) It is more accurate to extract the corresponding feature segments from the original audio data according to the feature appearance time.

(2)通过对音频数据采用智能填充技术，黑色区域剪切技术和短时傅里叶变换特征提取方案，更加具有可靠性。(2) By using intelligent filling technology, black area cutting technology and short-time Fourier transform feature extraction scheme for audio data, it is more reliable.

(3)将基于改进的Swin-Transformer网络应用于呼吸音自动分类任务，并将混淆矩阵应用于网络测试中，更加具有先进性。(3) It is more advanced to apply the improved Swin-Transformer network to the automatic classification task of breath sounds, and to apply the confusion matrix to the network test.

(4)将卷积神经网络分类模型和基于改进的Swin-Transformer网络模型进行比较，突显出基于改进的Swin-Transformer网络分类方法在呼吸音分类任务中优于卷积神经网络分类方法。(4) Comparing the convolutional neural network classification model with the improved Swin-Transformer network model, it is highlighted that the improved Swin-Transformer network classification method is superior to the convolutional neural network classification method in the breathing sound classification task.

下面结合附图和具体实施例，对本发明作更为详细的说明。The present invention will be described in more detail below in conjunction with the accompanying drawings and specific embodiments.

附图说明Description of drawings

图1是基于改进的Swin-Transformer的呼吸音自动分类方法的流程图；Fig. 1 is the flow chart of the breathing sound automatic classification method based on improved Swin-Transformer;

图2是基于改进的Swin-Transformer的网络训练流程图；Figure 2 is a flow chart of network training based on the improved Swin-Transformer;

图3是基于改进的Swin-Transformer的呼吸音自动分类方法的总体结构框架图；Fig. 3 is the overall structure frame diagram based on the improved Swin-Transformer breath sound automatic classification method;

图4是基于改进的Swin-Transformer混淆矩阵在2756个测试数据的结果图。Figure 4 is a result graph based on the improved Swin-Transformer confusion matrix in 2756 test data.

具体实施方式Detailed ways

为详细说明本发明技术方案的技术内容，构造特征，所实现目的及效果。以下结合附图，对本发明进行进一步的详细说明。此处所描述的具体实施仅用于解释本发明技术方案，并不限于本发明。In order to describe in detail the technical content, structural features, purpose and effect of the technical solution of the present invention. The present invention will be further described in detail below in conjunction with the accompanying drawings. The specific implementation described here is only used to explain the technical solution of the present invention, and is not limited to the present invention.

图1是本发明的流程图，结合附图1对本发明方法的具体步骤描述如下：Fig. 1 is a flow chart of the present invention, and in conjunction with accompanying drawing 1 the concrete steps of the inventive method are described as follows:

步骤A：准备呼吸音ICBHI 2017公开数据集，将数据集按照官网训练集和测试集6:4进行数据划分，根据音频分割代码将数据集分为四类呼吸音：Normal、Wheeze、Crackle、Wheeze&Crackle。其中，训练集为4142份，测试集为2756份；Step A: Prepare the ICBHI 2017 public dataset of breath sounds, divide the dataset according to the official website training set and test set 6:4, and divide the dataset into four types of breath sounds according to the audio segmentation code: Normal, Wheeze, Crackle, Wheeze&Crackle . Among them, the training set is 4142 and the test set is 2756;

上述步骤A中音频信号处理的具体过程包括：The specific process of audio signal processing in the above step A includes:

A101、对ICBHI2017呼吸音公开数据按四类呼吸音进行音频时间片分割；A101. Segment the public data of ICBHI2017 breath sounds into audio time slices according to four types of breath sounds;

A102、将数据集按照官方给出的6:4比例分为训练集和测试集，并分类保存。A102. Divide the data set into a training set and a test set according to the official ratio of 6:4, and store them in categories.

在上述实施方式步骤A中，该数据库包含总共5.5小时的录音。包含6898次呼吸周期，其中3642个正常呼吸音样本，886个哮鸣音样本，1864个爆裂音样本，506个同时包含哮鸣音和爆裂音样本，来自126名受试者，920个注释音频样本。该呼吸音样本被分为：Normal、Wheeze、Crackle、Wheeze&Crackle四种类别。录音采用异构设备收集，每段时长为10s～90s不等的带注释的呼吸音音频。In Step A of the above embodiment, the database contained a total of 5.5 hours of recordings. Contains 6898 breathing cycles, including 3642 samples of normal breath sounds, 886 samples of wheeze sounds, 1864 samples of crackles, 506 samples of both wheeze and crackles, from 126 subjects, 920 annotated audio sample. The breath sound samples are divided into four categories: Normal, Wheeze, Crackle, Wheeze&Crackle. The recordings are collected by heterogeneous equipment, and each section is annotated breath sound audio with a duration ranging from 10s to 90s.

上述步骤B中，参照图2所示，呼吸音音频数据预处理部分具体步骤包括：In the above-mentioned step B, as shown in FIG. 2, the specific steps of the breath sound audio data preprocessing part include:

B101、对音频信号进行下采样频率4000Hz，采用五阶巴特沃斯带通滤波器100Hz～1800Hz对音频信号滤波；B101. The audio signal is down-sampled at a frequency of 4000 Hz, and the audio signal is filtered by a fifth-order Butterworth bandpass filter of 100 Hz to 1800 Hz;

B102、采用智能填充技术。复制同一周期音频时域信号，使得音频数据的长度保持一致；B102. Adopt intelligent filling technology. Copy the audio time domain signal of the same period, so that the length of the audio data remains consistent;

B103、采用黑色区域剪切技术。经过巴特沃斯带通滤波生成的声谱图会存在黑色且无明显有效纹理的区域，采用该方法剪切黑色区域；B103, using black area clipping technology. The spectrogram generated by the Butterworth bandpass filter will have black areas without obvious effective textures, and this method is used to cut the black areas;

B104、设置短时傅里叶变换FFT点数为256，其他参数默认，生成声谱图。B104. Set the number of short-time Fourier transform FFT points to 256, and other parameters default to generate a spectrogram.

在步骤B101中，采用带通滤波器对音频信号进行滤波，由于巴特沃斯滤波器具有在通频带内的频率响应曲线最大平坦，没有起伏，而在阻频带则逐渐下降为零的优点，故本发明采用五阶巴特沃斯带通滤波器对音频信号进行滤波。In step B101, a band-pass filter is used to filter the audio signal. Since the Butterworth filter has the advantage that the frequency response curve in the pass-band is the most flat, without fluctuations, and gradually decreases to zero in the stop-band, so The present invention uses a fifth-order Butterworth band-pass filter to filter the audio signal.

在步骤B102中，采用智能填充技术。在ICBHI数据集中，呼吸周期的长度从0.2s到16.2s不等，平均周期长度为2.7s。然而，网络训练期望输入的数据长度保持一致。处理该问题的标准方法是通过零填充将音频信号填充到一个固定大小。受零填充思想启发，本发明采用新的智能填充方案。具体来说，对音频时域信号复制同一周期填充，继续这个过程，直到达到一致音频数据大小。一个小的音频样本数据在经过剪切以后，在一个已经很稀缺的数据集中失去了有价值的信息，而一个非常大的样本数据则产生了很多次的重复，降低网络分类性能。In step B102, intelligent filling technology is adopted. In the ICBHI dataset, the length of the breathing cycle varies from 0.2s to 16.2s, with an average cycle length of 2.7s. However, network training expects the input data length to be consistent. The standard way to handle this is to pad the audio signal to a fixed size with zero padding. Inspired by the idea of zero filling, this invention adopts a new intelligent filling scheme. Specifically, the same period padding is replicated for the audio time-domain signal, and the process continues until a consistent audio data size is reached. After a small audio sample data is cut, it loses valuable information in an already scarce data set, while a very large sample data generates many repetitions, reducing the network classification performance.

在步骤B103中，采用黑色区域剪切技术。在使用热力图分析卷积神经网络对呼吸音提取特征规律时，发现网络对声谱图有明显黑色区域的高频区域(1500Hz～1800Hz)不敏感。为避免对网络性能产生不利的影响，有选择地从这些声谱图的高频率区域剪掉黑色区域。确保网络集中在感兴趣的区域，提高网络分类性能。In step B103, the black area clipping technique is used. When using the heat map to analyze the characteristics of the convolutional neural network for breathing sound extraction, it is found that the network is not sensitive to the high-frequency region (1500Hz-1800Hz) with obvious black areas in the spectrogram. To avoid adversely affecting network performance, black regions are selectively clipped from the high-frequency regions of these spectrograms. Ensuring that the network is focused on regions of interest improves network classification performance.

在步骤B104中，采用短时傅里叶变换对音频数据进行处理，具体来说，采用加窗方法对每段时域信号进行傅里叶变换。在数字滤波器设计和频谱估计中，窗函数的选择对于整体结果的质量有重大影响。加窗主要作用是减弱因无穷级数截断而产生吉布斯现象的影响，根据窗函数和采样点数的大小，声谱图可分为窄带声谱图和宽带声谱图。宽带声谱图由于窗短，时间分辨率高、频率分辨率低，而窄带声谱图则由于窗长，时间分辨率低、频率分辨率高。In step B104, short-time Fourier transform is used to process the audio data, specifically, a windowing method is used to perform Fourier transform on each segment of the time-domain signal. In digital filter design and spectral estimation, the choice of window function has a significant impact on the quality of the overall results. The main function of windowing is to weaken the influence of the Gibbs phenomenon caused by the truncation of infinite series. According to the size of the window function and the number of sampling points, the spectrogram can be divided into narrowband spectrogram and broadband spectrogram. Broadband spectrograms have high time resolution and low frequency resolution due to short windows, while narrowband spectrograms have low time resolution and high frequency resolution due to long windows.

根据声谱图易受窗宽影响的特点，对原始信号做短时傅里叶变换的步骤包括：According to the characteristic that the spectrogram is easily affected by the window width, the steps of performing short-time Fourier transform on the original signal include:

首先将窗口移动到信号的开端位置，此时窗函数的中心位置在t＝τ₀处，对信号加窗处理First, move the window to the beginning position of the signal, at this time, the center position of the window function is at t=_τ0 , and add a window to the signal

y(t)＝x(t)*w(t-τ₀)y(t)=x(t)*w(t-τ₀ )

然后进行傅里叶变换Then take the Fourier transform

得到第一个分段序列的频谱分布X(ω)。在实际应用中，由于信号是离散的点序列，因此经傅里叶变换得到频谱序列X[N]，为了更加直观的表示，定义函数S(ω,τ)，表示在窗函数中心为τ时，对原函数进行变换后的频谱结果为X(ω)，公式表示为Get the spectral distribution X(ω) of the first segmented sequence. In practical applications, since the signal is a discrete point sequence, the spectrum sequence X[N] is obtained by Fourier transform. For a more intuitive representation, the function S(ω,τ) is defined, which means that when the center of the window function is τ , the spectral result after transforming the original function is X(ω), and the formula is expressed as

对应离散情形下，S[ω,τ]是一个二维矩阵，每一列代表在不同位置对信号加窗，对得到的分段进行傅里叶变换后的结果序列。以此类推，每次移动窗函数一段距离，该距离一般小于窗口宽度，两窗之间存在一定的重叠部分，重复以上操作，最终得到τ₀-τ_N上所有分段的频谱结果，用S表示。In the discrete case, S[ω,τ] is a two-dimensional matrix, and each column represents the result sequence of windowing the signal at different positions and performing Fourier transform on the obtained segments. By analogy, each time the window function is moved for a certain distance, the distance is generally smaller than the window width, and there is a certain overlap between the two windows. Repeat the above operations, and finally obtain the spectrum results of all segments on τ₀ -τ_N , using S express.

上述步骤C中，参照图3所示，改进Swin-Transformer网络测试部分具体步骤包括：In the above step C, with reference to Fig. 3, the specific steps of improving the Swin-Transformer network test part include:

C101、改进Swin-Transformer网络测试部分，未改进的Swin-Transformer网络测试部分为输出单一的分类准确度，此方法对网络性能的评价不够具体。因此，在呼吸音分类任务中，加入混淆矩阵作为网络的评价指标。不仅可以更好的看出每一类呼吸音的准确度，而且能够看出网络对呼吸音数据集训练效果。C101. Improve the Swin-Transformer network test part. The unimproved Swin-Transformer network test part is to output a single classification accuracy. This method is not specific enough for network performance evaluation. Therefore, in the breathing sound classification task, the confusion matrix is added as the evaluation index of the network. Not only can you better see the accuracy of each type of breath sound, but also you can see the training effect of the network on the breath sound dataset.

在上述实施方式中，步骤C101中，采用混淆矩阵评价网络训练效果，原始网络评价网络性能只能输出单一的训练测试准确度，无法看出每一类的预测效果。在医学领域的预测和诊断中，这样的结果显然是不够优越的。利用混淆矩阵对网络性能进行评价，不仅可以看出每一类的准确度，同时也能计算出模型的敏感性和特异性，有利于呼吸音分类任务的预测和诊断。混淆矩阵包含了实际分类和预测分类的信息，通过对包含的信息进行选择计算，可以估计网络训练的分类能力。下表展示了混淆矩阵具体结构。In the above embodiment, in step C101, the confusion matrix is used to evaluate the network training effect, and the original network evaluation network performance can only output a single training test accuracy, and the prediction effect of each category cannot be seen. In prediction and diagnosis in the medical field, such results are obviously not superior enough. Using the confusion matrix to evaluate the network performance can not only see the accuracy of each category, but also calculate the sensitivity and specificity of the model, which is beneficial to the prediction and diagnosis of breath sound classification tasks. The confusion matrix contains the information of actual classification and predicted classification. By selecting and calculating the included information, the classification ability of network training can be estimated. The following table shows the specific structure of the confusion matrix.

混淆矩阵confusion matrix网络预测是正类The network predicts the positive class网络预测是负类The network predicts the negative class样本实际是正类The sample is actually a positive classTPTPFNFN样本实际是负类The sample is actually a negative classFPFPTNTN

上表显示了n＝2的混淆矩阵，与网络分类相关的大小为n x n的混淆矩阵显示了预测和实际分类结果。其中n是不同类别的数量，TP指样本真实类别是正类，网络预测结果是正类，FP指样本真实类别是负类，网络预测结果是正类，TN指样本真实类别是负类，网络预测结果是负类，FN指样本真实类别是正类，网络预测结果是负类。预测精度和分类误差可以从这个矩阵中得到。具体为，准确率Score＝(TP+TN)/(TP+FP+FN+TN),敏感度Sensitivity(SE)＝TP/(TP+FN),特异度Specificity(SP)＝TN/(TN+FP),F1-值(F1-score)＝2*TP/(2*TP+FP+FN)。其中，敏感度指样本真实类别是正类的结果中，网络预测是正类的比重，特异度指样本真实类别是负类的结果中，网络预测是正类的比重。当FN或FP为0时，表明网络中类别预测分歧为1，在这种情况下，混淆矩阵只对一个类别进行错误的分类。当FN或FP相同时，表明网络中类别预测分歧为0。FN或FP不仅本身具有良好的辨别力，更重要的是相互之间具有互补性。基于改进的Swin-Transformer混淆矩阵在2756个测试数据的结果图如图4所示，混淆矩阵应用到Swin-Transformer不仅提高了网络性能的辨识度，同时能够清晰展示每种呼吸音类别和Swin-Transformer网络对呼吸音数据集的整体预测效果，对于Swin-Transformer网络应用到其他分类任务具有非常好的参考价值。The above table shows the confusion matrix for n = 2, and the confusion matrix of size n x n related to the classification of the network shows the predicted and actual classification results. Among them, n is the number of different categories, TP means that the real category of the sample is positive, and the network prediction result is positive, FP means that the real category of the sample is negative, the network prediction result is positive, TN means that the real category of the sample is negative, and the network prediction result is Negative class, FN means that the true class of the sample is the positive class, and the network prediction result is the negative class. Prediction accuracy and classification error can be obtained from this matrix. Specifically, accuracy rate Score=(TP+TN)/(TP+FP+FN+TN), sensitivity Sensitivity (SE)=TP/(TP+FN), specificity (SP)=TN/(TN+ FP), F1-value (F1-score) = 2*TP/(2*TP+FP+FN). Among them, the sensitivity refers to the proportion of the positive class predicted by the network among the results of the true class of the sample being the positive class, and the specificity refers to the proportion of the positive class predicted by the network among the results of the negative class of the true class of the sample. When FN or FP is 0, it indicates that the class prediction divergence in the network is 1, in which case the confusion matrix misclassifies only one class. When FN or FP are the same, it indicates that the class prediction divergence in the network is 0. FN or FP not only have good discrimination, but more importantly, they are complementary to each other. The results of 2756 test data based on the improved Swin-Transformer confusion matrix are shown in Figure 4. The application of the confusion matrix to Swin-Transformer not only improves the recognition of network performance, but also clearly shows each breathing sound category and Swin- The overall prediction effect of the Transformer network on the breath sound dataset has a very good reference value for the application of the Swin-Transformer network to other classification tasks.

本发明提出基于Swin-Transformer的呼吸音分类方法，其实验实施配置要求如下：The present invention proposes a breathing sound classification method based on Swin-Transformer, and its experimental implementation configuration requirements are as follows:

本方法已经在python中实现了所提出的架构，Intel Core i7-6800中央处理器和NVIDIA Geforce GTX 2080Ti显卡的台式计算机上运行了所有实验。对于该实验的配置要求如下，深度学习框架需为Pytorch，模型训练时显卡显存需大于等于11264MB，测试时无显卡性能要求。数据集为ICBHI 2017呼吸音数据集，原始数据集中包括3642个正常呼吸音样本，886个哮鸣音样本，1864个爆裂音样本，506个同时包含哮鸣音和爆裂音样本。应用改进Swin-Transformer对经过下采样、五阶巴特沃斯带通滤波、智能填充技术、黑色区域剪切技术和短时傅里叶变换处理生成的数据集进行分类实验。This method has been implemented in python for the proposed architecture, and all experiments were run on a desktop computer with an Intel Core i7-6800 CPU and an NVIDIA Geforce GTX 2080Ti graphics card. The configuration requirements for this experiment are as follows. The deep learning framework needs to be Pytorch. The video memory of the graphics card must be greater than or equal to 11264MB during model training. There is no performance requirement of the graphics card during the test. The data set is the ICBHI 2017 breath sound data set. The original data set includes 3642 samples of normal breath sounds, 886 samples of wheeze sounds, 1864 samples of crackles, and 506 samples of both wheezing and crackles. The improved Swin-Transformer is used to classify the data sets generated by downsampling, fifth-order Butterworth bandpass filter, intelligent filling technology, black area clipping technology and short-time Fourier transform.

表1.不同网络对呼吸音四分类准确度比较Table 1. Comparison of four classification accuracy of breath sounds by different networks

实验结果如表1所示，实验表明，在对ICBHI2017呼吸音数据集四分类预测实验中，基于改进Swin-Transformer网络取得了57.3％的分类准确率，高于最新传统卷积神经网络的测试性能。本发明对于提高呼吸音分类准确率具有较为明显的效果。The experimental results are shown in Table 1. The experiment shows that in the four-category prediction experiment of the ICBHI2017 breath sound dataset, the improved Swin-Transformer network achieved a classification accuracy of 57.3%, which is higher than the latest traditional convolutional neural network test performance. . The present invention has obvious effects on improving the classification accuracy of breath sounds.

表2.不同网络对呼吸音四分类混淆矩阵评价对比结果Table 2. Comparison results of different networks for the four-category confusion matrix evaluation of breath sounds

表2为不同网络对呼吸音四分类评价对比结果。其中，SE的计算公式为SE＝TP_i/[(TP_c+FN_c)+(TP_w+FN_w)+(TP_b+FN_b)]，(i∈{Crackle；Wheeze；Crackle&Wheeze})，SP的计算公式为SP＝TP_n/(TP_n+FN_n)，下标简写字母分别为{Normal(n),Crackle(c),Wheeze(w),[Crackle&Wheeze](b)}，Score即为预测正确样本占所有测试样本的比例。从表中可以看出，本发明分类准确率明显高于传统卷积神经网络分类结果。表明本发明具有良好的分类性能，由于该情况符合实际场景，因此本发明具有重大研究意义。Table 2 shows the comparison results of different networks for the four classifications of breath sounds. Among them, the calculation formula of SE is SE=TP_i /[(TP_c +FN_c )+(TP_w +FN_w )+(TP_b +FN_b )], (i∈{Crackle; Wheeze; Crackle&Wheeze}), The calculation formula of SP is SP=TP_n /(TP_n +FN_n ), the subscripts are {Normal(n), Crackle(c), Wheeze(w), [Crackle&Wheeze](b)}, and Score is In order to predict the proportion of correct samples to all test samples. It can be seen from the table that the classification accuracy of the present invention is significantly higher than the traditional convolutional neural network classification results. It shows that the present invention has good classification performance, and because this situation conforms to the actual scene, the present invention has great research significance.

以上所述仅表达了本发明的优选实施方式，其描述较为详细和具体，但不能因此而理解为对本发明专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本发明构思的前提下，还可以做出若干变形、改进以及替代，这些都属于本发明的保护范围。因此，本发明专利的保护范围应以所附权利要求为准。The above description only expresses the preferred implementation of the present invention, and its description is relatively detailed and specific, but should not be construed as limiting the patent scope of the present invention. It should be noted that those skilled in the art can make several modifications, improvements and substitutions without departing from the concept of the present invention, and these all belong to the protection scope of the present invention. Therefore, the protection scope of the patent for the present invention should be based on the appended claims.

Claims

1. An improved Swin-transducer based breathing sound automatic classification method, comprising:

s1, preparing a breathing sound ICBHI2017 public data set, dividing the data set into four types of breathing sounds according to an official network training set and a test set 6:4, and dividing the data set into four types of breathing sounds according to an audio segmentation code: normal breathing sounds Normal, wheeze wheele, pop noise cracke, wheeze and pop noise wheele & cracke;

s2, preprocessing an audio signal; firstly, carrying out 4000Hz downsampling on an audio signal, and filtering the downsampled audio data by using a five-order Butterworth band-pass filter to remove interference; secondly, filling the audio time domain signals so that the length of each audio signal is the same, and generating a spectrogram by adopting short-time Fourier transform on the filled data set; finally, cutting a black region for each spectrogram to generate a final training data set and a test data set;

s3, training a classification model constructed based on the improved Swin-transform network by using the processed data set; and displaying the predicted result of four-class breathing sounds of the Swin-transducer network by using the confusion matrix.

2. The improved Swin-transducer based breath sound automatic classification method according to claim 1, wherein the breath sound preprocessing in step S2 comprises the following steps:

s201, setting the FFT point number of the short-time Fourier transform to 256, defaulting other parameters, and generating a spectrogram.

3. The improved Swin-transducer based breath sound automatic classification method according to claim 1, wherein the Swin-transducer test specific procedure in step S3 comprises:

s301, adding a confusion matrix as an evaluation index of the network in a breath sound classification task by a modified Swin-transducer network test part.