CN112245728A

Movatterモバイル変換

Info

Publication number: CN112245728A
Application number: CN202010492039.5A
Authority: CN
Inventors: 刘佳明; 李想; 范皓玥
Original assignee: Beijing University of Chemical Technology
Current assignee: Beijing University of Chemical Technology
Priority date: 2020-06-03
Filing date: 2020-06-03
Publication date: 2021-01-22
Anticipated expiration: 2040-06-03
Also published as: CN112245728B

Abstract

Translated fromChinese

本发明公开了一种基于集成树的呼吸机假阳性报警信号识别方法及系统，包括以下步骤：S1，数据收集：从医院呼吸机和监护仪中收集病人的监测数据；S2，数据预处理：处理数据集中的缺失值、异常值和标准化，并生成假阳性报警信号的标识规则；S3，特征提取：使用随机森林对特征进行排序，并选取具有良好识别能力的特征；S4，假阳性报警信号识别：建立呼吸机和监护仪的假阳性报警信号识别方法。实验结果表明，本发明具有优良的假阳性报警信号的识别性能，并且方法的识别效果稳健。The invention discloses a method and system for recognizing false positive alarm signals of ventilator based on integrated tree, comprising the following steps: S1, data collection: collecting patient monitoring data from hospital ventilators and monitors; S2, data preprocessing: Process missing values, outliers and normalization in the dataset, and generate identification rules for false positive alarm signals; S3, feature extraction: use random forest to sort features and select features with good identification ability; S4, false positive alarm signals Identification: Establish a method for identifying false positive alarm signals for ventilators and monitors. The experimental results show that the invention has excellent identification performance of false positive alarm signals, and the identification effect of the method is robust.

Description

Translated fromChinese

一种基于集成树的呼吸机假阳性报警信号识别方法及系统A method and system for recognizing false positive alarm signal of ventilator based on ensemble tree

技术领域technical field

本发明涉及医院呼吸机-监护仪假阳性报警信号识别方法及系统，具体涉及一种基于集成树的呼吸机假阳性报警信号识别方法及系统。The invention relates to a method and system for recognizing false positive alarm signals of a hospital ventilator-monitor, in particular to a method and system for recognizing false positive alarm signals of a ventilator based on an integrated tree.

背景技术Background technique

呼吸机作为一种急救与生命支持类医疗设备，已被广泛应用于现代临床医学中，例如用于救治各种原因所致的呼吸衰竭、大手术期间的麻醉呼吸管理、呼吸支持治疗和急救复苏等患者。由于呼吸机主要用于病情高风险的患者，因此通常要配套监护仪同时使用。当患者呼吸出现异常或者设备出现故障，呼吸机-监护仪将发出报警信号，医护人员根据报警信号查看相应患者状态，检查设备运行情况。有效的报警信号，可以帮助医护人员正确的识别和及时的处理呼吸机报警，保障呼吸机的正常工作和患者的安全。As a kind of emergency and life support medical equipment, ventilator has been widely used in modern clinical medicine, such as for the treatment of respiratory failure caused by various reasons, anesthesia and respiratory management during major surgery, respiratory support treatment and emergency resuscitation. Wait for the patient. Since ventilators are mainly used for patients with high risk of disease, they are usually used together with monitors. When the patient's breathing is abnormal or the equipment fails, the ventilator-monitor will send an alarm signal, and the medical staff will check the corresponding patient status and check the operation of the equipment according to the alarm signal. Effective alarm signals can help medical staff to correctly identify and deal with ventilator alarms in a timely manner, ensuring the normal operation of the ventilator and the safety of patients.

然而，在呼吸机-监护仪的使用过程中，往往存在大多报警信号是假阳性报警信号的情况。据统计，现很多发展中国家的医护人员资源都较为紧缺，尤其在疫情或突发事件期间，呼吸机假阳性报警会给医护人员带来更大的工作压力。频繁的假阳性报警会使得医护人员对报警信号产生报警疲劳，影响医护人员对报警信号的反应速度。当多个呼吸机-监护仪同时报警时，很可能会因为假阳性报警情况的出现，使得真正有危险的患者未被及时查看，错失最佳治疗时机。However, during the use of the ventilator-monitor, there are often cases where most of the alarm signals are false positive alarm signals. According to statistics, medical staff in many developing countries are in short supply, especially during epidemics or emergencies, the false positive alarm of ventilator will bring greater work pressure to medical staff. Frequent false positive alarms will cause medical staff to have alarm fatigue to the alarm signal and affect the response speed of the medical staff to the alarm signal. When multiple ventilator-monitors alarm at the same time, it is very likely that the truly dangerous patients are not checked in time due to the occurrence of false positive alarms, and the best treatment opportunity is missed.

基于呼吸机和监护仪监测的真实人体数据识别假阳性报警信号，可以减轻医护人员的工作压力，还可以提高医护人员对报警信号的警觉。此外，准确识别假阳性报警信号还可以为真正有风险的患者争取更加及时的医疗救护，解除呼吸机报警，使患者得到安全有效的治疗。目前，也有对于呼吸机报警原因分析及处理方法的研究，但总体上看，借助机器学习方法对呼吸机-监护仪假阳性报警信号的识别研究尚未展开，大多研究内容围绕呼吸机假阳性报警所引发的问题进行论述，所以当前还没有一个行之有效的呼吸机-监护仪假阳性报警信号识别方法和系统。因此，开展医院呼吸机-监护仪假阳性报警信号识别方法的工作显得尤为重要。Identifying false-positive alarm signals based on real human body data monitored by ventilators and monitors can reduce the work pressure of medical staff and improve the alertness of medical staff to alarm signals. In addition, accurate identification of false-positive alarm signals can also strive for more timely medical care for patients who are truly at risk, release the ventilator alarm, and enable patients to receive safe and effective treatment. At present, there are also studies on the cause analysis and processing methods of ventilator alarms, but in general, the identification of ventilator-monitor false positive alarm signals with the help of machine learning methods has not been carried out. The problems caused are discussed, so there is currently no effective method and system for recognizing false positive alarm signals of ventilator-monitor. Therefore, it is particularly important to develop a method for identifying false positive alarm signals of hospital ventilators-monitors.

因此，目前迫切需要一种能够对呼吸机-监护仪的假阳性报警信号进行识别的新方法，新方法应满足以下技术需求：1)能够有效提高识别结果的解释能力，找到对假阳性报警信号起到关键作用的体征指标；2)具有良好的分类识别效果和性能，是一种有效识别呼吸机-监护仪假阳性报警信号的方法和系统。Therefore, there is an urgent need for a new method that can identify the false positive alarm signals of ventilator-monitor. The new method should meet the following technical requirements: 1) It can effectively improve the interpretation ability of the recognition results and find out the false positive alarm signals. 2) It has good classification and recognition effect and performance, and is a method and system for effectively identifying false positive alarm signals of ventilator-monitor.

发明内容SUMMARY OF THE INVENTION

本发明技术解决问题：提供一种基于集成树的呼吸机假阳性报警信号识别方法及系统，以弥补当前对呼吸机-监护仪假阳性报警信号识别存在主观判断或识别效果不佳的问题，并解决由于假阳性报警所引起的医护人员压力大、错过最佳治疗时机等问题。The technical solution of the present invention is to provide a method and system for recognizing false positive alarm signals of ventilator based on integrated tree, so as to make up for the problem of subjective judgment or poor recognition effect in current recognition of false positive alarm signals of ventilator-monitor, and Solve the problems caused by false positive alarms such as high pressure on medical staff and missing the best time for treatment.

本发明采用的技术方案：The technical scheme adopted in the present invention:

本发明提供了一种基于集成树的呼吸机假阳性报警信号识别方法，包括以下步骤：The invention provides a method for identifying false positive alarm signals of ventilator based on integrated tree, comprising the following steps:

步骤1)数据收集：从医院采集病人真实的若干个呼吸机与若干个监护仪监测的样本数据，并将每个所述监测仪数据样本数据和每个所述呼吸机样本数据组合作为原始数据集，所述原始数据集包括若干个特征数据和对应的报警信号；Step 1) Data collection: collect the real sample data of several ventilators and several monitors of the patient from the hospital, and combine each of the monitor data sample data and each of the ventilator sample data as raw data set, the original data set includes several characteristic data and corresponding alarm signals;

步骤2)数据预处理：对步骤1)所述原始数据集的特征数据进行缺失值处理、异常值处理和数据标准化处理，对所述原始数据集的报警信号进行标识处理，进而得到预处理后数据集，所述标识处理为根据既定规则为所述报警信号分别标识不同类别的标签信息，所述不同类别的标签信息为真阳性报警信号或假阳性报警信号；Step 2) Data preprocessing: perform missing value processing, outlier processing and data standardization processing on the characteristic data of the original data set in step 1), and perform identification processing on the alarm signal of the original data set, and then obtain the preprocessed data. A data set, wherein the identification processing is to identify different types of label information for the alarm signal according to a predetermined rule, and the label information of the different types is a true positive alarm signal or a false positive alarm signal;

步骤3)特征选择：对步骤2)所述预处理后数据集的特征数据，使用随机森林进行特征筛选，保留筛选后特征，所述预处理后数据集中的筛选后特征数据与对应的预警信号标签信息组成训练数据集；Step 3) Feature selection: for the feature data of the preprocessed data set in step 2), use random forest to perform feature screening, and retain the features after screening, and the feature data after screening in the preprocessed data set and the corresponding early warning signals The label information constitutes the training data set;

步骤4)假阳性报警信号识别：使用步骤3)所述训练数据集，对梯度提升决策树分类器参数进行训练，得到已训练的所述报警信号标签信息类别识别器，所述识别器根据新输入的筛选后特征数据与对应的预警信号，进而识别输出所述对应的预警信号标签信息的类别为真阳性报警信号或假阳性报警信号。Step 4) False positive alarm signal identification: use the training data set described in step 3) to train the gradient boosting decision tree classifier parameters to obtain the trained alarm signal label information category identifier, which is based on the new The input filtered feature data and the corresponding early warning signal, and then identify and output the category of the corresponding early warning signal label information as a true positive warning signal or a false positive warning signal.

进一步地，在所述步骤1中：Further, in the step 1:

所述从医院采集病人真实的呼吸机-监护仪监测数据的样本频率以秒为单位，每秒采集三次；The sampling frequency of collecting the patient's real ventilator-monitor monitoring data from the hospital is in seconds, and is collected three times per second;

所述若干个特征包括16个体征特征，所述16个体征特征分别为分钟呼气量、平均压、氧气输入口压力、吸入氧气浓度、呼吸未正压、自主呼吸频率、呼吸频率、吸气潮气量、呼气潮气量、峰压、有创血压平均值、有创血压高值、有创血压低值、中心静脉压、血氧浓度和心率；The several features include 16 physical characteristics, and the 16 physical characteristics are respectively minute expiratory volume, mean pressure, oxygen inlet pressure, inhaled oxygen concentration, positive breathing pressure, spontaneous breathing frequency, breathing frequency, inhalation. Tidal volume, expiratory tidal volume, peak pressure, mean invasive blood pressure, high invasive blood pressure, low invasive blood pressure, central venous pressure, blood oxygen concentration and heart rate;

所述将每个所述监测仪数据样本和每个所述呼吸机数据样本组合的具体实现为，以呼吸机为主时间戳，采用匹配方法将每个监测仪数据样本和每个呼吸机数据样本组合成为一个样本。The specific implementation of the combination of each of the monitor data samples and each of the ventilator data samples is that the ventilator is the main timestamp, and the matching method is used to combine each monitor data sample and each ventilator data. The samples are combined into one sample.

进一步地，在所述步骤2中：Further, in the step 2:

所述缺失值处理的具体实现为，采用特征均值法，筛选将所述原始数据集中每个特征数据的缺失值，填补为每个特征数据的均值，其中所述原始数据集中第i个特征数据第j个样本的缺失值经缺失值处理填补后的数值x′_{missing(i，j)}，The specific implementation of the missing value processing is to use the feature mean method to filter and fill in the missing value of each feature data in the original data set as the mean value of each feature data, wherein the i-th feature data in the original data set. The missing value of the jth sample is filled by the missing value processing and filled with the value x'_{missing(i, j)} ,

其中，x_i1，x_i2，...，x_in分别表示所述原始数据集中第i个特征下的第1,2,…,n个样本，n表示样本数量；Wherein, x_i1 , x_i2 , ..., x_in respectively represent the 1st, 2nd, ..., nth samples under the ith feature in the original data set, and n represents the number of samples;

所述异常值处理的具体实现为，采用三倍标准差法，首先筛选所述原始数据集中每个特征数据中与所述特征数据的均值之差大于三倍所述特征数据的标准差的异常值，调整为所述特征数据的均值与三倍所述特征数据的标准差之和；然后筛选所述原始数据集中每个特征数据中与所述特征数据的均值之差小于三倍所述特征数据的标准差相反数的异常值，调整为所述特征数据的均值与三倍所述特征数据的标准差之差，其中所述原始数据集中第j个样本第i个特征数据的异常值经异常值处理调整后的数值x′_{outlier(i，j)}，The specific implementation of the outlier processing is to use the triple standard deviation method to first screen out the anomalies where the difference between each feature data in the original data set and the mean of the feature data is greater than three times the standard deviation of the feature data. value, adjusted to the sum of the mean value of the feature data and three times the standard deviation of the feature data; then filter the difference between the mean value of the feature data and the feature data in each feature data in the original data set is less than three times the feature The abnormal value of the opposite number of the standard deviation of the data is adjusted to be the difference between the mean value of the characteristic data and three times the standard deviation of the characteristic data, wherein the outlier value of the i-th characteristic data of the j-th sample in the original data set is Outlier processing adjusted value x′_{outlier(i, j)} ,

其中，x_ij表示所述原始数据集中第i个特征数据下第j个样本的数值，μ_i表示所述原始数据集中第i个特征数据的均值，σ_i表示所述原始数据集中第i个特征数据的标准差；Wherein, x_ij represents the value of the j-th sample under the i-th feature data in the original data set, μ_i represents the mean value of the i-th feature data in the original data set, σ_i represents the i-th sample in the original data set Standard deviation of characteristic data;

所述标准化处理的具体实现为，采用z-score方法，将所述原始数据集中每个特征数据的数值，替换为每个特征数据的z-score，其中所述原始数据集中第j个样本第i个特征数据的数值经异常值处理替换后的数值x′_norm(i，j)，The specific implementation of the standardization process is to use the z-score method to replace the value of each characteristic data in the original data set with the z-score of each characteristic data, wherein the jth sample in the original data set is the first. The values of the i feature data are replaced by outliers processing x′_{norm(i, j)} ,

其中，x_ij表示所述原始数据集中第i个特征数据下的第j个样本的数值，μ_i表示所述原始数据集中第i个特征数据的均值，σ_i表示所述原始数据集中第i个特征数据的标准差；Wherein, x_ij represents the value of the j-th sample under the i-th feature data in the original data set, μ_i represents the mean value of the i-th feature data in the original data set, σ_i represents the i-th sample in the original data set The standard deviation of the characteristic data;

所述标识处理的具体实现为，所述既定标识规则包括：当所述呼吸机和所述监护仪同时报警时，则所述报警信号的标签信息为真阳性报警信号；当所述呼吸机或所述监护仪出现连续无间断报警，且报警持续次数超过3次时，则所述报警信号的标签信息为真阳性报警信号；当体征特征数据超过所述呼吸机设定的体征特征数据的阈值范围时，则所述报警信号的标签信息为真阳性报警信号；所述体征特征数据的阈值范围包括：分钟呼气量0.5～180L/min，允许误差范围±3％；平均压-2～12kPa，允许误差范围±0.1kPa；吸入氧气浓度21～100％，允许误差范围±3％；呼吸未正压-12～12kPa，允许误差范围±0.05kPa；自主呼吸频率和呼吸频率1～150次/分，允许误差范围±3％；吸气/呼气潮气量-10～10L，允许误差范围±3％。The specific implementation of the identification process is that the predetermined identification rules include: when the ventilator and the monitor alarm at the same time, the label information of the alarm signal is a true positive alarm signal; When the monitor has continuous and uninterrupted alarms, and the number of alarms lasts more than 3 times, the label information of the alarm signal is a true positive alarm signal; when the sign feature data exceeds the sign feature data threshold set by the ventilator When it falls within the range, the label information of the alarm signal is a true positive alarm signal; the threshold range of the sign feature data includes: minute expiratory volume 0.5～180L/min, allowable error range ±3%; average pressure -2～12kPa , the allowable error range is ±0.1kPa; the inhaled oxygen concentration is 21～100%, the allowable error range is ±3%; the breathing is not positive pressure -12～12kPa, the allowable error range is ±0.05kPa; spontaneous breathing frequency andrespiratory rate 1～150 times/ The allowable error range is ±3%; the inspiratory/expiratory tidal volume is -10~10L, and the allowable error range is ±3%.

进一步地，在所述步骤3中，所述使用随机森林进行特征筛选，保留筛选后特征的具体实现为：Further, in thestep 3, the random forest is used for feature screening, and the specific implementation of retaining the features after screening is as follows:

对所述预处理后数据集中的每个特征F，计算报警信号标签信息的信息熵Entropy(L)与特征F下的报警信号标签的信息熵Entropy(L，F)之差信息增益Gain(L，F)，For each feature F in the preprocessed data set, calculate the difference between the information entropy Entropy (L) of the alarm signal label information and the information entropy Entropy (L, F) of the alarm signal label under the feature F. Information gain Gain (L , F),

Gain(L，F)＝Entropy(L)-Entropy(L，F)，Gain(L,F)=Entropy(L)-Entropy(L,F),

若Gain(L，F)＞θ，则保留特征F为筛选后特征，若Gain(L，F)＜θ，则删除特征F，θ为设定阈值；If Gain(L, F) > θ, the feature F is retained as the filtered feature; if Gain(L, F) < θ, the feature F is deleted, and θ is the set threshold;

其中，L表示所述预处理后数据集的报警信号标签信息，p_i表示报警信号第i个类别的标签信息在所述预处理后数据集中出现的概率；Wherein, L represents the alarm signal label information of the data set after preprocessing, and p_i represents the probability that the label information of the ith category of the alarm signal appears in the data set after preprocessing;

其中，L表示所述预处理后数据集的报警信号标签信息，v表示所述预处理后数据集在特征F下取值的个数，L_j表示所述预处理后数据集在特征F下第j个取值的个数。Wherein, L represents the alarm signal label information of the preprocessed data set, v represents the number of values of the preprocessed data set under the feature F, and_Lj represents the preprocessed data set under the feature F The number of the jth value.

所述筛选后特征包括峰压、心率、呼吸频率、自主呼吸频率、呼气潮气量、吸气潮气量、分钟呼吸量、平均压与呼吸未正压，优选包括峰压、心率、呼吸频率与自主呼吸频率。The post-screening features include peak pressure, heart rate, respiratory rate, spontaneous respiratory rate, expiratory tidal volume, inspiratory tidal volume, minute respiratory volume, mean pressure and positive respiratory pressure, preferably peak pressure, heart rate, respiratory rate and Spontaneous breathing rate.

进一步地，在所述步骤4中，所述梯度提升决策树分类器的决策树数量设置范围为[50,150]，步长为10，树高度设置范围为[3,10]，步长为1，叶节点数量设置范围为[5,15]，步长为1。Further, in thestep 4, the number of decision trees of the gradient boosting decision tree classifier is set in the range of [50, 150], the step size is 10, the tree height is set in the range of [3, 10], and the step size is 1, The setting range of the number of leaf nodes is [5, 15], and the step size is 1.

所述步骤4的具体实现为：The specific implementation of thestep 4 is:

41)将步骤3)得到的所述筛选后特征作为输入特征向量空间，若第m-1轮的所述梯度提升决策树分类器识别输出的预警信号标签信息为F_m-1(x)，则损失函数L(y，F_m-1(x))＝y-F_m-1(x)，其中x为样本，y为样本的真实的预警信号标签信息；41) Taking the screened feature obtained in step 3) as the input feature vector space, if the early warning signal label information output by the gradient boosting decision tree classifier in the m-1th round is F_m-1 (x), Then the loss function L(y, F_m-1 (x))=yF_m-1 (x), where x is the sample, and y is the real early warning signal label information of the sample;

42)通过L(y，F_m-1(x))对F_m-1(x)求偏导

得到第m轮所述梯度提升决策树分类器的优化方向，学习率γ_m-1控制第m-1轮所述梯度提升决策树分类器识别输出的预警信号标签信息的贡献度，则第m轮所述梯度提升决策树分类器识别输出的预警信号标签信息为

42) Find the partial derivative of F_m-1 (x) by L(y, F_m-1 (x))

The optimization direction of the gradient boosting decision tree classifier in the mth round is obtained, and the learning rate γ_m-1 controls the contribution of the early warning signal label information output by the gradient boosting decision tree classifier in the m-1th round, then the mth The early warning signal label information output by the gradient boosting decision tree classifier in the round is as follows:

43)迭代重复步骤41)～42)，直至第m轮与第m-1轮的所述梯度提升决策树分类器识别输出的预警信号标签信息F_m(x)与F_m-1(x)之差小于设定阈值时，则迭代重复停止，得到已训练的所述报警信号标签信息类别识别器；43) Iteratively repeat steps 41) to 42) until the early warning signal label information F_m (x) and F_m-1 (x) output by the gradient boosting decision tree classifier in the mth round and the m-1th round are identified. When the difference is less than the set threshold, the iteration stops repeatedly to obtain the trained alarm signal label information category identifier;

44)所述识别器根据新输入的筛选后特征数据与对应的预警信号，进而识别输出所述对应的预警信号标签信息的类别为真阳性报警信号或假阳性报警信号。44) According to the newly input filtered feature data and the corresponding early warning signal, the identifier further identifies and outputs the category of the corresponding early warning signal label information as a true positive warning signal or a false positive warning signal.

本发明还提供了一种基于集成树的呼吸机假阳性报警信号识别方系统，所述系统包括：The present invention also provides a system for identifying a false positive alarm signal of a ventilator based on an integrated tree, the system comprising:

数据获取模块，数据预处理模块和报警信号标签信息类别识别器；Data acquisition module, data preprocessing module and alarm signal label information category identifier;

所述数据获取模块，获取用户输入的呼吸机-监护仪监测数据集，并发送至数据预处理模块，所述监测数据集包括若干个特征数据与报警信号，所述若干个特征包括峰压、心率、呼吸频率、自主呼吸频率、呼气潮气量、吸气潮气量、分钟呼吸量、平均压、呼吸未正压，优选包括峰压、心率、呼吸频率、自主呼吸频率；The data acquisition module acquires the ventilator-monitor monitoring data set input by the user, and sends it to the data preprocessing module, the monitoring data set includes several characteristic data and alarm signals, and the several characteristics include peak pressure, Heart rate, respiratory rate, spontaneous respiratory rate, expiratory tidal volume, inspiratory tidal volume, minute respiratory volume, mean pressure, positive respiratory pressure, preferably including peak pressure, heart rate, respiratory rate, spontaneous respiratory rate;

所述数据预处理模块，接收所述数据获取模块所发送的监测数据集，对所述检测数据集中特征数据进行缺失值处理、异常值处理和数据标准化处理，并将预处理后的监测数据集发送至报警信号标签信息类别识别器；The data preprocessing module receives the monitoring data set sent by the data acquisition module, performs missing value processing, outlier processing and data standardization processing on the characteristic data in the detection data set, and stores the preprocessed monitoring data set. Sent to the alarm signal label information category identifier;

所述报警信号标签信息类别识别器，为已训练的梯度提升决策树分类器，接收所述数据预处理模块发送的所述预处理后的监测数据集，识别输出所述预警信号标签信息的类别为真阳性报警信号或假阳性报警信号。The warning signal label information category identifier is a trained gradient boosting decision tree classifier, and receives the preprocessed monitoring data set sent by the data preprocessing module, and identifies and outputs the warning signal label information category. It is a true positive alarm signal or a false positive alarm signal.

与现有技术相比，本发明的优点为：Compared with the prior art, the advantages of the present invention are:

(1)本发明的一种基于梯度提升决策树方法实现对医院呼吸机和监护仪假阳性报警信号的识别方法，首先从医院的呼吸机和监护仪中采集病人的实时监测数据，然后对数据进行预处理操作和假阳性报警信号的标识工作，保证数据的完整性和有效性，接下来运用随机森林方法对数据的有效特征进行提取，再采用梯度提升决策树方法，最终实现对医院呼吸机和监护仪假阳性报警信号的识别和验证，同时实验结果表明，相比于其它先进的机器学习方法，该方法具有更优秀的识别性能；(1) A method for identifying false positive alarm signals of hospital ventilators and monitors based on a gradient boosting decision tree method of the present invention, first collects real-time monitoring data of patients from the ventilators and monitors in the hospital, and then analyzes the data. Carry out preprocessing operations and identification of false positive alarm signals to ensure the integrity and validity of the data. Next, use the random forest method to extract the effective features of the data, and then use the gradient boosting decision tree method to finally realize the hospital ventilator. And the detection and verification of false positive alarm signals of monitors, and the experimental results show that this method has better recognition performance than other advanced machine learning methods;

(2)本发明提出的假阳性报警信号识别方法具有很好的解释能力，为找到关键分类指标提供了依据；(2) The false positive alarm signal identification method proposed by the present invention has good explanatory ability, and provides a basis for finding key classification indicators;

(3)本发明具有非常优秀的分类识别性能，与其它机器学习方法相比，在准确率，AUC，F1-SCORE等方面都具有最好的识别结果。(3) The present invention has excellent classification and recognition performance, and compared with other machine learning methods, it has the best recognition results in terms of accuracy, AUC, and F1-SCORE.

附图说明Description of drawings

构成本申请的一部分附图用来提供对本发明的进一步理解，本发明的示意性实例及其说明用于解释本发明，以使得本发明的上述优点更加明晰。其中，The accompanying drawings, which constitute a part of this application, are used to provide a further understanding of the present invention, and the schematic examples of the present invention and their descriptions are used to explain the present invention so as to make the above-mentioned advantages of the present invention more apparent. in,

图1是本发明的假阳性报警信号识别方法的流程图；Fig. 1 is the flow chart of the false positive alarm signal identification method of the present invention;

图2是数据采集设备呼吸机和监护仪；Figure 2 is the data acquisition equipment ventilator and monitor;

图3是GBDT、SVM、NB、LR四种方法F1-Score指标的比较图，其中，(a)训练集测试集比率80:2；(b)训练集测试集比率70:30；(c)训练集测试集比率60:40，GBDT表示梯度提升决策树，SVM表示支持向量机，NB表示朴素贝叶斯分类器，LR表示Logistic回归。Figure 3 is a comparison chart of the F1-Score indicators of the four methods of GBDT, SVM, NB, and LR. Among them, (a) the ratio of training set and test set is 80:2; (b) the ratio of training set and test set is 70:30; (c) The training set test set ratio is 60:40, GBDT means gradient boosting decision tree, SVM means support vector machine, NB means Naive Bayes classifier, and LR means Logistic regression.

具体实施方式Detailed ways

为了使本发明的目的、技术方案、实施步骤和优点更加清晰明了，以下内容结合附图及实施例子，对本发明进行进一步详细说明。需要说明的是，该部分内容的具体实施例子仅用于解释本发明，并不用于限定本发明，并且实施例子中各个部分相互组合形成的技术方案均在本发明的保护范围之内。In order to make the objectives, technical solutions, implementation steps and advantages of the present invention clearer and clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be noted that the specific embodiments in this part are only used to explain the present invention, but not to limit the present invention, and the technical solutions formed by the combination of various parts in the embodiments are all within the protection scope of the present invention.

本发明主要针对在医院环境中，为重病人配套的呼吸机和监护仪频繁产生假阳性报警信号，进而导致医护人员工作压力激增的问题，本发明提出一种基于集成树的呼吸机假阳性报警信号识别方法，用于识别假阳性报警信号，来缓解医护人员的工作压力，包括以下步骤：S1，数据收集：从医院呼吸机和监护仪中收集病人的监测数据作为原始数据集；S2，数据预处理：对原始数据集进行缺失值、异常值和标准化处理，对原始数据集的报警信号进行标识处理，根据既定规则为报警信号分别标识不同类别的标签信息，即真阳性报警信号或假阳性报警信号；S3，特征提取：使用随机森林进行特征筛选，保留筛选后特征，进而构建训练数据集；S4，假阳性报警信号识别：使用训练集对梯度提升决策树分类器参数进行训练，建立报警信号标签信息类别识别器，根据新输入的筛选后特征数据与对应的预警信号，进而识别输出对应的预警信号标签信息的类别为真阳性报警信号或假阳性报警信号。本发明首先从医院呼吸机和监护仪上收集病人的真实体征数据，对数据预处理后，进行了基于随机森林的特征选择，再采用梯度提升决策树方法实现了呼吸机和监护仪假阳性报警信号的识别工作，并进行了实验验证。实验结果表明，本发明具有优良的假阳性报警信号的识别性能，并且方法的识别效果稳健。The present invention is mainly aimed at the problem that in the hospital environment, the ventilator and the monitor for the critically ill patient frequently generate false positive alarm signals, which in turn leads to a surge in the working pressure of medical staff. The present invention proposes a false positive alarm for the ventilator based on an integrated tree. The signal identification method is used to identify false positive alarm signals to relieve the work pressure of medical staff, and includes the following steps: S1, data collection: collect patient monitoring data from hospital ventilators and monitors as the original data set; S2, data Preprocessing: Perform missing values, outliers and standardization on the original data set, identify the alarm signals of the original data set, and identify different types of label information for the alarm signals according to the established rules, that is, true positive alarm signals or false positives. Alarm signal; S3, feature extraction: use random forest for feature screening, retain the filtered features, and then build a training data set; S4, false positive alarm signal identification: use the training set to train the parameters of the gradient boosting decision tree classifier, and establish an alarm The signal label information category identifier, according to the newly input filtered feature data and the corresponding early warning signal, further identifies and outputs the corresponding early warning signal label information category as a true positive alarm signal or a false positive alarm signal. The present invention first collects the patient's real sign data from the hospital ventilator and monitor, performs feature selection based on random forest after data preprocessing, and then adopts the gradient boosting decision tree method to realize the false positive alarm of the ventilator and monitor. The identification of the signal works, and the experimental verification is carried out. The experimental results show that the present invention has excellent identification performance of false positive alarm signals, and the identification effect of the method is robust.

本发明方法的流程主要包括以下步骤：The process flow of the method of the present invention mainly comprises the following steps:

进一步地，在步骤1)中：Further, in step 1):

根据呼吸机和监护仪的实际运行状态，同时考虑到具有较高分辨率数据能够更加准确的展现病人的真实状态，从医院呼吸机和监护仪中收集病人的体征监测数据，采集的样本频率以秒为单位，每秒采集三次样本。According to the actual operating state of the ventilator and monitor, and taking into account that higher-resolution data can more accurately show the real state of the patient, the patient's physical monitoring data is collected from the hospital ventilator and monitor. The frequency of the collected samples is determined by In seconds, samples are taken three times per second.

采集的信息包括16个体征特征，分别是分钟呼气量、平均压、氧气输入口压力、吸入氧气浓度、呼吸未正压、自主呼吸频率、呼吸频率、吸气潮气量、呼气潮气量、峰压、有创血压平均值、有创血压高值、有创血压低值、中心静脉压、血氧浓度和心率，以及一个用于识别报警信号的标签信息。The collected information includes 16 physical characteristics, namely minute expiratory volume, mean pressure, oxygen inlet pressure, inhaled oxygen concentration, positive breathing pressure, spontaneous breathing rate, respiratory rate, inspiratory tidal volume, expiratory tidal volume, Peak blood pressure, invasive blood pressure average, invasive blood pressure high, invasive blood pressure low, central venous pressure, blood oxygen concentration, and heart rate, and a label for identifying alarm signals.

体征特征对应的集合表示为X＝{x₁，x₂，...，x₁₆}，1个报警信号的标签信息表示为Y＝{0，1}，其中，0表示未发生报警，1表示发生报警。_The set corresponding to the physical features is represented as X={x₁ , x₂ , . Indicates that an alarm has occurred.

由于数据来源于呼吸机和检测仪两种设备，所以存在两个数据源样本时间戳不统一的问题。以呼吸机为主时间戳，采用匹配方法将每个监测仪数据样本和每个呼吸机数据样本组合成为一个样本。具体实现为：首先以采集时间为依据，分别对呼吸机和监护仪的样本进行排序，进而将具有相同时刻的监护仪样本对应合并到呼吸机样本上，当存在呼吸机具有该时刻样本，而监护仪不存在该时刻样本的情形时，删除该时刻下呼吸机对应的样本，反之亦然。Since the data comes from two devices, the ventilator and the detector, there is a problem that the time stamps of the samples from the two data sources are not uniform. Taking the ventilator as the main timestamp, the matching method is used to combine each monitor data sample and each ventilator data sample into one sample. The specific implementation is as follows: first, based on the collection time, the samples of the ventilator and the monitor are sorted respectively, and then the monitor samples with the same time are correspondingly merged into the ventilator samples. When there is a ventilator with samples at this time, and When the monitor does not have a sample at this moment, the sample corresponding to the ventilator at this moment is deleted, and vice versa.

进一步地，在步骤2)中：Further, in step 2):

缺失值处理步骤，由于机器故障等不可避免因素，造成数据集中出现完全随机缺失数据的情况，需要对缺失数据进行处理，采用特征均值法对缺失值进行填补，计算公式如下：In the missing value processing step, due to unavoidable factors such as machine failure, completely random missing data appears in the data set. The missing data needs to be processed, and the missing value is filled by the feature mean method. The calculation formula is as follows:

其中，x_i1，x_i2，...，x_in分别表示所述原始数据集中第i个特征下的第1，2，...，n个样本，j为缺失值对应的索引，n表示样本数量。Among them, x_i1 , x_i2 , ..., x_in respectively represent the 1st, 2nd, ..., n samples under the i-th feature in the original data set, j is the index corresponding to the missing value, and n represents Number of samples.

异常值处理步骤，由于记录失误、机器异常等原因，导致数据中出现明显异的样本，为了避免数据中异常值对假阳性报警信号的识别负面影响，采用三倍标准差的方差对异常值进行处理，即将超过三倍标准差的数据调整为三倍标准差值，从而解决异常值问题，具体计算公式如下：Outlier processing steps, due to recording errors, machine abnormalities and other reasons, resulting in significantly different samples in the data, in order to avoid the negative impact of outliers in the data on the identification of false positive alarm signals, the variance of three times the standard deviation is used to analyze the outliers. Processing, that is, adjusting the data that exceeds three times the standard deviation to three times the standard deviation value, so as to solve the problem of outliers. The specific calculation formula is as follows:

其中，x_ij表示所述原始数据集中第i个特征数据下第j个样本为异常值，μ_i表示所述原始数据集中第i个特征数据的均值，σ_i表示所述原始数据集中第i个特征数据的标准差。Wherein, x_ij represents that the j-th sample under the i-th feature data in the original data set is an outlier, μ_i represents the mean value of the i-th feature data in the original data set, and σ_i represents the i-th sample in the original data set The standard deviation of the feature data.

标准化处理步骤，为了消除不同体征特征之间的量纲问题，避免分类结果过分偏向某个量纲较大特征的现象，采用z-score方法对数据特征进行了标准化处理，具体处理公式如下：In the standardization processing steps, in order to eliminate the dimension problem between different physical features and avoid the phenomenon that the classification results are overly biased to a certain feature with a larger dimension, the z-score method is used to standardize the data features. The specific processing formula is as follows:

其中，x_ij表示所述原始数据集中第i个特征数据下的第j个样本，μ_i表示所述原始数据集中第i个特征数据的均值，σ_i表示所述原始数据集中第i个特征数据的标准差。Wherein, x_ij represents the j th sample under the ith feature data in the original data set, μ_i represents the mean value of the ith feature data in the original data set, σ_i represents the ith feature in the original data set The standard deviation of the data.

假阳性报警信号的标识步骤，由于假阳性报警信号的标记需要医学相关领域专家的专业背景知识，经过充分综合讨论后，提出了真阳性报警信号的规律，从所有报警信号中标记真阳性报警信号，则剩余部分自动化分为假阳性报警信号。标识规则包括三方面：(1)当呼吸机和监护仪同时报警时，则为真阳性报警信号；(2)当呼吸机或监护仪出现连续无间断报警时，则为真阳性报警信号；(3)当病人体征数据超过呼吸机设定阈值时，则为真阳性报警信号。所述体征特征包括分钟呼气量，流量范围：(0.5～180)L/min，允许误差范围：±3％、平均压，压力范围：(-2～12)kPa，允许误差范围：±0.1kPa、吸入氧气浓度，范围：21％～100％，允许误差范围：±3％、呼吸未正压，范围：±12kPa，允许误差范围：±0.05kPa、自主呼吸频率和呼吸频率，频率范围：(1～150)次/分，允许误差范围：±3％、吸气/呼气潮气量，潮气量：±10L，允许误差范围：±3％。The identification steps of false positive alarm signals, because the marking of false positive alarm signals requires the professional background knowledge of experts in medical related fields, after a full and comprehensive discussion, the law of true positive alarm signals is put forward, and true positive alarm signals are marked from all alarm signals. , the remainder is automatically classified as a false positive alarm signal. The identification rules include three aspects: (1) when the ventilator and the monitor alarm at the same time, it is a true positive alarm signal; (2) when the ventilator or the monitor has continuous and uninterrupted alarms, it is a true positive alarm signal; ( 3) When the patient's sign data exceeds the threshold set by the ventilator, it is a true positive alarm signal. The physical characteristics include minute expiratory volume, flow range: (0.5～180) L/min, allowable error range: ±3%, average pressure, pressure range: (-2～12)kPa, allowable error range: ±0.1 kPa, inhaled oxygen concentration, range: 21% ~ 100%, allowable error range: ±3%, breathing without positive pressure, range: ±12kPa, allowable error range: ±0.05kPa, spontaneous breathing frequency and breathing frequency, frequency range: (1～150) times/min, allowable error range: ±3%, inspiratory/expiratory tidal volume, tidal volume: ±10L, allowable error range: ±3%.

基于以上规则，完成了对真阳性和假阳性报警信号的标识工作，为后续有监督学习的分类方法提供了科学的数据基础。Based on the above rules, the identification of true positive and false positive alarm signals is completed, which provides a scientific data basis for the subsequent classification method of supervised learning.

进一步地，在步骤3)中：Further, in step 3):

收集的原始数据中包含了病人的16个体征数据，其中不乏存在对识别假阳性报警信号性能较弱的特征，所以这部分内容采用了随机森林(Random Forest)的特征选择方法，对原始数据集中的特征进行选择，在不降低识别精度的基础上减少特征的数量，提高计算效率和识别准确率。The collected raw data includes 16 patient signs data, many of which have weak performance in identifying false positive alarm signals, so this part adopts the feature selection method of Random Forest (Random Forest). The features are selected, the number of features is reduced without reducing the recognition accuracy, and the calculation efficiency and recognition accuracy are improved.

随机森林进行特征选择的主要思路是基于决策树中信息增益的思想。随机森林通过对数据的特征和样本进行双重扰动，来生成多个具有差异性的决策树，其进行特征选择的思想与决策树是大同小异的，不过具备了多个决策树集成的优势，通过模型的方式可以表示为：The main idea of feature selection in random forest is based on the idea of information gain in decision trees. Random forest generates multiple differentiated decision trees by double perturbing data features and samples. The idea of feature selection is similar to that of decision trees, but it has the advantage of integrating multiple decision trees. can be expressed as:

Gain(L，F)＝Entropy(L)-Entropy(L，F)，Gain(L,F)=Entropy(L)-Entropy(L,F),

所述报警信号标签信息的信息熵

其中，L表示所述预处理后数据集的报警信号标签信息，p_i表示报警信号第i个类别的标签信息在所述预处理后数据集中出现的概率，p_i通过计算假阳性报警信号样本在所述预处理后数据集样本中的数量占比得到；Information entropy of the alarm signal label information

Wherein, L represents the alarm signal label information of the preprocessed data set, pi represents the probability that the label information of the_ith category of the alarm signal appears in the preprocessed data set, and_pi is calculated by calculating the false positive alarm signal samples. The number proportion in the data set samples after the preprocessing is obtained;

所述特征F下的报警信号标签的信息熵

其中，L表示所述预处理后数据集的报警信号标签信息，L_j表示所述预处理后数据集的特征F取某个数值的数量，v表示所述预处理后数据集在特征F下不同取值的个数，j表示所述预处理后数据集在特征F下第j个值的索引。The information entropy of the alarm signal label under the feature F

Wherein, L represents the alarm signal label information of the preprocessed data set,_Lj represents the number of the feature F of the preprocessed data set to take a certain value, v represents the preprocessed data set under the feature F The number of different values, j represents the index of the jth value of the preprocessed data set under the feature F.

所述随机森林方法的实现包括：The implementation of the random forest method includes:

步骤1)输入初始特征集的训练数据，计算并输出报警信号的信息熵Entropy(L)，为步骤3)计算信息增益提供依据；Step 1) input the training data of the initial feature set, calculate and output the information entropy Entropy (L) of the alarm signal, provide a basis for step 3) calculate the information gain;

步骤2)输入初始特征集的训练数据，计算并输出在每棵树下，特征F的报警信号条件熵Entropy(L,F)，为步骤3)计算信息增益提供依据；Step 2) input the training data of the initial feature set, calculate and output under each tree, the alarm signal condition entropy Entropy (L, F) of the feature F, provide a basis for step 3) to calculate the information gain;

步骤3)以步骤1)和步骤2)的结果作为输入，计算在每棵树下特征F的信息增益Gain(L,F)，并根据树的数量取信息增益的平均值。信息增益越大，说明该特征对分类结果更加重要。设定特征选取的阈值θ，当Gain(L,F)超过θ时，保留这个特征，反之则删除这个特征。Step 3) Take the results of step 1) and step 2) as input, calculate the information gain Gain(L, F) of the feature F under each tree, and take the average value of the information gain according to the number of trees. The greater the information gain, the more important the feature is to the classification result. Set the threshold θ for feature selection. When Gain(L, F) exceeds θ, keep this feature, otherwise delete this feature.

进一步地，在步骤4)中：Further, in step 4):

梯度提升决策树参数初始化过程：影响梯度提升决策树分类性能的主要参数包括树的数量、树的高度、叶节点的数量，梯度提升决策树分类器的决策树数量设置范围为[50,150]，步长为10，树高度设置范围为[3,10]，步长为1，叶节点数量设置范围为[5,15]，步长为1。Gradient boosting decision tree parameter initialization process: The main parameters that affect the classification performance of gradient boosting decision tree include the number of trees, the height of the tree, and the number of leaf nodes. The number of decision trees for the gradient boosting decision tree classifier The length is 10, the tree height setting range is [3, 10], the step size is 1, the leaf node number setting range is [5, 15], and the step size is 1.

将步骤3得到的特征子集作为该步骤的输入特征向量空间，因为梯度提升的计算目的是为了减少上一次计算结果的残差，所以为了消除残差，假设第(m-1)轮的分类器分类结果是F_m-1(x)，则损失函数定义为：L(y，F_m-1(x))＝y-F_m-1(x)，其中x为样本，y为样本的真实的报警信号值。梯度提升决策树通过对损失函数L(y，F_m-1(x))的预测值F_m-1(x_i)求偏导

得到下一轮决策树的优化方向，用学习率γ_m控制每轮的决策树对分类结果的贡献度，那么第m轮的分类器结果可以表示为

The feature subset obtained instep 3 is used as the input feature vector space of this step. Because the calculation purpose of gradient boosting is to reduce the residual of the previous calculation result, in order to eliminate the residual, it is assumed that the classification of the (m-1) round If the classification result is F_m-1 (x), the loss function is defined as: L(y, F_m-1 (x))=yF_m-1 (x), where x is the sample and y is the real sample Alarm signal value. Gradient boosting decision tree by taking the partial derivative of the predicted value F_m-1 (x_i ) of the loss function L(y, F_m-1 (x))

The optimization direction of the decision tree in the next round is obtained, and the learning rate γm is used to control the contribution of the decision tree in each round to the classification result, then the classifier result of the_mth round can be expressed as

迭代重复上述步骤，直至第m轮的所述梯度提升决策树分类器识别的预警信号标签信息类别F_m(x)与第m-1轮的所述梯度提升决策树分类器识别的预警信号标签信息类别F_m-1(x)之差小于设定阈值时，则迭代重复停止，所述梯度提升决策树分类器完成训练，得到已训练的所述报警信号标签信息类别识别器，其中，F_m(x_i)表示第m轮迭代对第i个样本的预测标签结果，y_i表示样本i的真实标签信息，L(y_i，F_m(x_i)为损失函数，即真实标签值与预测标签值之间的误差。Iteratively repeat the above steps until the early warning signal label information category F_m (x) identified by the gradient boosting decision tree classifier in the mth round and the early warning signal label identified by the gradient boosting decision tree classifier in the m-1 round When the difference between the information categories F_m-1 (x) is less than the set threshold, the iteration stops repeatedly, the gradient boosting decision tree classifier completes the training, and the trained alarm signal label information category identifier is obtained, where F_m (x_i ) represents the predicted label result of the i-th sample in the m-th iteration, y_i represents the real label information of the sample i, and L(y_i , F_m (x_i ) is the loss function, that is, the real label value and the Error between predicted label values.

所述识别器根据新输入的筛选后特征数据与对应的预警信号，进而识别输出所述对应的预警信号标签信息的类别为真阳性报警信号或假阳性报警信号。According to the newly input filtered feature data and the corresponding early warning signal, the identifier further identifies and outputs the category of the corresponding early warning signal label information as a true positive warning signal or a false positive warning signal.

本发明的一种医院呼吸机-监护仪假阳性报警信号识别系统包括：A hospital ventilator-monitor false positive alarm signal identification system of the present invention includes:

数据获取模块：获取用户输入的呼吸机-监护仪监测数据集，并发送至数据预处理模块，所述监测数据集包括若干个特征数据与报警信号，所述若干个特征包括峰压、心率、呼吸频率、自主呼吸频率、呼气潮气量、吸气潮气量、分钟呼吸量、平均压、呼吸未正压，所述若干个特征优选为峰压、心率、呼吸频率、自主呼吸频率；Data acquisition module: acquires the ventilator-monitor monitoring data set input by the user, and sends it to the data preprocessing module, the monitoring data set includes several characteristic data and alarm signals, and the several characteristics include peak pressure, heart rate, Respiratory frequency, spontaneous breathing frequency, expiratory tidal volume, inspiratory tidal volume, minute respiratory volume, mean pressure, positive breathing pressure, the several features are preferably peak pressure, heart rate, respiratory frequency, spontaneous breathing frequency;

数据预处理模块：接收所述数据获取模块所发送的监测数据集，对所述检测数据集中特征数据进行缺失值处理、异常值处理和数据标准化处理，并将预处理后的监测数据集发送至报警信号标签信息类别识别器；Data preprocessing module: receives the monitoring data set sent by the data acquisition module, performs missing value processing, outlier processing and data standardization processing on the feature data in the detection data set, and sends the preprocessed monitoring data set to Alarm signal label information category identifier;

报警信号标签信息类别识别器，为已训练的梯度提升决策树分类器，接收所述数据预处理模块发送的所述预处理后的监测数据集，识别输出所述预警信号标签信息的类别为真阳性报警信号或假阳性报警信号。The alarm signal label information category identifier is a trained gradient boosting decision tree classifier, receives the preprocessed monitoring data set sent by the data preprocessing module, and identifies and outputs the category of the warning signal label information as true Positive alarm signal or false positive alarm signal.

为了验证该方法在呼吸机-监护仪假阳性报警信号识别中的性能，进行了实证实验，从医院的重症监护室中收集了多个呼吸机和监护仪的真实病人监测数据，呼吸机-监护仪如图2所示。数据样本量为15006条病人体征和报警信号记录。In order to verify the performance of the method in the recognition of false positive alarm signals of ventilator-monitor, an empirical experiment was carried out, and real patient monitoring data of multiple ventilators and monitors were collected from the intensive care unit of the hospital, ventilator-monitor The instrument is shown in Figure 2. The data sample size is 15,006 records of patient signs and alarm signals.

为实证性能，选取了三种常用的机器学习分类方法作为对比方法，分别是Logistic回归(Logistic Regression,LR)、支持向量机(Support Vector Machine,SVM)和朴素贝叶斯分类器(Naive Bayes,NB)与本发明提出的方法进行比较，采用的假阳性报警信号识别性能的判断指标包括准确率(Accuracy)、第一类错误率(Type I error)、第二类错误率(Type II error)、AUC和F1-score。实验流程如图1所示：For empirical performance, three commonly used machine learning classification methods are selected as comparison methods, namely Logistic Regression (LR), Support Vector Machine (SVM) and Naive Bayes classifier (Naive Bayes, LR). NB) is compared with the method proposed by the present invention, and the judgment index of the false positive alarm signal recognition performance adopted includes the accuracy rate (Accuracy), the first type error rate (Type I error), the second type error rate (Type II error) , AUC and F1-score. The experimental process is shown in Figure 1:

为了对医院呼吸机和监护仪采集的数据集有直观的了解，采用的数据样本如表1所示：In order to have an intuitive understanding of the data sets collected by hospital ventilators and monitors, the data samples used are shown in Table 1:

表1：监测信息数据集样本Table 1: Sample Monitoring Information Dataset

x1x1x2x2x3x3x4x4x5x5x6x6x7x7x8x8x9x9x10x10x11x11x12x12x13x13x14x14x15x15x16x16yy8.18.17.27.256.656.650505.45.429.929.929.929.925825829129115.515.57979111111656530309999111111117.97.97.57.556.656.650505.45.43030303025725726026015.615.6808011211266661919100100109109117.67.67.67.656.656.650505.35.330.130.130.130.12722722922921515808011111166661919100100108108117.77.77.57.556.656.650505530.130.130.130.123223223023015.415.4787810810865652020100100107107116.76.77.87.856.656.650505.35.330.330.330.330.325425428828815.215.27878109109656521twenty one999910510511887.47.456.656.650505.65.629.529.529.529.525525525425415.615.6797911011065651919100100109109007.77.77.57.556.656.650505.35.32828282823723725725715.315.37878109109646419199898110110007.27.27.77.756.656.650505.15.126.526.526.526.530030029729715.315.3787810910964641919100100109109006.96.97.57.556.656.650505.75.725.525.525.525.527527525625615.415.4787810910964641919100100109109007.37.37.37.356.656.650505.45.427.927.927.927.924024025525515.615.678781101106464191910010010910900

在表1中，x₁…x₁₆对应表示病人的16个体征信息(分别为分钟呼气量、平均压、氧气输入口压力、吸入氧气浓度、呼吸未正压、自主呼吸频率、呼吸频率、吸气潮气量、呼气潮气量、峰压、有创血压平均值、有创血压高值、有创血压低值、中心静脉压、血氧浓度和心率)，y代表报警信号类型(y＝1表示假阳性报警信号，y＝0表示真阳性报警信号)In Table₁ , x₁ . Inspiratory tidal volume, expiratory tidal volume, peak pressure, average invasive blood pressure, high invasive blood pressure, low invasive blood pressure, central venous pressure, blood oxygen concentration and heart rate), y represents the type of alarm signal (y= 1 means false positive alarm signal, y=0 means true positive alarm signal)

为了避免由于一次实验可能造成的随机性，通过随机采样的方式进行了30次的实验，其中训练样本和测试样本划分的比例分别是20％，30％和40％，最后取30次实验的平均结果和方差来评判方法性能的好坏。本发明提出的方法结果和对比方法的结果分别列于表2-4：In order to avoid the randomness that may be caused by one experiment, 30 experiments were carried out by random sampling, in which the proportions of training samples and test samples were 20%, 30% and 40% respectively, and finally the average of the 30 experiments was taken. The results and variances are used to judge the performance of the method. The results of the method proposed by the present invention and the results of the comparison method are listed in Table 2-4 respectively:

表2：GBDT方法及对比方法(LR、SVM与NB)的性能比较(训练集:测试集＝80:20)Table 2: Performance comparison of GBDT methods and contrasting methods (LR, SVM and NB) (training set: test set = 80:20)

从表2的分类结果可以看出，梯度提升决策树GBDT取得了最好的假阳性报警信号的识别效果，对比其它四个方法，在准确率、第二类错误率、AUC和F1-Score上都取得了非常好的分类结果，AUC达到了97.6％。虽然在第一类错误率上，梯度提升决策树的错误率要高于Logistic回归和支持向量机，但是本发明针对的领域是假阳性报警信号的识别，也就是说，第二类错误率真正反映了一个方法对假阳性报警信号的识别成功率。所以即使Logistic回归在第一类错误率上取得了非常好的效果，但是第二类错误率非常高，结果显示为56.4％，说明该方法对假阳性报警型号的识别效果并不理想。此外，AUC和F1-Score是用于衡量两类信号分类效果的综合指标，梯度提升决策树在这两个指标的性能上也是远远领先于其它方法。四种模型的F1-Score对比如图3中的(a)所示。所以，本发明提出的这类方法具有非常好的识别假阳性报警型号的作用。From the classification results in Table 2, it can be seen that the gradient boosting decision tree GBDT has achieved the best recognition effect of false positive alarm signals. All have achieved very good classification results, with an AUC of 97.6%. Although the error rate of gradient boosting decision tree is higher than that of logistic regression and support vector machine in the first type of error rate, the field targeted by the present invention is the identification of false positive alarm signals, that is, the second type of error rate is true It reflects the recognition success rate of a method for false positive alarm signals. So even though Logistic regression has achieved a very good effect on the first type of error rate, the second type of error rate is very high, the result is 56.4%, indicating that the method is not ideal for identifying false positive alarm models. In addition, AUC and F1-Score are comprehensive indicators used to measure the classification effect of two types of signals, and the performance of gradient boosting decision tree is far ahead of other methods in these two indicators. The F1-Score comparison of the four models is shown in (a) in Figure 3. Therefore, the method proposed by the present invention has a very good function of identifying false positive alarm models.

表3：GBDT方法及对比方法(LR、SVM与NB)的性能比较(训练集:测试集＝70:30)Table 3: Performance comparison of GBDT methods and contrasting methods (LR, SVM and NB) (training set: test set = 70:30)

MetricsMetricsGBDTGBDTLRLRSVMSVMNBNBAUCAUC0.972(0.02)0.972(0.02)0.691(0.03)0.691(0.03)0.837(0.03)0.837(0.03)0.837(0.03)0.837(0.03)AccuracyAccuracy0.997(0.00)0.997(0.00)0.989(0.00)0.989(0.00)0.994(0.00)0.994(0.00)0.941(0.01)0.941(0.01)TypeIErrorTypeIError0.002(0.00)0.002(0.00)0.000(0.00)0.000(0.00)0.000(0.00)0.000(0.00)0.060(0.01)0.060(0.01)TypeIIErrorTypeIIError0.054(0.04)0.054(0.04)0.619(0.06)0.619(0.06)0.325(0.06)0.325(0.06)0.012(0.01)0.012(0.01)F1-ScoreF1-Score0.914(0.04)0.914(0.04)0.547(0.06)0.547(0.06)0.794(0.04)0.794(0.04)0.364(0.03)0.364 (0.03)

表3反映了各个方法在不同训练集和测试集比率下的识别效果，总体的结果与表2的结果是一致的。说明提出的梯度提升决策树方法在假阳性报警信号的识别中具有鲁棒性，信号识别效果稳健。四种模型的F1-Score对比如图3中的(b)所示。Table 3 reflects the recognition effect of each method under different ratios of training set and test set, and the overall results are consistent with those in Table 2. It shows that the proposed gradient boosting decision tree method is robust in the identification of false positive alarm signals, and the signal identification effect is robust. The F1-Score comparison of the four models is shown in Figure 3(b).

表4：GBDT方法及对比方法(LR、SVM与NB)的性能比较(训练集:测试集＝60:40)Table 4: Performance comparison of GBDT methods and contrasting methods (LR, SVM and NB) (training set: test set = 60:40)

表4与表2和3所反映出来的分类效果是一样的，由于训练集样本量的减少，所有方法的信号识别性能整体有所下降，但是这不影响整体判断，即梯度提升决策树具有最优的假阳性报警信号的识别功能。此外，观察括号内30次实验结果的方差结果不难发现，梯度提升决策树的表现也非常的稳定，没有产生较大的波动。为了进一步对实验结果进行更明了的展现，可以进一步参考图3中的(c)提供的所述GBDT、LR、SVM和NB四种方法F1-SCORE指标的比较结果。The classification effects reflected in Table 4 and Tables 2 and 3 are the same. Due to the reduction of the sample size of the training set, the signal recognition performance of all methods has declined as a whole, but this does not affect the overall judgment, that is, the gradient boosting decision tree has the most Excellent detection of false positive alarm signals. In addition, it is not difficult to observe the variance results of the 30 experimental results in parentheses, and it is not difficult to find that the performance of the gradient boosting decision tree is also very stable without major fluctuations. In order to further show the experimental results more clearly, you can further refer to the comparison results of the F1-SCORE indicators of the four methods of GBDT, LR, SVM and NB provided in (c) of FIG. 3 .

本发明的一种基于集成树的呼吸机假阳性报警信号识别方法，首先采集了医院中呼吸机和监护仪的病人真实体征数据，对数据预处理后进行了基于随机森林的特征选择，再采用梯度提升决策树方法，最终实现了呼吸机和监护仪假阳性报警信号的识别工作，并进行了实验验证。实验结果表明，本发明具有优良的假阳性报警信号的识别性能，并且方法的识别效果稳健。A method for recognizing false positive alarm signals of ventilator based on ensemble tree of the present invention firstly collects the patient's real sign data of ventilator and monitor in the hospital, performs feature selection based on random forest after data preprocessing, and then adopts The gradient boosting decision tree method finally realized the identification of false positive alarm signals of ventilators and monitors, and carried out experimental verification. The experimental results show that the invention has excellent identification performance of false positive alarm signals, and the identification effect of the method is robust.

提供以上实施例仅仅是为了描述本发明的目的，而并非要限制本发明的范围。本发明的范围由所附权利要求限定。不脱离本发明的精神和原理而做出的各种等同替换和修改，均应涵盖在本发明的范围之内。The above embodiments are provided for the purpose of describing the present invention only, and are not intended to limit the scope of the present invention. The scope of the invention is defined by the appended claims. Various equivalent replacements and modifications made without departing from the spirit and principle of the present invention should be included within the scope of the present invention.

Claims

Translated fromChinese

1.一种基于集成树的呼吸机假阳性报警信号识别方法，其特征在于，包括以下步骤：1. a ventilator false positive alarm signal identification method based on integrated tree, is characterized in that, comprises the following steps:

2.如权利要求1所述的一种基于集成树的呼吸机假阳性报警信号识别方法，其特征在于，在所述步骤1中：2. a kind of ventilator false positive alarm signal identification method based on integrated tree as claimed in claim 1, is characterized in that, in described step 1:

3.如权利要求1所述的一种基于集成树的呼吸机假阳性报警信号识别方法，其特征在于，在所述步骤2中：3. a kind of ventilator false positive alarm signal identification method based on integrated tree as claimed in claim 1, is characterized in that, in described step 2:

其中，x_i1，x_i2，...，x_in分别表示所述原始数据集中第i个特征下的第1，2，…，n个样本，n表示样本数量；Wherein, x_i1 , x_i2 , ..., x_in respectively represent the 1st, 2nd, ..., n samples under the i-th feature in the original data set, and n represents the number of samples;

所述标识处理的具体实现为，所述既定标识规则包括：当所述呼吸机和所述监护仪同时报警时，则所述报警信号的标签信息为真阳性报警信号；当所述呼吸机或所述监护仪出现连续无间断报警，且报警持续次数超过3次时，则所述报警信号的标签信息为真阳性报警信号；当体征特征数据超过所述呼吸机设定的体征特征数据的阈值范围时，则所述报警信号的标签信息为真阳性报警信号；所述体征特征数据的阈值范围包括：分钟呼气量0.5～180L/min，允许误差范围±3％；平均压-2～12kPa，允许误差范围±0.1kPa；吸入氧气浓度21～100％，允许误差范围±3％；呼吸未正压-12～12kPa，允许误差范围±0.05kPa；自主呼吸频率和呼吸频率1～150次/分，允许误差范围±3％；吸气/呼气潮气量-10～10L，允许误差范围±3％。The specific implementation of the identification process is that the predetermined identification rules include: when the ventilator and the monitor alarm at the same time, the label information of the alarm signal is a true positive alarm signal; When the monitor has continuous and uninterrupted alarms, and the number of alarms lasts more than 3 times, the label information of the alarm signal is a true positive alarm signal; when the sign feature data exceeds the sign feature data threshold set by the ventilator When it falls within the range, the label information of the alarm signal is a true positive alarm signal; the threshold range of the sign feature data includes: minute expiratory volume 0.5～180L/min, allowable error range ±3%; average pressure -2～12kPa , the allowable error range is ±0.1kPa; the inhaled oxygen concentration is 21～100%, the allowable error range is ±3%; the breathing is not positive pressure -12～12kPa, the allowable error range is ±0.05kPa; spontaneous breathing frequency and respiratory rate 1～150 times/ The allowable error range is ±3%; the inspiratory/expiratory tidal volume is -10~10L, and the allowable error range is ±3%.

4.如权利要求1所述的一种基于集成树的呼吸机假阳性报警信号识别方法，其特征在于，在所述步骤3中，所述使用随机森林进行特征筛选，保留筛选后特征的具体实现为：4. a kind of ventilator false-positive alarm signal identification method based on ensemble tree as claimed in claim 1, is characterized in that, in described step 3, described using random forest to carry out feature screening, keep the specific feature after screening Implemented as:

对所述预处理后数据集中的每个特征F，计算报警信号标签信息的信息熵Entropy(L)与特征F下的报警信号标签的信息熵Entropy(L，F)之差信息增益Gain(L，F)，For each feature F in the preprocessed data set, calculate the difference between the information entropy Entropy (L) of the alarm signal label information and the information entropy Entropy (L, F) of the alarm signal label under the feature F. Gain (L , F),

Gain(L，F)＝Entropy(L)-Entropy(L，F)，Gain(L,F)=Entropy(L)-Entropy(L,F),

5.如权利要求1所述的一种基于集成树的呼吸机假阳性报警信号识别方法，其特征在于，在所述步骤3中，所述筛选后特征包括峰压、心率、呼吸频率、自主呼吸频率、呼气潮气量、吸气潮气量、分钟呼吸量、平均压与呼吸未正压。5. a kind of ventilator false positive alarm signal identification method based on ensemble tree as claimed in claim 1, is characterized in that, in described step 3, after described screening feature comprises peak pressure, heart rate, respiratory rate, autonomous Respiratory rate, expiratory tidal volume, inspiratory tidal volume, minute respiratory volume, mean pressure and positive respiratory pressure.

6.如权利要求1所述的一种基于集成树的呼吸机假阳性报警信号识别方法，其特征在于，在所述步骤4中，所述梯度提升决策树分类器的决策树数量设置范围为[50，150]，步长为10，树高度设置范围为[3，10]，步长为1，叶节点数量设置范围为[5，15]，步长为1。6. a kind of ventilator false positive alarm signal identification method based on ensemble tree as claimed in claim 1 is characterized in that, in described step 4, the decision tree quantity setting range of described gradient boosting decision tree classifier is [50, 150], step size is 10, tree height setting range is [3, 10], step size is 1, leaf node number setting range is [5, 15], step size is 1.

7.如权利要求1所述的一种基于集成树的呼吸机假阳性报警信号识别方法，其特征在于，所述步骤4的具体实现为：7. a kind of ventilator false positive alarm signal identification method based on integrated tree as claimed in claim 1, is characterized in that, the concrete realization of described step 4 is:

41)将步骤3)得到的所述筛选后特征作为输入特征向量空间，若第m-1轮的所述梯度提升决策树分类器识别输出的预警信号标签信息为F_m-1(x)，则损失函数L(y，F_m-1(x))＝y-F_m-1(x)，其中x为样本，y为样本对应的真实的预警信号标签信息；41) Taking the screened feature obtained in step 3) as the input feature vector space, if the early warning signal label information output by the gradient boosting decision tree classifier in the m-1th round is F_m-1 (x), Then the loss function L(y, F_m-1 (x))=yF_m-1 (x), where x is the sample, and y is the real warning signal label information corresponding to the sample;

42)通过L(y，F_m-1(x))对F_m-1(x)求偏导

42) Find the partial derivative of F_m-1 (x) by L(y, F_m-1 (x))

8.一种基于集成树的呼吸机假阳性报警信号识别系统，其特征在于，所述系统包括：8. A system for identifying false positive alarm signals for ventilator based on integrated tree, wherein the system comprises:

所述数据获取模块，获取用户输入的呼吸机-监护仪监测数据集，并发送至数据预处理模块，所述监测数据集包括若干个特征数据与报警信号，所述若干个特征包括峰压、心率、呼吸频率、自主呼吸频率、呼气潮气量、吸气潮气量、分钟呼吸量、平均压、呼吸未正压；The data acquisition module acquires the ventilator-monitor monitoring data set input by the user, and sends it to the data preprocessing module, the monitoring data set includes several characteristic data and alarm signals, and the several characteristics include peak pressure, Heart rate, respiratory rate, spontaneous breathing rate, expiratory tidal volume, inspiratory tidal volume, minute respiratory volume, mean pressure, positive breathing pressure;

所述报警信号标签信息类别识别器，为已训练的梯度提升决策树分类器，接收所述数据预处理模块发送的所述预处理后的监测数据集，识别输出所述预警信号标签信息的类别为真阳性报警信号或假阳性报警信号。The alarm signal label information category identifier is a trained gradient boosting decision tree classifier, which receives the preprocessed monitoring data set sent by the data preprocessing module, and identifies and outputs the category of the warning signal label information. It is a true positive alarm signal or a false positive alarm signal.