CN118800473A

Movatterモバイル変換

Info

Publication number: CN118800473A
Application number: CN202411269931.1A
Authority: CN
Inventors: 杨常顺
Original assignee: FUJIAN PROVINCIAL HOSPITAL
Current assignee: FUJIAN PROVINCIAL HOSPITAL
Priority date: 2024-09-11
Filing date: 2024-09-11
Publication date: 2024-10-18
Anticipated expiration: 2044-09-11
Also published as: CN118800473B

Abstract

Translated fromChinese

本发明公开了一种预测胃肠道肿瘤患者术后并发症的模型构建方法，具体涉及医疗数据处理技术领域，用于解决现有的在时间序列数据中特征提取中不能准确的对提取特征的准确性进行评估的问题；是通过整合和分析胃肠道肿瘤患者数据，通过标准化处理提高了后续分析的准确度，利用统计方法和时间序列分析技术，结合神经网络中的门控机制，增强了数据分析的深度，引入Bootstrap技术和变点检测方法，Bootstrap技术通过对数据集进行多次重采样，配合KL散度与JS散度分析，变点检测使得模型能够识别数据中的关键变化点，对特征的时间依赖性进行精确评估，这些技术的应用显著增强了模型在实际临床环境中的适应性和预测力。

The present invention discloses a model construction method for predicting postoperative complications of patients with gastrointestinal tumors, and specifically relates to the technical field of medical data processing, and is used to solve the problem that the accuracy of extracted features cannot be accurately evaluated in the existing feature extraction in time series data; the method integrates and analyzes the data of patients with gastrointestinal tumors, improves the accuracy of subsequent analysis through standardized processing, utilizes statistical methods and time series analysis technology, combines the gating mechanism in neural networks, enhances the depth of data analysis, introduces Bootstrap technology and change point detection methods, and Bootstrap technology resamples the data set multiple times, cooperates with KL divergence and JS divergence analysis, and change point detection enables the model to identify key change points in the data, and accurately evaluates the time dependence of the features. The application of these technologies significantly enhances the adaptability and predictive power of the model in actual clinical environments.

Description

Translated fromChinese

一种预测胃肠道肿瘤患者术后并发症的模型构建方法A model construction method for predicting postoperative complications in patients with gastrointestinal tumors

技术领域Technical Field

本发明涉及医疗数据处理技术领域，更具体地说，本发明涉及一种预测胃肠道肿瘤患者术后并发症的模型构建方法。The present invention relates to the technical field of medical data processing, and more specifically, to a model construction method for predicting postoperative complications in patients with gastrointestinal tumors.

背景技术Background Art

胃肠道肿瘤患者术后并发症预测模型的构建是指基于患者术前、术中和术后收集的临床数据，利用统计学方法、机器学习算法和时间序列分析技术，提取出与术后并发症相关的关键特征，并将这些特征输入到预测模型中，以预测患者术后可能出现的并发症。该模型旨在通过对大量患者数据的分析，识别出高风险患者，并为临床决策提供支持，从而提高患者的术后管理质量，减少并发症的发生率，改善患者的预后。例如在公开号为CN114822821A的发明一种预测胃肠道肿瘤患者术后并发症发生概率的列线图模型及构建方法中，通过对患者临床特征数据的分析选择,得到与术后并发症相关性显著的因素,进一步构建相应的列线图模型;提供2种列线图模型，分别用于预测胃肠道肿瘤患者术后7天内、30天内的并发症发生概率，以达到提早识别术后并发症高危人群并及早干预的目的;模型预测准确度和区分度高,简单、直观、易于临床推广应用，可辅助医生提早判断胃肠道肿瘤患者的术后并发症发生，同时利于临床医生对胃肠道肿瘤疾病进行个体化管理。The construction of a prediction model for postoperative complications in patients with gastrointestinal tumors refers to extracting key features related to postoperative complications based on clinical data collected from patients before, during and after surgery using statistical methods, machine learning algorithms and time series analysis techniques, and inputting these features into the prediction model to predict possible complications that may occur after surgery. The model aims to identify high-risk patients through the analysis of a large amount of patient data and provide support for clinical decision-making, thereby improving the quality of postoperative management of patients, reducing the incidence of complications and improving patient prognosis. For example, in the invention of a nomogram model and construction method for predicting the probability of postoperative complications in patients with gastrointestinal tumors with publication number CN114822821A, factors significantly correlated with postoperative complications are obtained through analysis and selection of patient clinical characteristic data, and the corresponding nomogram model is further constructed; two nomogram models are provided, which are respectively used to predict the probability of complications in patients with gastrointestinal tumors within 7 days and 30 days after surgery, so as to achieve the purpose of early identification of high-risk groups for postoperative complications and early intervention; the model has high prediction accuracy and discrimination, is simple, intuitive, and easy to promote and apply clinically, which can assist doctors in early judgment of the occurrence of postoperative complications in patients with gastrointestinal tumors, and at the same time facilitate clinicians to carry out individualized management of gastrointestinal tumor diseases.

在进行胃肠道肿瘤患者术后并发症预测模型的构建过程中，对胃肠道肿瘤患者的全面数据的时间序列数据的处理和特征提取是关键步骤。时间序列数据中蕴含了大量关于患者术后健康变化的信息，这些信息对预测并发症的发生具有重要意义。In the process of constructing a prediction model for postoperative complications in patients with gastrointestinal tumors, the processing and feature extraction of time series data of comprehensive data of patients with gastrointestinal tumors are key steps. Time series data contains a lot of information about the health changes of patients after surgery, which is of great significance for predicting the occurrence of complications.

在时间序列数据中特征提取中，若不能准确的对提取特征的准确性进行评估，可能会导致所选特征在模型中的表现不稳定，这种不稳定性会导致预测模型对特征的依赖性不合理，进而影响模型的整体预测性能和可靠性。此外，现有方法在面对数据的多样性和复杂性时，缺乏系统性的分析工具来有效筛选和验证这些关键时间依赖特征。这种不足不仅影响了特征的选择质量，也削弱了模型在实际应用中的鲁棒性和可靠性。In feature extraction from time series data, if the accuracy of the extracted features cannot be accurately evaluated, the performance of the selected features in the model may be unstable. This instability will lead to unreasonable dependence of the prediction model on the features, which in turn affects the overall prediction performance and reliability of the model. In addition, when faced with the diversity and complexity of data, existing methods lack systematic analysis tools to effectively screen and verify these key time-dependent features. This deficiency not only affects the quality of feature selection, but also weakens the robustness and reliability of the model in practical applications.

为了解决上述问题，现提供一种技术方案。In order to solve the above problems, a technical solution is now provided.

发明内容Summary of the invention

为了克服现有技术的上述缺陷，本发明的实施例提供一种预测胃肠道肿瘤患者术后并发症的模型构建方法以解决上述背景技术中提出的问题。In order to overcome the above-mentioned defects of the prior art, an embodiment of the present invention provides a method for constructing a model for predicting postoperative complications in patients with gastrointestinal tumors to solve the problems raised in the above-mentioned background technology.

为实现上述目的，本发明提供如下技术方案：To achieve the above object, the present invention provides the following technical solutions:

一种预测胃肠道肿瘤患者术后并发症的模型构建方法，包括如下步骤：A method for constructing a model to predict postoperative complications in patients with gastrointestinal tumors comprises the following steps:

S1：收集胃肠道肿瘤患者的全面数据，胃肠道肿瘤患者的全面数据包括术前数据、术中数据以及术后数据；S1: Collect comprehensive data of patients with gastrointestinal tumors, including preoperative data, intraoperative data, and postoperative data;

S2：对收集的胃肠道肿瘤患者的全面数据进行标准化处理，建立全面数据集；S2: Standardize the comprehensive data collected from patients with gastrointestinal tumors and establish a comprehensive data set;

S3：利用统计方法和时间序列分析技术从全面数据集中提取关键时间依赖特征；引入神经网络中的门控机制，动态调整不同时间点关键时间依赖特征的权重；S3: Extract key time-dependent features from comprehensive data sets using statistical methods and time series analysis techniques; introduce a gating mechanism in a neural network to dynamically adjust the weights of key time-dependent features at different time points;

S4：采用Bootstrap技术对全面数据集进行重采样，通过多次提取关键时间依赖特征，结合KL散度与JS散度，评估关键时间依赖特征的稳定性；S4: Bootstrap technology is used to resample the comprehensive data set, and the stability of key time-dependent features is evaluated by combining KL divergence and JS divergence by extracting key time-dependent features multiple times;

S5：引入变点检测方法来评估关键时间依赖特征的时间依赖性是否随时间发生显著改变；S5: Introduce change point detection method to evaluate whether the time dependence of key time-dependent features changes significantly over time;

S6：基于步骤S4和步骤S5综合判断提取的关键时间依赖特征可信度是否达标。S6: Based on step S4 and step S5, it is comprehensively determined whether the credibility of the extracted key time-dependent features meets the requirements.

在一个优选的实施方式中，对收集的胃肠道肿瘤患者的全面数据进行标准化处理，建立全面数据集，具体为：In a preferred embodiment, the collected comprehensive data of gastrointestinal tumor patients are standardized to establish a comprehensive data set, specifically:

将术前数据、术中数据以及术后数据整合到统一的数据库中；Integrate preoperative data, intraoperative data, and postoperative data into a unified database;

将所有数据转换为统一格式；对数值数据进行量纲统一处理；识别数据中的缺失值并进行填充；Convert all data into a unified format; unify the dimensions of numerical data; identify missing values in the data and fill them in;

对分类数据进行编码处理；对数值型数据进行归一化处理；Encode categorical data; normalize numerical data;

将处理后的数据编译成最终的全面数据集，用于后续的模型训练和分析。The processed data are compiled into a final comprehensive dataset for subsequent model training and analysis.

在一个优选的实施方式中，利用统计方法和时间序列分析技术从全面数据集中提取关键时间依赖特征；引入神经网络中的门控机制，动态调整不同时间点关键时间依赖特征的权重，具体包括：In a preferred embodiment, statistical methods and time series analysis techniques are used to extract key time-dependent features from a comprehensive data set; a gating mechanism in a neural network is introduced to dynamically adjust the weights of key time-dependent features at different time points, specifically including:

S301：应用自回归移动平均模型，从全面数据集中提取基础的时间依赖特征；S301: Apply an autoregressive moving average model to extract basic time-dependent features from a comprehensive dataset;

S302：运用傅里叶变换分析方法识别全面数据中的周期性和趋势性特征；S302: Use Fourier transform analysis to identify periodic and trend characteristics in comprehensive data;

S303：使用门控循环单元处理和优化时间序列数据的关键时间依赖特征：S303: Using gated recurrent units to process and optimize key time-dependent features of time series data:

门控循环单元的更新门表示为：；门控循环单元的重置门表示为：；其中，和分别为更新门和重置门，为上一个时刻的隐藏状态，为当前输入，和分别为更新门和重置门的权重矩阵；为Sigmoid激活函数；The update gate of the gated recurrent unit is expressed as: ; The reset gate of the gated recurrent unit is expressed as: ;in, and They are update gate and reset gate respectively. is the hidden state at the previous moment, is the current input, and They are the weight matrices of the update gate and the reset gate respectively; is the Sigmoid activation function;

使用Adam优化器对GRU模型进行训练；Use Adam optimizer to train the GRU model;

S304：在神经网络中集成注意力机制，根据模型在训练过程中的表现动态调整各时间点关键时间依赖特征的权重：S304: Integrate the attention mechanism in the neural network to dynamically adjust the weights of key time-dependent features at each time point based on the performance of the model during training:

根据每个时间步的重要性分配权重，具体公式为：；其中，用于计算输入序列中每个元素的加权和，为查询矩阵，为键矩阵，为值矩阵，为Softmax激活函数，为查询矩阵与键矩阵的转置矩阵的点积，为键矩阵的维度。Assign weights according to the importance of each time step. The specific formula is: ;in, It is used to calculate the weighted sum of each element in the input sequence. is the query matrix, is the key matrix, is the value matrix, is the Softmax activation function, The query matrix With key matrix The dot product of the transposed matrix of is the key matrix Dimension.

在一个优选的实施方式中，采用Bootstrap技术对全面数据集进行重采样，通过多次提取关键时间依赖特征，结合KL散度与JS散度，评估关键时间依赖特征的稳定性，具体包括：In a preferred embodiment, the Bootstrap technique is used to resample the comprehensive data set, and the stability of the key time-dependent features is evaluated by extracting the key time-dependent features multiple times and combining the KL divergence and the JS divergence, which specifically includes:

S401：应用Bootstrap技术对全面数据集进行多次重采样：确定重采样的次数，对于每一次重采样，通过从原始全面数据集中随机抽取多个数据点生成样本集；S401: Apply Bootstrap technology to resample the comprehensive data set multiple times: determine the number of resampling, and for each resampling, generate a sample set by randomly selecting multiple data points from the original comprehensive data set;

S402：在每个重采样样本集中提取关键时间依赖特征：在每个重采样样本集中应用预先定义的统计方法和时间序列分析技术提取关键时间依赖特征；S402: extracting key time-dependent features in each resampled sample set: applying predefined statistical methods and time series analysis techniques to extract key time-dependent features in each resampled sample set;

S403：对每次重采样后提取的特征进行概率分布估计，建立每次重采样的特征分布：S403: Estimating the probability distribution of the features extracted after each resampling, and establishing the feature distribution of each resampling:

对提取的关键时间依赖特征进行概率分布估计的常用方法包括使用核密度估计和直方图来估计特征的概率分布；Common methods for estimating the probability distribution of extracted key time-dependent features include using kernel density estimation and histograms to estimate the probability distribution of features;

通过对每个重采样样本中的特征进行概率分布估计，构建每个重采样样本集的特征分布；By estimating the probability distribution of the features in each resampled sample, the feature distribution of each resampled sample set is constructed;

S404：使用KL散度计算每个重采样样本与原始数据特征分布的差异，衡量特征一致性；通过JS散度评估重采样样本与原始数据特征分布的相似性，测度特征的差异性；S404: Use KL divergence to calculate the difference between the feature distribution of each resampled sample and the original data to measure the feature consistency; use JS divergence to evaluate the similarity between the feature distribution of the resampled sample and the original data to measure the difference of the features;

S405：结合KL散度和JS散度，计算动态Bootstrap一致性和差异性测度来评估关键时间依赖特征的稳定性。S405: Combine KL divergence and JS divergence to calculate dynamic Bootstrap consistency and difference measures to evaluate the stability of key time-dependent features.

在一个优选的实施方式中，步骤S404具体为：In a preferred embodiment, step S404 is specifically as follows:

使用KL散度计算每个重采样样本与原始数据特征分布的差异，公式为：；Use KL divergence to calculate the difference between each resampled sample and the original data feature distribution. The formula is: ;

为KL散度，为原始数据中特征的概率分布值，为第次重采样中对应特征的概率分布值，为关键时间依赖特征的第个取值； is the KL divergence, The features in the original data The probability distribution value of For the The corresponding features in the resampling The probability distribution value of Key time-dependent features No. A value;

通过JS散度评估重采样样本与原始数据特征分布的相似性，公式为：；其中，；The JS divergence is used to evaluate the similarity between the resampled samples and the original data feature distribution. The formula is: ;in, ;

为JS散度，为原始数据特征分布与均值分布之间的KL散度，为第次重采样的特征分布与均值分布之间的KL散度，为原始数据特征分布，为第次重采样特征分布。 is the JS divergence, is the original data feature distribution Distribution with mean The KL divergence between For the The resampled feature distribution Distribution with mean The KL divergence between is the original data characteristic distribution, For the Resample feature distributions.

在一个优选的实施方式中，步骤S405具体为：通过综合考虑KL散度和JS散度，计算关键时间依赖特征的动态Bootstrap一致性和差异性测度；公式为：；其中，为特征的稳定性测度，是重采样的总次数。In a preferred embodiment, step S405 is specifically: by comprehensively considering KL divergence and JS divergence, the dynamic Bootstrap consistency and difference measure of the key time-dependent features is calculated; the formula is: ;in, is the stability measure of the feature, is the total number of resampling times.

在一个优选的实施方式中，引入变点检测方法来评估关键时间依赖特征的时间依赖性是否随时间发生显著改变，具体包括：In a preferred embodiment, a change point detection method is introduced to evaluate whether the time dependency of key time-dependent features changes significantly over time, specifically including:

S501：将关键时间依赖特征在时间维度上构建为时间序列；S501: construct key time-dependent features as time series in the time dimension;

S502：选用贝叶斯变点检测识别时间序列中的潜在变点：定义贝叶斯模型的先验分布，模型使用事后概率评估每个时间点作为变点的可能性；通过动态规划算法计算贝叶斯后验概率，识别时间序列中的潜在变点；S502: Using Bayesian change point detection to identify potential change points in the time series: defining the prior distribution of the Bayesian model, the model uses the posterior probability to evaluate the possibility of each time point as a change point; calculating the Bayesian posterior probability through a dynamic programming algorithm to identify potential change points in the time series;

S503：对时间序列进行窗口划分，通过滑动窗口计算检测特征在不同时间段内的稳定性：设定窗口大小，将时间序列按窗口划分为多个子序列；在每个窗口内，计算关键时间依赖特征的局部均值和方差以评估特征在不同时间段内的稳定性；S503: Divide the time series into windows and calculate the stability of the detection features in different time periods through sliding windows: set the window size and divide the time series into multiple subsequences according to the window; in each window, calculate the local mean and variance of the key time-dependent features to evaluate the stability of the features in different time periods;

S504：对检测到的潜在变点进行显著性测试，评估变点是否在统计意义上显著；S504: Perform a significance test on the detected potential change points to evaluate whether the change points are statistically significant;

S505：结合变点检测结果，评估关键时间依赖特征的时间依赖性是否发生显著改变：将通过显著性测试验证的变点信息整合到时间序列分析结果中；评估在检测到的变点处，关键时间依赖特征的时间依赖性是否发生显著改变；根据变点检测和时间依赖性评估的结果，判断关键时间依赖特征的时间依赖性是否在整个时间序列中保持稳定。S505: In combination with the change point detection results, evaluate whether the time dependence of the key time-dependent features has changed significantly: integrate the change point information verified by the significance test into the time series analysis results; evaluate whether the time dependence of the key time-dependent features has changed significantly at the detected change points; based on the results of the change point detection and time dependence evaluation, determine whether the time dependence of the key time-dependent features remains stable throughout the time series.

在一个优选的实施方式中，基于步骤S4和步骤S5综合判断提取的关键时间依赖特征可信度是否达标，具体为：In a preferred embodiment, based on step S4 and step S5, it is comprehensively judged whether the credibility of the extracted key time-dependent features meets the standard, specifically:

获取关键时间依赖特征的时间依赖性是否随时间发生显著改变的判断结果；Obtaining the judgment result of whether the time dependency of the key time-dependent feature changes significantly over time;

设定特征的稳定性测度对应的特征稳定评估阈值，将特征的稳定性测度与特征稳定评估阈值进行比较：Set the feature stability assessment threshold corresponding to the feature stability measure, and compare the feature stability measure with the feature stability assessment threshold:

当特征的稳定性测度小于等于特征稳定评估阈值时，则判定关键时间依赖特征的稳定性正常；当特征的稳定性测度大于特征稳定评估阈值时，则判定关键时间依赖特征的稳定性差；When the stability measure of the feature is less than or equal to the feature stability assessment threshold, the stability of the key time dependent feature is judged to be normal; when the stability measure of the feature is greater than the feature stability assessment threshold, the stability of the key time dependent feature is judged to be poor;

当关键时间依赖特征的时间依赖性未随时间发生显著改变，且关键时间依赖特征的稳定性正常时，则判定提取的关键时间依赖特征可信度达标；否则，则判定提取的关键时间依赖特征可信度不达标。When the time dependency of the key time dependent feature does not change significantly over time and the stability of the key time dependent feature is normal, the credibility of the extracted key time dependent feature is determined to be up to standard; otherwise, the credibility of the extracted key time dependent feature is determined to be not up to standard.

本发明一种预测胃肠道肿瘤患者术后并发症的模型构建方法的技术效果和优点：Technical effects and advantages of a model construction method for predicting postoperative complications in patients with gastrointestinal tumors of the present invention:

1、通过整合和分析胃肠道肿瘤患者的术前、术中及术后数据，提供了准确预测术后并发症的坚实基础，通过标准化处理，确保了数据的一致性和可比性，提高了后续分析的准确度和效率，利用统计方法和时间序列分析技术，结合神经网络中的门控机制，增强了数据分析的深度，提升了模型对时间变化的敏感性和适应性，这些处理确保从复杂医疗数据中提取的关键特征能够真实反映患者的健康状态和术后风险，从而大幅提升预测模型的准确性和实用性。1. By integrating and analyzing the preoperative, intraoperative and postoperative data of patients with gastrointestinal tumors, a solid foundation is provided for the accurate prediction of postoperative complications. Through standardized processing, the consistency and comparability of the data are ensured, and the accuracy and efficiency of subsequent analysis are improved. The use of statistical methods and time series analysis techniques, combined with the gating mechanism in the neural network, enhances the depth of data analysis and improves the sensitivity and adaptability of the model to time changes. These processes ensure that the key features extracted from complex medical data can truly reflect the patient's health status and postoperative risks, thereby greatly improving the accuracy and practicality of the prediction model.

2、通过引入Bootstrap技术和变点检测方法，进一步增强了对关键时间依赖特征的分析精度和可信度，Bootstrap技术通过对数据集进行多次重采样，配合KL散度与JS散度分析，提供了一种量化特征稳定性的科学方法，这种方法有效识别出在不同条件下保持一致性的特征，从而筛选出更为可靠的预测指标，变点检测使得模型能够识别数据中的关键变化点，对特征的时间依赖性进行精确评估，这些技术的应用显著增强了模型在实际临床环境中的适应性和预测力，极大地帮助医疗专业人员进行术后风险管理和治疗决策，旨在降低患者的并发症风险并改善其术后恢复情况。2. By introducing Bootstrap technology and change point detection methods, the analysis accuracy and credibility of key time-dependent features are further enhanced. Bootstrap technology provides a scientific method to quantify feature stability by resampling the data set multiple times, combined with KL divergence and JS divergence analysis. This method effectively identifies features that remain consistent under different conditions, thereby screening out more reliable predictive indicators. Change point detection enables the model to identify key change points in the data and accurately evaluate the time dependency of features. The application of these technologies significantly enhances the adaptability and predictive power of the model in actual clinical environments, greatly helping medical professionals in postoperative risk management and treatment decisions, aiming to reduce patients' risk of complications and improve their postoperative recovery.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明一种预测胃肠道肿瘤患者术后并发症的模型构建方法示意图。FIG1 is a schematic diagram of a method for constructing a model for predicting postoperative complications in patients with gastrointestinal tumors according to the present invention.

具体实施方式DETAILED DESCRIPTION

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整的描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will be combined with the drawings in the embodiments of the present invention to clearly and completely describe the technical solutions in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.

实施例Example

图1给出了本发明一种预测胃肠道肿瘤患者术后并发症的模型构建方法，其包括如下步骤：FIG1 shows a model construction method for predicting postoperative complications in patients with gastrointestinal tumors according to the present invention, which comprises the following steps:

S1：收集胃肠道肿瘤患者的全面数据，胃肠道肿瘤患者的全面数据包括术前数据、术中数据以及术后数据。S1: Collect comprehensive data of patients with gastrointestinal tumors, including preoperative data, intraoperative data, and postoperative data.

S2：对收集的胃肠道肿瘤患者的全面数据进行标准化处理，建立全面数据集。S2: Standardize the comprehensive data collected from patients with gastrointestinal tumors and establish a comprehensive dataset.

S3：利用统计方法和时间序列分析技术从全面数据集中提取关键时间依赖特征；引入神经网络中的门控机制，动态调整不同时间点关键时间依赖特征的权重。S3: Use statistical methods and time series analysis techniques to extract key time-dependent features from a comprehensive dataset; introduce a gating mechanism in a neural network to dynamically adjust the weights of key time-dependent features at different time points.

S4：采用Bootstrap技术对全面数据集进行重采样，通过多次提取关键时间依赖特征，结合KL散度与JS散度，评估关键时间依赖特征的稳定性。S4: Bootstrap technology is used to resample the comprehensive data set, and the key time-dependent features are extracted multiple times. The stability of the key time-dependent features is evaluated by combining KL divergence and JS divergence.

S5：引入变点检测方法来评估关键时间依赖特征的时间依赖性是否随时间发生显著改变。S5: A change point detection method is introduced to evaluate whether the time dependence of key time-dependent features changes significantly over time.

收集胃肠道肿瘤患者的全面数据，胃肠道肿瘤患者的全面数据包括术前数据、术中数据以及术后数据，具体包括：Collect comprehensive data of patients with gastrointestinal tumors, including preoperative data, intraoperative data, and postoperative data, including:

术前数据收集包括：Preoperative data collection includes:

基本信息：包括患者的年龄、性别、体重、身高、种族和家族病史。Basic information: including the patient's age, gender, weight, height, race, and family medical history.

病历信息：包括既往病史、药物使用记录、过敏史等。Medical history information: including past medical history, medication use records, allergy history, etc.

实验室测试结果：血液测试、生化指标、肿瘤标志物等。Laboratory test results: blood tests, biochemical indicators, tumor markers, etc.

影像学资料：如CT、MRI、X光和超声检查结果，特别是胃肠道区域的影像数据。Imaging data: such as CT, MRI, X-ray and ultrasound examination results, especially imaging data of the gastrointestinal area.

病理报告：肿瘤的组织学类型、分化程度、TNM分期等。Pathology report: tumor histological type, degree of differentiation, TNM staging, etc.

术前数据获取方式包括：Preoperative data acquisition methods include:

电子病历系统：从医院的电子病历系统中提取病史和基本信息数据。Electronic medical record system: extract medical history and basic information data from the hospital's electronic medical record system.

实验室信息系统：从实验室信息系统中获取各种生化指标和肿瘤标志物的测试结果。Laboratory Information System: Obtain test results of various biochemical indicators and tumor markers from the laboratory information system.

影像存档和通信系统（PACS）：从PACS系统中导出与肿瘤相关的影像数据。Picture Archiving and Communication System (PACS): Export tumor-related imaging data from the PACS system.

术中数据收集包括：Intraoperative data collection includes:

手术详情：包括手术日期、手术持续时间、手术类型（如切除术、吻合术等）。Surgery details: including surgery date, surgery duration, surgery type (e.g., resection, anastomosis, etc.).

麻醉信息：麻醉类型、用药情况、麻醉师评估报告。Anesthesia information: type of anesthesia, medication, anesthesiologist’s evaluation report.

监测数据：术中心率、血压、体温等生命体征的监测数据。Monitoring data: monitoring data of vital signs such as heart rate, blood pressure, and body temperature during surgery.

手术记录：手术过程中的关键事件和决策，如出血量、液体输入输出等。Surgical records: key events and decisions during the operation, such as blood loss, fluid input and output, etc.

术中数据获取方式包括：Intraoperative data acquisition methods include:

手术室记录系统：通过手术室记录系统获取手术详情和手术记录。Operating Room Recording System: Access surgical details and surgical records through the operating room recording system.

麻醉信息管理系统：从麻醉信息管理系统中提取麻醉相关数据。Anesthesia Information Management System: Extract anesthesia related data from the Anesthesia Information Management System.

生理监测设备：从连接的生理监测设备直接导出术中监测数据。Physiological Monitoring Devices: Directly export intraoperative monitoring data from connected physiological monitoring devices.

术后数据收集包括：Postoperative data collection included:

恢复情况：包括术后恢复室停留时间、重症监护单元（ICU）转入情况。Recovery status: including the length of stay in the postoperative recovery room and transfer to the intensive care unit (ICU).

术后并发症：记录所有已知并发症的发生情况，如感染、出血、肠梗阻等。Postoperative complications: Record the occurrence of all known complications, such as infection, bleeding, intestinal obstruction, etc.

随访数据：随访期间的复查结果，包括实验室测试、影像学复查等。Follow-up data: Review results during the follow-up period, including laboratory tests, imaging reviews, etc.

生命体征：包括术后体温、心率、血压等监测数据。Vital signs: including postoperative body temperature, heart rate, blood pressure and other monitoring data.

术后数据获取方式包括：Postoperative data acquisition methods include:

病房管理系统：从病房管理系统获取术后恢复情况和生命体征数据。Ward Management System: Obtain postoperative recovery status and vital signs data from the ward management system.

随访记录：通过定期随访获取的数据，包括复查的实验室和影像学结果。Follow-up records: Data obtained through regular follow-up, including repeated laboratory and imaging results.

通过以上方法，可以确保收集到的数据全面、详尽，覆盖患者治疗过程中的各个阶段，从而为构建预测模型提供强有力的数据支持。Through the above methods, it can be ensured that the collected data is comprehensive and detailed, covering all stages of the patient's treatment process, thereby providing strong data support for building a predictive model.

对收集的胃肠道肿瘤患者的全面数据进行标准化处理，建立全面数据集，具体为：The comprehensive data collected from patients with gastrointestinal tumors were standardized to establish a comprehensive data set, specifically:

将术前数据、术中数据以及术后数据整合到一个统一的数据库中，确保数据来源的一致性。Integrate preoperative data, intraoperative data, and postoperative data into a unified database to ensure consistency of data sources.

将所有数据转换为统一格式，例如将所有日期格式统一为YYYY-MM-DD：对于日期和时间数据，设置转换规则，将所有输入转换为国际标准日期格式ISO 8601（YYYY-MM-DD），以消除区域差异。Convert all data to a uniform format, such as unifying all date formats to YYYY-MM-DD: For date and time data, set conversion rules to convert all inputs to the international standard date format ISO 8601 (YYYY-MM-DD) to eliminate regional differences.

对数值数据进行量纲统一处理，如将体重从磅转换为千克：在数据库导入过程中，集成一个量纲转换模块，该模块自动识别数据单位，并根据预设的转换公式（如1磅=0.453592千克）进行转换。提供手动校正功能，允许数据管理员在发现自动转换错误时进行修正。Unify the dimensions of numerical data, such as converting weight from pounds to kilograms: During the database import process, a dimension conversion module is integrated, which automatically identifies the data unit and converts it according to the preset conversion formula (such as 1 pound = 0.453592 kilograms). A manual correction function is provided to allow data administrators to correct automatic conversion errors when they are found.

识别数据中的缺失值，采用插值或平均值方法进行填充：利用高级统计插补方法，如多重插补（Multiple Imputation）或K-最近邻（K-NN）算法，自动识别并填补缺失值。设置异常值检测算法，如基于标准差的Z-score方法或四分位数的IQR方法，自动标识并修正数据中的异常值。Identify missing values in data and fill them with interpolation or mean values: Use advanced statistical interpolation methods, such as Multiple Imputation or K-Nearest Neighbor (K-NN) algorithms, to automatically identify and fill missing values. Set up outlier detection algorithms, such as the Z-score method based on standard deviation or the IQR method based on quartiles, to automatically identify and correct outliers in the data.

对分类数据进行编码处理，如使用独热编码转换性别和手术类型：使用机器学习库，如Python的Pandas或Scikit-learn，实施独热编码（One-Hot Encoding）处理分类数据。对性别、手术类型等分类变量应用独热编码，转换成数值格式以便于算法处理。Encode categorical data, such as using one-hot encoding to convert gender and surgery type: Use machine learning libraries such as Python's Pandas or Scikit-learn to implement one-hot encoding to process categorical data. Apply one-hot encoding to categorical variables such as gender and surgery type to convert them into numerical format for algorithm processing.

对数值型数据进行归一化处理，使用Min-Max或Z-score标准化方法：采用Min-Max归一化方法将数值型数据缩放到[0, 1]区间，或使用Z-score方法进行标准化处理，数据转换为具有零均值和单位方差的分布。设定定期审查机制，确保归一化过程不影响数据的统计特性。Normalize numerical data using Min-Max or Z-score normalization methods: Use Min-Max normalization to scale numerical data to the [0, 1] interval, or use Z-score normalization to convert the data into a distribution with zero mean and unit variance. Set up a regular review mechanism to ensure that the normalization process does not affect the statistical characteristics of the data.

关键时间依赖特征是指那些随时间变化而展示出显著相关性或依赖性的特征，它们对于理解和预测时间序列数据中的模式和趋势至关重要。在数据分析和模型构建过程中，这些特征可以帮助我们识别数据中的重要时间相关动态，例如趋势、周期性变化、季节性模式等。例如，在预测胃肠道肿瘤患者术后并发症的模型中，关键时间依赖特征可能包括术后恢复期间的生理参数变化（如心率、血压的变化模式），或者是实验室测试结果（如白细胞计数、血红蛋白水平）的时间序列变化。这些特征通过展示出与并发症发展的时间相关性，成为模型中的关键指标，对于预测和治疗决策提供支持。Key time-dependent features are those that show significant correlation or dependence over time, and they are essential for understanding and predicting patterns and trends in time series data. During data analysis and model building, these features can help us identify important time-related dynamics in the data, such as trends, cyclical changes, seasonal patterns, etc. For example, in a model for predicting postoperative complications in patients with gastrointestinal tumors, key time-dependent features may include changes in physiological parameters during postoperative recovery (such as patterns of changes in heart rate and blood pressure), or time series changes in laboratory test results (such as white blood cell counts and hemoglobin levels). These features become key indicators in the model by showing temporal correlation with the development of complications, providing support for prediction and treatment decisions.

利用统计方法和时间序列分析技术从全面数据集中提取关键时间依赖特征；引入神经网络中的门控机制，动态调整不同时间点关键时间依赖特征的权重，具体包括：Statistical methods and time series analysis techniques are used to extract key time-dependent features from comprehensive data sets. The gating mechanism in the neural network is introduced to dynamically adjust the weights of key time-dependent features at different time points, including:

S301：应用自回归移动平均模型，从全面数据集中提取基础的时间依赖特征：S301: Apply an autoregressive moving average model to extract basic time-dependent features from a comprehensive dataset:

选择自回归移动平均模型（ARMA），该模型由自回归（AR）部分和移动平均（MA）部分组成，用于分析和预测时间序列数据中的线性依赖关系。Select the Autoregressive Moving Average model (ARMA), which consists of an autoregressive (AR) part and a moving average (MA) part and is used to analyze and predict linear dependencies in time series data.

通过分析自相关函数（ACF）和偏自相关函数（PACF）图来选择AR部分的阶数和MA部分的阶数。具体而言，ACF用于识别MA成分，PACF用于识别AR成分。阶数的选择标准是ACF或PACF在某一滞后阶数处显著下降。The order of the AR part and the order of the MA part are selected by analyzing the autocorrelation function (ACF) and partial autocorrelation function (PACF) plots. Specifically, ACF is used to identify the MA component and PACF is used to identify the AR component. The order selection criterion is that the ACF or PACF drops significantly at a certain lag order.

为了确保时间序列的平稳性，对原始数据进行差分操作。差分的阶数是通过检验序列的单位根（如Dickey-Fuller检验）确定的，确保消除数据中的趋势成分。In order to ensure the stationarity of the time series, the original data is differentiated. The order of the difference is determined by testing the unit root of the series (such as Dickey-Fuller test) to ensure that the trend component in the data is eliminated.

在选择好AR部分的阶数和MA部分的阶数后，使用最大似然估计（MLE）对自回归移动平均模型进行拟合，求解自回归部分的系数和移动平均部分的系数的最优值。After selecting the order of the AR part and the order of the MA part, the maximum likelihood estimation (MLE) is used to fit the autoregressive moving average model to solve the optimal values of the coefficients of the autoregressive part and the moving average part.

通过检验模型残差的自相关性，确保残差是白噪声，验证模型的有效性。如果残差表现出显著的自相关性，说明模型可能不足以捕捉数据中的依赖关系，需要调整AR部分的阶数和MA部分的阶数。By checking the autocorrelation of the model residuals, we can ensure that the residuals are white noise and verify the validity of the model. If the residuals show significant autocorrelation, it means that the model may not be sufficient to capture the dependencies in the data, and the order of the AR part and the order of the MA part need to be adjusted.

S302：运用傅里叶变换分析方法识别全面数据中的周期性和趋势性特征：S302: Use Fourier transform analysis to identify periodic and trend characteristics in comprehensive data:

傅里叶变换用于将时间域信号转换为频域信号。Fourier transform is used to convert time domain signals into frequency domain signals.

对差分后的数据进行加窗处理（如汉明窗），以减少频谱泄漏效应。窗函数的选择旨在平衡频率分辨率与旁瓣抑制。The differenced data is windowed (e.g., Hamming window) to reduce the spectrum leakage effect. The choice of window function aims to balance frequency resolution and sidelobe suppression.

在频谱图中，周期性特征表现为频率域的显著峰值。通过识别这些峰值并计算对应的周期，确定数据中的显著周期性特征。频谱中的低频成分通常反映长期趋势，通过分析低频成分，可以提取出时间序列中的趋势性特征。对提取的周期性和趋势性特征进行逆变换，恢复到时间域，并与原始数据进行对比，确保提取的特征与实际数据中的模式一致。In the spectrum graph, periodic features appear as significant peaks in the frequency domain. By identifying these peaks and calculating the corresponding periods, significant periodic features in the data are determined. Low-frequency components in the spectrum usually reflect long-term trends. By analyzing low-frequency components, trend features in the time series can be extracted. The extracted periodic and trend features are inversely transformed, restored to the time domain, and compared with the original data to ensure that the extracted features are consistent with the patterns in the actual data.

门控循环单元（GRU）是一种改进的递归神经网络（RNN），其主要优点在于通过门控机制管理信息的传递和记忆，避免了长序列训练中的梯度消失问题。门控循环单元的更新门表示为：；门控循环单元的重置门表示为：；其中，和分别为更新门和重置门，为上一个时刻的隐藏状态，为当前输入，和分别为更新门和重置门的权重矩阵；为Sigmoid激活函数，输出值在0到1之间，表示更新门的激活程度。The Gated Recurrent Unit (GRU) is an improved recurrent neural network (RNN). Its main advantage is that it manages the transmission and memory of information through a gating mechanism, avoiding the gradient vanishing problem in long sequence training. The update gate of the GRU is expressed as: ; The reset gate of the gated recurrent unit is expressed as: ;in, and They are update gate and reset gate respectively. is the hidden state at the previous moment, is the current input, and They are the weight matrices of the update gate and the reset gate respectively; It is a Sigmoid activation function, and the output value is between 0 and 1, indicating the activation degree of the update gate.

将从ARMA和傅里叶变换中提取的时间依赖特征作为GRU模型的输入。数据集分为训练集和验证集，其中训练集用于模型的权重调整，验证集用于评估模型的泛化能力。The time-dependent features extracted from ARMA and Fourier transform are used as the input of the GRU model. The dataset is divided into a training set and a validation set, where the training set is used for model weight adjustment and the validation set is used to evaluate the generalization ability of the model.

使用Adam优化器对GRU模型进行训练。Adam优化器能够自适应调整学习率，公式为：；The GRU model is trained using the Adam optimizer. The Adam optimizer can adaptively adjust the learning rate using the formula: ;

其中，为学习率，表示每次参数更新的步长，通常是一个预设的超参数，用于控制更新的幅度。in, The learning rate indicates the step size of each parameter update, which is usually a preset hyperparameter used to control the magnitude of the update.

是一阶矩估计的偏差修正值，表示梯度的动量项（即梯度的指数加权移动平均）。 is the bias correction of the first-order moment estimate and represents the momentum term of the gradient (i.e., the exponentially weighted moving average of the gradient).

是二阶矩估计的偏差修正值，表示梯度平方的动量项（即梯度平方的指数加权移动平均）。 is the bias correction of the second-order moment estimate and represents the momentum term of the squared gradient (i.e., the exponentially weighted moving average of the squared gradient).

为平滑项，通常取一个非常小的值（如），用于防止除零错误，确保数值稳定性。 is a smoothing term, usually taking a very small value (such as ), which is used to prevent division by zero errors and ensure numerical stability.

是模型在第次迭代时的参数向量，表示当前的模型参数值。 The model is The parameter vector at the iteration represents the current model parameter values.

是模型在第次迭代后的参数向量，表示更新后的模型参数值。 The model is The parameter vector after iterations represents the updated model parameter values.

在验证集上进行交叉验证，评估模型的性能，重点关注模型在时间序列数据上捕捉依赖关系的能力。通过调整GRU的层数和隐藏单元数，优化模型的表现。Cross-validation is performed on the validation set to evaluate the performance of the model, focusing on the model's ability to capture dependencies on time series data. The performance of the model is optimized by adjusting the number of GRU layers and hidden units.

注意力机制允许模型在处理序列数据时，根据每个时间步的重要性分配权重，具体公式为：；其中，The attention mechanism allows the model to assign weights according to the importance of each time step when processing sequence data. The specific formula is: ;in,

是注意力机制中的一个关键操作，用于计算输入序列中每个元素的加权和。 It is a key operation in the attention mechanism, which is used to calculate the weighted sum of each element in the input sequence.

：查询矩阵（Query），表示要查询的当前时刻的隐藏状态或特征。 : Query matrix (Query), which represents the hidden state or feature at the current moment to be queried.

：键矩阵（Key），表示所有时间点的隐藏状态或特征，供查询矩阵进行匹配。 : Key matrix (Key), representing the hidden state or features at all time points, for query matrix to make a match.

：值矩阵（Value），表示与键矩阵对应的值矩阵，用于计算注意力输出。 : Value matrix (Value), representing the key matrix The corresponding value matrix is used to calculate the attention output.

：Softmax激活函数，将匹配得分转换为概率分布，用于确定各时间点的权重。 : Softmax activation function, which converts the matching score into a probability distribution and is used to determine the weight of each time point.

：查询矩阵与键矩阵的转置矩阵的点积，表示查询与键之间的相似性得分。 : Query matrix With key matrix The dot product of the transposed matrix of , representing the similarity score between the query and the key.

：键矩阵的维度，用于归一化点积结果，防止因维度过大而导致梯度消失。 : Key matrix The dimension is used to normalize the dot product result to prevent the gradient from disappearing due to excessive dimension.

通过训练过程中损失函数的变化，实时调整注意力机制中的权重参数，使模型能够聚焦于更为关键的时间步。By changing the loss function during training, the weight parameters in the attention mechanism are adjusted in real time, allowing the model to focus on more critical time steps.

将注意力机制集成到GRU网络中，使其能够在处理关键时间依赖特征时动态调整各时间点的权重。训练过程中，使用交叉熵损失函数评估模型在不同时间步上的表现，确保注意力机制的有效性。The attention mechanism is integrated into the GRU network, enabling it to dynamically adjust the weights of each time point when processing key time-dependent features. During the training process, the cross entropy loss function is used to evaluate the performance of the model at different time steps to ensure the effectiveness of the attention mechanism.

根据交叉验证的结果，进一步调整注意力机制的超参数（如注意力头数和维度），以最大化模型的预测性能。通过对比模型在不同权重配置下的表现，验证注意力机制的引入是否提高了模型的时间依赖特征处理能力，确保模型在关键时间点上的预测精度。According to the results of cross-validation, the hyperparameters of the attention mechanism (such as the number of attention heads and dimensions) are further adjusted to maximize the prediction performance of the model. By comparing the performance of the model under different weight configurations, it is verified whether the introduction of the attention mechanism improves the model's ability to process time-dependent features and ensures the prediction accuracy of the model at key time points.

采用Bootstrap技术对全面数据集进行重采样，通过多次提取关键时间依赖特征，结合KL散度与JS散度，评估关键时间依赖特征的稳定性，具体包括：The Bootstrap technique is used to resample the comprehensive data set. The key time-dependent features are extracted multiple times, and the stability of the key time-dependent features is evaluated by combining KL divergence and JS divergence, including:

S401：应用Bootstrap技术对全面数据集进行多次重采样，生成不同的样本集以捕捉数据中的变异性：S401: Apply Bootstrap technology to resample the comprehensive data set multiple times to generate different sample sets to capture the variability in the data:

Bootstrap是一种统计重采样技术，用于通过对原始数据集的多次随机抽样生成新的样本集，以估计数据特征的分布及其变异性。每次重采样是从原始数据集中随机抽取相同数量的数据点，允许重复抽样，因此每个新生成的样本集与原始数据集有所不同，但保持相同的数据规模。Bootstrap is a statistical resampling technique used to generate new sample sets by multiple random sampling of the original data set to estimate the distribution and variability of data features. Each resampling randomly extracts the same number of data points from the original data set, allowing repeated sampling, so each newly generated sample set is different from the original data set, but maintains the same data size.

在实施中，首先确定重采样的次数（通常为数百到数千次）。对于每一次重采样，生成的样本集是通过从原始全面数据集中随机抽取多个数据点而得。In implementation, the number of resampling is first determined (usually hundreds to thousands). For each resampling, the generated sample set is obtained by randomly sampling multiple data points from the original comprehensive data set.

通过重复这个过程，生成样本集，每个样本集都代表了原始数据的一个变体。这个过程的目的是捕捉数据中的变异性，从而能够在后续步骤中评估特征的稳定性。By repeating this process, we generate sets of samples, each of which represents a variation of the original data. The purpose of this process is to capture the variability in the data so that the stability of the features can be assessed in subsequent steps.

S402：在每个重采样样本集中提取关键时间依赖特征，确保这些特征能够反映数据的时间相关性：S402: Extract key time-dependent features from each resampled sample set to ensure that these features can reflect the time correlation of the data:

在每个重采样样本集中，应用预先定义的统计方法和时间序列分析技术（如ARMA模型、傅里叶变换等）来提取这些关键时间依赖特征。例如，可以在每个样本集中使用自回归模型来识别重要的滞后效应，或通过傅里叶变换识别周期性特征。In each resampled sample set, predefined statistical methods and time series analysis techniques (such as ARMA model, Fourier transform, etc.) are applied to extract these key time-dependent features. For example, an autoregressive model can be used in each sample set to identify important lag effects, or Fourier transform can be used to identify periodic features.

确保提取的特征能够反映时间相关性，这通过检验特征在不同时刻的表现及其与时间序列整体模式的符合程度来验证。通过这一过程，确保所提取的特征在不同的重采样样本集中依然保有其时间依赖性。Ensure that the extracted features can reflect the time correlation, which is verified by examining the performance of the features at different times and their conformity with the overall pattern of the time series. Through this process, it is ensured that the extracted features still retain their time dependence in different resampled sample sets.

在每次重采样后，对提取的关键时间依赖特征进行概率分布估计。常用的方法包括使用核密度估计（KDE）和直方图来估计特征的概率分布。核密度估计是一种非参数方法，能够平滑地估计数据的概率分布，而不需要假设数据的具体分布形式。After each resampling, the probability distribution of the extracted key time-dependent features is estimated. Common methods include using kernel density estimation (KDE) and histograms to estimate the probability distribution of features. Kernel density estimation is a non-parametric method that can smoothly estimate the probability distribution of data without assuming the specific distribution form of the data.

通过对每个重采样样本中的特征进行概率分布估计，构建每个重采样样本集的特征分布。这些分布函数描述了特征在各重采样样本集中的可能值及其概率，从而为后续的特征稳定性评估提供了基础。By estimating the probability distribution of the features in each resampled sample, the feature distribution of each resampled sample set is constructed. These distribution functions describe the possible values and probabilities of the features in each resampled sample set, thus providing a basis for the subsequent feature stability evaluation.

S404：使用KL散度计算每个重采样样本与原始数据特征分布的差异，衡量特征一致性。通过JS散度评估重采样样本与原始数据特征分布的相似性，测度特征的差异性：S404: Use KL divergence to calculate the difference between the feature distribution of each resampled sample and the original data to measure the consistency of the features. Use JS divergence to evaluate the similarity between the feature distribution of the resampled sample and the original data and measure the difference of the features:

Kullback-Leibler散度（KL散度）是一种衡量两个概率分布之间差异的非对称性指标。公式为：。Kullback-Leibler divergence (KL divergence) is an asymmetric measure of the difference between two probability distributions. The formula is: .

为KL散度，用于衡量原始数据的特征分布与第次重采样的特征分布之间的差异。KL散度是非对称的，即。 KL divergence is used to measure the characteristic distribution of the original data With The resampled feature distribution The difference between . KL divergence is asymmetric, that is .

：原始数据中特征的概率分布值，表示特征在原始数据中的概率。 : Features in the original data The probability distribution value of The probability in the original data.

：第次重采样中对应特征的概率分布值，表示特征在第个重采样样本中的概率。 : The corresponding features in the resampling The probability distribution value of In the The probability of being in the resampled sample.

：关键时间依赖特征的第个取值，通常是关键时间依赖特征在不同时间点上的数值。 : Key time-dependent features No. A value, usually the value of a key time-dependent feature at different time points.

：对数函数，用于计算原始数据的特征分布与重采样数据的特征分布在特征处的比值。 : Logarithmic function, used to calculate the characteristic distribution of the original data and the characteristic distribution of the resampled data in the characteristic The ratio of .

通过计算每次重采样的特征分布与原始数据特征分布之间的KL散度，衡量重采样过程中特征的一致性。KL散度越小，表示重采样样本与原始样本之间的差异越小，特征越稳定。The consistency of features during the resampling process is measured by calculating the KL divergence between the feature distribution of each resample and the feature distribution of the original data. The smaller the KL divergence, the smaller the difference between the resampled sample and the original sample, and the more stable the feature.

Jensen-Shannon散度（JS散度）是一种对称的指标，用于衡量两个概率分布的相似性。公式为：；其中，。Jensen-Shannon divergence (JS divergence) is a symmetric indicator used to measure the similarity of two probability distributions. The formula is: ;in, .

：Jensen-Shannon散度（JS散度），用于衡量原始数据的特征分布与第次重采样的特征分布之间的相似性。JS散度是对称的，即。 : Jensen-Shannon divergence (JS divergence), used to measure the characteristic distribution of the original data With The resampled feature distribution The JS divergence is symmetric, that is, .

：原始数据特征分布与均值分布之间的KL散度，衡量与的差异。 : Original data feature distribution Distribution with mean The KL divergence between and difference.

：第次重采样的特征分布与均值分布之间的KL散度，衡量与的差异。 : The resampled feature distribution Distribution with mean The KL divergence between and difference.

为原始数据特征分布，为第次重采样特征分布。 is the original data characteristic distribution, For the Resample feature distributions.

通过计算JS散度，评估重采样样本与原始数据之间的相似性。JS散度越小，表示两个分布越相似，即重采样后的特征保持了与原始数据类似的分布。By calculating the JS divergence, the similarity between the resampled sample and the original data is evaluated. The smaller the JS divergence, the more similar the two distributions are, that is, the resampled features maintain a distribution similar to the original data.

S405：结合KL散度和JS散度，计算动态Bootstrap一致性和差异性测度来评估关键时间依赖特征的稳定性：S405: Combine KL divergence and JS divergence to calculate dynamic Bootstrap consistency and difference measures to evaluate the stability of key time-dependent features:

通过综合考虑KL散度和JS散度，计算关键时间依赖特征的动态Bootstrap一致性和差异性测度。公式为：。By comprehensively considering KL divergence and JS divergence, the dynamic Bootstrap consistency and difference measures of key time-dependent features are calculated. The formula is: .

为特征的稳定性测度，即动态Bootstrap一致性和差异性测度，用于评估关键时间依赖特征的稳定性。通过对KL散度和JS散度的结合计算，分析特征在不同重采样下的稳定性。 is a stability measure of the feature, i.e., a dynamic Bootstrap consistency and difference measure, which is used to evaluate key time-dependent features. By combining the KL divergence and JS divergence, the stability of the feature under different resampling is analyzed.

：重采样的总次数，表示在应用Bootstrap技术时，生成了多少个不同的样本集。 : The total number of resampling, indicating how many different sample sets are generated when applying the Bootstrap technique.

通过上述特征的稳定性测度，对关键时间依赖特征的稳定性进行评估。如果关键时间依赖特征的稳定性测度接近零，说明关键时间依赖特征在不同重采样下保持了较高的一致性和相似性，具有较强的稳定性；反之，则关键时间依赖特征的稳定性较差。The stability of the key time-dependent feature is evaluated by the stability measure of the above features. If the stability measure of the key time-dependent feature is close to zero, it means that the key time-dependent feature maintains high consistency and similarity under different resampling and has strong stability; otherwise, the stability of the key time-dependent feature is poor.

即特征的稳定性测度越大，说明关键时间依赖特征的稳定性越差。That is, the larger the stability measure of the feature, the worse the stability of the key time-dependent feature.

引入变点检测方法来评估关键时间依赖特征的时间依赖性是否随时间发生显著改变，具体包括：The change point detection method is introduced to evaluate whether the time dependence of key time-dependent features changes significantly over time, including:

S501：将关键时间依赖特征在时间维度上构建为时间序列，确保时间依赖性分析的基础数据完整：S501: Construct key time-dependent features as time series in the time dimension to ensure the integrity of the basic data for time-dependent analysis:

首先，将在前步骤中提取的关键时间依赖特征按时间顺序排列，形成一个时间序列数据集。First, the key time-dependent features extracted in the previous step are arranged in chronological order to form a time series dataset.

将所有时间点的数据整合为一个完整的时间序列，确保数据无缺失且按时间连续排列。使用插值方法填补可能的缺失值，确保数据的连续性和完整性。Integrate the data of all time points into a complete time series to ensure that the data is not missing and arranged continuously in time. Use interpolation methods to fill in possible missing values to ensure the continuity and completeness of the data.

如果时间点分布不均匀，进行时间轴标准化处理，使时间序列具有均匀的时间步长，以便后续分析。If the time points are unevenly distributed, the time axis is normalized to make the time series have a uniform time step for subsequent analysis.

S502：选用贝叶斯变点检测识别时间序列中的潜在变点：S502: Use Bayesian change point detection to identify potential change points in the time series:

贝叶斯变点检测是一种基于贝叶斯统计的方法，用于识别时间序列数据中结构性变化的点（变点）。这种方法通过计算时间序列中每个时间点处发生变点的概率来检测变点。Bayesian change point detection is a method based on Bayesian statistics to identify points of structural change (change points) in time series data. This method detects change points by calculating the probability of a change point occurring at each time point in the time series.

定义贝叶斯模型的先验分布，其中表示可能的变点位置。模型使用事后概率评估每个时间点作为变点的可能性，表示观察到的时间序列数据。在贝叶斯统计中，用来表示概率分布或概率密度。Defining the prior distribution for the Bayesian model ,in represents the possible change point location. The model uses the posterior probability Assess the likelihood of each time point being a change point, represents the observed time series data. In Bayesian statistics, Used to represent probability distribution or probability density.

通过动态规划算法计算贝叶斯后验概率，识别时间序列中的潜在变点。这一步确保对时间序列中所有可能的变点进行全面分析。The Bayesian posterior probability is calculated by dynamic programming algorithm to identify potential change points in the time series. This step ensures that all possible change points in the time series are fully analyzed.

S503：对时间序列进行窗口划分，通过滑动窗口计算检测特征在不同时间段内的稳定性：S503: Divide the time series into windows, and calculate the stability of the detection features in different time periods through sliding windows:

滑动窗口是一种分析技术，通过在时间序列上设定固定长度的窗口，在整个序列上移动以捕捉局部特征的变化。Sliding window is an analysis technique that sets a fixed-length window on a time series and moves it across the entire series to capture changes in local features.

设定窗口大小，将时间序列按窗口划分为多个子序列。窗口大小的选择依据数据的特性及分析的精度要求，一般采用经验法则或通过交叉验证确定最佳窗口大小。Set the window size and divide the time series into multiple subsequences according to the window. The selection of window size depends on the characteristics of the data and the accuracy requirements of the analysis. Generally, the best window size is determined by empirical rules or cross-validation.

在每个窗口内，计算关键时间依赖特征的局部均值、方差等统计量，以评估特征在不同时间段内的稳定性。这种局部分析能够捕捉到全局分析中可能忽略的短期变化。In each window, statistics such as local mean and variance of key time-dependent features are calculated to assess the stability of features over different time periods. This local analysis can capture short-term changes that may be overlooked in global analysis.

S504：对检测到的潜在变点进行显著性测试，评估变点是否在统计意义上显著，避免伪检测：S504: Perform significance test on detected potential change points to evaluate whether the change points are statistically significant and avoid false detection:

为了避免误将随机波动视为变点，需要对检测到的变点进行显著性测试。常用的方法包括t检验、卡方检验或基于似然比的检验。In order to avoid mistaking random fluctuations for change points, it is necessary to perform significance tests on the detected change points. Commonly used methods include t-test, chi-square test or likelihood ratio-based test.

将检测到的变点所在时间段与前后时间段的数据进行比较，使用显著性测试检验变点是否在统计意义上显著。例如，使用t检验比较变点前后两个时间段的均值差异，或使用似然比检验评估变点前后模型拟合的显著性差异。Compare the time period of the detected change point with the data of the previous and next time periods, and use a significance test to test whether the change point is statistically significant. For example, use a t-test to compare the mean difference between the two time periods before and after the change point, or use a likelihood ratio test to evaluate the significant difference in model fit before and after the change point.

如果显著性检验的p值小于预设的显著性水平（如0.05），则认为该变点在统计上是显著的，否则认为该变点可能是噪声引起的伪检测，应予以剔除。If the p-value of the significance test is less than the preset significance level (such as 0.05), the change point is considered to be statistically significant. Otherwise, it is considered that the change point may be a false detection caused by noise and should be eliminated.

S505：结合变点检测结果，评估关键时间依赖特征的时间依赖性是否发生显著改变：S505: Combined with the change point detection results, evaluate whether the time dependency of the key time-dependent features has changed significantly:

将通过显著性测试验证的变点信息整合到时间序列分析结果中，作为判断时间依赖性变化的基础。The change point information verified by significance testing is integrated into the time series analysis results as the basis for judging time-dependent changes.

评估在检测到的变点处，关键时间依赖特征的时间依赖性是否发生显著改变。具体地，可以通过比较变点前后的特征均值、方差或其他统计特征，确定特征的时间依赖性在变点处是否发生了显著的变化。Evaluate whether the time dependency of the key time-dependent feature has changed significantly at the detected change point. Specifically, by comparing the feature mean, variance or other statistical features before and after the change point, it can be determined whether the time dependency of the feature has changed significantly at the change point.

根据变点检测和时间依赖性评估的结果，判断关键时间依赖特征的时间依赖性是否在整个时间序列中保持稳定。如果发现显著的时间依赖性变化，需进一步分析其可能的原因和对模型的影响。Based on the results of change point detection and time dependency assessment, it is determined whether the time dependency of key time-dependent features remains stable throughout the time series. If significant time dependency changes are found, further analysis is required on their possible causes and impact on the model.

基于步骤S4和步骤S5综合判断提取的关键时间依赖特征可信度是否达标，具体为：Based on step S4 and step S5, it is comprehensively judged whether the credibility of the extracted key time-dependent features meets the standard, specifically:

通过将稳定性测度与设定的阈值进行比较，可以量化特征的稳定性水平；同时，评估时间依赖性是否发生显著改变，可以判断特征在不同时间段内的表现一致性。这两者结合，可以综合评估关键时间依赖特征的可信度，确保最终用于模型的特征具有较高的可靠性。By comparing the stability measure with the set threshold, the stability level of the feature can be quantified; at the same time, by evaluating whether the time dependency has changed significantly, the consistency of the feature performance in different time periods can be determined. The combination of the two can comprehensively evaluate the credibility of key time-dependent features and ensure that the features ultimately used in the model have high reliability.

获取关键时间依赖特征的时间依赖性是否随时间发生显著改变的判断结果。Obtain the judgment result of whether the time dependence of the key time-dependent feature changes significantly over time.

当特征的稳定性测度小于等于特征稳定评估阈值时，则判定关键时间依赖特征的稳定性正常。When the stability measure of a feature is less than or equal to the feature stability assessment threshold, the stability of the key time-dependent feature is determined to be normal.

当特征的稳定性测度大于特征稳定评估阈值时，则判定关键时间依赖特征的稳定性差。When the stability measure of a feature is greater than the feature stability assessment threshold, the stability of the critical time-dependent feature is determined to be poor.

将稳定性和时间依赖性两个重要因素结合起来，通过设置明确的评估标准和阈值，实现对特征的综合评估。这种方法不仅提升了评估的精确性和严谨性，还通过多维度的分析，增强了对特征的全面理解，避免了单一指标可能带来的偏差。Combining the two important factors of stability and time dependence, and setting clear evaluation criteria and thresholds, a comprehensive evaluation of features can be achieved. This method not only improves the accuracy and rigor of the evaluation, but also enhances the comprehensive understanding of the features through multi-dimensional analysis, avoiding the deviation that may be caused by a single indicator.

其中，特征稳定评估阈值的设定应基于实际应用场景中的经验数据或通过交叉验证方法来确定。具体来说，可以对历史数据进行统计分析，计算出特征稳定性测度的分布情况，并选择一个合理的分位数（如第95百分位数）作为阈值。此外，可以通过多次实验调整阈值，以确保在实际应用中能够有效区分稳定和不稳定的特征，从而提高模型的整体表现。Among them, the setting of the feature stability assessment threshold should be based on empirical data in actual application scenarios or determined through cross-validation methods. Specifically, historical data can be statistically analyzed to calculate the distribution of feature stability measures and select a reasonable quantile (such as the 95th percentile) as the threshold. In addition, the threshold can be adjusted through multiple experiments to ensure that stable and unstable features can be effectively distinguished in actual applications, thereby improving the overall performance of the model.

上述公式均是去量纲取其数值计算，公式是由采集大量数据进行软件模拟得到最近真实情况的一个公式，公式中的预设参数以及阈值选取由本领域的技术人员根据实际情况进行设置。The above formulas are all dimensionless and numerical calculations. The formula is a formula for the most recent real situation obtained by collecting a large amount of data and performing software simulation. The preset parameters and thresholds in the formula are set by technicians in this field according to actual conditions.

上述实施例，可以全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时，上述实施例可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令或计算机程序。在计算机上加载或执行所述计算机指令或计算机程序时，全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以为通用计算机、专用计算机、计算机网络，或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中，或者从一个计算机可读存储介质向另一个计算机可读存储介质传输，例如，所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线（例如红外、无线、微波等）方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集合的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质（例如，软盘、硬盘、磁带）、光介质（例如，DVD），或者半导体介质。半导体介质可以是固态硬盘。The above embodiments may be implemented in whole or in part by software, hardware, firmware or any other combination thereof. When implemented by software, the above embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions or computer programs. When the computer instructions or computer programs are loaded or executed on a computer, the process or function described in the embodiment of the present application is generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in a computer-readable storage medium, or may be transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from one website, computer, server or data center to another website, computer, server or data center by wired (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that a computer can access or a data storage device such as a server or data center that contains one or more available media sets. The available medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a DVD), or a semiconductor medium. The semiconductor medium may be a solid-state hard disk.

本领域普通技术人员可以意识到，结合本文中所公开的实施例描述的各示例的模块及算法步骤，能够以电子硬件，或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art will appreciate that the modules and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Professional and technical personnel can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of this application.

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的系统、装置和模块的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working processes of the systems, devices and modules described above can refer to the corresponding processes in the aforementioned method embodiments and will not be repeated here.

在本申请所提供的几个实施例中，应该理解到，所揭露的系统、装置和方法，可以通过其他的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述模块的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个模块或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或模块的间接耦合或通信连接，可以是电性，机械或其他的形式。In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the modules is only a logical function division. There may be other division methods in actual implementation, such as multiple modules or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or modules, which can be electrical, mechanical or other forms.

所述作为分离部件说明的模块可以是或者也可以不是物理上分开的，作为模块显示的部件可以是或者也可以不是物理模块，既可以位于一个地方，或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical modules, and may be located in one place or distributed on multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本申请各个实施例中的各功能模块可以集成在一个处理模块中，也可以是各个模块单独物理存在，也可以两个或两个以上模块集成在一个模块中。In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist physically separately, or two or more modules may be integrated into one module.

所述功能如果以软件功能模块的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备（可以是个人计算机，服务器，或者网络设备等）执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(read-only memory，ROM)、随机存取存储器(random access memory，RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions are implemented in the form of software function modules and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application can be essentially or partly embodied in the form of a software product that contributes to the prior art. The computer software product is stored in a storage medium, including several instructions for a computer device (which can be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), disk or optical disk, and other media that can store program codes.

以上所述，仅为本申请的具体实施方式，但本申请的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应以所述权利要求的保护范围为准。The above is only a specific implementation of the present application, but the protection scope of the present application is not limited thereto. Any person skilled in the art who is familiar with the present technical field can easily think of changes or substitutions within the technical scope disclosed in the present application, which should be included in the protection scope of the present application. Therefore, the protection scope of the present application should be based on the protection scope of the claims.

最后：以上所述仅为本发明的优选实施例而已，并不用于限制本发明，凡在本发明的精神和原则之内，所做的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。Finally: The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present invention should be included in the protection scope of the present invention.