CN118469045A

Movatterモバイル変換

Info

Publication number: CN118469045A
Application number: CN202410910534.1A
Authority: CN
Inventors: 张玉芝; 梁少杰; 陈建兵; 王盟; 贾明涛
Original assignee: Shijiazhuang Tiedao University
Current assignee: Shijiazhuang Tiedao University
Priority date: 2024-07-09
Filing date: 2024-07-09
Publication date: 2024-08-09
Anticipated expiration: 2044-07-09
Also published as: CN118469045B

Abstract

Description

Translated fromChinese

多年冻土上限预测方法、装置、设备及存储介质Permafrost upper limit prediction method, device, equipment and storage medium

技术领域Technical Field

本申请属于数据处理技术领域，尤其涉及多年冻土上限预测方法、装置、设备及存储介质。The present application belongs to the field of data processing technology, and in particular relates to a method, device, equipment and storage medium for predicting the upper limit of permafrost.

背景技术Background Art

青藏通道是连接内陆最便捷陆上交通走廊，被视为西藏的“生命线”，该道的大部分路段都建立在这片冻土之上。The Qinghai-Tibet Channel is the most convenient land transportation corridor connecting the inland and is regarded as the "lifeline" of Tibet. Most sections of the road are built on this permafrost.

在多年冻土地区进行高速公路的建设和运营面临着巨大挑战，主要集中在解决多年冻土对路基造成的不利影响。多年冻土是一种特殊的土壤类型，其在气候和季节变化下常呈现冻胀和融沉等特性，尤其多年冻土上限下降对路基构成了严重的变形威胁，直接影响了道路的平稳性，因此，多年冻土上限的准确预测对于路基病害管理至关重要。The construction and operation of highways in permafrost areas face huge challenges, mainly focusing on solving the adverse effects of permafrost on roadbeds. Permafrost is a special type of soil that often exhibits characteristics such as frost heave and thaw settlement under climate and seasonal changes. In particular, the decline in the upper limit of permafrost poses a serious deformation threat to the roadbed and directly affects the stability of the road. Therefore, accurate prediction of the upper limit of permafrost is crucial for roadbed disease management.

传统的预测方法可能受到数据复杂性、时序变化和非线性关系的限制，进而使得预测结果出现较大偏差，无法实现多年冻土上限的准确预测。Traditional prediction methods may be limited by data complexity, time series changes and nonlinear relationships, which may lead to large deviations in the prediction results and make it impossible to accurately predict the upper limit of permafrost.

发明内容Summary of the invention

本申请实施例提供了多年冻土上限预测方法、装置、设备及存储介质，以准确的预测多年冻土上限。The embodiments of the present application provide a method, device, equipment and storage medium for predicting the upper limit of permafrost to accurately predict the upper limit of permafrost.

本申请是通过如下技术方案实现的：This application is implemented through the following technical solutions:

第一方面，本申请实施例提供了一种多年冻土上限预测方法，包括：In a first aspect, an embodiment of the present application provides a method for predicting an upper limit of permafrost, comprising:

获取目标路基的历史多年冻土上限数据，以及影响历史多年冻土上限数据的历史环境数据，得到训练集和测试集。The historical permafrost upper limit data of the target roadbed and the historical environmental data that affect the historical permafrost upper limit data are obtained to obtain the training set and the test set.

根据训练集和测试集训练多年冻土上限预测模型；其中，多年冻土上限预测模型为随机森林算法、极端梯度提升算法和长短期记忆网络结合形成的预测模型。The permafrost upper limit prediction model is trained based on the training set and the test set; the permafrost upper limit prediction model is a prediction model formed by combining the random forest algorithm, the extreme gradient boosting algorithm and the long short-term memory network.

基于影响目标时段多年冻土上限数据的环境数据和完成训练的多年冻土上限预测模型，预测目标路基在目标时段的多年冻土上限；其中，目标时段为未来任意时间段。Based on the environmental data that affects the upper limit of permafrost in the target period and the trained upper limit prediction model of permafrost, the upper limit of permafrost in the target roadbed in the target period is predicted; wherein the target period is any time period in the future.

结合第一方面，在一些可能的实现方式中，根据训练集和测试集训练多年冻土上限预测模型，包括：In combination with the first aspect, in some possible implementations, training a permafrost upper limit prediction model based on a training set and a test set includes:

通过交叉验证将训练集进行随机划分，得到多组训练数据和多组对应的验证数据。The training set is randomly divided through cross-validation to obtain multiple groups of training data and multiple groups of corresponding verification data.

将随机森林算法、极端梯度提升算法和长短期记忆网络均作为目标算法，对每个目标算法进行参数初始化、参数寻优和参数修正，得到每个目标算法最终的参数集合。The random forest algorithm, extreme gradient boosting algorithm and long short-term memory network are all taken as target algorithms. Parameter initialization, parameter optimization and parameter correction are performed on each target algorithm to obtain the final parameter set of each target algorithm.

基于多组验证数据，计算随机森林算法的权重、极端梯度提升算法的权重和长短期记忆网络的权重。Based on multiple sets of validation data, the weights of the random forest algorithm, the extreme gradient boosting algorithm, and the long short-term memory network are calculated.

基于随机森林算法最终的参数集合、极端梯度提升算法的参数集合、长短期记忆网络最终的参数集合、随机森林算法的权重、极端梯度提升算法的权重和长短期记忆网络的权重，得到训练完成的多年冻土上限预测模型。Based on the final parameter set of the random forest algorithm, the parameter set of the extreme gradient boosting algorithm, the final parameter set of the long short-term memory network, the weights of the random forest algorithm, the weights of the extreme gradient boosting algorithm and the weights of the long short-term memory network, the trained permafrost upper limit prediction model is obtained.

对每个目标算法进行参数初始化、参数寻优和参数修正，得到每个目标算法最终的参数集合，包括：Perform parameter initialization, parameter optimization and parameter correction for each target algorithm to obtain the final parameter set of each target algorithm, including:

初始化每个目标算法的参数。Initialize the parameters of each target algorithm.

基于多组训练数据和多组对应的验证数据通过网格法，对每个目标算法的参数进行寻优，得到每个目标算法的最优参数集合。Based on multiple sets of training data and multiple sets of corresponding verification data, the grid method is used to optimize the parameters of each target algorithm to obtain the optimal parameter set for each target algorithm.

选择均方误差作为每个目标算法的损失函数，基于测试集、每个目标算法的最优参数集合和每个目标算法，计算每个目标算法的误差值，基于每个目标算法的误差值对每个目标算法的最优参数集合进行修正，直至每个目标算法的最优参数集合不再发生变化或者训练次数达到第一预设值，得到每个目标算法最终的参数集合。Select mean square error as the loss function of each target algorithm, calculate the error value of each target algorithm based on the test set, the optimal parameter set of each target algorithm and each target algorithm, and correct the optimal parameter set of each target algorithm based on the error value of each target algorithm until the optimal parameter set of each target algorithm no longer changes or the number of training times reaches a first preset value, so as to obtain the final parameter set of each target algorithm.

结合第一方面，在一些可能的实现方式中，基于多组验证数据，计算随机森林算法的权重、极端梯度提升算法的权重和长短期记忆网络的权重，包括：In combination with the first aspect, in some possible implementations, based on multiple sets of verification data, the weights of the random forest algorithm, the weights of the extreme gradient boosting algorithm, and the weights of the long short-term memory network are calculated, including:

基于多组验证数据中的历史环境数据和随机森林算法，预测得到多个第一预测值。Based on the historical environmental data in multiple groups of verification data and the random forest algorithm, multiple first prediction values are predicted.

基于多组验证数据中的历史环境数据和极端梯度提升算法，预测得到多个第二预测值。Based on the historical environmental data in multiple groups of verification data and the extreme gradient boosting algorithm, multiple second prediction values are predicted.

基于多组验证数据中的历史环境数据和长短期记忆网络，预测得到多个第三预测值。Based on the historical environmental data and long short-term memory networks in multiple groups of verification data, multiple third prediction values are predicted.

基于最小绝对误差线性加权法、多个第一预测值、多个第二预测值和多个第三预测值，计算得到随机森林算法的权重、极端梯度提升算法的权重和长短期记忆网络的权重。Based on the minimum absolute error linear weighted method, multiple first prediction values, multiple second prediction values and multiple third prediction values, the weights of the random forest algorithm, the weights of the extreme gradient boosting algorithm and the weights of the long short-term memory network are calculated.

结合第一方面，在一些可能的实现方式中，基于最小绝对误差线性加权法、多个第一预测值、多个第二预测值和多个第三预测值，计算得到随机森林算法的权重、极端梯度提升算法的权重和长短期记忆网络的权重，包括：In combination with the first aspect, in some possible implementations, based on the minimum absolute error linear weighted method, multiple first prediction values, multiple second prediction values, and multiple third prediction values, the weights of the random forest algorithm, the weights of the extreme gradient boosting algorithm, and the weights of the long short-term memory network are calculated, including:

基于最小绝对误差线性加权法、多个第一预测值、多个第二预测值和多个第三预测值，结合第一公式，计算得到随机森林算法的权重、极端梯度提升算法的权重和长短期记忆网络的权重。Based on the minimum absolute error linear weighted method, multiple first prediction values, multiple second prediction values and multiple third prediction values, combined with the first formula, the weights of the random forest algorithm, the weights of the extreme gradient boosting algorithm and the weights of the long short-term memory network are calculated.

第一公式为：The first formula is:

其中，表示第个第一预测值，表示第个第二预测值，表示第个第三预测值，表示随机森林算法的权重，表示极端梯度提升算法的权重，表示长短期记忆网络的权重，表示第个验证数据中历史多年冻土上限的真实值。in, Indicates The first predicted value, Indicates The second predicted value, Indicates The third predicted value, represents the weight of the random forest algorithm, represents the weight of the extreme gradient boosting algorithm, represents the weight of the long short-term memory network, Indicates The true value of the upper limit of historical permafrost in the validation data.

结合第一方面，在一些可能的实现方式中，获取目标路基的历史多年冻土上限数据，以及影响历史多年冻土上限数据的历史环境数据，得到训练集和测试集，包括：In combination with the first aspect, in some possible implementations, historical permafrost upper limit data of the target roadbed and historical environmental data affecting the historical permafrost upper limit data are obtained to obtain a training set and a test set, including:

获取目标路基的历史多年冻土上限数据，以及目标路基所在地的多种类型的历史环境数据。Obtain historical permafrost upper limit data for the target roadbed, as well as various types of historical environmental data for the location of the target roadbed.

计算历史多年冻土上限和每种类型的历史环境数据的相关系数，得到多个相关系数。The correlation coefficients between the historical permafrost upper limit and each type of historical environmental data were calculated, and multiple correlation coefficients were obtained.

将多个相关系数中超过第一阈值的相关系数所对应的环境数据的类型，确定为影响历史多年冻土上限数据的历史环境数据。The type of environmental data corresponding to the correlation coefficient exceeding the first threshold among the multiple correlation coefficients is determined as the historical environmental data affecting the historical permafrost upper limit data.

将目标路基的历史多年冻土上限数据和影响历史多年冻土上限数据的历史环境数据按照时间一一对应，得到第一数据集合。The historical permafrost upper limit data of the target roadbed and the historical environmental data affecting the historical permafrost upper limit data are matched one by one according to time to obtain a first data set.

将第一数据集合按照预设比例进行划分，得到训练集和测试集。The first data set is divided into a training set and a test set according to a preset ratio.

结合第一方面，在一些可能的实现方式中，计算历史多年冻土上限和每种类型的历史环境数据的相关系数，得到多个相关系数，包括：In combination with the first aspect, in some possible implementations, the correlation coefficient between the historical permafrost upper limit and each type of historical environmental data is calculated to obtain multiple correlation coefficients, including:

结合第二公式，计算历史多年冻土上限和每种类型的历史环境数据的相关系数，得到多个相关系数。Combined with the second formula, the correlation coefficient between the upper limit of historical permafrost and each type of historical environmental data is calculated to obtain multiple correlation coefficients.

第二公式为：The second formula is:

其中，表示历史多年冻土上限与类型的历史环境数据的相关系数，表示时间序列号的个数，表示第个时间序列号时类型的历史环境数据的取值，表示第个时间序列号时历史多年冻土上限的取值，表示全部类型的历史环境数据取值的均值，表示全部历史多年冻土上限取值的均值。in, Indicates the upper limit of historical permafrost and Correlation coefficient of historical environmental data of type, Indicates the number of time series numbers. Indicates Time series number The value of the historical environment data of the type, Indicates The upper limit of historical permafrost The value of Show All The average value of the historical environmental data of the type, The upper limit of all historical permafrost The mean of the values.

结合第一方面，在一些可能的实现方式中，获取目标路基的历史多年冻土上限数据，以及目标路基所在地的多种类型的历史环境数据，包括：In conjunction with the first aspect, in some possible implementations, historical permafrost upper limit data of the target roadbed and various types of historical environmental data of the location of the target roadbed are obtained, including:

获取历史多年冻土上限的原始数据和历史环境数据的原始数据。Obtain raw data on historical permafrost upper limits and raw data on historical environmental data.

对历史多年冻土上限的原始数据和历史环境数据的原始数据进行数据清洗、数据填充、数据重采样和数据归一化操作，得到目标路基的历史多年冻土上限以及目标路基所在地的多种类型的历史环境数据。The original data of the historical permafrost upper limit and the original data of the historical environmental data are cleaned, filled, resampled and normalized to obtain the historical permafrost upper limit of the target roadbed and various types of historical environmental data at the location of the target roadbed.

第二方面，本申请实施例提供了一种多年冻土上限预测装置，包括：In a second aspect, an embodiment of the present application provides a permafrost upper limit prediction device, comprising:

数据获取模块，用于获取目标路基的历史多年冻土上限数据，以及影响历史多年冻土上限数据的历史环境数据，得到训练集和测试集。The data acquisition module is used to obtain the historical permafrost upper limit data of the target roadbed and the historical environmental data that affects the historical permafrost upper limit data, and obtain the training set and the test set.

模型训练模块，用于根据训练集和测试集训练多年冻土上限预测模型；其中，多年冻土上限预测模型为随机森林算法、极端梯度提升算法和长短期记忆网络结合形成的预测模型。The model training module is used to train the permafrost upper limit prediction model based on the training set and the test set; the permafrost upper limit prediction model is a prediction model formed by combining the random forest algorithm, the extreme gradient boosting algorithm and the long short-term memory network.

结果输出模块，用于基于影响目标时段多年冻土上限数据的环境数据和完成训练的多年冻土上限预测模型，预测目标路基在目标时段的多年冻土上限；其中，目标时段为未来任意时间段。The result output module is used to predict the permafrost upper limit of the target roadbed in the target period based on the environmental data that affects the permafrost upper limit data of the target period and the trained permafrost upper limit prediction model; wherein the target period is any time period in the future.

第三方面，本申请实施例提供了一种终端设备，包括：处理器和存储器，该存储器用于存储计算机程序，所述处理器执行所述计算机程序时实现如第一方面任一项所述的多年冻土上限预测方法。In a third aspect, an embodiment of the present application provides a terminal device, comprising: a processor and a memory, the memory being used to store a computer program, and when the processor executes the computer program, the permafrost upper limit prediction method as described in any one of the first aspects is implemented.

第四方面，本申请实施例提供了一种计算机可读存储介质，所述计算机可读存储介质存储有计算机程序，所述计算机程序被处理器执行时实现如第一方面任一项所述的多年冻土上限预测方法。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the method for predicting an upper limit of permafrost as described in any one of the first aspects is implemented.

可以理解的是，上述第二方面至第四方面的有益效果可以参见上述第一方面中的相关描述，在此不再赘述。It can be understood that the beneficial effects of the second to fourth aspects mentioned above can be found in the relevant description of the first aspect mentioned above, and will not be repeated here.

本申请实施例与现有技术相比存在的有益效果是：Compared with the prior art, the embodiments of the present invention have the following beneficial effects:

本申请的多年冻土上限预测模型使用了随机森林算法（Random Forest，RF）、极端梯度提升算法（eXtreme Gradient Boosting，XGBoost）和长短期记忆网络（Long Short-Term Memory，LSTM），随机森林算法通过构建多个决策树，利用训练集和测试集中的随机子集进行训练，从而有效处理多年冻土上限数据中的不确定性和复杂性。长短时记忆神经网络则能够捕捉训练集和测试集数据中的时间相关性，考虑多年冻土上限与时间变化之间的关系。而极端梯度提升算法通过梯度提升的方式进一步提高了预测模型的性能，使其更适用于不同环境条件下的准确预测。更加全面的考虑了影响多年冻土上限的因素，并根据这些因素训练得到了多年冻土上限预测模型，多年冻土上限预测模型能够充分考虑数据的复杂性、数据的不确定性、时序变化以及非线性关系的限制，预测效率更高，预测结果更加准确。The permafrost upper limit prediction model of this application uses the Random Forest algorithm (RF), the extreme gradient boosting algorithm (XGBoost) and the Long Short-Term Memory network (LSTM). The Random Forest algorithm constructs multiple decision trees and uses random subsets in the training set and the test set for training, so as to effectively deal with the uncertainty and complexity in the permafrost upper limit data. The long short-term memory neural network can capture the time correlation in the training set and the test set data, and consider the relationship between the permafrost upper limit and time changes. The extreme gradient boosting algorithm further improves the performance of the prediction model by gradient boosting, making it more suitable for accurate prediction under different environmental conditions. The factors affecting the permafrost upper limit are considered more comprehensively, and the permafrost upper limit prediction model is obtained by training based on these factors. The permafrost upper limit prediction model can fully consider the complexity of the data, the uncertainty of the data, the time series changes and the limitations of nonlinear relationships, and the prediction efficiency is higher and the prediction results are more accurate.

除此之外，相比于传统的预测模型，不同的路基所处的位置不同，传统固定的预测多年冻土上限的模型会存在一定的误差，但是本申请中的多年冻土上限预测模型能够根据目标路基的历史数据建立与该目标路基更加契合的预测模型，进而使预测结果更加准确。本申请的模型还可以在数据集数据较少的时候使用，通过交叉验证和网格法对本模型进行训练，进而缓解在数据集数据较少的情况下存在的过拟合现象，最终显著提升多年冻土上限的预测精度。In addition, compared with the traditional prediction model, different roadbeds are located in different positions, and the traditional fixed model for predicting the upper limit of permafrost will have certain errors, but the permafrost upper limit prediction model in this application can establish a prediction model that is more consistent with the target roadbed based on the historical data of the target roadbed, thereby making the prediction result more accurate. The model of this application can also be used when the data set is small, and the model is trained by cross-validation and grid method, thereby alleviating the overfitting phenomenon that exists when the data set is small, and finally significantly improving the prediction accuracy of the upper limit of permafrost.

应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，并不能限制本说明书。It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present specification.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present application. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying any creative labor.

图1是本申请一实施例提供的多年冻土上限预测方法的流程示意图；FIG1 is a schematic flow chart of a method for predicting an upper limit of permafrost provided in an embodiment of the present application;

图2是本申请一实施例提供的随机森林算法预测预测值的过程示意图；FIG2 is a schematic diagram of a process of predicting a prediction value using a random forest algorithm provided by an embodiment of the present application;

图3是本申请一实施例提供的多种预测方法评价结果对比分析示意图；FIG3 is a schematic diagram of comparative analysis of evaluation results of various prediction methods provided in an embodiment of the present application;

图4是本申请一实施例提供的多年冻土上限预测装置的结构示意图；FIG4 is a schematic diagram of the structure of a permafrost upper limit prediction device provided by an embodiment of the present application;

图5是本申请一实施例提供的终端设备的结构示意图。FIG5 is a schematic diagram of the structure of a terminal device provided in an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

以下描述中，为了说明而不是为了限定，提出了诸如特定系统结构、技术之类的具体细节，以便透彻理解本申请实施例。然而，本领域的技术人员应当清楚，在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中，省略对众所周知的系统、装置、电路以及方法的详细说明，以免不必要的细节妨碍本申请的描述。In the following description, specific details such as specific system structures, technologies, etc. are provided for the purpose of illustration rather than limitation, so as to provide a thorough understanding of the embodiments of the present application. However, it should be clear to those skilled in the art that the present application may also be implemented in other embodiments without these specific details. In other cases, detailed descriptions of well-known systems, devices, circuits, and methods are omitted to prevent unnecessary details from obstructing the description of the present application.

应当理解，当在本申请说明书和所附权利要求书中使用时，术语“包括”指示所描述特征、整体、步骤、操作、元素和/或组件的存在，但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It should be understood that when used in the present specification and the appended claims, the term "comprising" indicates the presence of described features, wholes, steps, operations, elements and/or components, but does not exclude the presence or addition of one or more other features, wholes, steps, operations, elements, components and/or combinations thereof.

还应当理解，在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合，并且包括这些组合。It should also be understood that the term “and/or” used in the specification and appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

如在本申请说明书和所附权利要求书中所使用的那样，术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地，短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。As used in the specification and appended claims of this application, the term "if" can be interpreted as "when" or "uponce" or "in response to determining" or "in response to detecting", depending on the context. Similarly, the phrase "if it is determined" or "if [described condition or event] is detected" can be interpreted as meaning "uponce it is determined" or "in response to determining" or "uponce [described condition or event] is detected" or "in response to detecting [described condition or event]", depending on the context.

另外，在本申请说明书和所附权利要求书的描述中，术语“第一”、“第二”、“第三”等仅用于区分描述，而不能理解为指示或暗示相对重要性。In addition, in the description of the present application specification and the appended claims, the terms "first", "second", "third", etc. are only used to distinguish the descriptions and cannot be understood as indicating or implying relative importance.

在本申请说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此，在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例，而是意味着“一个或多个但不是所有的实施例”，除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”，除非是以其他方式另外特别强调。References to "one embodiment" or "some embodiments" etc. described in the specification of this application mean that one or more embodiments of the present application include specific features, structures or characteristics described in conjunction with the embodiment. Therefore, the statements "in one embodiment", "in some embodiments", "in some other embodiments", "in some other embodiments", etc. that appear in different places in this specification do not necessarily refer to the same embodiment, but mean "one or more but not all embodiments", unless otherwise specifically emphasized in other ways. The terms "including", "comprising", "having" and their variations all mean "including but not limited to", unless otherwise specifically emphasized in other ways.

本申请实施例提供了一种多年冻土上限预测方法，图1是本申请一实施例提供的多年冻土上限预测方法的流程示意性图，参照图1，对该多年冻土上限预测方法的详述如下：The present application embodiment provides a method for predicting the upper limit of permafrost. FIG1 is a schematic diagram of the process of the method for predicting the upper limit of permafrost provided by an embodiment of the present application. With reference to FIG1 , the method for predicting the upper limit of permafrost is described in detail as follows:

步骤101，获取目标路基的历史多年冻土上限数据，以及影响历史多年冻土上限数据的历史环境数据，得到训练集和测试集。Step 101, obtaining historical permafrost upper limit data of the target roadbed and historical environmental data affecting the historical permafrost upper limit data, and obtaining a training set and a test set.

示例性的，获取目标路基的历史多年冻土上限数据，以及影响历史多年冻土上限数据的历史环境数据，得到训练集和测试集，可以包括：Exemplarily, obtaining historical permafrost upper limit data of a target roadbed and historical environmental data that affects the historical permafrost upper limit data to obtain a training set and a test set may include:

在一些具体实施例中，预设比例一般为4：1，为了更高的适应性，对于不同的实际情况，预设比例可以根据具体情况进行适应性的修改。In some specific embodiments, the preset ratio is generally 4:1. For higher adaptability, the preset ratio can be adaptively modified according to different actual situations.

在一些具体实施例中，根据计算历史多年冻土上限和每种类型的历史环境数据的相关系数，可以筛选出与该目标路基所在地的多年冻土上限相关的环境因素，剔除无关数据，减少数据的运算量，加快预测多年冻土上限的速度。In some specific embodiments, based on the calculated correlation coefficient between the historical permafrost upper limit and each type of historical environmental data, the environmental factors related to the permafrost upper limit at the target roadbed location can be screened out, irrelevant data can be eliminated, the amount of data calculation can be reduced, and the speed of predicting the permafrost upper limit can be accelerated.

示例性的，计算历史多年冻土上限和每种类型的历史环境数据的相关系数，得到多个相关系数，可以包括：Exemplarily, calculating the correlation coefficient between the historical permafrost upper limit and each type of historical environmental data to obtain multiple correlation coefficients may include:

第二公式为：The second formula is:

在一些具体实施例中，相关系数为斯皮尔曼相关系数，斯皮尔曼相关系数的系数基于秩，是一种非参数统计方法，用于衡量两个变量之间的相关程度，适用于非线性相关关系的分析。更适合量化多年冻土上限与环境数据之间的非线性相关关系。In some specific embodiments, the correlation coefficient is a Spearman correlation coefficient, which is a nonparametric statistical method based on rank, used to measure the degree of correlation between two variables, and is suitable for the analysis of nonlinear correlations, and is more suitable for quantifying the nonlinear correlation between the upper limit of permafrost and environmental data.

示例性的，获取目标路基的历史多年冻土上限数据，以及目标路基所在地的多种类型的历史环境数据，可以包括：Exemplarily, obtaining historical permafrost upper limit data of the target roadbed and various types of historical environmental data of the location of the target roadbed may include:

在某些具体实施例中，数据清洗的具体步骤如下：监测传感器采集到的数据可能受环境干扰和设备问题影响，导致出现异常孤立数据点和噪声数据。这些异常数据会影响冻土上限预测及稳定性评估的准确性和实时性，因此需要进行数据清洗。为了清洗铁路沿线监测传感器采集到的数据，选择采用3σ标准去噪方法，以排除异常数据和噪声，确保数据质量和分析结果的可靠性。In some specific embodiments, the specific steps of data cleaning are as follows: the data collected by the monitoring sensor may be affected by environmental interference and equipment problems, resulting in abnormal isolated data points and noise data. These abnormal data will affect the accuracy and real-time performance of the upper limit prediction and stability assessment of frozen soil, so data cleaning is required. In order to clean the data collected by the monitoring sensors along the railway, the 3σ standard denoising method is selected to exclude abnormal data and noise, and ensure the data quality and reliability of the analysis results.

在某些具体实施例中，数据填充的具体步骤如下：在数据处理过程中，原始数据可能存在缺失或异常值，需要进行数据填充以确保数据的连续性。特别是对地温和气象数据这类具有较强连续性的数据，在处理时可以采用向前填充的方法来填补缺失值。由于这些数据在较短时间内变化不大，向前填充方法能够方便地完成数据填充操作，同时保持填充结果的准确性，避免引入较大误差。In some specific embodiments, the specific steps of data filling are as follows: during the data processing, the original data may have missing or abnormal values, and data filling is required to ensure the continuity of the data. In particular, for data with strong continuity such as ground temperature and meteorological data, a forward filling method can be used to fill the missing values during processing. Since these data do not change much in a short period of time, the forward filling method can conveniently complete the data filling operation while maintaining the accuracy of the filling result and avoiding the introduction of large errors.

在某些具体实施例中，数据重采样的具体步骤如下：由于地温数据的采样频率为每月2次，与气温数据采集频率不同，需要对数据进行重采样和频率转换。一种方法是通过计算每个时间段内气温数据的平均值，将气温数据转换为每月2次的频率。这样可以使地温数据与气温数据的频率保持一致，以便进行后续分析和比较。In some specific embodiments, the specific steps of data resampling are as follows: Since the sampling frequency of ground temperature data is twice a month, which is different from the frequency of air temperature data collection, the data needs to be resampled and frequency converted. One method is to convert the air temperature data to a frequency of twice a month by calculating the average value of the air temperature data in each time period. In this way, the frequency of ground temperature data and air temperature data can be kept consistent for subsequent analysis and comparison.

在某些具体实施例中，数据归一化的具体步骤如下：由于各连续数据特征（如日平均气温、地表温度、地表热流通量、冻土上限）的取值范围差异较大，为加快模型的收敛速度，需要对这些连续特征数据进行归一化处理。通过归一化处理，可以将不同特征数据的取值范围映射到相似的尺度上，有利于模型更快地学习和收敛，提高训练效率和准确性。In some specific embodiments, the specific steps of data normalization are as follows: Since the value ranges of various continuous data features (such as daily average temperature, surface temperature, surface heat flux, and upper limit of frozen ground) vary greatly, in order to speed up the convergence of the model, these continuous feature data need to be normalized. Through normalization, the value ranges of different feature data can be mapped to similar scales, which is conducive to faster learning and convergence of the model and improves training efficiency and accuracy.

在一些具体实施例中，进行数据清洗、数据填充、数据重采样和数据归一化操作能使后续计算的数据更加准确，进而准确的预测出未来时间的多年冻土上限。In some specific embodiments, performing data cleaning, data filling, data resampling and data normalization operations can make the subsequently calculated data more accurate, thereby accurately predicting the upper limit of permafrost in the future.

步骤102，根据训练集和测试集训练多年冻土上限预测模型；其中，多年冻土上限预测模型为随机森林算法、极端梯度提升算法和长短期记忆网络结合形成的预测模型。Step 102, training a permafrost upper limit prediction model based on the training set and the test set; wherein the permafrost upper limit prediction model is a prediction model formed by combining a random forest algorithm, an extreme gradient boosting algorithm, and a long short-term memory network.

示例性的，步骤102可以包括：Exemplarily, step 102 may include:

通过交叉验证将训练集进行随机划分，得到多组训练数据和多组对应的验证数据。其中，交叉验证为五折交叉验证，即将训练集划分为五等份，其中的四份为训练数据，一份为验证数据。因为任意四份均可以划分为训练数据，因此会得到五组划分好的训练数据和对应的验证数据。The training set is randomly divided through cross-validation to obtain multiple sets of training data and multiple sets of corresponding validation data. Among them, the cross-validation is a five-fold cross-validation, that is, the training set is divided into five equal parts, four of which are training data and one is validation data. Because any four parts can be divided into training data, five sets of divided training data and corresponding validation data will be obtained.

将随机森林算法、极端梯度提升算法和长短期记忆网络均作为目标算法，对每个目标算法进行参数初始化、参数寻优和参数修正，得到每个目标算法最终的参数集合。其中，参数初始化步骤与上述交叉验证过程并不存在明显的先后顺序，因此，在实际生产生活中，参数初始化的过程可以在交叉验证过程之前，也可以在交叉验证过程之后。The random forest algorithm, extreme gradient boosting algorithm and long short-term memory network are all used as target algorithms. Parameters of each target algorithm are initialized, optimized and modified to obtain the final parameter set of each target algorithm. Among them, there is no obvious order between the parameter initialization step and the above cross-validation process. Therefore, in actual production life, the parameter initialization process can be before or after the cross-validation process.

基于随机森林算法最终的参数集合、极端梯度提升算法的参数集合、长短期记忆网络最终的参数集合、随机森林算法的权重、极端梯度提升算法的权重和长短期记忆网络的权重，得到训练完成的多年冻土上限预测模型。Based on the final parameter set of the random forest algorithm, the parameter set of the extreme gradient boosting algorithm, the final parameter set of the long short-term memory network, the weights of the random forest algorithm, the weights of the extreme gradient boosting algorithm and the weights of the long short-term memory network, the trained permafrost upper limit prediction model was obtained.

对每个目标算法进行参数初始化、参数寻优和参数修正，得到每个目标算法最终的参数集合，可以包括：Perform parameter initialization, parameter optimization and parameter correction for each target algorithm to obtain the final parameter set of each target algorithm, which may include:

通过交叉验证和网格法对多年冻土上限预测模型进行训练，交叉验证能够生成足够多的训练数据和验证数据保证训练数据的充足，而网格法能够让模型进行充分的寻优，保证模型的精度，因此缓解了在数据集数据较少的情况下多年冻土上限预测模型出现过拟合现象的可能性，最终显著提升多年冻土上限的预测精度。The permafrost upper limit prediction model is trained through cross-validation and grid method. Cross-validation can generate enough training data and verification data to ensure the sufficiency of training data, while the grid method can enable the model to fully optimize and ensure the accuracy of the model. Therefore, it alleviates the possibility of overfitting of the permafrost upper limit prediction model when the data set is small, and ultimately significantly improves the prediction accuracy of the permafrost upper limit.

在一些具体实施例中，输入进三个算法的样本序列为：，在这个序列中m表示时间序列序号（代表了不同的时刻），n表示影响多年冻土上限的环境因素的数量，而表示的为第m个时间序列序号时对应的多年冻土上限的真实值。也就是用这个真实值和根据环境数据计算的预测值计算目标算法的误差值。In some specific embodiments, the sample sequences input into the three algorithms are: In this sequence,m represents the time series number (representing different moments),n represents the number of environmental factors affecting the upper limit of permafrost, and It represents the true value of the upper limit of permafrost corresponding to themth time series number. That is, the error value of the target algorithm is calculated using this true value and the predicted value calculated based on environmental data.

在一些具体实施例中，步骤102具体可以包括一下步骤：In some specific embodiments, step 102 may specifically include the following steps:

步骤一：构建训练集和测试集。训练集与测试集数据来源为获取目标路基的历史多年冻土上限数据，以及目标路基所在地的多种类型的历史环境数据，训练集与测试集中数据的比例为4:1。输入训练集中的训练样本，训练集中的任意第j个训练样本为输入序列。Step 1: Construct training set and test set. The data sources of training set and test set are historical permafrost upper limit data of target roadbed and various types of historical environmental data of the target roadbed. The ratio of training set to test set data is 4:1. Input the training samples in the training set. Anyj -th training sample in the training set is the input sequence .

步骤二：初始化各个算法中所有算法的参数。Step 2: Initialize the parameters of all algorithms in each algorithm.

步骤三：通过交叉验证将训练集数据随机分为五等分，80%数据为训练数据，另外20%数据为验证数据。通过网格法进行参数寻优，以LSTM为例，先根据五组训练数据，训练得到五组最优参数，然后根据验证数据寻找出最优的参数集合。Step 3: Through cross-validation, the training set data is randomly divided into five equal parts, 80% of the data is training data, and the other 20% of the data is validation data. The grid method is used to optimize the parameters. Taking LSTM as an example, five sets of optimal parameters are first trained based on five sets of training data, and then the optimal parameter set is found based on the validation data.

步骤四：选择均方误差(MSE)作为LSTM的损失函数，利用测试集数据反向传播计算LSTM的误差值，便于之后根据误差值调整LSTM的参数。Step 4: Select mean square error (MSE) as the loss function of LSTM, and use the test set data to backpropagate and calculate the error value of LSTM, so that the parameters of LSTM can be adjusted according to the error value.

步骤五：不断进行迭代（调整LSTM参数使MSE最小），直到LSTM参数不在发生变化或者训练次数达到预设值。其中为了充分达到训练的效率，不断缩小超参数网格的范围，从而快速训练使得LSTM达到最优，最后输出LSTM参数。RF得到最后的参数的过程以及XGBoost到最后的参数的过程与LSTM获得最后参数的过程相同。最后三个算法的参数见表1，进而得到最优的多年冻土上限预测模型，从而计算得出预测结果。Step 5: Continue to iterate (adjust LSTM parameters to minimize MSE) until LSTM parameters no longer change or the number of training times reaches the preset value. In order to fully achieve the training efficiency, the range of the hyperparameter grid is continuously reduced, so that LSTM is quickly trained to reach the optimal state, and finally the LSTM parameters are output. The process of RF obtaining the final parameters and the process of XGBoost obtaining the final parameters are the same as the process of LSTM obtaining the final parameters. The parameters of the last three algorithms are shown in Table 1, and the optimal permafrost upper limit prediction model is obtained, thereby calculating the prediction results.

表1模型参数配置Table 1 Model parameter configuration

步骤六：采取基于最小绝对误差线性加权组合方式，根据RF算法预测值、LSTM算法预测值和XGBoost算法预测值，以及对应的真实值（多年冻土上限）进行计算，得到三个算法各自预测值在整体模型预测值中的占比。相较于传统直接给出比例的方法，更适应该目标路基的情况，进而使得到的结果更加准确。Step 6: Take the method of linear weighted combination based on the minimum absolute error, calculate the predicted values of the RF algorithm, the LSTM algorithm, the XGBoost algorithm, and the corresponding true value (permafrost upper limit), and get the proportion of the predicted values of the three algorithms in the overall model predicted value. Compared with the traditional method of directly giving the proportion, it is more suitable for the situation of the target roadbed, and thus makes the result more accurate.

在一些具体实施例中，随机森林算法是一种用于分类和回归任务的集成机器学习算法，同时是一种基于自助法重复抽样技术的模型。随机森林算法的步骤包括以下几个主要的步骤：In some specific embodiments, the random forest algorithm is an integrated machine learning algorithm for classification and regression tasks, and is also a model based on bootstrap repeated sampling technology. The steps of the random forest algorithm include the following main steps:

数据准备：首先，需要准备用于训练和测试模型的数据集。数据集应包含特征和对应的目标变量。特征是用于预测目标变量的属性或特性，而目标变量是需要进行回归预测的值。通常，需要将数据集划分为训练集和测试集，其中训练集用于训练模型，测试集用于评估模型的性能。Data preparation: First, you need to prepare a dataset for training and testing the model. The dataset should contain features and corresponding target variables. Features are attributes or characteristics used to predict the target variable, while the target variable is the value that needs to be predicted by regression. Usually, the dataset needs to be divided into a training set and a test set, where the training set is used to train the model and the test set is used to evaluate the performance of the model.

构建随机森林：在Scikit-learn库中，使用RandomForestRegressor类来构建随机森林回归模型。设置超参数来控制随机森林的行为，例如决策树的数量、特征选择的方式、决策树的生长方式等。根据实际问题和需求进行参数的调整。Build a random forest: In the Scikit-learn library, use the RandomForestRegressor class to build a random forest regression model. Set hyperparameters to control the behavior of the random forest, such as the number of decision trees, feature selection method, and decision tree growth method. Adjust the parameters according to actual problems and needs.

训练模型：使用训练集对随机森林回归模型进行训练。模型将根据训练集中的样本和目标变量的值来构建多棵决策树，并在每棵树上进行特征选择和划分。Training model: Use the training set to train the random forest regression model. The model will build multiple decision trees based on the samples in the training set and the value of the target variable, and perform feature selection and partitioning on each tree.

预测结果：使用训练好的随机森林回归模型对测试集中的样本进行预测。模型将对每棵决策树的预测结果进行平均或加权平均，从而得到最终的回归预测结果。Prediction results: Use the trained random forest regression model to predict the samples in the test set. The model will average or weighted average the prediction results of each decision tree to obtain the final regression prediction result.

模型评估：通过与真实目标变量的比较，评估模型的性能。可以使用各种回归性能指标，例如均方误差（Mean Squared Error, MSE）、平均绝对误差（Mean Absolute Error,MAE）、决定系数（R-squared）等来评估模型的准确性和泛化能力。Model evaluation: Evaluate the performance of the model by comparing it with the true target variable. Various regression performance indicators such as mean squared error (MSE), mean absolute error (MAE), coefficient of determination (R-squared) can be used to evaluate the accuracy and generalization ability of the model.

模型调优：根据模型评估的结果，可以对随机森林回归模型进行调优。可以尝试调整随机森林的参数，例如增加或减少决策树的数量、调整特征选择的方式、调整决策树的生长方式等，从而提高模型的性能。Model tuning: Based on the results of model evaluation, the random forest regression model can be tuned. You can try to adjust the parameters of the random forest, such as increasing or decreasing the number of decision trees, adjusting the feature selection method, adjusting the growth method of decision trees, etc., to improve the performance of the model.

模型应用：在模型评估和调优后，可以使用训练好的随机森林回归模型进行实际的预测。预测过程如图2所示。首先从训练样本中随机抽样得到D₁、D₂、D₃等多个样本集合，分别进行结果的预测，最后将多个预测值进行求均值的操作，进而得到本次输入的样本的对应的预测结果。Model application: After model evaluation and tuning, the trained random forest regression model can be used for actual prediction. The prediction process is shown in Figure 2. First, multiple sample sets such as_D1 ,_D2 , and_D3 are randomly sampled from the training samples, and the results are predicted respectively. Finally, the multiple predicted values are averaged to obtain the corresponding prediction results of the input samples.

在一些具体实施例中，长短期记忆网络过程的第一步是通过遗忘门，遗忘门确定存储单元单元从先前状态丢弃哪些信息。的公式（1）如下：In some embodiments, the first step of the LSTM process is to pass the forget gate , the Forget Gate Determines which information the memory cell discards from the previous state. The formula (1) is as follows:

其中，表示遗忘门并取[0，1]中的值，是逻辑S形函数，和是可调权重矩阵，为偏置向量。其中，为本次的输入量（即多组验证数据中的历史环境数据）。in, represents the forget gate and takes a value in [0, 1], is the logistic sigmoid function, and is an adjustable weight matrix, is the bias vector. is the input amount for this time (i.e., the historical environmental data in multiple sets of verification data).

下一步骤确定将哪些信息添加到存储器单元单元中以进行更新。输入门中的sigmoid函数决定更新哪些值，层生成潜在的更新向量；和的计算公式为（2）和（3）：The next step determines what information to add to the memory cell for update. The sigmoid function in the input gate determines which values to update. The layer generates a potential update vector ; and The calculation formulas are (2) and (3):

其中，代表输入门，取（0，1）中的值，、和是为输入门定义的一系列可学习参数。、和是另一系列可学习的参数。in, represents the input gate, taking the value in (0, 1), , and is a set of learnable parameters defined for the input gate. , and is another set of learnable parameters.

在决定了丢弃和保留的信息之后，小区状态被更新并计算的公式为（4）：After deciding which information to discard and which to keep, the cell status The updated and calculated formula is (4):

其中⊙表示按元素相乘，定义存储在中的哪些信息将被遗忘，并且定义哪些新信息将被添加到单元状态。Where ⊙ represents element-wise multiplication, Definitions are stored in what information in the Defines what new information will be added to the cell state .

最后一步计算输出门，它确定隐藏状态。输出由sigmoid函数计算；输出（即第三预测值）是通过将和输出相乘而获得的，如等式（6）：The last step is to calculate the output gate , which determines the hidden state Output Calculated by sigmoid function; output (i.e., the third predicted value) is obtained by and Output The multiplication is as follows, as shown in equation (6):

其中，是具有范围为（0，1）的值的向量，、和是为输入门定义的三个可学习参数。in, is a vector with values in the range (0, 1), , and are three learnable parameters defined for the input gate.

在一些具体实施例中，XGBoost（极限梯度提升）是一种流行且强大的机器学习算法，用于分类和回归任务。它是梯度提升算法的扩展，旨在提高大规模数据集的性能和速度。回归树为基学习器，对每一棵回归树寻求一个最优解，逐步逼近优化损失函数。In some specific embodiments, XGBoost (Extreme Gradient Boosting) is a popular and powerful machine learning algorithm for classification and regression tasks. It is an extension of the gradient boosting algorithm, designed to improve performance and speed for large-scale data sets. Regression trees are used as base learners, and an optimal solution is sought for each regression tree, gradually approaching the optimized loss function.

其中，代表第i个样本的预测值（即第二预测值）；是第j棵回归树，k为机器学习器的个数；代表第i个样本的特征值（即多组验证数据中的历史环境数据）；i指第i个样本；为前面k-1棵回归树累加起来的结果；当前需要优化的k个回归树。in, Represents the predicted value of thei -th sample (i.e., the second predicted value); is thejth regression tree,k is the number of machine learning machines; Represents the characteristic value of thei -th sample (i.e., historical environmental data in multiple sets of validation data);i refers to thei -th sample; It is the cumulative result of the previousk -1 regression trees; The number ofk regression trees that need to be optimized.

XGBoost算法中，目标函数主要由两个部分组成，一部分是任意可微的损失函数，用来控制模型的经验风险。另一部分是控制模型复杂度的，使用它来控制当前树的结构风险。In the XGBoost algorithm, the objective function is mainly composed of two parts. One part is an arbitrarily differentiable loss function, which is used to control the empirical risk of the model. The other part is to control the complexity of the model. , use it to control the structural risk of the current tree.

为个样本的预测值（即第二预测值）；为真实值（即第i组验证数据中的历史环境数据中的实际的多年冻土上限）；N为样本数；为损失函数；为正则项，；正则项中为超参数控制罚力度，T为当前回归树的叶子节点个数，W为每个叶子节点的值。 for The predicted value of samples (i.e., the second predicted value); is the true value (i.e., the actual upper limit of permafrost in the historical environmental data in thei- th group of validation data);N is the number of samples; is the loss function; is the regularization term, ; In the regular term is the penalty strength controlled by the hyperparameter,T is the number of leaf nodes in the current regression tree, andW is the value of each leaf node.

将目标函数以样本作为目标拆分为以叶子节点作为目标，将二阶泰勒展开公式代入目标函数得：The objective function is split into leaf nodes as the target with samples as the target, and the second-order Taylor expansion formula is substituted into the objective function to obtain:

其中，，，，。in,,,, .

最小化目标函数求解得到叶子节点预测值；最小化目标函数：Minimize the objective function and solve it Leaf node prediction value; Minimize the objective function:

节点如何分裂是决策树生长过程的关键问题。XGBoost 算法采取贪心策略。针对每一个叶子节点的划分，计算结构分数增益，选取增益最大点进行分枝。How to split nodes is a key issue in the decision tree growth process. The XGBoost algorithm adopts a greedy strategy. For each leaf node division, the structural score gain is calculated and the point with the maximum gain is selected for branching.

式中为左节点结构分数，为右节点结构分数，为父节点的结构分数。In the formula is the left node structure score, is the right node structurescore, is the structural score of the parent node.

示例性的，基于多组验证数据，计算随机森林算法的权重、极端梯度提升算法的权重和长短期记忆网络的权重，可以包括：Exemplarily, based on multiple sets of validation data, calculating the weights of a random forest algorithm, the weights of an extreme gradient boosting algorithm, and the weights of a long short-term memory network may include:

示例性的，基于最小绝对误差线性加权法、多个第一预测值、多个第二预测值和多个第三预测值，计算得到随机森林算法的权重、极端梯度提升算法的权重和长短期记忆网络的权重，可以包括：Exemplarily, based on the minimum absolute error linear weighted method, multiple first prediction values, multiple second prediction values, and multiple third prediction values, calculating the weight of the random forest algorithm, the weight of the extreme gradient boosting algorithm, and the weight of the long short-term memory network may include:

第一公式可以为：The first formula can be:

在一些具体实施例中，基于最小绝对误差线性加权法、多个第一预测值、多个第二预测值和多个第三预测值，进行权重的计算，更加能够体现出数据之间的关系，并非简单的赋值权重，权重更加合理，进而保证预测多年冻土上限的准确性。In some specific embodiments, weights are calculated based on the minimum absolute error linear weighted method, multiple first prediction values, multiple second prediction values, and multiple third prediction values, which can better reflect the relationship between the data. Instead of simply assigning weights, the weights are more reasonable, thereby ensuring the accuracy of predicting the upper limit of permafrost.

步骤103，基于影响目标时段多年冻土上限数据的环境数据和完成训练的多年冻土上限预测模型，预测目标路基在目标时段的多年冻土上限；其中，目标时段为未来任意时间段。Step 103, based on the environmental data that affects the permafrost upper limit data of the target period and the trained permafrost upper limit prediction model, predict the permafrost upper limit of the target roadbed in the target period; wherein the target period is any time period in the future.

为了更好的展示出本方案预测结果的准确性，如图3所示，与实际值最接近的为本方案构建的预测模型的预测结果。还可以从均方误差(Mean Squared Error，MSE)、平均绝对百分比误差( Mean Absolute Percentage Error，MAPE) 、均方根误差( Root MeanSquare Error，RMSE) 、平均绝对误差( Mean Absolute Error，MAE)和确定系数(R²)来对单独使用RF、单独使用XGBoost、单独使用LSTM和本方案的预测模型四种预测多年冻土上限的方法进行评价，评价结果见表2，不难看出从多个评价角度来说，本方案的准确度相对于其他方案更好更准确。In order to better demonstrate the accuracy of the prediction results of this scheme, as shown in Figure 3, the prediction results of the prediction model constructed by this scheme are closest to the actual value. The four methods of predicting the upper limit of permafrost, namely, using RF alone, using XGBoost alone, using LSTM alone, and the prediction model of this scheme, can also be evaluated from the mean square error (MSE), mean absolute percentage error (MAPE), root mean square error (RMSE), mean absolute error (MAE) and determination coefficient (^R2 ). The evaluation results are shown in Table 2. It is not difficult to see that from multiple evaluation perspectives, the accuracy of this scheme is better and more accurate than other schemes.

表2 模型评价指标Table 2 Model evaluation indicators

在机器学习领域，组合模型因其卓越性能而广受欢迎。然而，这类模型常被批评为“黑箱”，因为难以解释输入变量与预测目标之间的内在联系。为了克服这一挑战，研究人员提出了使用SHAP（SHapley Additive exPlanations，解释机器学习）解释器来阐明每个特征对预测的具体贡献。SHAP提供了一种计算各个特征的附加重要性得分的方法，这些得分体现了各因素在最终组合模型预测中的重要程度。SHAP是一种模型事后解释方法，核心在于评估特征对模型输出的边际贡献。如果一个特征的贡献值为正，这说明它对预测结果有正向作用；若贡献值为负，则表明该特征导致预测值下降，起到了反作用。SHAP值的最大优势在于，它不仅可以揭示每个样本的特征影响力，还能明确显示这种影响是积极还是消极。这种方法极大地增强了模型的可解释性，帮助研究者和实践者更好地理解和信任他们的机器学习模型。In the field of machine learning, combination models are popular for their excellent performance. However, such models are often criticized as "black boxes" because it is difficult to explain the intrinsic connection between input variables and prediction targets. To overcome this challenge, researchers proposed using the SHAP (SHapley Additive exPlanations, Explain Machine Learning) interpreter to clarify the specific contribution of each feature to the prediction. SHAP provides a method to calculate the additional importance scores of each feature, which reflects the importance of each factor in the prediction of the final combination model. SHAP is a model post-interpretation method, the core of which is to evaluate the marginal contribution of features to the model output. If a feature has a positive contribution value, it means that it has a positive effect on the prediction result; if the contribution value is negative, it means that the feature causes the prediction value to decrease and has a counter-effect. The biggest advantage of the SHAP value is that it not only reveals the feature influence of each sample, but also clearly shows whether this influence is positive or negative. This method greatly enhances the interpretability of the model and helps researchers and practitioners better understand and trust their machine learning models.

在多年冻土上限预测模型的研究中，为确保不同特征重要性值具有可比性和可解释性，将多年冻土上限设定为目标参数。通过对特征重要性处理和分析，确定了多年冻土上限预测模型中各个影响预测结果的变量的重要性排序。在一具体实施例中，总体重要性分析显示，日平均温度对冻土上限的预测影响最大，其次是地表热流通量，而地表温度的贡献相对较小。此外，通过应用SHAP技术进行的个体影响分析进一步表明，日平均气温和地表热流的监测值对冻土上限的预测产生了较大的积极贡献，而地表温度的样本点对预测结果的贡献则相对较弱。这种方法也能从侧面验证相关系数计算的结果是否准确，进而更加稳定的保证预测结果的准确性。In the study of the permafrost upper limit prediction model, in order to ensure the comparability and interpretability of different feature importance values, the permafrost upper limit is set as the target parameter. By processing and analyzing the feature importance, the importance ranking of each variable affecting the prediction result in the permafrost upper limit prediction model is determined. In a specific embodiment, the overall importance analysis shows that the daily average temperature has the greatest impact on the prediction of the permafrost upper limit, followed by the surface heat flux, while the contribution of the surface temperature is relatively small. In addition, the individual impact analysis performed by applying the SHAP technology further shows that the monitoring values of the daily average temperature and the surface heat flux have made a greater positive contribution to the prediction of the permafrost upper limit, while the sample points of the surface temperature have a relatively weak contribution to the prediction results. This method can also verify from the side whether the results of the correlation coefficient calculation are accurate, thereby more stably ensuring the accuracy of the prediction results.

上述多年冻土上限预测方法，多年冻土上限预测模型使用了随机森林算法、极端梯度提升算法和长短期记忆网络，随机森林算法通过构建多个决策树，利用训练集和测试集中的随机子集进行训练，从而有效处理多年冻土上限数据中的不确定性和复杂性。长短期记忆网络则能够捕捉训练集和测试集数据中的时间相关性，考虑多年冻土上限与时间变化之间的关系。而极端梯度提升算法通过梯度提升的方式进一步提高了预测模型的性能，使其更适用于不同环境条件下的准确预测。更加全面的考虑了影响多年冻土上限的因素，并根据这些因素训练得到了多年冻土上限预测模型，多年冻土上限预测模型能够充分考虑数据的复杂性、数据的不确定性、时序变化以及非线性关系的限制，预测效率更高，预测结果更加准确。The above-mentioned permafrost upper limit prediction method uses the random forest algorithm, extreme gradient boosting algorithm and long short-term memory network in the permafrost upper limit prediction model. The random forest algorithm constructs multiple decision trees and uses random subsets in the training set and test set for training, thereby effectively handling the uncertainty and complexity in the permafrost upper limit data. The long short-term memory network can capture the temporal correlation in the training set and test set data, and consider the relationship between the permafrost upper limit and time changes. The extreme gradient boosting algorithm further improves the performance of the prediction model by gradient boosting, making it more suitable for accurate prediction under different environmental conditions. The factors affecting the permafrost upper limit are considered more comprehensively, and the permafrost upper limit prediction model is obtained by training based on these factors. The permafrost upper limit prediction model can fully consider the complexity of the data, the uncertainty of the data, the temporal changes and the limitations of nonlinear relationships, and the prediction efficiency is higher and the prediction results are more accurate.

应理解，上述实施例中各步骤的序号的大小并不意味着执行顺序的先后，各过程的执行顺序应以其功能和内在逻辑确定，而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the serial numbers of the steps in the above embodiments does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.

对应于上文实施例所述的多年冻土上限预测方法，图4示出了本申请实施例提供的多年冻土上限预测装置的结构框图，为了便于说明，仅示出了与本申请实施例相关的部分。Corresponding to the permafrost upper limit prediction method described in the above embodiment, FIG4 shows a structural block diagram of the permafrost upper limit prediction device provided in the embodiment of the present application. For the sake of ease of explanation, only the parts related to the embodiment of the present application are shown.

参见图4，本申请实施例中的多年冻土上限预测装置可以包括：Referring to FIG. 4 , the permafrost upper limit prediction device in the embodiment of the present application may include:

数据获取模块201，用于获取目标路基的历史多年冻土上限数据，以及影响历史多年冻土上限数据的历史环境数据，得到训练集和测试集。The data acquisition module 201 is used to acquire the historical permafrost upper limit data of the target roadbed and the historical environmental data that affects the historical permafrost upper limit data, and obtain a training set and a test set.

模型训练模块202，用于根据训练集和测试集训练多年冻土上限预测模型；其中，多年冻土上限预测模型为随机森林算法、极端梯度提升算法和长短期记忆网络结合形成的预测模型。The model training module 202 is used to train the permafrost upper limit prediction model based on the training set and the test set; wherein the permafrost upper limit prediction model is a prediction model formed by combining the random forest algorithm, the extreme gradient boosting algorithm and the long short-term memory network.

结果输出模块203，用于基于影响目标时段多年冻土上限数据的环境数据和完成训练的多年冻土上限预测模型，预测目标路基在目标时段的多年冻土上限；其中，目标时段为未来任意时间段。The result output module 203 is used to predict the permafrost upper limit of the target roadbed in the target period based on the environmental data that affects the permafrost upper limit data of the target period and the trained permafrost upper limit prediction model; wherein the target period is any time period in the future.

示例性的，模型训练模块202可以用于：Exemplarily, the model training module 202 may be used to:

示例性的，模型训练模块202还可以用于：Exemplarily, the model training module 202 may also be used to:

第一公式可以为：The first formula can be:

示例性的，数据获取模块201可以用于：Exemplarily, the data acquisition module 201 may be used for:

将第一数据集合按照预设比例进行划分，得到训练集和测试集。The first data set is divided according to a preset ratio to obtain a training set and a test set.

示例性的，数据获取模块201还可以用于：Exemplarily, the data acquisition module 201 may also be used for:

第二公式可以为：The second formula can be:

需要说明的是，上述装置/单元之间的信息交互、执行过程等内容，由于与本申请方法实施例基于同一构思，其具体功能及带来的技术效果，具体可参见方法实施例部分，此处不再赘述。It should be noted that the information interaction, execution process, etc. between the above-mentioned devices/units are based on the same concept as the method embodiment of the present application. Their specific functions and technical effects can be found in the method embodiment part and will not be repeated here.

所属领域的技术人员可以清楚地了解到，为了描述的方便和简洁，仅以上述各功能单元、模块的划分进行举例说明，实际应用中，可以根据需要而将上述功能分配由不同的功能单元、模块完成，即将所述装置的内部结构划分成不同的功能单元或模块，以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中，上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。另外，各功能单元、模块的具体名称也只是为了便于相互区分，并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。The technicians in the relevant field can clearly understand that for the convenience and simplicity of description, only the division of the above-mentioned functional units and modules is used as an example for illustration. In practical applications, the above-mentioned function allocation can be completed by different functional units and modules as needed, that is, the internal structure of the device can be divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiment can be integrated in a processing unit, or each unit can exist physically separately, or two or more units can be integrated in one unit. The above-mentioned integrated unit can be implemented in the form of hardware or in the form of software functional units. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing each other, and are not used to limit the scope of protection of this application. The specific working process of the units and modules in the above-mentioned system can refer to the corresponding process in the aforementioned method embodiment, which will not be repeated here.

本申请实施例还提供了一种终端设备，参见图5，该终端设备300可以包括：至少一个处理器310、存储器320，该存储器320用于存储计算机程序321，所述处理器310用于调用并运行所述存储器320中存储的计算机程序321实现上述任意各个方法实施例中的步骤，例如图1所示实施例中的步骤101至步骤103。或者，处理器310执行所述计算机程序时实现上述各装置实施例中各模块/单元的功能，例如图4所示各模块的功能。The embodiment of the present application also provides a terminal device, see FIG5, the terminal device 300 may include: at least one processor 310, memory 320, the memory 320 is used to store a computer program 321, the processor 310 is used to call and run the computer program 321 stored in the memory 320 to implement the steps in any of the above-mentioned method embodiments, such as steps 101 to 103 in the embodiment shown in FIG1. Alternatively, when the processor 310 executes the computer program, the functions of the modules/units in the above-mentioned device embodiments are implemented, such as the functions of the modules shown in FIG4.

示例性的，计算机程序321可以被分割成一个或多个模块/单元，一个或者多个模块/单元被存储在存储器320中，并由处理器310执行，以完成本申请。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机程序段，该程序段用于描述计算机程序在终端设备300中的执行过程。Exemplarily, the computer program 321 may be divided into one or more modules/units, one or more modules/units are stored in the memory 320, and are executed by the processor 310 to complete the present application. The one or more modules/units may be a series of computer program segments that can complete specific functions, and the program segments are used to describe the execution process of the computer program in the terminal device 300.

本领域技术人员可以理解，图5仅仅是终端设备的示例，并不构成对终端设备的限定，可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件，例如输入输出设备、网络接入设备、总线等。Those skilled in the art will understand that FIG5 is merely an example of a terminal device and does not constitute a limitation on the terminal device. The terminal device may include more or fewer components than shown in the figure, or a combination of certain components, or different components, such as input and output devices, network access devices, buses, etc.

处理器310可以是中央处理单元(Central Processing Unit，CPU)，还可以是其他通用处理器、数字信号处理器 (Digital Signal Processor，DSP)、专用集成电路(Application Specific Integrated Circuit，ASIC)、现成可编程门阵列 (Field-Programmable Gate Array，FPGA) 或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The processor 310 may be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSP), application-specific integrated circuits (ASIC), field-programmable gate arrays (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor or any conventional processor, etc.

存储器320可以是终端设备的内部存储单元，也可以是终端设备的外部存储设备，例如插接式硬盘，智能存储卡（Smart Media Card，SMC），安全数字（Secure Digital，SD）卡，闪存卡（Flash Card）等。所述存储器320用于存储所述计算机程序以及终端设备所需的其他程序和数据。所述存储器320还可以用于暂时地存储已经输出或者将要输出的数据。The memory 320 may be an internal storage unit of the terminal device, or an external storage device of the terminal device, such as a plug-in hard disk, a smart media card (SMC), a secure digital (SD) card, a flash card, etc. The memory 320 is used to store the computer program and other programs and data required by the terminal device. The memory 320 may also be used to temporarily store data that has been output or is to be output.

总线可以是工业标准体系结构（Industry Standard Architecture，ISA）总线、外部设备互连（Peripheral Component，PCI）总线或扩展工业标准体系结构（ExtendedIndustry Standard Architecture，EISA）总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示，本申请附图中的总线并不限定仅有一根总线或一种类型的总线。The bus can be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus, etc. The bus can be divided into an address bus, a data bus, a control bus, etc. For ease of representation, the bus in the drawings of this application is not limited to only one bus or one type of bus.

本申请实施例提供的多年冻土上限预测方法可以应用于计算机、可穿戴设备、车载设备、平板电脑、笔记本电脑、上网本等终端设备上，本申请实施例对终端设备的具体类型不作任何限制。The permafrost upper limit prediction method provided in the embodiment of the present application can be applied to terminal devices such as computers, wearable devices, vehicle-mounted devices, tablet computers, laptops, netbooks, etc. The embodiment of the present application does not impose any restrictions on the specific type of terminal devices.

本申请实施例还提供了一种计算机可读存储介质，所述计算机可读存储介质存储有计算机程序，所述计算机程序被处理器执行时实现可实现上述多年冻土上限预测方法各个实施例中的步骤。An embodiment of the present application further provides a computer-readable storage medium storing a computer program. When the computer program is executed by a processor, the steps in each embodiment of the above-mentioned permafrost upper limit prediction method can be implemented.

本申请实施例提供了一种计算机程序产品，当计算机程序产品在移动终端上运行时，使得移动终端执行时实现可实现上述多年冻土上限预测方法各个实施例中的步骤。An embodiment of the present application provides a computer program product. When the computer program product is run on a mobile terminal, the mobile terminal can implement the steps in each embodiment of the above-mentioned permafrost upper limit prediction method when executing the computer program product.

所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请实现上述实施例方法中的全部或部分流程，可以通过计算机程序来指令相关的硬件来完成，所述的计算机程序可存储于一计算机可读存储介质中，该计算机程序在被处理器执行时，可实现上述各个方法实施例的步骤。其中，所述计算机程序包括计算机程序代码，所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质至少可以包括：能够将计算机程序代码携带到拍照装置/终端设备的任何实体或装置、记录介质、计算机存储器、只读存储器（ROM，Read-Only Memory）、随机存取存储器（RAM，RandomAccess Memory）、电载波信号、电信信号以及软件分发介质。例如U盘、移动硬盘、磁碟或者光盘等。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the present application implements all or part of the processes in the above-mentioned embodiment method, which can be completed by instructing the relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium. When the computer program is executed by the processor, the steps of the above-mentioned method embodiments can be implemented. Among them, the computer program includes computer program code, and the computer program code can be in source code form, object code form, executable file or some intermediate form. The computer-readable medium may at least include: any entity or device that can carry the computer program code to the camera device/terminal device, recording medium, computer memory, read-only memory (ROM, Read-Only Memory), random access memory (RAM, RandomAccess Memory), electrical carrier signal, telecommunication signal and software distribution medium. For example, a USB flash drive, a mobile hard disk, a disk or an optical disk.

在上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述或记载的部分，可以参见其它实施例的相关描述。In the above embodiments, the description of each embodiment has its own emphasis. For parts that are not described or recorded in detail in a certain embodiment, reference can be made to the relevant descriptions of other embodiments.

本领域普通技术人员可以意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Professional and technical personnel can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of this application.

在本申请所提供的实施例中，应该理解到，所揭露的装置/网络设备和方法，可以通过其它的方式实现。例如，以上所描述的装置/网络设备实施例仅仅是示意性的，例如，所述模块或单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口，装置或单元的间接耦合或通讯连接，可以是电性，机械或其它的形式。In the embodiments provided in the present application, it should be understood that the disclosed devices/network equipment and methods can be implemented in other ways. For example, the device/network equipment embodiments described above are merely schematic. For example, the division of the modules or units is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

以上所述实施例仅用以说明本申请的技术方案，而非对其限制；尽管参照前述实施例对本申请进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围，均应包含在本申请的保护范围之内。The embodiments described above are only used to illustrate the technical solutions of the present application, rather than to limit them. Although the present application has been described in detail with reference to the aforementioned embodiments, a person skilled in the art should understand that the technical solutions described in the aforementioned embodiments may still be modified, or some of the technical features may be replaced by equivalents. Such modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present application, and should all be included in the protection scope of the present application.