CN117977568A

Movatterモバイル変換

Info

Publication number: CN117977568A
Application number: CN202410049336.0A
Authority: CN
Inventors: 李丹; 张远航; 孙光帆; 杨保华; 王奇; 缪书唯; 李振兴; 刘颂凯
Original assignee: China Three Gorges University CTGU
Current assignee: China Three Gorges University CTGU
Priority date: 2020-10-13
Filing date: 2020-10-13
Publication date: 2024-05-03
Also published as: CN112232561B; CN112232561A

Abstract

The invention discloses a power load prediction method based on nested LSTM and quantile calculation, which comprises the steps of collecting load power and influence factor data of a plurality of sample days to form a data set; establishing a nested LSTM model, and pre-training each quantile LSTM in the nested LSTM model to obtain a weight and bias parameter set; performing overall training on the nested LSTM model, and performing fine adjustment on the weight and the bias parameter in the training process to determine the optimal weight and the bias parameter of the nested LSTM model; inputting the verification set into a trained nested LSTM model, and selecting the optimal super parameters of the model according to the verification error; and inputting the test sample into a nested LSTM model with the optimal super parameters, and performing inverse normalization on a prediction result output by the nested LSTM model. According to the invention, the nested LSTM model is adopted to carry out quantile regression prediction of the power load, so that the probability distribution of the predicted load is more reasonable, and the intersection between quantile predicted values is avoided.

Description

Translated fromChinese

基于嵌套LSTM和分位数计算的电力负荷预测方法Power load forecasting method based on nested LSTM and quantile calculation

技术领域Technical Field

本发明属于电力负荷预测领域，具体涉及一种基于嵌套LSTM和分位数计算的电力负荷预测方法。The present invention belongs to the field of power load forecasting, and in particular relates to a power load forecasting method based on nested LSTM and quantile calculation.

背景技术Background technique

短期电力负荷预测是电力系统安全、经济运行的基础，为电力系统规划和运行、能源交易、机组启停、经济调度等提供重要信息。提高负荷预测的准确率有助于提高电力设备的利用率，并最大程度地减少能源浪费。Short-term power load forecasting is the basis for safe and economical operation of power systems, and provides important information for power system planning and operation, energy trading, unit start-up and shutdown, economic dispatch, etc. Improving the accuracy of load forecasting can help improve the utilization rate of power equipment and minimize energy waste.

目前，负荷概率预测方法主要包括区间估计、核密度估计和分位数回归等。前两种方法主要基于点预测误差的参数统计估计概率分布，而分位数回归可直接解释不同分位点下响应变量和因变量之间的关系，成为近年来负荷概率预测文献关注的热点。但是，分位数回归的分位数预测值存在交叉的现象而导致不合理。At present, load probability forecasting methods mainly include interval estimation, kernel density estimation and quantile regression. The first two methods are mainly based on the parameter statistics of point prediction errors to estimate the probability distribution, while quantile regression can directly explain the relationship between the response variable and the dependent variable at different quantile points, and has become a hot topic in the literature on load probability forecasting in recent years. However, the quantile prediction values of quantile regression have the phenomenon of crossing, which leads to unreasonableness.

负荷概率预测方法较多是结合机器学习算法和分位数回归法，构建分位数模型。然而传统的机器学习算法，往往需要利用特征工程对数据进行处理。与传统的机器学习方法相比，深度学习神经网络被证明在应对大数据集的短期负荷预测方面更加有效。尤其是长短期记忆(long short-term memory, LSTM)神经网络，如附图2所示，因为其对时间序列形式的数据具有强适应性而得到了广泛应用。Most load probability forecasting methods combine machine learning algorithms and quantile regression methods to build quantile models. However, traditional machine learning algorithms often require feature engineering to process data. Compared with traditional machine learning methods, deep learning neural networks have been proven to be more effective in dealing with short-term load forecasting of large data sets. In particular, long short-term memory (LSTM) neural networks, as shown in Figure 2, have been widely used because of their strong adaptability to time series data.

因此，研究一种基于嵌套LSTM神经网络分位数回归的短期电力负荷概率预测方法。Therefore, a short-term power load probability forecasting method based on nested LSTM neural network quantile regression is studied.

发明内容Summary of the invention

本发明的技术问题是现有的电力负荷的分位数回归方法的分位数预测值存在交叉的现象而导致不合理。The technical problem of the present invention is that the quantile prediction values of the existing power load quantile regression method have a crossover phenomenon, which leads to unreasonableness.

本发明的目的是解决上述问题，提供一种基于嵌套LSTM和分位数计算的电力负荷预测方法，将LSTM的鲁棒性和记忆特性与分位数回归的概率预测功能相结合，并考虑预测负荷概率分位数的固有特性，加入考虑分位数预测值之间约束关系的组合层，构建嵌套LSTM即约束并行长短期记忆网络模型（(constrained parallel Long-Short TermMemory，CP-LSTM）进行电力负荷的分位数回归预测，使得预测负荷概率分布更合理，避免分位数预测值之间的交叉。The purpose of the present invention is to solve the above problems and provide a power load forecasting method based on nested LSTM and quantile calculation, which combines the robustness and memory characteristics of LSTM with the probability prediction function of quantile regression, considers the inherent characteristics of the predicted load probability quantile, adds a combination layer that considers the constraint relationship between quantile prediction values, and constructs a nested LSTM, i.e., a constrained parallel Long-Short Term Memory network model (constrained parallel Long-Short Term Memory, CP-LSTM) to perform quantile regression prediction of power load, so that the predicted load probability distribution is more reasonable and the intersection between quantile prediction values is avoided.

本发明的技术方案是基于嵌套LSTM和分位数计算的电力负荷预测方法，包括以下步骤，The technical solution of the present invention is a power load forecasting method based on nested LSTM and quantile calculation, comprising the following steps:

步骤1：收集多个样本日的负荷功率和影响因素数据，形成数据集并分为训练集、验证集和测试集；Step 1: Collect load power and influencing factor data for multiple sample days to form a data set and divide it into a training set, a validation set, and a test set;

步骤2：建立嵌套LSTM模型，设置模型超参数；采用并行式训练方法，对嵌套LSTM模型中各个分位点下并行LSTM进行预训练，获得全局参数集{W(τi),b(τi)}_opt；Step 2: Establish a nested LSTM model and set model hyperparameters; use a parallel training method to pre-train the parallel LSTM at each quantile in the nested LSTM model to obtain the global parameter set {W (τi ),b (τi )}_opt ;

步骤3：将得到全局参数集{W(τi),b(τi)}_opt作为嵌套LSTM模型的初始参数，对嵌套LSTM模型进行整体训练，训练过程中对权重、偏置参数进行微调，确定嵌套LSTM模型的最佳权重、偏置参数；Step 3: Use the global parameter set {W (τi ),b (τi )}_opt as the initial parameters of the nested LSTM model, train the nested LSTM model as a whole, fine-tune the weight and bias parameters during the training process, and determine the optimal weight and bias parameters of the nested LSTM model;

步骤4：将验证集输入训练好的嵌套LSTM模型，根据验证误差选出模型的最佳超参数；Step 4: Input the validation set into the trained nested LSTM model and select the best hyperparameters of the model based on the validation error;

步骤5：将测试样本输入具有最佳超参数的嵌套LSTM模型，对嵌套LSTM模型输出的预测结果进行反归一化，得到预测日中各时刻预测负荷的多个分位数预测值；Step 5: Input the test sample into the nested LSTM model with the best hyperparameters, denormalize the prediction results output by the nested LSTM model, and obtain multiple quantile prediction values of the predicted load at each time during the prediction day;

步骤6：根据步骤5得到的预测负荷的多个分位数，计算得到预测点的概率密度曲线。Step 6: Based on the multiple quantiles of the predicted load obtained in step 5, calculate the probability density curve of the predicted point.

优选地，步骤1还包括对数据集的各类数据进行归一化，将数据变量归一化到[-1,1]区间。Preferably, step 1 further comprises normalizing various types of data in the data set, normalizing the data variables to the interval [-1, 1].

具体地，对样本日收集0时~24时相邻时间点间隔15分钟的96点负荷功率数据，选取预测前日的96点负荷功率与预测日的24时刻气温和分区降雨量组成多维特征输入变量向量，以预测日的96点负荷分位数作为输出变量向量，输入变量X_d=[T_d,R_d]，气温T_d=[T₁,T₂,…, T₂₄]_d，其中T_i, i∈{1,2,…,24}表示i时测取的天气温度，降雨量R_d=[R₁, R₂,…,R_M]_d，其中R_j, j∈{1,2,…,M}表示预测地区的第j个子区域的降雨，d∈{1,2,…,D}，D为历史样本总天数，M为预测地区所包含的子区域数。Specifically, 96 load power data at adjacent time points with an interval of 15 minutes from 0 to 24 o’clock are collected on the sample day. The 96 load power points on the day before the prediction and the temperature and rainfall at 24 o’clock on the prediction day are selected to form a multidimensional feature input variable vector. The 96 load quantiles on the prediction day are used as the output variable vector. The input variable Xd = [Td, Rd], the temperature Td = [T1,_T2_,…,_T24]_d,whereTi_,i∈_{1,2_,_…,24} represents the weather temperature measured at time i, and the rainfall Rd =[_R1,_R2,_…,_RM]d_, whereRj_, j∈{1,2,…,M} represents the rainfall_in the jth sub-region of the prediction area,d∈ {1,2,…,D },D is the total number of historical sample days, andM is the number of sub-regions contained in the prediction area.

步骤2中，所述模型超参数包括神经元的数量m、样本的时窗长度l、节点数n和惩罚项参数λ₁、λ₂。In step 2, the model hyperparameters include the number of neuronsm , the time window lengthl of the sample, the number of nodesn , and penalty term parametersλ₁and λ₂ .

优选地，所述并行式训练通过GPU分布式计算实现，将训练集均等分为多个子集，分配到计算系统的各个节点，每个计算节点负责处理该数据集的一个不同子集，从而减少神经网络训练总时间，将每个节点训练得到的参数集合，运用梯度下降公式计算新的全局权重集，进而分配给计算系统的每一个节点，其公式为：Preferably, the parallel training is implemented through GPU distributed computing, the training set is equally divided into multiple subsets, and distributed to each node of the computing system, each computing node is responsible for processing a different subset of the data set, thereby reducing the total training time of the neural network, and the parameter set obtained by each node training is used to calculate a new global weight set using the gradient descent formula, and then distributed to each node of the computing system, the formula is:

其中Z_φ={W, b}^(φ)为第φ次迭代训练得到的全局参数集，△Z_φ，j为第φ次迭代训练得到的第j个计算节点的参数梯度，n为计算节点的总个数，为缩放系数。whereZ_φ ={W, b }^(φ) is the global parameter set obtained from theφth iteration training, △Z_φ,j is the parameter gradient of thejth computing node obtained from theφth iteration training,n is the total number of computing nodes, is the scaling factor.

步骤3中，所述对权重、偏置参数进行微调，根据损失函数，运用梯度下降算法对权重、偏置参数进行微调整。In step 3, the weight and bias parameters are fine-tuned by using a gradient descent algorithm according to the loss function.

优选地，所述计算得到预测点的概率密度曲线采用高斯核密度估计方法。Preferably, the probability density curve of the predicted points is calculated using a Gaussian kernel density estimation method.

优选地，步骤1按照8:1:1的比例，将数据集划分为训练集、验证集和测试集。Preferably, in step 1, the data set is divided into a training set, a validation set and a test set in a ratio of 8:1:1.

优选地，步骤5的预测结果采用计及分位数约束关系的评价指标评价分位数的交叉情况，由分位数的固有属性可知，t时刻的分位数预测值应满足Preferably, the prediction result of step 5 uses an evaluation index that takes into account the quantile constraint relationship to evaluate the crossover of quantiles. According to the inherent properties of quantiles, the quantile prediction value at timet should satisfy

， ,

计及分位数约束关系的指标如下：The indicators that take into account the quantile constraints are as follows:

其中表示计及分位数约束关系的评价指标值；/>是t时刻分位点下的预测值，N为全部测试时刻数，v_t,i为约束违反程度函数，θ=τ_i+1-τ_i为分位点之间的步长，是一个常数；当相邻分位数之间满足约束关系时，v_t,i为0，而当违背约束关系时，v_t,i为相邻分位数的正差值，反映约束违反的程度。系数项2θ/N为分位数约束误差平方的归一化系数，由此计算出的X_CS为v_t,i在整个测试集样本和全部相邻分位数上的归一化均方根。故可通过X_CS来量化反映分位数的交叉情况。in Indicates the evaluation index value taking into account the quantile constraint relationship;/> is the predicted value under the quantile at time t, N is the total number of test moments,v_t,i is the constraint violation degree function,θ =τ_i+1 -τ_i is the step length between quantiles, which is a constant; when the constraint relationship is satisfied between adjacent quantiles,v_t,i is 0, and when the constraint relationship is violated,v_t,i is the positive difference between adjacent quantiles, reflecting the degree of constraint violation. The coefficient term 2θ /N is the normalized coefficient of the square of the quantile constraint error, and the calculatedX_CS is the normalized root mean square ofv_t,i on the entire test set sample and all adjacent quantiles. Therefore,X_CS can be used to quantitatively reflect the crossover of quantiles.

当概率预测评价指标X_QS和X_CS同时较低时，表明预测得到的分位数具有更好的性能，本发明将两者结合，组成综合评价指标X_QCS：When the probability prediction evaluation indexX_QS andX_CS are both low, it indicates that the predicted quantile has better performance. The present invention combines the two to form a comprehensive evaluation indexX_QCS :

相比现有技术，本发明的有益效果包括：Compared with the prior art, the beneficial effects of the present invention include:

1）本发明采用嵌套LSTM模型进行电力负荷的分位数回归预测，使得预测负荷概率分布更合理，避免分位数预测值之间的交叉。1) The present invention adopts a nested LSTM model to perform quantile regression prediction of power load, making the predicted load probability distribution more reasonable and avoiding the intersection between quantile prediction values.

2）采用并行式训练方法，对嵌套LSTM模型中各个分位点LSTM进行预训练，获得权重、偏置参数集作为嵌套LSTM模型的初始参数，再进行整体训练，对权重、偏置参数进行微调，得到嵌套LSTM模型的最佳权重、偏置参数，使得模型预测效率更高，能获得准确的点预测结果。2) A parallel training method is used to pre-train each quantile LSTM in the nested LSTM model, and the weight and bias parameter sets are obtained as the initial parameters of the nested LSTM model. Then, overall training is performed to fine-tune the weight and bias parameters to obtain the optimal weight and bias parameters of the nested LSTM model, making the model prediction more efficient and able to obtain accurate point prediction results.

3）本发明提出的计及分位数约束关系的评价指标，可用于评价分位数的交叉情况。3) The evaluation index proposed in the present invention taking into account the quantile constraint relationship can be used to evaluate the intersection of quantiles.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

下面结合附图和实施例对本发明作进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

图1为实施例的电力负荷概率预测方法的流程示意图。FIG1 is a schematic flow chart of a method for predicting power load probability according to an embodiment.

图2为LSTM结构示意图。Figure 2 is a schematic diagram of the LSTM structure.

图3为实施例的嵌套LSTM模型的结构示意图。FIG3 is a schematic diagram of the structure of a nested LSTM model of an embodiment.

图4为实施例的并行式训练的示意图。FIG. 4 is a schematic diagram of parallel training according to an embodiment.

图5为并行LSTM的训练过程示意图。Figure 5 is a schematic diagram of the training process of parallel LSTM.

图6为实施例中不同预测模型得到的测试集样本日的评价指标Xcs的对比示意图。FIG6 is a schematic diagram showing a comparison of the evaluation indexXcs of the test set sample days obtained by different prediction models in the embodiment.

具体实施方式Detailed ways

如图1所示，基于嵌套LSTM和分位数计算的电力负荷预测方法，包括以下步骤，As shown in Figure 1, the power load forecasting method based on nested LSTM and quantile calculation includes the following steps:

步骤1：收集某实际地区2016年1月1日至2017年6月30日每间隔15分钟的负荷数据、气温数据和降雨量，形成数据集并按8:1:1的比例分为训练集、验证集和测试集，输入变量X_d=[T_d,R_d]，包括预测日24时刻气温T_d=[T₁, T₂,…, T₂₄]_d和M个分区的降雨量R_d=[R₁,R₂,…,R_M]_d；考虑到数据之间的数据差别比较大，需要将不同的数据归化到[-1,1]里，经过归一化处理后的输入样本为；归一化处理前的样本数据为/>，其最大、最小样本值分别为、/>，样本数为N，具体处理公式如下：Step 1: Collect load data, temperature data and rainfall at intervals of 15 minutes in a certain area from January 1, 2016 to June 30_, 2017 to form a data set and divide it into training set, validation set and test set in a ratio of 8:1:₁ . The input variableXd = [Td_,Rd ] includes the_temperature at 24 o'clock on the forecast_dayTd= [T1_,T2_,…, T24_]_d and the rainfall ofM partitionsRd=_[R1,R2_,… ,R_M ]_d ; considering that the data differences between the data are relatively large, the different data need to be normalized to [-1, 1]. The input sample after normalization is ; The sample data before normalization is/> , and its maximum and minimum sample values are 、/> , the number of samples isN , and the specific processing formula is as follows:

步骤2：建立嵌套LSTM模型，如图3所示，嵌套LSTM模型包括输入层、隐藏层、输出层和回归层，隐藏层包括多个分位点长短期记忆网络模型（Quantile Long-Short TermMemory，Q-LSTM）。Step 2: Establish a nested LSTM model. As shown in Figure 3, the nested LSTM model includes an input layer, a hidden layer, an output layer, and a regression layer. The hidden layer includes multiple quantile long short-term memory network models (Quantile Long-Short Term Memory, Q-LSTM).

设置模型超参数，包括神经元数量m、样本时窗长度l、计算节点数n、惩罚参数λ₁、λ₂；实施例中，m的取值为200，时窗长度l的取值为6，λ₁的取值为1，λ₂的取值为20，总样本日为547天。The model hyperparameters are set, including the number of neuronsm , the sample time window lengthl, the number of computing nodesn, and the penalty parametersλ₁andλ₂ . In the embodiment, the value ofm is 200, the value of the time window lengthl is 6, the value ofλ₁ is 1, the value ofλ₂ is 20, and the total sample days are 547 days.

采用并行式训练方法，对嵌套LSTM模型中各个分位点下并行LSTM进行预训练，将训练集分为均等的n个子集，利用对应的n个计算节点对网络进行并行训练；A parallel training method is used to pre-train the parallel LSTM at each quantile in the nested LSTM model, the training set is divided inton equal subsets, and the network is trained in parallel using the correspondingn computing nodes;

如图4所示，神经网络的数据并行式训练通过GPU分布式计算实现，将训练集均等分为多个子集，分配到计算系统的各个节点，每个计算节点负责处理该数据集的一个不同子集，从而减少神经网络训练总时间，每个节点训练其数据子集均会得到一套模型参数，将每个节点训练得到的参数集合，运用梯度下降公式计算新的全局权重集，进而分配给计算系统的每一个节点，其公式为：As shown in Figure 4, data parallel training of neural networks is implemented through GPU distributed computing. The training set is equally divided into multiple subsets and distributed to each node of the computing system. Each computing node is responsible for processing a different subset of the data set, thereby reducing the total training time of the neural network. Each node trains its data subset and obtains a set of model parameters. The parameter set obtained by each node training is used to calculate a new global weight set using the gradient descent formula, and then distributed to each node of the computing system. The formula is:

其中Z_φ={W, b}^(φ)为第φ次迭代训练得到的全局参数集，△Z_φ，j为第φ次迭代训练得到的第j个计算节点的参数梯度，n为计算节点的总个数，为缩放系数，类似于学习率。whereZ_φ ={W, b }^(φ) is the global parameter set obtained from theφth iteration training, △Z_φ,j is the parameter gradient of thejth computing node obtained from theφth iteration training,n is the total number of computing nodes, is a scaling factor, similar to the learning rate.

如图5所示，每个节点单独训练的模型Q-LSTM，训练过程如下：As shown in Figure 5, the training process of the Q-LSTM model for each node is as follows:

（4）输入初始权重W_0(τi)和初始偏置b_0(τi)；(4) Input initial weightW0_(τi) and initial biasb0_(τi) ;

（5）计算LSTM的输入门、遗忘门/>、输出门/>、候选存储单元/>、新的存储状态、隐藏层状态/>的当前迭代值/>、/>、/>、/>、/>、/>，计算过程如下：(5) Calculate the input gate of LSTM , Forgotten Gate/> , output gate/> , candidate storage unit/> , new storage state , hidden layer state/> The current iteration value of /> 、/> 、/> 、/> 、/> 、/> , the calculation process is as follows:

给定当前输入x_t，上一时刻隐含层状态h_t-1和存储状态C_t-1，详细的计算过程如下：Given the current inputx_t , the previous hidden layer stateh_t-1 and the storage stateC_t-1 , the detailed calculation process is as follows:

其中，W_i、W_f、W_o、W_c分别代表相应的权重矩阵，b_i、b_f、b_o、b_c分别代表相应的偏置向量；σ( ) 和tanh( )分别为Sigmoid 和正切S形曲线激活函数；输出层的最终输出由隐含层状态h_t计算：WhereWi_,Wf ,_Wo_,_{and Wc}represent the corresponding weight matrices,bi_,bf ,bo_,_{and bc}represent the corresponding bias vectors,σ () andtanh () are the Sigmoid and tangent S-curve activation functions, respectively; the final output of the output layer_is Calculated from the hidden layer stateh_t :

其中W_S是隐含层与输出层的连接权重矩阵，b_S表示相应的偏置向量。WhereW_S is the connection weight matrix between the hidden layer and the output layer, andb_S represents the corresponding bias vector.

（6）根据损失函数利用梯度下降法计算梯度和/>，并据此计算各权重和偏置的梯度，损失函数如下：(6) Calculate the gradient using the gradient descent method based on the loss function and/> , and based on this, the gradients of each weight and bias are calculated. The loss function is as follows:

其中in

W(τ_i)={W_f(τ_i),W_i(τ_i),W_c(τ_i),W_o(τ_i)_,W_S(τ_i)}b(τ_i)={b_f(τ_i),b_i(τ_i),b_c(τ_i),b_o(τ_i),b_S(τ_i)}W (τ_i ) = {W_f (τ_i ),W_i (τ_i ),W_c (τ_i ),W_o (τ_i )_,W_S (τ_i )}b (τ_i ) = {b_f (τ_i ),b_i (τ_i ),b_c (τ_i ),b_o (τ_i ),b_S (τ_i )}

分别为分位点下LSTM神经网络的全部权重参数矩阵集合和偏置向量集合；λ₁为防止模型训练过拟合的正则项惩罚参数，/>(a)为检验函数，其定义为：Percentile The following are the set of all weight parameter matrices and bias vectors of the LSTM neural network;λ₁ is the regularization penalty parameter to prevent overfitting of the model training,/> (a ) is the test function, which is defined as:

定义梯度函数和/>如下：Define the gradient function and/> as follows:

为损失函数/>对隐藏层状态/>的微分，/>为损失函数/>对存储状态/>的微分。 is the loss function/> For the hidden layer state/> The differential of is the loss function/> To store the status /> The differential of .

1）隐含层至输出层参数的梯度为：1) The gradient of the parameters from the hidden layer to the output layer is:

为隐藏层状态/>对隐含层与输出层的连接权重矩阵W_S微分，/>为隐藏层状态/>对偏置向量b_S微分。 is the hidden layer state/> Differentiate the connection weight matrixW_S between the hidden layer and the output layer,/> is the hidden layer state/> Differentiate the bias vectorbS_.

2）2）根据、/>分别计算遗忘门、输入门、候选存储单元和输出门参数的梯度；2) 2) According to 、/> Calculate the gradients of the forget gate, input gate, candidate storage unit and output gate parameters respectively;

（5）更新权重和偏置，公式如下：(5) Update weights and biases. The formula is as follows:

其中η为学习率，W_*和b_*分别代表相应的权重矩阵和偏置向量。Whereη is the learning rate,W_* andb_* represent the corresponding weight matrix and bias vector, respectively.

重复步骤（2）~步骤（4），直至达到收敛条件，获得模型最优参数{W(τ_i), b(τ_i)}_opt。Repeat steps (2) to (4) until convergence conditions are reached and the optimal model parameters {W(τ_i), b(τ_i) }_opt are obtained.

步骤3：将得到的权重、偏置参数集{W(τ_i), b(τ_i)}_opt作为嵌套LSTM模型的初始参数，对嵌套LSTM模型进行整体训练，对{W(τ_i),b(τ_i)}_r进行微调，确定CP-LSTM短期负荷概率预测模型的最佳权重及偏置参数；为获得嵌套LSTM模型的最佳参数，基于训练样本集，采用梯度下降法搜寻使损失函数最小的模型参数{W(τ_i),b(τ_i)}_opt；嵌套LSTM模型的训练方法与Q-LSTM训练方法一致，只是在损失函数和梯度有差别，嵌套LSTM模型损失函数如下：Step 3: Use the obtained weight and bias parameter set {W(τ_i), b(τ_i) }_opt as the initial parameters of the nested LSTM model, train the nested LSTM model as a whole, fine-tune {W(τ_i), b(τ_i) }_r , and determine the optimal weight and bias parameters of the CP-LSTM short-term load probability forecasting model; To obtain the optimal parameters of the nested LSTM model, based on the training sample set, the gradient descent method is used to search for the model parameters {W (τ_i ),b (τ_i )}_opt that minimize the loss function; The training method of the nested LSTM model is consistent with the Q-LSTM training method, except that there are differences in the loss function and gradient. The loss function of the nested LSTM model is as follows:

其中,/>为违反约束条件的惩罚参数，对应的梯度/>、/>和/>相应变为:in ,/> is the penalty parameter for violating the constraint, and the corresponding gradient/> 、/> and/> The corresponding changes are:

向量u_i中各元素分别为：The elements in the vectoru_i are:

遗忘门、输入门、存储单元、候选存储单元和输出门参数的梯度计算与步骤3中计算方式相同。The gradient calculation of the forget gate, input gate, storage unit, candidate storage unit and output gate parameters is the same as that in step 3.

步骤4：将验证集输入步骤3训练好的嵌套LSTM模型，根据验证误差选择最优越的超参数。Step 4: Input the validation set into the nested LSTM model trained in step 3 and select the best hyperparameters based on the validation error.

实施例的547天的样本数据的10%用于验证，根据最终输出结果与真实值的误差选择最佳超参数。10% of the 547-day sample data of the embodiment is used for verification, and the best hyperparameter is selected according to the error between the final output result and the true value.

步骤5：将测试样本输入具有最佳超参数的嵌套LSTM模型得到输出结果，再将输出结果转化为不同的量纲，即反归一化，最终对预测数据与真实结果进行对比分析；考虑到分位数预测结果应满足分位数约束条件，在常用概率预测评价指标Quantile score（QS)的基础上，本发明提出计及分位数约束关系的评价指标Constraint score（CS)。由分位数的固有属性可知，t时刻的分位数预测值应满足，据此本发明提出考虑分位数约束关系的指标如式：Step 5: Input the test sample into the nested LSTM model with the best hyperparameters to obtain the output result, and then convert the output result into different dimensions, that is, denormalize, and finally compare and analyze the predicted data with the actual result; Considering that the quantile prediction result should satisfy the quantile constraint condition, based on the commonly used probability prediction evaluation index Quantile score (QS), the present invention proposes an evaluation index Constraint score (CS) that takes into account the quantile constraint relationship. It can be seen from the inherent properties of the quantile that the quantile prediction value at timet should satisfy , based on which the present invention proposes an index considering the quantile constraint relationship as follows:

其中表示计及分位数约束关系的评价指标值；/>是t时刻分位点下的预测值，N为测试时刻的总数，v_t,i为约束违反程度函数，θ=τ_i+1-τ_i为分位点之间的步长，是一个常数；当相邻分位数之间满足约束关系时，v_t,i为0，而当违背约束关系时，v_t,i为相邻分位数的正差值，反映约束违反的程度。系数项2θ/N为分位数约束误差平方的归一化系数，由此计算出的X_CS为v_t,i在整个测试集样本和全部相邻分位数上的归一化均方根。故可通过X_CS来量化反映分位数的交叉情况。in Indicates the evaluation index value taking into account the quantile constraint relationship;/> is the predicted value under the quantile at time t, N is the total number of test moments,v_t,i is the constraint violation degree function,θ =τ_i+1 -τ_i is the step length between quantiles, which is a constant; when the constraint relationship is satisfied between adjacent quantiles,v_t,i is 0, and when the constraint relationship is violated,v_t,i is the positive difference between adjacent quantiles, reflecting the degree of constraint violation. The coefficient term 2θ /N is the normalized coefficient of the square of the quantile constraint error, and the calculatedX_CS is the normalized root mean square ofv_t,i on the entire test set samples and all adjacent quantiles. Therefore,X_CS can be used to quantitatively reflect the crossover of quantiles.

当X_QS和X_CS同时较低时，表明预测得到的分位数具有更好的性能，本发明将两者结合，组成综合评价指标X_QCS：When bothX_QS andX_CS are low, it indicates that the predicted quantile has better performance. The present invention combines the two to form a comprehensive evaluation indexX_QCS :

此外，预测区间(prediction interval, PI)的可靠性指标—PI覆盖概率偏差指数(PICP)和锐度指标—PI 标准均方根宽度(PINRW)也是概率预测结果评价的要指标。In addition, the reliability index of the prediction interval (PI) - PI coverage probability deviation index (PICP) and the sharpness index - PI standard root mean square width (PINRW) are also important indicators for evaluating the probability prediction results.

常用概率预测评价指标X_QS：Commonly used probability prediction evaluation indexX_QS :

其中是分位点/>下的pinball losses值，y_t为t时刻电力负荷的实际值，/>是t时刻/>分位点下的预测值，N为测试时刻的总数。in is the quantile/> The pinball losses value under the condition,y_t is the actual value of the power load at time t, /> is timet /> The predicted value at each quantile, whereN is the total number of test moments.

可靠性指标X_PICP：Reliability indexX_PICP :

其中ε^α表示在置信度1-α下实际值落入预测区间的数量。Whereε^α represents the number of actual values that fall into the prediction interval at the confidence level 1-α .

PI的实际覆盖率PICP与其标称值(PI nominal confidence, PINC)的偏差覆盖概率偏差指数X_Dev：The deviation between the actual coverage rate of PI PICP and its nominal value (PI nominal confidence, PINC) is the coverage probability deviation indexX_Dev :

锐度指标X_PINRW：Sharpness indexX_PINRW :

其中X_PINRW^α是在置信度1-α下的预测区间归一化均方根宽度，U_t^α和L_t^α分别是置信度1-α下第t个测试样本预测区间的上、下限，R是测试集中负荷最大值和最小值之差。whereXPINRWα is the normalized RMS widthof the prediction interval at confidence level 1-α, Utα and Ltα are the upper and lower limits of the prediction interval of the tth test sample at confidence level 1-α^,andR^is_thedifferencebetween_the_maximum^andminimum load values in the test set.

步骤6：根据步骤5得到的预测负荷的多个分位数，采用高斯核密度估计方法计算得到预测点的概率密度曲线，高斯核密度估计方法参照期刊《Energy》2018年刊登的文章“Short-term power load probability density forecasting based on Yeo-Johnsontransformation quantile regression and Gaussian kernel function”公开的高斯核密度估计方法。Step 6: Based on the multiple quantiles of the predicted load obtained in step 5, the Gaussian kernel density estimation method is used to calculate the probability density curve of the predicted point. The Gaussian kernel density estimation method refers to the Gaussian kernel density estimation method disclosed in the article "Short-term power load probability density forecasting based on Yeo-Johnson transformation quantile regression and Gaussian kernel function" published in the journal "Energy" in 2018.

实施例中，选取某实际地区2016年1月1日至2017年6月30日的15分钟级负荷数据集，通过本发明的方法，进行日前负荷概率预测。为验证嵌套LSTM模型的预测性能，将它与线性分位数回归模型L-QR、带参数整流线性激活函数RCLU的分位数神经网络bQRNN、QRNN以及不加组合层的Q-LSTM进行对比。各模型概率预测结果的评价指标统计对比如表1和表2所示，表1列出了训练时长T_train、常用概率预测评价指标X_QS、考虑分位数约束关系的指标X_CS、综合评价指标X_QCS及50%、90%置信度下的锐度指标X_PINRW和违背相邻分位数约束关系样本占比f；表2列出了不同置信度下可靠性指标X_PICP和偏差指数X_Dev的对比，其中X_AD、X_MD分别为各置信度下X_Dev的均值、最大值。In the embodiment, a 15-minute load data set from January 1, 2016 to June 30, 2017 in a certain actual area is selected, and the method of the present invention is used to perform the probability forecast of the load on the day before. In order to verify the prediction performance of the nested LSTM model, it is compared with the linear quantile regression model L-QR, the quantile neural network bQRNN with parameter rectified linear activation function RCLU, QRNN and Q-LSTM without combination layer. The statistical comparison of the evaluation indexes of the probability prediction results of each model is shown in Table 1 and Table 2. Table 1 lists the training timeT_train , the commonly used probability prediction evaluation indexX_QS , the indexX_CS considering the quantile constraint relationship, the comprehensive evaluation indexX_QCS and the sharpness indexX_PINRW at 50% and 90% confidence levels and the proportion of samples violating the adjacent quantile constraint relationshipf ; Table 2 lists the comparison of the reliability indexX_PICP and the deviation indexX_Dev at different confidence levels, whereX_AD andX_MD are the mean and maximum values ofX_Dev at each confidence level, respectively.

结合图6和表1可知，嵌套LSTM即CP-LSTM的X_CS指标在绝大多数样本日中明显低于其他几种方法，且CP-LSTM在整个测试集的综合X_CS指标仅为Q-LSTM的27.28%，而且违背约束的样本在整个测试集样本中所占的比例f相较Q-LSTM降低了16.3%，但反映预测精度的X_QS指标却并未有明显变化。可见CP-LSTM能在不降低预测精度的前提下，有效避免分位数交叉，改善预测分位数的合理性。Combining Figure 6 and Table 1, we can see that_{the XCS}index of the nested LSTM, namely CP-LSTM, is significantly lower than that of the other methods in most sample days, and_{the comprehensive XCS}index of CP-LSTM in the entire test set is only 27.28% of that of Q-LSTM, and the proportion of samples that violate the constraint in the entire test set is 16.3% lowerthan that of Q-LSTM, but_{the XQS}index reflecting the prediction accuracy has not changed significantly. It can be seen that CP-LSTM can effectively avoid quantile crossing and improve the rationality of the prediction quantile without reducing the prediction accuracy.

表1 各模型评价指标对比表Table 1 Comparison of evaluation indicators of each model

表2 各模型X_PICP和X_Dev对比表Table 2 Comparison ofX_PICP andX_Dev models

本发明的保护范围并不局限于此，尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：任何熟悉本技术领域的技术人员在本发明揭露的技术范围内，其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化，或者对其中部分技术特征进行等同替换；而这些修改、变化或者替换，并不使相应技术方案的本质脱离本发明实施例技术方案的精神和范围，都应涵盖在本发明的保护范围之内。The protection scope of the present invention is not limited to this. Although the present invention has been described in detail with reference to the aforementioned embodiments, ordinary technicians in this field should understand that any technician familiar with this technical field can still modify the technical solutions recorded in the aforementioned embodiments within the technical scope disclosed by the present invention, or easily conceive of changes, or make equivalent replacements for some of the technical features therein; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention, and should all be covered by the protection scope of the present invention.