Background
Load prediction is an important link of the power industry, and short-term load prediction plays an important supporting role in guaranteeing safe and stable power grid and economic and efficient operation and is a basis for starting and stopping plans and scheduling plans. The load prediction is also a core technology of photovoltaic and wind energy consumption, particularly for areas rich in renewable energy sources, the low load prediction precision can not only damage the economic benefits of a photovoltaic base/wind field, but also influence the electric energy quality and reliability of the whole power grid, and has important significance on energy management and energy conservation and emission reduction.
The short-term load prediction method can be mainly classified into two categories, namely a statistical method and a machine learning method. Wherein, the statistical method mainly comprises a time sequence method, a regression analysis method and the like; the machine learning method mainly includes a support vector machine (SVR), a decision tree model, and the like, wherein algorithms commonly applied to power load prediction in the decision tree model include a random forest algorithm, a gradient boosting tree (GBRT), and the like.
In recent years, deep learning methods are widely applied to the field of power load prediction due to their strong mapping capability in high-dimensional nonlinear complex systems, and deep learning models such as Deep Belief Networks (DBNs), long short term memory networks (LSTM), Convolutional Neural Networks (CNNs) become hot models in the field of load prediction. The deep learning method can learn abstract features in mass data layer by layer through a multilayer network, and can achieve a more high-precision load prediction effect by combining a nonlinear relation between a load and influence factors such as weather and economy.
The LSTM is a structure of a recurrent neural network, connects neurons of hidden layers of sequences with each other to form feedback of the sequences, effectively solves the problem of gradient disappearance possibly generated in the original RNN, shows good performance in the aspect of learning data dynamic time sequence characteristics, and is widely applied to a plurality of sequence recognition scenes such as speech recognition, text prediction and the like. The load has a relatively obvious periodic rule in a short time scale, and the time sequence correlation of the load is considered in prediction, so that the LSTM model with the memory function can effectively learn the regularity in a historical load sequence, and is applied to the field of load prediction in recent years. For example, applying the LSTM network to short-term load prediction, comparing the results with the multi-layer BP network, it can be seen that the prediction effect of the LSTM network is far better than that of the latter; and for example, the LSTM network is applied to the regional ultra-short-term load prediction, and the comparison with the traditional algorithm shows the superiority of the performance.
On the other hand, the load prediction effect is closely related to the selection of the input features, the time sequence relevance and the like, and the influence degree of the input feature sequence on the load volatility and the uncertainty needs to be considered. The Attention mechanism (Attention) based neural network has good application effect in the fields of natural language processing, computer vision and the like, can enable a hidden layer to pay Attention to more critical information by distributing different probability weights to neurons of the hidden layer, and can mine the relevance of input feature sequences in load prediction by utilizing the characteristics. For example, the Attention mechanism and the LSTM network are combined and applied to ultra-short-term load prediction, so that the mining of the correlation between historical load data and predicted point loads is realized; for another example, a power market load prediction method based on the Attention-LSTM method is proposed, focusing on the correlation between the electricity price and the load sequence. However, most of the above studies only use a shallow LSTM network model, and do not fully utilize the feature extraction advantages of a deep neural network, and do not consider the probability characteristics of the load, and the prediction data is a deterministic load curve, and it is difficult to represent the probability characteristic distribution of the load.
Disclosure of Invention
The invention provides a short-term load quantile point probability forecasting method based on depth Attention-LSTM, aiming at solving the problems of volatility and uncertainty of short-term load forecasting and the difficulty of load probability forecasting reliability.
In order to solve the technical problems, the invention adopts the following technical scheme: a short-term load quantile probability forecasting method based on depth Attention-LSTM,
firstly, designing a short-term probability load prediction model based on a depth Attention-LSTM network;
secondly, selecting load influence factors to construct input and output characteristic vectors, and performing data preprocessing on the original data to obtain a training set, a verification set and a test set;
and finally, applying a short-term probability load forecasting model to forecast the short-term probability load to obtain a forecasting result.
Preferably, the short-term probability load prediction model based on the depth Attention-LSTM network comprises two LSTM hidden layers and an Attention layer, load time sequence characteristics of input vectors are extracted through the LSTM network, finally, the characteristic weight of sequence input is calculated through the Attention layer according to the current input vectors, and then the weight vectors and the current input vectors are combined to obtain a predicted point output value.
Preferably, the calculation formula of the LSTM is as follows:
ft=σ(Wfxt+Ufht-1+bf),
it=σ(Wixt+Uiht-1+bi),
ot=σ(Woxt+Uoht-1+bo),
ht=ot⊙tanh(ct)
wherein: x is the number oftAnd htI, f, c and o are respectively an input vector and a hidden layer state quantity at the moment t, i, f, c and o are respectively a corresponding input gate, a forgetting gate, an alternative updating gate and an output gate, W, U and b are respectively a corresponding weight coefficient matrix and a bias vector, and sigma and tanh are respectively a sigmoid activation function and a hyperbolic tangent activation function.
Preferably, the equation for the Attention mechanism connected to the LSTM is as follows:
uit=tanh(Wwh2i+bw)
in the formula, the Attention layer first implies the output information h of the LSTM hidden layer2iObtaining an implicit representation u by means of a non-linear transformationitThen the attention mechanism matrix uwPerforming dot product operation and using softmax normalization to finally obtain a weight coefficient alphaiThe output vector C is obtained by the combined action of the LSTM top-level output and the LSTM top-level output;
Ww、bw、uwthe initial mode is random initialization for the parameters to be trained.
Preferably, the pinball loss function is used for obtaining the quantile point information corresponding to the load probability distribution,
wherein the probability characteristic of the load is obtained by solving the multi-component site, i.e.
In the formula (I), the compound is shown in the specification,
is a load split site, and p is belonged to [0,1]],n
qIs the number of the quantiles to be investigated, P is the load value y
iLower than the quantile
The probability of (a) of (b) being,
the quantile information described by the pinball loss function is defined as the following formula
In the formula, N is the number of samples, s is a sign function, and is 1 only when the condition is met; p is the corresponding probability of the quantile,
the calculated quantile point is obtained;
when in use
When it is requested
Namely the median, the number of bits is,
equivalent to the mean absolute error;
the established neural network training process is solved as an unconstrained optimization problem to minimize the loss function, i.e.
In the formula, ωpI.e. the parameters for training and learning when the quantile probability is p.
Preferably, in the data preprocessing, the empty values and the "NAN" abnormal values which do not exceed 20% of the data volume are filled by using the mean value of the adjacent load data, and if continuous abnormal data occur, the searching is continued to the two ends until the empty values are not obtained.
Preferably, because different input features have different dimensions, the raw feature data x (i) is normalized to limit the input to [0, 1%]In-range, normalized input data x1(i) Is composed of
In the formula, xmax、xminThe raw input data maximum and minimum values, respectively.
Preferably, the training set, validation set, and test set are divided by a ratio of 70% to 15% for the data set as a whole.
By adopting the technical scheme, the invention has the following beneficial effects:
1) aiming at the technical difficulties of volatility, uncertainty and the like existing in short-term load prediction, the invention fully considers the load time sequence characteristics, introduces an Attention mechanism on the basis of the depth LSTM and establishes a short-term load prediction model based on a depth Attention-LSTM network.
2) Aiming at the problem of load probability prediction, the invention establishes a quantile prediction method corresponding to load probability distribution, establishes a reliability measurement standard by adopting a Pinball loss function and measures the reliability of load probability prediction under the condition of a specific quantile.
The following detailed description of the present invention will be provided in conjunction with the accompanying drawings.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention firstly designs a short-term probability load prediction model based on a depth Attention-LSTM network; secondly, selecting load influence factors (including weather, holidays, economy and other factors) to construct input and output characteristic vectors, and performing data preprocessing on the original data to obtain a training set, a verification set and a test set; and then, applying the established short-term probability load prediction model to predict the short-term probability load to obtain a prediction result.
Specific embodiments of the respective steps are as follows.
Short-term probability load prediction model design based on depth Attention-LSTM network
LSTM network architecture
The LSTM long-short term memory network is an improved RNN, the memory parameters are limited in the interval of [0,1], the influence of memory at a far moment on output index explosion is prevented, the problem of gradient extinction which cannot be processed by the RNN is effectively solved, historical information can be fully utilized, and the LSTM long-short term memory network has stronger adaptability in time sequence data analysis. The basic structure of the LSTM cell is shown in fig. 1.
In the figure: x is the number oftAnd htI, f, c, and o are respectively an input vector and an implicit layer state quantity (output information of the implicit layer) at time t, i, f, c, and o are respectively a corresponding input gate, a forgetting gate, a candidate update gate, and an output gate, W, U and b are respectively a corresponding weight coefficient matrix and a bias vector, or W is U, and σ and tanh are respectively a sigmoid activation function and a hyperbolic tangent activation function.
The calculation formula for LSTM can be expressed as follows:
ft=σ(Wfxt+Ufht-1+bf),
it=σ(Wixt+Uiht-1+bi),
ot=σ(Woxt+Uoht-1+bo),
ht=ot⊙tanh(ct)。
mechanism of Attention
The Attention mechanism mimics the human brain Attention mechanism, i.e., the property that human brain Attention will be focused on a particular place at a particular moment and the rest of the Attention will be reduced or ignored. The load sequence has time sequence, the load level at the current moment is related to the input of different historical moments, therefore, the invention introduces an Attention mechanism into an LSTM model, and highlights the influence factors with strong relevance by endowing different weight values to the input quantities at different time sequence positions, thereby helping the model to achieve more accurate prediction effect and basically not increasing the calculation and storage expenses of the model.
The structure diagram of the Attention mechanism connected to the LSTM is shown in FIG. 2, and the calculation formula is as follows:
uit=tanh(Wwh2i+bw)
in the formula, the Attention layer first implies the output information h of the LSTM hidden layer2iObtaining an implicit representation u by means of a non-linear transformationitThen the attention mechanism matrix uwPerforming dot product operation and using softmax normalization to finally obtain a weight coefficient alphaiWhich coacts with the LSTM top-level output to produce an output vector C.
Ww、bw、uwThe initial mode is random initialization for the parameters to be trained. In essence, the Attention layer extracts high-dimensional structure information in a time sequence, i.e., finds a hyperplane in a high-dimensional space so that load data to be predicted falls on the hyperplane.
Design of model integral structure
The overall structure of the depth Attention-LSTM-based load prediction model is shown in fig. 3, which includes two LSTM hidden layers and one Attention layer. The input vector firstly extracts load time sequence characteristics through an LSTM network, finally, characteristic weight of sequence input is calculated through an Attention layer according to the current input vector, and then the weight vector and the current input vector are combined to obtain a predicted point output value.
The network construction process is shown asalgorithm 1.
Training method and probability load prediction index
The load prediction model established by the invention is realized based on the TensorFlow architecture of Python language. The training procedure is given in fig. 4: after the data set is preprocessed, the data is divided into a training set, a verification set and a test set. And establishing a network and randomly initializing to ensure that all connection weights and thresholds are Gaussian distribution with the mean value of 0 and the variance of 1. After that, the network parameters are updated through loop iteration, an Adam optimizer is adopted in the training optimization algorithm, and the learning rate is set to be in an exponential decline mode, namely lr ═ lr0e-ktWherein the initial learning rate lr0And the attenuation coefficient k is an adjustable hyper-parameter, t is iteration times (epoch), and the setting aims to slow down the training speed in the later training period so as to find an optimal solution.
In the training process, the generalization performance of the model is monitored by adopting the verification set, the training is stopped when the loss function of the verification set is not reduced basically, the model weight at the end of the training and the model weight with the minimum error on the verification set in the training process are recorded, and one group with the best generalization capability is selected as the model finally finished by training.
In order to further describe the uncertainty of load fluctuation, indexes such as MAPE (mapping edge index) and RMSE (RMSE) which are commonly used in deterministic prediction are not adopted as loss functions, and pinball loss functions are adopted to acquire quantile point information corresponding to load probability distribution.
Wherein the probability characteristic of the load is obtained by solving the multi-component site, i.e.
In the formula (I), the compound is shown in the specification,
is a load split site, and p is belonged to [0,1]],n
qIs the number of the quantiles to be investigated, P is the load value y
iLower than the quantile
The probability of (a) of (b) being,
the quantile information described by the pinball loss function is defined as the following formula
In the formula, N is the number of samples, s is a sign function, and is 1 only when the condition is met; p is the corresponding probability of the quantile,
the desired quantile. In particular when
When it is requested
Namely the median, the number of bits is,
equivalent to Mean Absolute Error (MAE). The established neural network training process isSolving an unconstrained optimization problem so as to minimize the loss function, i.e.
In the formula, ωpI.e. the parameters for training and learning when the quantile probability is p.
Data pre-processing
Experimental data description and input-output selection
The experimental data come from the load data of power distribution management systems 2018-06-16 to 2019-06-16 in a certain city of Zhejiang, the data step length is 15 minutes, and the meteorological data provided by the meteorological bureau of the region,
including the meteorological data consisting of the average temperature, the maximum temperature, and the minimum temperature, with a data step size of 24 hours. The invention assumes that the meteorological data are kept unchanged in one day, and carries out periodic continuation on the meteorological data. The training set, validation set, and test set will be divided by a ratio of 70% to 15% for the data set as a whole. Finally, model input data are weather factors consisting of load, date type, time value, average temperature, minimum temperature and maximum temperature of the previous day of the predicted point, and the length of an input sequence is 96; the output data is a single-point predicted load, and the output length is 1.
Data pre-processing
And (3) removing and filling some missing and deviating abnormal values possibly existing in the original data, wherein the missing values and 'NAN' abnormal values which do not exceed 20% of the data volume are filled by adopting the average value of adjacent load data, and if continuous abnormal data occur, the two ends are continuously searched until the abnormal values are not empty.
Because different input features have different dimensions, the raw feature data x (i) is normalized to limit the input to 0,1]Within the range, thereby improving the prediction accuracy and the convergence rate. Normalized input data x1(i) Is composed of
In the formula, xmax、xminThe raw input data maximum and minimum values, respectively.
Short term load prediction implementation and outcome
In order to verify the performance of the deep Attention-LSTM model, an LSTM + fully connected layer (FC), an SVR (singular connected layer) and a GBRT (GBRT) network are used as a comparison group, the LSTM + FC is realized based on a TensorFlow framework, the rest two are realized based on a Sklearn library, a loss function is used as a pinball function, and a random search method is used for determining the hyper-parameter. The LSTM + FC node number and the hyper-parameter are the same as those of the model provided by the invention, the number of neurons in each hidden layer is respectively 100, 50 and 20, namely the network structure is 96-100-50-20-1, the SVR structure is 96-50-20-1, an RBF kernel function is adopted, the maximum depth of a decision tree is set to be 10 by GBRT according to experience, and the description of other parameters is omitted. The iterative convergence conditions of the training of each algorithm are the same, namely the training is stopped when the descending value of the loss function is less than the threshold value for 50 times of continuous iterations.
And comparing the prediction precision with the program running time. The load prediction experiment before the day is carried out by taking 6 months and 10 days to 6 months and 16 days as a test interval. And taking a Pinball loss function table of a network test set with the quantile probability within the range of 0.2-0.8 and depth Attention-LSTM, LSTM + FC, SVR and GBRT for giving. Particularly, when the quantile point probability p is 0.5, the comparison result of the real load curve and the prediction result of each method shows that when the depth Attention-LSTM network established by the invention is applied to load prediction, although the prediction precision is different in different prediction dates, the overall prediction precision is obviously superior to that of the traditional SVR and GBRT, compared with LSTM + FC, the prediction precision is slightly improved, the test error is smaller than that of a comparison algorithm under different quantile point probabilities, and the prediction precision advantage is kept. Aiming at the problem of load probability prediction, the prediction results of the depth Attention-LSTM algorithm under different quantile point probability settings show that all quantiles can effectively grasp the load change trend, the distribution is concentrated, the interval formed by the load prediction curve group formed by the quantiles can realize better coverage on the actual value of the load, and a reliable basis is provided for scheduling operation.
While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that the invention is not limited thereto, and may be embodied in other forms without departing from the spirit or essential characteristics thereof. Any modification which does not depart from the functional and structural principles of the present invention is intended to be included within the scope of the claims.