Medium-and-long-term runoff forecasting method based on mutual information-kernel principal component analysis-Elman networkTechnical Field
The invention belongs to the technical field of information, and particularly relates to a medium-long term runoff forecasting method based on mutual information, kernel principal component analysis and Elman network.
Background
Accurate medium and long-term runoff forecasting refers to an important precondition for comprehensive development and utilization, scientific management and optimized scheduling of water guide resources.
At present, the commonly used method for forecasting the medium-and-long-term runoff is a statistical-based method, namely, forecasting is realized by finding out the statistical relationship between a forecasting object and a forecasting factor. The key problem of applying statistical methods to medium and long term runoff forecasting includes the following three aspects:
(1) initial selection of a forecasting factor: for the initial selection of the forecasting factors, the currently common method is a linear correlation analysis method (such as Pearson correlation analysis, Spearman correlation analysis, etc.), that is, a factor with a high correlation coefficient is selected as the forecasting factor by calculating the correlation coefficient between meteorological hydrological data (including remote correlation factor, local correlation factor, etc.) and the historical runoff sequence;
(2) denoising and redundancy removal of the initially selected factors: for noise reduction and redundancy elimination of the initially selected factors, a method commonly used at present is Principal Component Analysis (PCA). When the factors are screened by using the correlation analysis method, the screened factors with high correlation are more often, and different factor time sequences have higher complex collinearity, and the factor time sequences also have certain noise. Therefore, the initially selected factors need to be denoised and de-redundant. The PCA can replace more original indexes with a plurality of less comprehensive indexes, and the less comprehensive indexes can reflect the useful information of more original indexes as much as possible and are orthogonal to each other;
(3) establishing an optimal mathematical relationship between the forecasting object and the forecasting factor: for the establishment of the optimal mathematical relationship between the forecast object and the forecast factor, the currently common models include multiple regression, random forest, artificial neural network, support vector machine and the like.
The existing medium-long term runoff forecasting method based on statistics has the following three problems:
(1) the hydrological process is complex, and besides a linear relation, a certain nonlinear relation also exists between the forecasting factors and the forecasting objects. The linear correlation analysis method for the initial selection of the forecasting factors can only describe the linear relation among the variables and cannot reflect the nonlinear relation among the variables;
(2) PCA for initially factorial noise reduction and redundancy elimination is essentially a linear mapping method, and the obtained principal component is generated by linear mapping. This approach ignores correlations between data above 2 nd order, so the extracted principal component is not optimal;
(3) the model used for establishing the optimal mathematical relationship between the forecast object and the forecast factor is a linear fitting in practice, and the nonlinear relationship between the forecast object and the forecast factor cannot be reflected. Compared with other models, the artificial neural network is widely applied to medium-and-long-term runoff forecasting due to good robustness and strong nonlinear mapping and self-learning capabilities, but the uncertainty of the neural network model parameters can affect the forecasting accuracy to a certain extent, and the forecasting results in each time have a certain difference.
Disclosure of Invention
The invention aims to solve the problems in the traditional statistical forecasting method and provide a medium-long term runoff forecasting method capable of overcoming the problems, so that the stability and the precision of forecasting are improved.
The invention provides a medium and long term runoff forecasting method based on Normalized Mutual Information (NMI), Kernel Principal Component Analysis (KPCA) and an Elman Neural Network (Elman Neural Network), which specifically comprises the following steps:
step 1: data pre-processing
1.1 collecting historical runoff data of an area to be predicted and meteorological hydrological data which can be used as forecasting factors, wherein the commonly used meteorological hydrological data comprise indexes such as atmospheric circulation characteristics, high-altitude air pressure fields, sea surface temperature and the like, 74 circulation characteristic indexes or 130 new atmospheric monitoring indexes provided by the national climate center basically comprise the commonly used indexes, and a preliminary forecasting factor can be directly selected from the indexes.
1.2, considering that the influence of meteorological factors on runoff has hysteresis, establishing a one-to-one correspondence relationship between the index time sequence in the previous 1 year and the runoff time sequence of the area to be predicted. For example, 130 indexes are selected, the forecast target is year runoff in 2017, and the historical runoff data and the historical index data of the prior 1960-2016 month year are monthly. The corresponding relation between one index and the runoff is used for explanation, and the corresponding relations between other indexes and the runoff are the same. The corresponding relationship is as follows:
TABLE 1 corresponding relationship between certain index time series and annual runoff time series
Step 2: forecasting factor primary selection based on standard mutual information
2.1 dividing the index time sequence and the annual runoff time sequence into two parts, wherein one part is used as a training sample of the neural network, and the other part is used as a test sample of the trained neural network. For example, the data of the previous 50 years is used as a training sample, and the data of the next 5 years is used as a test sample.
2.2 calculating mutual information. And respectively calculating mutual information of each index time sequence and the corresponding runoff time sequence for the training sample data. Taking the data in table 1 as an example, the mutual information between the 1 st column of the table and the remaining columns of the table is calculated. The formula for the mutual information MI is as follows:
wherein X is a runoff time series, and X ═ X1,x2,x3...xn)TTime series with Y as index, Y ═ Y1,y2,y3...yn)TMolecule p (x)i,yj) Is the joint distribution law of X and Y, p (X)i)、p(yj) The edge distribution laws for X and Y, respectively.
2.3 calculating standard mutual information. The mutual information is normalized, i.e. the MI value of step 2.2 is mapped between 0 and 1 using entropy as denominator. The formula for calculating the standard mutual information NMI is as follows:
wherein, H (X) and H (Y) are the entropy of X and Y respectively, and the calculation formulas of H (X) and H (Y) are as follows:
2.4 Standard mutual information Significance Test (Significance Test). The bootstrap method is adopted to carry out the inspection of the standard mutual information, and comprises the following steps:
2.4.1 calculating the standard mutual information NMI value of the original runoff time sequence and the index time sequence;
2.4.2 randomly and simultaneously disordering the sequence of two corresponding time sequences for K times (generally 100 times), calculating the NMI values after disorder and arranging the NMI values in the sequence from big to small;
2.4.3 taking the probability quantile of the NMI in the sequential arrangement as the NMI threshold value corresponding to the significance level of the probability;
2.4.4 if the NMI value of the original time series is greater than the NMI value corresponding to a certain probability threshold (generally 95%), the two sets of data are considered to be significantly correlated.
2.5, selecting the index which passes the significance test and has standard mutual information larger than a certain threshold (generally 0.9, but has difference according to the difference of time sequence length and can be adjusted by self) as the pre-selected forecasting factor.
And step 3: performing kernel principal component analysis to extract principal component
3.1 standardize the primarily selected forecasting factor data z-score, and the calculation formula is as follows:
in the formula, y*And d, normalizing the data by z-score, wherein y is one item in the primarily selected forecast factor data, mu is the mean value of the time series where y is located, and sigma is the standard deviation of the time series where y is located.
3.2 calculating the kernel matrix K of the forecasting factors initially selected in the step 2.5. K is an n x n matrix, the element K of the ith row and the jth columni,jThe calculation formula of (a) is as follows:
in the formula, the first step is that,
is a column vector representing the normalized time series of different initially selected predictor z-score, k is a kernel function, and the following commonly used kernel functions are:
linear Kernel (Linear Kernel):
② Polynomial Kernel (Polynomial Kernel):
③ Radial Basis Function (Radial Basis Function):
sigmoid nucleus (Sigmoid Kernel):
b, c, p, u and xi in the formulas (8), (9) and (10) are all constants and are parameters of various kernel functions.
3.3 calculate the centered kernel matrix. K for the core matrix after centralizationcIs represented by KcIs a matrix of n × n, KcThe calculation formula is as follows:
Kc=K-J·K-K·J+J·K·J (11)
in formula (11), J is an n × n matrix, and J has the form:
3.4 computing the centered Kernel matrix KcThe characteristic values and the characteristic vectors are arranged according to the sequence from big to small, and the sequence of the characteristic vectors is correspondingly adjusted according to the characteristic values. The eigenvalue matrix obtained after sorting is Λ, the eigenvector matrix is U, and the expression is as follows:
3.5 calculate the normalized eigenvector matrix A, which is of the form:
wherein
3.6 extracting principal components, wherein the principal component matrix is an n multiplied by n square matrix. The first 2 to 3 principal components are typically extracted as predictors. The calculation formula of the ith principal component is as follows:
KPC in formulai=(kpci1,kpci2,...,kpcin),KCThe resulting centered kernel matrix is calculated for step 3.2.
And 4, step 4: construction of Elman neural network model
4.1 the Elman network model is constructed, and the network structure (namely the number of nodes of each layer of the network) is determined firstly. The structure of the Elman network is shown in the attached figure 2 of the specification. The method for determining each layer of nodes of the network comprises the following steps: the number of nodes of an Input Layer (Input Layer) is equal to the number of the prediction factors; the number of Output Layer (Output Layer) nodes is equal to the number of forecast objects; the number of nodes of the accepting Layer (Context Layer) is equal to the number of nodes of the Hidden Layer (Hidden Layer); the number of hidden layer nodes has an important influence on the generalization performance of the network, but no system and standard method for determining the number of hidden layer nodes exists at present. One better choice is a trial-and-error method, i.e., the number of hidden layer nodes is determined by observing the network forecast effect by adopting different numbers of hidden layer nodes.
4.2 an Elman network model is constructed, and a training algorithm for determining the network is also needed. The invention adopts a back propagation algorithm and a gradient descent algorithm which drives the quantity items and has self-adaptive learning rate to update the weight of the network. The weight value updating formula is as follows:
in the formula, E is a Cost Function (Cost Function), and the Mean Squared Error (MSE) is adopted in the present invention. Omega is weight matrix of Elman neural network, delta omegakIs the change of the weight value at the k-th update, and eta isLearning Rate (Learning Rate), alpha is a Momentum Constant (Momentum Constant), alpha is more than or equal to 0 and less than 1, and alpha is 0.9. The update formula for the learning rate at each iteration is as follows:
η(k)=η(k-1)(1+c·cosθ) (16)
in the formula, c is a constant, and the invention takes 0.2. Theta is the most rapid descending direction
The last weight change Δ ω
k-1The included angle therebetween.
And 5: single model prediction of runoff
5.1 normalizing the main component factor sequence extracted in the step 3.5 and the historical runoff sequence of the area to be predicted according to the step 2.1, and then dividing the normalized main component factor sequence into a training sample and a test sample, wherein the normalization formula is as follows:
wherein z is normalized data, zmax=1,zmin=-1,z∈[-1,1]Q is one of the original runoff sequence or the principal component sequence, qminIs the minimum value in the sequence in which q is located, qmaxIs the maximum value in the sequence in which q is located.
And 5.2, taking the factor data in the training sample as the input of the network, taking the historical runoff data in the training sample as the output of the network, and carrying out supervised learning training on the network.
And 5.3, for the trained network, using the factors in the test sample as the input of the network, and testing the prediction effect of the network. And (5) carrying out reverse normalization on the test result to obtain a predicted runoff value.
5.4 using average Absolute Percentage Error (MAPE), Relative Error (RE), Maximum Relative Error (MRE) and Qualification Rate (QR) as forecast evaluation indexes, the calculation formula of each index is as follows:
in formulae (18), (19) and (20)
Run-off value, x, predicted for the test period
iJ is the number of samples tested for the corresponding actual flow value.
In the formula (21), TQualFor a qualified number of forecasts, TtotalThe total forecast times. According to the scheme of evaluating the forecasting precision of the medium and long term runoff in hydrologic information forecasting Specification (GI3/T22482-2008), the forecasting method takes forecasting with the maximum relative error smaller than 20% as qualified forecasting.
Step 6: ensemble prediction of runoff
In step 4, the gradient descent search of the Elman network weight space is realized by using a back propagation algorithm, and the error between the actual value of the historical runoff data and the predicted value of the network is reduced in an iterative manner. However, the error surface may contain a plurality of different local minimum values, and during the gradient descent search process of the Elman network weight space, the local minimum value point may be stayed, not necessarily the global minimum value point. Therefore, even though the structure of each trained Elman network is the same, the connection weight parameters of the model are different, which causes the difference between the prediction results of each single Elman network model. In order to reduce the deviation of the prediction result caused by the uncertainty of the model parameters, the invention conducts single-model prediction of the runoff for multiple times, and takes the average value of the multiple prediction results as the final prediction result.
Compared with the prior art, the invention has the advantages that:
(1) the initial selection method of the forecast factors based on the standard mutual information can reflect the linear relation among variables and reflect the nonlinear relation among the variables, the selected factors are more representative, and the defects of the traditional method for screening the factors based on linear correlation analysis are overcome;
(2) kernel Principal Component Analysis (KPCA) is a nonlinear extension of Principal Component Analysis (PCA), i.e., the original vector is mapped to a high-dimensional feature space F by a mapping function phi, and PCA analysis is performed on F. The linear inseparable data in the original space can be almost linearly separable in a high-dimensional feature space, PCA is performed in the high-dimensional space, and the extracted principal components are more representative. Therefore, the feature extraction method based on KPCA greatly improves the processing capacity of nonlinear data, and has more advantages compared with the traditional feature extraction method based on PCA. In addition, the principal components extracted by KPAC are mutually orthogonal, and the data are subjected to noise reduction and redundancy removal, so that overfitting of a neural network can be well prevented, and the generalization capability of the network is improved;
(3) the artificial neural network has good robustness and strong nonlinear mapping and self-learning capabilities, and can well dig the internal relation between the forecasting factors and the forecasting objects. The Elman neural network selected by the invention is a typical dynamic regression network, and compared with a common forward neural network (such as a BP neural network), a carrying layer is additionally arranged. The adaptation layer can record information of the last network iteration and use the information as input of the current iteration, so that the Elman network is more suitable for prediction of time series data. In addition, the neural network has the problem of uncertainty of parameters, and in order to reduce the uncertainty of prediction, a multi-model ensemble prediction method is adopted, so that the prediction precision is improved;
(4) the NMI for factor initial selection, the KPCA for principal component extraction and the Elman neural network for runoff forecasting all have the capacity of processing nonlinear data except linear data, and the three methods are combined together, so that the limitation of the traditional method can be overcome, and the stability and accuracy of forecasting can be improved.
Drawings
FIG. 1 is a general flow diagram of the present invention;
fig. 2 is a block diagram of the Elman network.
Detailed Description
The invention is further elucidated with reference to the drawings and the embodiments.
FIG. 1 is an overall flow chart of the present invention. Taking prediction of annual average runoff of a reservoir of a silk screen first-level hydropower station as an example, the method can be divided into six steps according to a flow chart, and the steps are as follows:
step 1: data pre-processing
1.1 collecting historical runoff data of an area to be predicted and meteorological hydrological data which can be used as a forecasting factor, wherein the commonly used meteorological hydrological data comprise indexes such as atmospheric circulation characteristics, high-altitude air pressure fields, sea surface temperature and the like. The data adopted by the embodiment comprises annual average runoff data of a cross-screen primary hydropower station reservoir section in 1960-2011 and annual 74-item circulation characteristic quantity data in 1959-2010.
1.2, as the annual average runoff is forecasted, factors cannot be selected from the time of the same year, and meanwhile, the influence of meteorological factors on runoff has hysteresis, so that according to the table 1, a one-to-one correspondence relationship between annual average runoff of a silk screen primary hydropower station (1960-2011) and 74 atmospheric circulation indexes of the previous year (1959-2010) is established. The corresponding relation between a certain atmospheric circulation index time series and a runoff time series is shown in a table 2, and other indexes are similar.
TABLE 2 corresponding relationship between a certain atmospheric circulation index time series and a runoff time series
Step 2: forecasting factor primary selection based on standard mutual information
2.1 dividing the index time sequence and the annual runoff time sequence into two parts, wherein one part is used as a training sample of the neural network, and the other part is used as a test sample of the trained neural network. This example uses the data of the first 47 years as training samples and the data of the last 5 years as test samples.
2.2 calculate the mutual information MI. And respectively calculating mutual information of the monthly time sequence of each index and the corresponding runoff time sequence of the training sample data. For this example, the mutual information between the annual average runoff series incolumn 1 of table 2 and the index time series in the remaining columns of the table was calculated according to equation (1). It is noted that to check the reliability of the effect, only the training sample data is used to calculate the mutual information, thereby screening the preliminary predictor. It is checked that sample data should not be added.
2.3 calculate the normalized mutual information NMI, i.e. map the MI values calculated in step 2.2 between 0 and 1 with (2), (3) and (4).
2.4 Significance Test of mutual information (Significance Test). In this embodiment, the bootstrap method is adopted to perform mutual information verification, and includes the following steps:
2.4.1 calculating the standard mutual information NMI value of the original runoff time sequence and the index time sequence;
2.4.2 randomly disordering the sequence of the two time sequences for 100 times, calculating the NMI values after disorder and arranging the NMI values in the sequence from big to small;
2.4.3 taking the probability quantile of the NMI in the sequential arrangement as the NMI threshold value corresponding to the significance level of the probability;
2.4.4, if the NMI value of the original time series is greater than a certain probability threshold (95% is taken in this embodiment), the two groups of data are considered to be significantly related.
2.5, selecting the index which passes the significance test and has the standard mutual information larger than a certain threshold (0.9 is taken in the embodiment) as the forecasting factor of the initial selection. In this embodiment, there are 205 indexes with standard mutual information greater than 0.9, and the information of the first 20 indexes is as follows:
TABLE 3 first 20 Primary selected predictor factors
| Factor of initial selection | NMI | MI |
| Black sun in 8 months | 0.988375 | 5.426929 |
| Black sun of 4 months | 0.988375 | 5.426929 |
| Sun black for 7 months | 0.988375 | 5.426929 |
| Sun black for 10 months | 0.988375 | 5.426929 |
| 12 months old Taiyang black boy | 0.988375 | 5.426929 |
| Sun black for 2 months | 0.98444 | 5.384376 |
| Sun black for 9 months | 0.98444 | 5.384376 |
| 11-month-sun black boy | 0.98444 | 5.384376 |
| Sun black for 1 month | 0.98444 | 5.384376 |
| 3 month sun black | 0.98444 | 5.384376 |
| Sun black for 5 months | 0.98444 | 5.384376 |
| 8 moon northern hemisphere pair high strength index (5E-360) | 0.980474 | 5.341823 |
| 3 North moon hemisphere polar vortex area index (5 zone, 0-360) | 0.980474 | 5.341823 |
| North African North American high Strength index of 6 months (110W-60E) | 0.976477 | 5.299270 |
| 6-moon northern hemisphere pair high strength index (5E-360) | 0.976291 | 5.256717 |
| 4-moon north hemisphere pair high strength index (5E-360) | 0.972448 | 5.256717 |
| North African North American high Strength index of 7 months (110W-60E) | 0.972448 | 5.256717 |
| North African North America of 9 months high Strength index (110W-60E) | 0.972448 | 5.256717 |
| Sun black for 6 months | 0.972448 | 5.256717 |
| Pacific pair high intensity index in 6 months (110E-115W) | 0.970919 | 5.240655 |
And step 3: and (5) performing kernel principal component analysis, and selecting principal components as forecasting factors. This example selects 205 factor sequences in step 2.5, which often have multiple collinearity between them. The weight matrix of the neural network is increased due to the multiple collinearity forecasting factor, and the training speed and generalization capability of the neural network are directly affected by repeated information and noise, so that feature extraction, noise reduction and redundancy removal are required. In the embodiment, a radial basis kernel function is selected as a kernel function of kernel principal component analysis, principal components are calculated according to formulas (5), (6), (9), (11), (12), (13) and (14), the obtained principal components are arranged according to the descending order of the variance contribution rate values, the variance contribution rates of the first 5 extracted principal components are shown in table 4, and the data of the corresponding first 5 principal layers are shown in table 5.
TABLE 4 variance contribution ratio of first 5 principal components
| Principal component | Principal component _1 | Principal component _2 | Principal component _3 | Principal component _4 | Principal component _5 |
| Variance contribution rate | 25.7% | 6.9% | 5.6% | 5.1% | 3.9% |
TABLE 5 first 5 principal Components extracted by KPCA
In this embodiment, a trial-and-error method is used to determine which principal components are selected as the forecasting factors. Repeated tests show that when the first two main components are selected as the forecasting factors, the forecasting effect in the inspection period is the best, and the first two main components are finally determined to be selected as the forecasting factors. It is to be noted that, in order to make the criteria used when the training sample and the test sample extract the principal components consistent, it is necessary to combine the training sample sequence and the test sample sequence together for KPCA. In this embodiment, the length of the training sample sequence is 47, the length of the test sample sequence is 5, and the length of the combination of the sequence sample and the test sample sequence is 52, so that the sequence length of the principal component extracted in table 4 is 52.
And 4, step 4: construction of Elman neural network model
4.1 the Elman network model is constructed, and the network structure (namely the number of nodes of each layer of the network) is determined firstly. The method for determining each layer of nodes of the network comprises the following steps:
(1) the number of nodes of the Input Layer (Input Layer) is equal to the number of the prediction factors. In the embodiment, the first two main components are selected as the forecasting factors, so that the number of nodes of the input layer of the Elman neural network is 2;
(2) the number of output layer nodes is equal to the number of forecast objects, in this embodiment, the annual average runoff is subjected to single value forecast, so the number of output layer nodes is 1;
(3) the number of the carrying layer nodes is equal to the number of the hidden layer nodes;
(4) the number of hidden layer nodes has an important influence on the generalization performance of the network, but no system and standard method for determining the number of hidden layer nodes exists at present. One better choice is a trial-and-error method, i.e., the number of hidden layer nodes is determined by observing the network forecast effect by adopting different numbers of hidden layer nodes. In this embodiment, because the KPCA is used to perform noise reduction and redundancy removal on the factor data in the early stage, and the obtained principal components are orthogonal to each other, so that overfitting of the neural network can be prevented effectively, when the number of nodes in the hidden layer is 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, and 15, respectively, the relative error forecasted in the inspection period is within 20%, and the network is very stable and has good generalization capability. Through repeated experiments, when the number of the hidden layer nodes is 10, the maximum relative error of the forecast in the check period is reduced to 15%, and therefore the number of the hidden layer nodes is determined to be 10.
4.2 an Elman network model is constructed, and a training algorithm for determining the network is also needed. In the embodiment, the weight of the network is updated by adopting a back propagation algorithm and a gradient descent algorithm which drives the vector items and has a self-adaptive learning rate. The weight value updating formula is shown in formula (15) and formula (16).
And 5: single model prediction of runoff
5.1 according to the step 2.1, normalizing the main component factor sequence extracted in the step 3.5 and the historical runoff sequence of the area to be predicted according to the formula (1), and then dividing into a training sample and a test sample. In this embodiment, the data of the two principal component sequences selected in step 3 and the data of the golden screen primary hydropower station in the first 47 years are used as training samples, and the data of the last 5 years are used as test samples.
And 5.2, taking the factor data in the training sample as the input of the network, taking the historical runoff data in the training sample as the output of the network, and carrying out supervised learning training on the network. The learning process can be summarized as follows:
(1) and initializing connection weight coefficients between each layer of the network by adopting a random Function, and determining an error allowed by a Cost Function (Cost Function). The cost function of this embodiment adopts Mean Squared Error (MSE);
(2) inputting a learning sample to the network, calculating a value E of a mean square error function by combining an algorithm, and updating a connection weight value between each layer of the network according to the E;
(3) and (5) when the value of E is larger than the value of E, turning to the step (2), otherwise, finishing learning, and calculating the network output.
And 5.3, for the trained network, using the factor data in the test sample as the input of the network to test the prediction effect of the network. And (5) carrying out reverse normalization on the test result to obtain a predicted runoff value.
5.4, the average Absolute Percentage Error (MAPE), the large Relative Error (MRE) and the Qualification Rate (QR) are used as forecast evaluation indexes, and all the indexes are calculated according to the formulas (17), (18), (19) and (20). in order to verify the generalization ability and the forecast stability of the network model in the invention, the embodiment carries out 100 times of single model forecast, and the result shows that the Maximum Relative Error forecasted in each inspection period is within 16 percent, and the qualification Rate reaches 100 percent, thereby showing that the network model used in the invention has good generalization ability and forecast stability, wherein the Error statistics of the forecast in the previous 5 inspection periods is shown in Table 6.
TABLE 6 Single model test period prediction error statistics
Step 6: ensemble prediction of runoff
In order to reduce the deviation of the prediction result caused by the uncertainty of the model parameters, the method carries out single model prediction of the runoff for multiple times, and takes the average value of the multiple prediction results as the final prediction result. In this embodiment, the average of the results of 100 forecasts can be used as the final forecast result.
The above description is only an example of the present invention and is not intended to limit the present invention. All equivalents which come within the spirit of the invention are therefore intended to be embraced therein. Details not described herein are well within the skill of those in the art.