2.1 dividing the index time sequence and the annual runoff time sequence into two parts, wherein one part is used as a training sample of the neural network, and the other part is used as a test sample of the trained neural network. For example, the data of the previous 50 years is used as a training sample, and the data of the next 5 years is used as a test sample.

2.2 calculating mutual information. And respectively calculating mutual information of each index time sequence and the corresponding runoff time sequence for the training sample data. Taking the data in table 1 as an example, the mutual information between the 1 st column of the table and the remaining columns of the table is calculated. The formula for the mutual information MI is as follows:

wherein X is a runoff time series, and X ═ X₁,x₂,x₃...x_n)^TTime series with Y as index, Y ═ Y₁,y₂,y₃...y_n)^TMolecule p (x)_i,y_j) Is the joint distribution law of X and Y, p (X)_i)、p(y_j) The edge distribution laws for X and Y, respectively.

2.3 calculating standard mutual information. The mutual information is normalized, i.e. the MI value of step 2.2 is mapped between 0 and 1 using entropy as denominator. The formula for calculating the standard mutual information NMI is as follows:

wherein, H (X) and H (Y) are the entropy of X and Y respectively, and the calculation formulas of H (X) and H (Y) are as follows:

2.4 Standard mutual information Significance Test (Significance Test). The bootstrap method is adopted to carry out the inspection of the standard mutual information, and comprises the following steps:

2.4.1 calculating the standard mutual information NMI value of the original runoff time sequence and the index time sequence;

2.4.2 randomly and simultaneously disordering the sequence of two corresponding time sequences for K times (generally 100 times), calculating the NMI values after disorder and arranging the NMI values in the sequence from big to small;

2.4.3 taking the probability quantile of the NMI in the sequential arrangement as the NMI threshold value corresponding to the significance level of the probability;

2.4.4 if the NMI value of the original time series is greater than the NMI value corresponding to a certain probability threshold (generally 95%), the two sets of data are considered to be significantly correlated.

2.5, selecting the index which passes the significance test and has standard mutual information larger than a certain threshold (generally 0.9, but has difference according to the difference of time sequence length and can be adjusted by self) as the pre-selected forecasting factor.

And step 3: performing kernel principal component analysis to extract principal component

3.1 standardize the primarily selected forecasting factor data z-score, and the calculation formula is as follows:

in the formula, y^*And d, normalizing the data by z-score, wherein y is one item in the primarily selected forecast factor data, mu is the mean value of the time series where y is located, and sigma is the standard deviation of the time series where y is located.

3.2 calculating the kernel matrix K of the forecasting factors initially selected in the step 2.5. K is an n x n matrix, the element K of the ith row and the jth column_i,jThe calculation formula of (a) is as follows:

in the formula, the first step is that,

is a column vector representing the normalized time series of different initially selected predictor z-score, k is a kernel function, and the following commonly used kernel functions are:

linear Kernel (Linear Kernel):

② Polynomial Kernel (Polynomial Kernel):

③ Radial Basis Function (Radial Basis Function):

sigmoid nucleus (Sigmoid Kernel):

b, c, p, u and xi in the formulas (8), (9) and (10) are all constants and are parameters of various kernel functions.

3.3 calculate the centered kernel matrix. K for the core matrix after centralization_cIs represented by K_cIs a matrix of n × n, K_cThe calculation formula is as follows:

K_c＝K-J·K-K·J+J·K·J (11)

in formula (11), J is an n × n matrix, and J has the form:

3.4 computing the centered Kernel matrix K_cThe characteristic values and the characteristic vectors are arranged according to the sequence from big to small, and the sequence of the characteristic vectors is correspondingly adjusted according to the characteristic values. The eigenvalue matrix obtained after sorting is Λ, the eigenvector matrix is U, and the expression is as follows:

3.5 calculate the normalized eigenvector matrix A, which is of the form:

wherein

3.6 extracting principal components, wherein the principal component matrix is an n multiplied by n square matrix. The first 2 to 3 principal components are typically extracted as predictors. The calculation formula of the ith principal component is as follows:

KPC in formula_i＝(kpc_i1,kpc_i2,...,kpc_in)，K_CThe resulting centered kernel matrix is calculated for step 3.2.

And 4, step 4: construction of Elman neural network model

4.1 the Elman network model is constructed, and the network structure (namely the number of nodes of each layer of the network) is determined firstly. The structure of the Elman network is shown in the attached figure 2 of the specification. The method for determining each layer of nodes of the network comprises the following steps: the number of nodes of an Input Layer (Input Layer) is equal to the number of the prediction factors; the number of Output Layer (Output Layer) nodes is equal to the number of forecast objects; the number of nodes of the accepting Layer (Context Layer) is equal to the number of nodes of the Hidden Layer (Hidden Layer); the number of hidden layer nodes has an important influence on the generalization performance of the network, but no system and standard method for determining the number of hidden layer nodes exists at present. One better choice is a trial-and-error method, i.e., the number of hidden layer nodes is determined by observing the network forecast effect by adopting different numbers of hidden layer nodes.

4.2 an Elman network model is constructed, and a training algorithm for determining the network is also needed. The invention adopts a back propagation algorithm and a gradient descent algorithm which drives the quantity items and has self-adaptive learning rate to update the weight of the network. The weight value updating formula is as follows:

in the formula, E is a Cost Function (Cost Function), and the Mean Squared Error (MSE) is adopted in the present invention. Omega is weight matrix of Elman neural network, delta omega_kIs the change of the weight value at the k-th update, and eta isLearning Rate (Learning Rate), alpha is a Momentum Constant (Momentum Constant), alpha is more than or equal to 0 and less than 1, and alpha is 0.9. The update formula for the learning rate at each iteration is as follows:

η(k)＝η(k-1)(1+c·cosθ) (16)

in the formula, c is a constant, and the invention takes 0.2. Theta is the most rapid descending direction

The last weight change Δ ω_k-1The included angle therebetween.

And 5: single model prediction of runoff

5.1 normalizing the main component factor sequence extracted in the step 3.5 and the historical runoff sequence of the area to be predicted according to the step 2.1, and then dividing the normalized main component factor sequence into a training sample and a test sample, wherein the normalization formula is as follows:

wherein z is normalized data, z_max＝1，z_min＝-1，z∈[-1,1]Q is one of the original runoff sequence or the principal component sequence, q_minIs the minimum value in the sequence in which q is located, q_maxIs the maximum value in the sequence in which q is located.

And 5.2, taking the factor data in the training sample as the input of the network, taking the historical runoff data in the training sample as the output of the network, and carrying out supervised learning training on the network.

And 5.3, for the trained network, using the factors in the test sample as the input of the network, and testing the prediction effect of the network. And (5) carrying out reverse normalization on the test result to obtain a predicted runoff value.

5.4 using average Absolute Percentage Error (MAPE), Relative Error (RE), Maximum Relative Error (MRE) and Qualification Rate (QR) as forecast evaluation indexes, the calculation formula of each index is as follows:

in formulae (18), (19) and (20)

Run-off value, x, predicted for the test period_iJ is the number of samples tested for the corresponding actual flow value.

In the formula (21), T_QualFor a qualified number of forecasts, T_totalThe total forecast times. According to the scheme of evaluating the forecasting precision of the medium and long term runoff in hydrologic information forecasting Specification (GI3/T22482-2008), the forecasting method takes forecasting with the maximum relative error smaller than 20% as qualified forecasting.

Step 6: ensemble prediction of runoff

In step 4, the gradient descent search of the Elman network weight space is realized by using a back propagation algorithm, and the error between the actual value of the historical runoff data and the predicted value of the network is reduced in an iterative manner. However, the error surface may contain a plurality of different local minimum values, and during the gradient descent search process of the Elman network weight space, the local minimum value point may be stayed, not necessarily the global minimum value point. Therefore, even though the structure of each trained Elman network is the same, the connection weight parameters of the model are different, which causes the difference between the prediction results of each single Elman network model. In order to reduce the deviation of the prediction result caused by the uncertainty of the model parameters, the invention conducts single-model prediction of the runoff for multiple times, and takes the average value of the multiple prediction results as the final prediction result.

Compared with the prior art, the invention has the advantages that:

(1) the initial selection method of the forecast factors based on the standard mutual information can reflect the linear relation among variables and reflect the nonlinear relation among the variables, the selected factors are more representative, and the defects of the traditional method for screening the factors based on linear correlation analysis are overcome;

(2) kernel Principal Component Analysis (KPCA) is a nonlinear extension of Principal Component Analysis (PCA), i.e., the original vector is mapped to a high-dimensional feature space F by a mapping function phi, and PCA analysis is performed on F. The linear inseparable data in the original space can be almost linearly separable in a high-dimensional feature space, PCA is performed in the high-dimensional space, and the extracted principal components are more representative. Therefore, the feature extraction method based on KPCA greatly improves the processing capacity of nonlinear data, and has more advantages compared with the traditional feature extraction method based on PCA. In addition, the principal components extracted by KPAC are mutually orthogonal, and the data are subjected to noise reduction and redundancy removal, so that overfitting of a neural network can be well prevented, and the generalization capability of the network is improved;

(3) the artificial neural network has good robustness and strong nonlinear mapping and self-learning capabilities, and can well dig the internal relation between the forecasting factors and the forecasting objects. The Elman neural network selected by the invention is a typical dynamic regression network, and compared with a common forward neural network (such as a BP neural network), a carrying layer is additionally arranged. The adaptation layer can record information of the last network iteration and use the information as input of the current iteration, so that the Elman network is more suitable for prediction of time series data. In addition, the neural network has the problem of uncertainty of parameters, and in order to reduce the uncertainty of prediction, a multi-model ensemble prediction method is adopted, so that the prediction precision is improved;

(4) the NMI for factor initial selection, the KPCA for principal component extraction and the Elman neural network for runoff forecasting all have the capacity of processing nonlinear data except linear data, and the three methods are combined together, so that the limitation of the traditional method can be overcome, and the stability and accuracy of forecasting can be improved.

Drawings

FIG. 1 is a general flow diagram of the present invention;

fig. 2 is a block diagram of the Elman network.

Detailed Description

The invention is further elucidated with reference to the drawings and the embodiments.

FIG. 1 is an overall flow chart of the present invention. Taking prediction of annual average runoff of a reservoir of a silk screen first-level hydropower station as an example, the method can be divided into six steps according to a flow chart, and the steps are as follows:

step 1: data pre-processing

1.1 collecting historical runoff data of an area to be predicted and meteorological hydrological data which can be used as a forecasting factor, wherein the commonly used meteorological hydrological data comprise indexes such as atmospheric circulation characteristics, high-altitude air pressure fields, sea surface temperature and the like. The data adopted by the embodiment comprises annual average runoff data of a cross-screen primary hydropower station reservoir section in 1960-2011 and annual 74-item circulation characteristic quantity data in 1959-2010.

1.2, as the annual average runoff is forecasted, factors cannot be selected from the time of the same year, and meanwhile, the influence of meteorological factors on runoff has hysteresis, so that according to the table 1, a one-to-one correspondence relationship between annual average runoff of a silk screen primary hydropower station (1960-2011) and 74 atmospheric circulation indexes of the previous year (1959-2010) is established. The corresponding relation between a certain atmospheric circulation index time series and a runoff time series is shown in a table 2, and other indexes are similar.

TABLE 2 corresponding relationship between a certain atmospheric circulation index time series and a runoff time series

2.1 dividing the index time sequence and the annual runoff time sequence into two parts, wherein one part is used as a training sample of the neural network, and the other part is used as a test sample of the trained neural network. This example uses the data of the first 47 years as training samples and the data of the last 5 years as test samples.

2.2 calculate the mutual information MI. And respectively calculating mutual information of the monthly time sequence of each index and the corresponding runoff time sequence of the training sample data. For this example, the mutual information between the annual average runoff series incolumn 1 of table 2 and the index time series in the remaining columns of the table was calculated according to equation (1). It is noted that to check the reliability of the effect, only the training sample data is used to calculate the mutual information, thereby screening the preliminary predictor. It is checked that sample data should not be added.

2.3 calculate the normalized mutual information NMI, i.e. map the MI values calculated in step 2.2 between 0 and 1 with (2), (3) and (4).

2.4 Significance Test of mutual information (Significance Test). In this embodiment, the bootstrap method is adopted to perform mutual information verification, and includes the following steps:

2.4.2 randomly disordering the sequence of the two time sequences for 100 times, calculating the NMI values after disorder and arranging the NMI values in the sequence from big to small;

2.4.4, if the NMI value of the original time series is greater than a certain probability threshold (95% is taken in this embodiment), the two groups of data are considered to be significantly related.

2.5, selecting the index which passes the significance test and has the standard mutual information larger than a certain threshold (0.9 is taken in the embodiment) as the forecasting factor of the initial selection. In this embodiment, there are 205 indexes with standard mutual information greater than 0.9, and the information of the first 20 indexes is as follows:

TABLE 3 first 20 Primary selected predictor factors

Factor of initial selection	NMI	MI
			Black sun in 8 months	0.988375	5.426929
Black sun of 4 months	0.988375	5.426929
			Sun black for 7 months	0.988375	5.426929
Sun black for 10 months	0.988375	5.426929
			12 months old Taiyang black boy	0.988375	5.426929
Sun black for 2 months	0.98444	5.384376
			Sun black for 9 months	0.98444	5.384376
11-month-sun black boy	0.98444	5.384376
			Sun black for 1 month	0.98444	5.384376
3 month sun black	0.98444	5.384376
			Sun black for 5 months	0.98444	5.384376
8 moon northern hemisphere pair high strength index (5E-360)	0.980474	5.341823
			3 North moon hemisphere polar vortex area index (5 zone, 0-360)	0.980474	5.341823
North African North American high Strength index of 6 months (110W-60E)	0.976477	5.299270
			6-moon northern hemisphere pair high strength index (5E-360)	0.976291	5.256717
4-moon north hemisphere pair high strength index (5E-360)	0.972448	5.256717
			North African North American high Strength index of 7 months (110W-60E)	0.972448	5.256717
North African North America of 9 months high Strength index (110W-60E)	0.972448	5.256717
			Sun black for 6 months	0.972448	5.256717
Pacific pair high intensity index in 6 months (110E-115W)	0.970919	5.240655

And step 3: and (5) performing kernel principal component analysis, and selecting principal components as forecasting factors. This example selects 205 factor sequences in step 2.5, which often have multiple collinearity between them. The weight matrix of the neural network is increased due to the multiple collinearity forecasting factor, and the training speed and generalization capability of the neural network are directly affected by repeated information and noise, so that feature extraction, noise reduction and redundancy removal are required. In the embodiment, a radial basis kernel function is selected as a kernel function of kernel principal component analysis, principal components are calculated according to formulas (5), (6), (9), (11), (12), (13) and (14), the obtained principal components are arranged according to the descending order of the variance contribution rate values, the variance contribution rates of the first 5 extracted principal components are shown in table 4, and the data of the corresponding first 5 principal layers are shown in table 5.

TABLE 4 variance contribution ratio of first 5 principal components

Principal component	Principal component _1	Principal component _2	Principal component _3	Principal component _4	Principal component _5
						Variance contribution rate	25.7％	6.9％	5.6％	5.1％	3.9％

TABLE 5 first 5 principal Components extracted by KPCA

In this embodiment, a trial-and-error method is used to determine which principal components are selected as the forecasting factors. Repeated tests show that when the first two main components are selected as the forecasting factors, the forecasting effect in the inspection period is the best, and the first two main components are finally determined to be selected as the forecasting factors. It is to be noted that, in order to make the criteria used when the training sample and the test sample extract the principal components consistent, it is necessary to combine the training sample sequence and the test sample sequence together for KPCA. In this embodiment, the length of the training sample sequence is 47, the length of the test sample sequence is 5, and the length of the combination of the sequence sample and the test sample sequence is 52, so that the sequence length of the principal component extracted in table 4 is 52.

And 4, step 4: construction of Elman neural network model

4.1 the Elman network model is constructed, and the network structure (namely the number of nodes of each layer of the network) is determined firstly. The method for determining each layer of nodes of the network comprises the following steps:

(1) the number of nodes of the Input Layer (Input Layer) is equal to the number of the prediction factors. In the embodiment, the first two main components are selected as the forecasting factors, so that the number of nodes of the input layer of the Elman neural network is 2;

(2) the number of output layer nodes is equal to the number of forecast objects, in this embodiment, the annual average runoff is subjected to single value forecast, so the number of output layer nodes is 1;

(3) the number of the carrying layer nodes is equal to the number of the hidden layer nodes;

(4) the number of hidden layer nodes has an important influence on the generalization performance of the network, but no system and standard method for determining the number of hidden layer nodes exists at present. One better choice is a trial-and-error method, i.e., the number of hidden layer nodes is determined by observing the network forecast effect by adopting different numbers of hidden layer nodes. In this embodiment, because the KPCA is used to perform noise reduction and redundancy removal on the factor data in the early stage, and the obtained principal components are orthogonal to each other, so that overfitting of the neural network can be prevented effectively, when the number of nodes in the hidden layer is 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, and 15, respectively, the relative error forecasted in the inspection period is within 20%, and the network is very stable and has good generalization capability. Through repeated experiments, when the number of the hidden layer nodes is 10, the maximum relative error of the forecast in the check period is reduced to 15%, and therefore the number of the hidden layer nodes is determined to be 10.

4.2 an Elman network model is constructed, and a training algorithm for determining the network is also needed. In the embodiment, the weight of the network is updated by adopting a back propagation algorithm and a gradient descent algorithm which drives the vector items and has a self-adaptive learning rate. The weight value updating formula is shown in formula (15) and formula (16).

And 5: single model prediction of runoff

5.1 according to the step 2.1, normalizing the main component factor sequence extracted in the step 3.5 and the historical runoff sequence of the area to be predicted according to the formula (1), and then dividing into a training sample and a test sample. In this embodiment, the data of the two principal component sequences selected in step 3 and the data of the golden screen primary hydropower station in the first 47 years are used as training samples, and the data of the last 5 years are used as test samples.

And 5.2, taking the factor data in the training sample as the input of the network, taking the historical runoff data in the training sample as the output of the network, and carrying out supervised learning training on the network. The learning process can be summarized as follows:

(1) and initializing connection weight coefficients between each layer of the network by adopting a random Function, and determining an error allowed by a Cost Function (Cost Function). The cost function of this embodiment adopts Mean Squared Error (MSE);

(2) inputting a learning sample to the network, calculating a value E of a mean square error function by combining an algorithm, and updating a connection weight value between each layer of the network according to the E;

(3) and (5) when the value of E is larger than the value of E, turning to the step (2), otherwise, finishing learning, and calculating the network output.

And 5.3, for the trained network, using the factor data in the test sample as the input of the network to test the prediction effect of the network. And (5) carrying out reverse normalization on the test result to obtain a predicted runoff value.

5.4, the average Absolute Percentage Error (MAPE), the large Relative Error (MRE) and the Qualification Rate (QR) are used as forecast evaluation indexes, and all the indexes are calculated according to the formulas (17), (18), (19) and (20). in order to verify the generalization ability and the forecast stability of the network model in the invention, the embodiment carries out 100 times of single model forecast, and the result shows that the Maximum Relative Error forecasted in each inspection period is within 16 percent, and the qualification Rate reaches 100 percent, thereby showing that the network model used in the invention has good generalization ability and forecast stability, wherein the Error statistics of the forecast in the previous 5 inspection periods is shown in Table 6.

TABLE 6 Single model test period prediction error statistics

Step 6: ensemble prediction of runoff

In order to reduce the deviation of the prediction result caused by the uncertainty of the model parameters, the method carries out single model prediction of the runoff for multiple times, and takes the average value of the multiple prediction results as the final prediction result. In this embodiment, the average of the results of 100 forecasts can be used as the final forecast result.

The above description is only an example of the present invention and is not intended to limit the present invention. All equivalents which come within the spirit of the invention are therefore intended to be embraced therein. Details not described herein are well within the skill of those in the art.

Claims

1. A method for forecasting medium and long term runoff based on mutual information-kernel principal component analysis-Elman network comprises mutual information, kernel principal component analysis and Elman neural network, and is characterized by comprising the following steps:

s1, constructing a forecasting factor matrix, wherein the forecasting factors adopt 74 circulation characteristic indexes or 130 new atmospheric monitoring indexes, and the 130 new atmospheric monitoring indexes comprise 88 monthly atmospheric circulation indexes, 26 monthly sea temperature indexes and 16 monthly other indexes; the forecasting factors are divided according to a month scale, and a 74 x 12 or 130 x 12 forecasting factor matrix is established by considering that the lag time t is 1-12 months;

s2, calculating the mutual information of the forecasting factor matrix and the runoff time sequence by adopting a mutual information method, which comprises the following steps:

s201, calculating the mutual information of the forecasting factor matrix and the runoff time sequence, and specifically adopting the following formula to calculate:

wherein X is a runoff time series, and X ═ X₁,x₂,x₃...x_n)^TTime series with Y as index, Y ═ Y₁,y₂,y₃...y_n)^TMolecule p (x)_i,y_j) Is the joint distribution law of X and Y, p (X)_i)、p(y_j) Edge distribution laws of X and Y, respectively;

s202, after mutual information is calculated, the mutual information value is mapped between 0 and 1, and the following formula is specifically adopted for calculation:

wherein H (X) and H (Y) are the entropy of X and Y respectively, and H (X) is similar to H (Y) in calculation formula;

s203, checking the calculated standard mutual information by adopting a bootstrap method, randomly and simultaneously disordering two corresponding time sequence sequences for K times, calculating NMI values after each disordering, arranging the NMI values in a descending order, taking the probability quantiles of the NMI values arranged in sequence as NMI thresholds corresponding to the probability significance level, if the original time sequence NMI value is greater than the NMI value corresponding to a certain probability threshold, considering that the two groups of data are significantly related, and selecting an index which passes significance checking and has the standard mutual information greater than a certain threshold as a pre-selection forecasting factor;

s3, performing dimensionality reduction on the primarily selected forecasting factors by using kernel principal component analysis, and extracting principal components, wherein the method specifically comprises the following steps:

s301, standardizing the primarily selected forecasting factor data z-score, and specifically adopting the following formula to calculate:

in the formula, y^*The data is normalized by z-score, y is one item in the primarily selected forecast factor data, mu is the mean value of the time series corresponding to y, and sigma is the standard deviation of the time series corresponding to y;

s302, calculating a kernel matrix K of the primarily selected forecasting factors, wherein K is an n multiplied by n matrix, and an element K of the ith row and the jth column_i,jThe calculation formula of (a) is as follows:

in the formula (I), the compound is shown in the specification,

is a column vector representing the normalized time series of different initially selected predictor z-score, k is a kernel function, and the optional kernel function has a linear kernel function

Polynomial kernel function

Radial basis kernel function

Sigmoid kernel function

B, c, p, upsilon and ξ in the kernel function formula are constants and are parameters of various kernel functions;

s303, calculating a centralized kernel matrix K_c，K_cIs a matrix of n × n, K_cThe calculation formula is as follows:

K_c＝K-J·K-K·J+J·K·J

in the formula (I), the compound is shown in the specification,

j is a matrix of n × n;

s304, calculating a core matrix K after centralization_cThe eigenvalues and the eigenvectors are arranged according to the sequence from big to small, the sequence of the eigenvectors is correspondingly adjusted according to the eigenvalues, and the eigenvalue matrix obtained after the sequencing is

The eigenvector matrix is

S305, calculating a normalized feature vector matrix

Wherein

S306, extracting principal components, wherein the principal component matrix is an n multiplied by n square matrix, the first 2 to 3 principal components are extracted as forecasting factors, and the calculation formula of the ith principal component is as follows:

in the formula, KPC_i＝(kpc_i1,kpc_i2,...,kpc_in)，K_CFor the centered kernel matrix calculated in step S303, i is 1, 2.. n;

s4, constructing an Elman network model, determining the structure parameters of the network and a training algorithm, wherein the structure parameters comprise the number of input layer nodes and the number of hidden layer nodes, and the method specifically comprises the following steps:

s401, arranging the obtained principal components in a sequence from large variance contribution rate to small variance contribution rate, combining different principal components and the number of nodes of the hidden layer by adopting a trial-and-error method, and determining the optimal number of the principal components and the nodes of the hidden layer according to the forecast error in the inspection period;

s402, adopting a back propagation algorithm and a gradient descent algorithm with a driving quantity item and a self-adaptive learning rate to update the weight of the network, wherein the specific formula is as follows:

η(k)＝η(k-1)(1+c·cosθ)

in the formula, E is a cost function, a mean square error function is adopted, omega is a weight matrix of the Elman neural network, and delta omega is_kIs the change of weight value at kth update, eta is learning rate, alpha is momentum constant, alpha is more than or equal to 0 and less than 1, 0.9 is taken, c is constant, 0.2 is taken, and theta is the most rapid descending direction

The last weight change Δ ω_k-1The included angle therebetween;

s5, for the model with determined structure parameters and training algorithm, training the model for many times by using the selected principal components, and performing ensemble prediction, specifically comprising the following steps:

s501, normalizing the selected main component factor sequence and the historical runoff sequence of the area to be predicted, and then dividing the sequences into a training sample and a test sample, wherein the normalization formula is as follows:

wherein z is normalized data, z_max＝1，z_min＝-1，z∈[-1,1]Q is one of the original runoff sequence or the principal component sequence, q_minIs the minimum value in the sequence in which q is located, q_maxIs the maximum value in the sequence where q is located;

s502, taking factor data in a training sample as input of a network, taking historical runoff data in the training sample as output of the network, and performing supervised learning training on the network;

s503, for the trained network, using factors in the test sample as the input of the network, performing inverse normalization on the output result of the network to obtain a predicted runoff value, then evaluating the predicted value, and taking the forecast with the maximum forecast error less than 20% as a qualified forecast according to a scheme for evaluating the medium-and long-term runoff forecast accuracy in hydrological information forecast Specification (GI 3/T22482-2008);

and S504, repeating the step S502 and the step S503, training a plurality of models, and averaging the predicted values of the plurality of models to obtain the value of ensemble prediction.

2. The method for forecasting the medium and long term runoff based on the mutual information-kernel principal component analysis-Elman network as claimed in claim 1, wherein in S503, the predicted value is evaluated, the evaluated indexes comprise the average absolute percentage error MAPE, the relative error RE, the maximum relative error MRE and the qualification rate QR, and the calculation formula of each index is as follows:

in the formula (I), the compound is shown in the specification,

predicted path for test periodFlow value, x_iIs the corresponding actual runoff value, j is the number of samples tested, T_QualFor a qualified number of forecasts, T_totalThe total number of forecasts.