Disclosure of Invention
The invention aims to solve the technical problems that the numerical lightning calculation has high complexity and low calculation efficiency, the multi-scale and multi-resolution thunderstorm cannot be effectively identified, and the deep learning method has poor long-time prediction effect, complex network and 'misjudgment' which does not consider the strong meteorological element characteristic expression of non-lightning and 'missed judgment' which is lightning and weak meteorological element characteristic. In order to solve the technical problems, the invention adopts a technical scheme that the invention provides a lightning automatic prediction method based on a predicted high-abstraction characteristic network, which comprises the following steps:
Step 1, determining input data as 6 types of meteorological element data and dividing data sets, namely determining input meteorological elements related to thunder and lightning, downloading the data sets and dividing the data sets into training and testing;
Step 2, preprocessing input data, namely firstly, gridding data, then solving gradients, carrying out normalization processing, and converting label data into a form represented by 0 and 1;
step 3, extracting abstract features of 6 types of meteorological elements, namely constructing an encoder with double-layer convolution and excitation functions and 1 pooling operation, and continuously performing downsampling on the 6 types of meteorological elements to extract high abstract features;
Step 4, fusing the abstract features of the 6-class weather elements, namely adaptively fusing the abstract features of the 6-class weather elements in a mode of giving weight to the 6-class weather elements;
Step 5, constructing an abstract feature prediction network, namely predicting the high abstract feature within 6 hours by adopting a prediction network of a multi-GRU unit network series stacked structure;
Restoring the predicted high-level abstract features, namely restoring the high-level abstract features output by the prediction network by using a constructed decoder;
Step 7, training a prediction network, namely initializing network parameters of the network, calculating total loss of network prediction and labels according to a designed loss function, gradually updating the network parameters according to the size of the loss by utilizing a designed optimization mode, and finally selecting an optimal prediction network by utilizing test data;
Step 8, calibrating the network output and outputting a final lightning prediction result, namely checking and correcting the output of the prediction network by using a deviation corrector;
Step 9, outputting a lightning prediction result within X1 (the value is 6 in the invention):
Step 1 comprises the following steps:
Step 1-1, determining that input data are 6 types of meteorological element data at intervals of 1 hour, wherein the 6 types of meteorological element data are temperature T, horizontal wind speed U, vertical wind speed V, specific humidity Q, radar echo intensity R and aerosol concentration A respectively, acquiring the input data of the last 10 years to form a data set, and the used lightning labels are in one-to-one correspondence with the data set in time and space;
and step 1-2, dividing the data set by adopting a leave-out method.
The step 1-2 comprises the following steps:
Step 1-2-1, randomly extracting 1 year of data from the data set to serve as a test set, and taking the rest 9 years of data as a training set;
step 1-2-2, using a test set to test the prediction accuracy of the network after updating network parameters each time and record results, and then putting the test set into the whole data set again and returning to the step 1-2-1 again for division;
Step 1-2-3, repeating steps 1-2-1 to 1-2-2 until all 6 types of meteorological element data set samples are the test set and only one time.
Step 2 comprises the following steps:
step 2-1, meshing data, namely meshing 6 types of meteorological element data and corresponding tag data according to actual needs by adopting a nearest neighbor method;
Step 2-2, converting the meshed lightning data into a representation form of 0 and 1, wherein 0 represents no lightning and 1 represents lightning;
step 2-3, calculating gradient values corresponding to 6 types of meteorological element meshing data:
wherein Gradu is the gradient value; Bias derivative is calculated for u to x; Performing bias guide on u and y; performing bias guide on u to z; for solving bias guide for x; for solving the bias derivative of y; To deflect z.
Step 2-4, normalizing the gradient values to 0,1, wherein the grid gradient data values of 6 types of meteorological elements are normalized by adopting a maximum normalization method, and the calculation formula is as follows:
Wherein the method comprises the steps ofThe weather element gradient values representing the ij grid points adopt the value after the normalization of the most value, wherein i represents that the identification area is divided into a certain row of the grid points, j represents that the identification area is divided into a certain column number of the grid points, the minimum value of the i is 1, the maximum value of the i is the total number of the grid points divided into the identification area, the minimum value of the j is 1, and the maximum value of the j is the total column number of the grid points divided into the identification area; The meteorological element gradient values of the original ij grid points which are not normalized are represented, maxf represents the gradient maximum value before normalization, and mixf represents the gradient minimum value before normalization.
Step 3 comprises the following steps:
Step 3-1, constructing an encoder, namely adopting a common convolution operation and an activation operation which are connected in series in a double-layer manner by a structural unit of the encoder, wherein the activation function selects a ReLU function, a convolution kernel of the convolution operation is set to be 3 multiplied by 3, a step length is set to be 2, and a filling mode is set to be 1;
The method comprises the steps of 3-2, extracting abstract features of 6 types of meteorological elements by using an encoder, wherein the encoder is formed by connecting Y1 (the value of the encoder is 4) structural units in series, the number of the structural units depends on the size of a lightning prediction area and the number of grid points after grid formation of the structural units, if the prediction area is small and the number of grid points is small, the encoder can be connected in series by 3 structural units, similarly, the prediction area is large and the number of grid points is large, the number of the structural units connected in series is increased, and the shape and the size of input data are reduced to 1/2 of the original shape and size of the input data through one structural unit. After passing through an encoder with 4 structural units, the size of input data is changed into one sixteenth of the original size, so that the abstract characteristics of 6 types of meteorological elements are obtained, and finally, the high-level abstract characteristic data of the 6 types of meteorological elements are input into an adaptive fusion network for fusion;
Step 4 comprises:
Fusion of the abstract features of 6 kinds of multi-meteorological elements, wherein the abstract features of 6 kinds of multi-meteorological elements are fused by adopting a self-adaptive fusion method, and are expressed as follows by a mathematical formula:
wherein yij represents the result after self-adaptive fusion, n represents the total number of abstract features of 6 types of multi-meteorological elements input, n is 6,For the class weight to be a class weight,Data for class 6 meteorological elements, where n represents temperature, horizontal wind velocity, vertical wind velocity, specific humidity, radar echo intensity, and aerosol concentration.
Step 5 comprises the steps of:
step 5-1, the predicted high-abstraction feature network adopts a gating loop unit (Gated Recurrent Unit, GRU) network, wherein the calculation of an update gate and a reset gate rest in the GRU network is as follows:
update=σ(w1[ht-1,xt]+b2),
rest=σ(w2[ht-1,xt]+b2);
Wherein sigma is a parameter value of 0 to 1, the content of writing memory is controlled and the content of previous memory is reserved, w1 and w2 are parameter values of 0 to 1, the hidden layer at the previous moment is controlled, ht-1 is the hidden layer at the previous moment, xt is the input at the current moment, and b2 is a bias parameter.
And 5-2, the structure of the predicted high-abstraction characteristic network adopts a serial stacking structure, and n (abstraction characteristic number) unit GRU networks are stacked together in series in six groups, and each group of GRU networks outputs predicted abstraction characteristics for 1 hour.
Step 6 comprises the steps of:
step 6-1, constructing a decoder, namely adopting a convolution operation and an activation operation which are connected in series in a double layer mode and are the same as the encoder, wherein a convolution kernel of the convolution operation is set to be 3 multiplied by 3, a step length is set to be 2, and a filling mode is set to be 1, wherein a ReLU function is selected as an activation function, and finally, unlike the encoder, deconvolution operation is carried out, wherein the size of the convolution kernel is set to be 2 multiplied by 2, the step length is set to be 2, and the filling mode is set to be 1;
And 6-2, restoring the predicted high-level abstract features by adopting a decoder, wherein the number of structural units of the decoder is the same as that of the structural units in the encoder, the shape and the size of input data are enlarged to be 2 times of the original shape and size of the input data after the input data pass through the decoder with Y1 structural units, and the size of the input data is the same as the original size after the input data pass through the decoder with Y1 structural units.
Step 7 comprises the steps of:
Step 7-1, initializing the network parameters with the high abstract characteristics by adopting a Kaiming initialization mode and adopting the following formula to calculate the network parametersVariance of (2)
Wherein Ml-1 is the number of neurons in layer 1 of the predicted highly abstract feature network, calculated by multiplying the square of the kernel by the number of channels and adding the offset number (equal to the number of channels), calculated as 3 x 1 x 6+6=60 by the method, and initializing parameters with uniform distribution of intervals [ -r, r [ ]Wherein the parameter is
And 7-2, learning optimizing network parameters by using a small-batch gradient descent method, wherein the small-batch gradient descent method comprises the following calculation expression:
wherein L (wi, B) is a small batch loss, wi is a network parameter, B is a bias term, Yi is a prediction result, Y is a real label, a linear scaling rule is adopted for batch size K, and K is calculated according to the following equation relation:
K=N*I/E
wherein N is the number of training sets, I is the iteration number, E is the number of rounds of training;
Step 7-3, the learning rate is adjusted, wherein the learning rate is adjusted according to the improvement of AdaGrad algorithm in Zeiler 2012 by means of exponential decay moving average of gradient square, and the specific calculation method is as follows:
first, an exponential decay moving average of the square of the gradient gt is calculated for each iteration, the calculation formula is:
Wherein t is the current moment, and according to the element product, gτ is the gradient of the τ iteration, and β is the attenuation rate, the method takes 0.9.
Next, an exponential decay weight moving average of the square of the difference Δθ is calculated for each parameter update as:
Finally, calculating a parameter update difference value as follows:
wherein, theUpdating the exponential decay weight moving average of the difference delta theta for the parameters, Gt for each iteration gradient, Gt
Epsilon is a very small constant set to maintain numerical stability, typically values e-7 to e-10, and the method takes on a value e-8, for each iteration of the moving average of the exponential decay of the square of the gradient gt.
Step 7-4, layer-by-layer normalization, namely, after predicting the activation function of each layer of the high-abstraction feature network, adopting a Krizhevsky et al local response normalization method to locally normalize adjacent feature mapping, wherein the purpose is to balance and restrict the activity values of adjacent neurons and enhance the generalization of the network, and the method is expressed as follows by a formula:
wherein Y'p is the local normalization result of the adjacent feature map, Y is the output feature map of the convolution layer,(Output feature map space), Yp is an output feature map,N, k, alpha and beta are super parameters, n is the size of a local normalized characteristic window, the method adopts the value in AlexNet, n is 5, k is 2, alpha is 10e-4, and beta is 0.75;
in step 7-5, the loss is calculated by using a double-loss function, wherein the first-class loss function adopts a DiceLoss loss function DLoss proposed by MILLETARI et al in 2016, and the mathematical calculation formula is as follows:
Wherein y represents the result of the recognition and,A sample label is represented and is displayed,Representing y andThe number of intersection lattice points between, |y| andRespectively represent the sum of the number of grid points in yThe number of middle lattice points, smooths, is a smoothing parameter (set to 0.1 in the present invention). The second class of loss function adopts a two-class cross entropy loss function L, and the mathematical calculation formula is as follows:
where N is the total number of training data for a batch of input networks, yi is the class to which the i-th sample belongs, pi is the predicted value of the i-th sample, and is a probability value.
Step 7-6, regularizing the network by adopting an elastic network regularization method of simultaneously adding l1 and l2, wherein the regularization method is expressed as the following mathematical formula:
Wherein, theta* is the result of regularization of network parameters, argθ min is a parameter combination function for solving the minimum loss,As a loss function, f (-) is the neural network to be learned, θ is a network parameter, l is a norm function, and λ1 and λ2 are the coefficients of two regularization terms, i.e., i1 (θ) and i2 (θ), respectively.
Step 8 comprises the steps of:
8-1, determining a deviation corrector structure, wherein the deviation corrector comprises a predictor, a discriminator and a logic arithmetic unit, and the predictor and the discriminator are respectively divided into three types of water vapor content, vertical wind speed and electric field intensity;
8-2, determining the input of the rectifier, wherein the input of the rectifier comprises 4 types, the first type is the water vapor content within 6 hours, the input of the rectifier can be measured by instruments such as a microwave radiation sensor, the second type is the vertical wind speed within 6 hours, the third type is the electric field intensity within 6 hours, the input of the rectifier can be measured by instruments such as an electric field detector in the air, and the fourth type is the lightning result predicted by a predicted high-abstraction characteristic network;
and 8-3, determining a fitting function of a predictor, wherein the predictor predicts the data input by the deviation corrector for 1 to 6 hours by adopting the fitting function, and the fitting function adopted by the predictor is a four-parameter fitting regression equation:
wherein x is input data, y is a prediction result of the next moment, and A, B, C and D parameters are obtained through the following steps:
in a first step, the values of A, D, C, and B are found using the following linear form:
wherein, the initial value of D is set as the maximum value of the inputted y value plus 1, and the initial value of A is set as the minimum value of the inputted y value minus 0.1.
The second step is to obtain partial differentiation of four parameters in the first step equation to obtain the Taylor series expansion of the increment (delta A, delta D, delta C and delta B) of the given coefficient of y, wherein the Taylor series expansion is as follows:
converting the curve regression into multiple linear regression, obtaining the variables of four parameters through iterative calculation, and gradually correcting the values of the four parameters. The multiple linear regression is identical to the polynomial fitting method. The parameter variable value can be calculated in each iteration, and the new parameter value is the superposition of the original parameter value and the variable value.
And thirdly, introducing a coefficient k, setting an initial value to be 2, multiplying k by a variable matrix of the parameter, and calculating a correlation coefficient. k=k/2, cycling 10 times, halving the value of k each time. And taking a variable matrix [ delta A, delta D, delta C and delta B ] with the maximum correlation coefficient obtained in the cycle.
And fourthly, defaulting to 1000 times of total iteration times, or stopping iteration when the correlation coefficient is not reduced any more. And returning the obtained four parameter values.
And 8-4, firstly, the deviation corrector predicts the water vapor, the vertical wind speed and the electric field intensity within 6 hours by using the predictor, then inputs the predicted result into the discriminator for judgment, then inputs the judgment result into the logic arithmetic unit for logic judgment, and outputs the final predicted result.
The method has the advantages that the method can replace manual prediction of lightning, has the advantages of automation and accurate prediction, fuses multiple meteorological elements, predicts lightning within 6 hours through the high-level abstract feature, has the advantage of high efficiency, is realized on the basis of various scientific technical improvements and scientific experiments on the basis of the analysis of the advantages and disadvantages of some methods for automatically predicting the lightning at present, has higher accuracy of predicting the lightning, and has good effect in practical application within 6 hours of automatic prediction, and is worthy of popularization and application.
Detailed Description
The invention provides a lightning automatic prediction method based on a predicted high-abstraction characteristic network, which comprises the following steps:
Step 1, determining input data as 6 types of meteorological element data and dividing a data set into a training set and a testing set;
step 2, preprocessing input data;
step 3, extracting abstract features of class 6 meteorological elements;
Step 4, fusing abstract features of 6 types of meteorological elements;
step 5, constructing a predicted high-level abstract feature network;
step 6, restoring abstract features;
step 7, training a predicted high-level abstract feature network;
step 8, calibrating a network to predict a lightning result;
And 9, outputting a lightning prediction result within X1 hours.
Step 1 comprises the following steps:
Step 1-1, determining that input data are 6 types of meteorological element data at intervals of 1 hour, wherein the 6 types of meteorological element data are temperature T, horizontal wind speed U, vertical wind speed V, specific humidity Q, radar echo intensity R and aerosol concentration A respectively, acquiring the input data of the last 10 years to form a data set, and the used lightning labels are in one-to-one correspondence with the data set in time and space;
and step 1-2, dividing the data set by adopting a leave-out method.
The step 1-2 comprises the following steps:
Step 1-2-1, randomly extracting 1 year of data from the data set to serve as a test set, and taking the rest 9 years of data as a training set;
step 1-2-2, using a test set to test the prediction accuracy of the network after updating network parameters each time and record results, and then putting the test set into the whole data set again and returning to the step 1-2-1 again for division;
and step 1-2-3, repeating the steps 1-2-1 to 1-2-2 until all 6 types of meteorological element data are all and only once used as a test set.
Step 2 comprises the following steps:
step 2-1, meshing data, namely meshing 6 types of meteorological element data and corresponding tag data according to actual needs by adopting a nearest neighbor method;
Step 2-2, converting the meshed lightning data into a representation form of 0 and 1, wherein 0 represents no lightning and 1 represents lightning;
step 2-3, calculating gradient values corresponding to 6 types of meteorological element meshing data:
wherein Gradu is the gradient value; Bias derivative is calculated for u to x; Performing bias guide on u and y; performing bias guide on u to z; for solving bias guide for x; for solving the bias derivative of y; for biasing z;
Step 2-4, normalizing the gradient values to 0,1, wherein the grid gradient data values of 6 types of meteorological elements are normalized by adopting a maximum normalization method, and the calculation formula is as follows:
Wherein the method comprises the steps ofThe weather element gradient values representing the ij grid points adopt the value after the normalization of the most value, wherein i represents that the identification area is divided into a certain row of the grid points, j represents that the identification area is divided into a certain column number of the grid points, the minimum value of the i is 1, the maximum value of the i is the total number of the grid points divided into the identification area, the minimum value of the j is 1, and the maximum value of the j is the total column number of the grid points divided into the identification area; The meteorological element gradient values of the original ij grid points which are not normalized are represented, maxf represents the gradient maximum value before normalization, and mixf represents the gradient minimum value before normalization.
Step 3 comprises the following steps:
step 3-1, constructing an encoder, namely adopting a convolution operation and an activation operation of double-layer series connection by a structural unit of the encoder, selecting a ReLU function by an activation function, and finally carrying out a maximum pooling operation;
And 3-2, extracting abstract features of 6 types of meteorological elements by using an encoder, wherein the encoder is formed by connecting Y1 structural units in series, the shape and the size of input data are reduced to 1/2 of the original shape and the size of the input data through one structural unit, and the abstract features of the 6 types of meteorological elements are obtained after the input data pass through Y1 structural units.
Step 4 comprises:
Fusion of the abstract features of 6 kinds of multi-meteorological elements, wherein the abstract features of 6 kinds of multi-meteorological elements are fused by adopting a self-adaptive fusion method, and are expressed as follows by a mathematical formula:
wherein yij represents the result after the adaptive fusion, n represents the total number of abstract features of the 6 types of multi-meteorological elements input,For the class weight to be a class weight,Data for class 6 meteorological elements, where n represents temperature, horizontal wind velocity, vertical wind velocity, specific humidity, radar echo intensity, and aerosol concentration.
Step 5 comprises the steps of:
step 5-1, the predicted high-abstraction feature network adopts a gating loop unit (Gated Recurrent Unit, GRU) network, wherein the calculation of an update gate and a reset gate rest in the GRU network is as follows:
update=σ(w1[ht-1,xt]+b2),
rest=σ(w2[ht-1,xt]+b2);
Wherein, sigma is a parameter value of 0 to 1, controlling the writing memory content and retaining the previously memorized content, w1 and w2 are parameter values of 0 to 1, controlling the hidden layer at the previous moment, ht-1 is the hidden layer at the previous moment, xt is the input at the current moment, b2 is the bias parameter;
And 5-2, the structure of the predicted high-abstraction characteristic network adopts a serial stacking structure, and n unit GRU networks are stacked together in series in six groups, and each group of GRU networks outputs predicted abstraction characteristics for 1 hour.
Step 6 comprises the steps of:
Step 6-1, constructing a decoder, wherein the structural unit of the decoder adopts convolution operation and activation operation which are connected in series with the same double layers as the encoder, and the activation function selects a ReLU function, and the inverse convolution operation is performed unlike the encoder;
And 6-2, restoring the predicted high-level abstract features by adopting a decoder, wherein the number of structural units of the decoder is the same as that of the structural units in the encoder, the shape and the size of input data are enlarged to be 2 times of the original shape and size of the input data after the input data pass through the decoder with Y1 structural units, and the size of the input data is the same as the original size after the input data pass through the decoder with Y1 structural units.
Step 7 comprises the steps of:
Step 7-1, initializing the network parameters with the high abstract characteristics by adopting a Kaiming initialization mode and adopting the following formula to calculate the network parametersVariance of (2)
Wherein Ml-1 is the number of neurons in layer 1 of the predicted highly abstract feature network, calculated by multiplying the square of the kernel by the number of channels and adding the offset number (equal to the number of channels), calculated as 3 x 1 x 6+6=60 by the method, and initializing parameters with uniform distribution of intervals [ -r, r [ ]Wherein the parameter is
And 7-2, learning optimizing network parameters by using a small-batch gradient descent method, wherein the small-batch gradient descent method comprises the following calculation expression:
wherein L (wi, B) is a small batch loss, wi is a network parameter, B is a bias term, Yi is a prediction result, Y is a real label, a linear scaling rule is adopted for batch size K, and K is calculated according to the following equation relation:
wherein N is the number of training sets, I is the number of iterations, E is the number of rounds of training;
Step 7-3, the learning rate is adjusted by adopting an improvement of the AdaGrad algorithm in Zeiler 2012 and by means of the exponential decay moving average of the square of the gradient, and the specific calculation method is as follows:
The exponential decay moving average of the square of the gradient gt is calculated for each iteration:
Wherein t is the current time, as the product of the elements, gτ is the gradient of the τ iteration, and β is the attenuation rate;
calculating an exponential decay weight moving average of the square of the difference value of each parameter update as:
The calculated parameter update difference Δθt is:
wherein t-1 represents the previous time; For the t-th iteration, the parameter updates the exponential decay weight moving average of the square of the difference value delta thetat, Gt is the gradient of each iteration, Gt is the exponential decay moving average of the square of the gradient of each iteration Gt, epsilon is a constant set for keeping the numerical stability;
Step 7-4, layer-by-layer normalization, namely, after predicting the activation function of each layer of the high-level abstract feature network, adopting a local response normalization method to locally normalize adjacent feature mapping, and using a formula to express as follows:
where Y'p is the local normalization result for the adjacent feature map, Y is the output feature map for the convolutional layer,Is an output feature map space in M '×n' ×p dimensions, Yp is an output feature map, Yj is an output feature map different from Yp, j represents a certain row and a certain column position under a certain data channel,N, k, alpha, beta are super parameters, n is the size of a local normalized characteristic window;
And 7-5, calculating the loss by using a double-loss function, wherein a first-class loss function adopts DiceLoss loss functions DLoss, and the calculation formula is as follows:
Wherein y represents the result of the recognition and,A sample label is represented and is displayed,Representing y andThe number of intersection lattice points between, |y| andRespectively represent the sum of the number of grid points in yThe second class of loss functions adopts a two-class cross entropy loss function L, and the mathematical calculation formula is as follows:
Wherein N is the total number of training data of a batch input network, yi is the category to which the ith sample belongs, pi is the predicted value of the ith sample, and is a probability value;
step 7-6, regularizing the network, namely adopting an elastic network regularizing method proposed by Zou et al in 2005 and simultaneously adding l1 and l2 for regularization, and expressing the regularizing method as follows by a mathematical formula:
Where θ* is the result of regularization of the network parameters, argθ min is the parameter combination function that minimizes losses,
As a loss function, f (-) is the neural network to be learned, θ is a network parameter, l is a norm function, and λ1 and λ2 are the coefficients of two regularization terms, i.e., i1 (θ) and i2 (θ), respectively.
Step 8 comprises the steps of:
8-1, determining a deviation corrector structure, wherein the deviation corrector comprises a predictor, a discriminator and a logic AND arithmetic unit, and the predictor and the discriminator are respectively divided into three types of water vapor content, vertical wind speed and electric field intensity;
8-2, determining the input of the rectifier, wherein the input of the rectifier comprises 4 types, the first type is the water vapor content within 6 hours, the second type is the vertical wind speed within 6 hours, the third type is the electric field intensity within 6 hours, and the fourth type is the lightning result predicted by the predicted high-abstraction characteristic network;
and 8-3, determining a fitting function of a predictor, wherein the predictor predicts the data input by the deviation corrector for 1 to 6 hours by adopting the fitting function, and the fitting function adopted by the predictor is a four-parameter fitting regression equation:
wherein x is input data, y is a prediction result of the next moment, and A, B, C and D parameters are obtained through the following steps:
in a first step, the values of A, D, C, and B are found using the following linear form:
wherein, the initial value of D is set as the maximum value of the input y value plus 1, and the initial value of A is set as the minimum value of the input y value minus 0.1;
the second step is to obtain partial differentiation of four parameters in the first step equation to obtain the Taylor series expansion of the increment (delta A, delta D, delta C and delta B) of the given coefficient of y, wherein the Taylor series expansion is as follows:
converting the curve regression into multiple linear regression, obtaining the variables of four parameters through iterative calculation, and gradually correcting the values of the four parameters. The multiple linear regression is identical to the polynomial fitting method. The parameter variable value can be calculated in each iteration, and the new parameter value is the superposition of the original parameter value and the variable value.
And thirdly, introducing a coefficient k, setting an initial value to be 2, multiplying k by a variable matrix of the parameter, and calculating a correlation coefficient. k=k/2, cycling 10 times, halving the value of k each time. And taking a variable matrix [ delta A, delta D, delta C and delta B ] with the maximum correlation coefficient obtained in the cycle.
And fourthly, defaulting to 1000 times of total iteration times, or stopping iteration when the correlation coefficient is not reduced any more.
Returning the obtained four parameter values;
And 8-4, firstly, the deviation corrector predicts the water vapor, the vertical wind speed and the electric field intensity within 6 hours by using the predictor, then inputs the predicted result into the discriminator for judgment, then inputs the judgment result into the logic arithmetic unit for logic judgment, and outputs the final predicted result.
Examples
As shown in fig. 1, the embodiment provides a lightning automatic prediction method based on a predicted highly abstract feature network, and the specific implementation includes the following steps:
Firstly, carrying out gridding treatment, gradient calculation and normalization on a downloaded annual 6-class meteorological element data set to be used as network input, then, carrying out network parameter training by using a training sample set after initializing the network, and finally, selecting a group of parameter networks with highest lightning prediction accuracy by using a test sample set to be used as an automatic lightning prediction network;
And 2, network application prediction, namely inputting real-time meteorological element data in a lightning prediction area to be performed into the optimal prediction network obtained in the step 1, inputting a lightning result predicted by the prediction network into a deviation corrector for judgment, and finally outputting the lightning prediction result within six hours by the deviation corrector.
Step 1 comprises the following steps:
step 1-1, determining input data and dividing data sets, namely determining input meteorological elements related to lightning, downloading the data sets and performing training and testing on the data sets;
step 1-1 includes the steps of:
Step 1-1-1, determining that input data are 6 types of meteorological element data with an interval of 1 hour, wherein the input data are temperature (T), horizontal wind speed (U), vertical wind speed (V), specific humidity (Q), radar echo intensity (R) and aerosol concentration (A), the history span of the whole data set is 10 years, and the used lightning labels are in one-to-one correspondence with the data set in time and space. ;
Step 1-1-2, dividing the data set by adopting a classical leave-out method, wherein the method specifically comprises the following steps:
Step 1-1-2-1, randomly extracting 1 year data from the 10 years data set to serve as a test set, and taking the rest 9 years data as a training set;
Step 1-1-2-2, using a test set to test the prediction accuracy of the network after updating network parameters each time and record results, and then putting the test set into the whole data set again and returning to the step 1-2-1 again for division;
step 1-1-2-3, repeating the two steps until all 6 types of meteorological element data sets are all and only once are test sets;
step 1-2, preprocessing input data, namely firstly, gridding data, then solving gradients, carrying out normalization processing, and converting label data into a form represented by 0 and 1;
step 1-2 includes the steps of:
step 1-2-1, gridding data, namely gridding 6 types of meteorological element data and corresponding tag data according to actual needs by adopting a nearest neighbor method;
Step 1-2-2, converting the gridded lightning label data into a representation form of 0 and 1, wherein 0 represents no lightning and 1 represents lightning;
Step 1-2-2, calculating a gradient value corresponding to the data after the 6 types of meteorological elements are gridded, wherein the gradient calculation method is as follows;
wherein Gradu is the gradient value; Bias derivative is calculated for u to x; Performing bias guide on u and y; performing bias guide on u to z; for solving bias guide for x; for solving the bias derivative of y; To deflect z.
The fourth dimension of the data is 6, namely the data of the 6 types of weather element prediction areas at the current moment and the first 5 whole-point moments are input, the third dimension of the data is 6, namely the data represent 6 types of weather elements, the 1 st dimension and the second dimension represent the size of the prediction areas, and the shape size of the whole input data can be expressed as 6 x h x w;
Step 1-2-3, normalizing the gradient values to [0,1]: adopting a maximum normalization method to normalize the grid gradient data values of 6 types of meteorological elements, wherein the calculation formula is as follows:
Wherein the method comprises the steps ofThe meteorological element gradient value representing the ij grid point adopts the value normalized by the most value; The meteorological element gradient values of the original ij grid points which are not normalized are represented, maxf represents the gradient maximum value before normalization, and mixf represents the gradient minimum value before normalization.
Step 1-3, extracting abstract features of 6 types of meteorological elements, namely constructing an encoder with double-layer convolution and excitation functions and 1 pooling operation as shown in fig. 4, and extracting high abstract features by downsampling the 6 types of meteorological elements four times continuously;
the steps 1-3 comprise the following steps:
step 1-3-1, constructing an encoder, wherein the structure of the encoder is shown in fig. 4, and units of the encoder are shown in fig. 3 by adopting a common convolution operation and an activation operation of double-layer series connection, wherein the activation function selects a ReLU function, and finally, the maximum pooling operation is carried out;
The method comprises the steps of 1-3-2, extracting abstract features of 6 types of meteorological elements by using an encoder, wherein the encoder is formed by connecting 4 structural units in series as shown in fig. 4, the number of the structural units depends on the size of a lightning prediction area and the number of grid points after grid formation of the structural units in practical application, if the prediction area is small and the number of grid points is small, the encoder can be connected in series by 3 structural units, and similarly, the range of the prediction area is large and the number of grid points is large, the number of the structural units connected in series is increased, and the shape and the size of input data are reduced to 1/2 of the original shape and the size of the input data through one structural unit. After passing through an encoder with 4 structural units, the size of input data is changed into one sixteenth of the original size, and finally the high-abstraction characteristic data of 6 meteorological elements are input into an adaptive fusion network for fusion, wherein the data shape is that
Step 1-4, fusion of abstract features of 6 types of meteorological elements, namely, adaptively fusing the abstract features of 6 types of meteorological elements in a manner of giving weight to the 6 types of meteorological elements by adopting a method (summation representation) shown in FIG. 5;
The steps 1-4 comprise the following steps:
step 1-4-1, fusing the abstract features of the 6-class multi-meteorological-element gradients, namely fusing the abstract features of the 6-class multi-meteorological-element gradients by adopting a self-adaptive fusion method, and expressing the abstract features as follows by using a mathematical formula:
wherein yij represents the result after adaptive fusion, n represents the total number of input multi-meteorological elements, the method is 6,For the class weight to be a class weight,Is 6 kinds of meteorological element data, wherein n represents temperature, horizontal wind speed, vertical wind speed, specific humidity, radar echo intensity and aerosol concentration, and the data shape is after the self-adaptive fusion module
Step 1-5, as shown in fig. 2, constructing an abstract feature prediction network, namely predicting high abstract features within 6 hours by adopting a prediction network of a series stacked structure of a plurality of GRU unit networks by adopting a method shown in fig. 7;
the steps 1-5 comprise the following steps:
Step 1-5-1, the predictive network architecture unit employs a gated loop unit (Gated Recurrent Unit, GRU) network, the GRU network unit is shown in FIG. 6, where the update gate (update) and reset gate (rest) are calculated as follows:
update=σ(w1[ht-1,xt]+b2)
rest=σ(w2[ht-1,xt]+b2)
Wherein sigma is a parameter value of 0 to 1, the content of writing memory is controlled and the content of previous memory is reserved, w1 and w2 are parameter values of 0 to 1, the hidden layer at the previous moment is controlled, ht-1 is the hidden layer at the previous moment, xt is the input at the current moment, and b2 is a bias parameter.
Step 1-5-2, the prediction network structure adopts a series stacking structure, as shown in FIG. 7, n (abstract feature number) unit GRU networks are stacked together in series in six groups, each group of GRU networks outputs 1 hour of prediction abstract feature, and the data shape is thatI.e. the predicted results of 6 lightning highly abstract features representing the whole point time of 1 to 6 hours.
Step 1-6, restoring the predicted high-level abstract features, namely restoring the high-level abstract features output by the prediction network by using a constructed decoder;
steps 1-6 include the steps of:
Step 1-6-1, constructing a decoder, wherein the structural unit of the decoder adopts a convolution operation and an activation operation which are connected in series with the same double layers as the encoder as shown in fig. 9, wherein the activation function selects a ReLU function, and finally, the deconvolution operation is carried out unlike the encoder;
And step 1-6-2, restoring the predicted high-level abstract features by adopting a decoder shown in fig. 8 to restore the high-level abstract features predicted by the prediction network, wherein the number of structural units of the decoder is consistent with the number of structural units in the encoder. The input data is expanded to 2 times of the original shape and size by one structural unit. After passing through a decoder with 4 structural units, the size of input data is consistent with the original size, and the shape of the data is 6h w;
Step 1-7, training a prediction network, namely initializing network parameters of the network, calculating a loss function of the network according to the designed loss function, updating the network parameters in a designed optimization mode, and finally selecting an optimal prediction network by using test data;
steps 1-7 include the steps of:
Step 1-7-1, initializing network parameters, wherein a Kaiming initialization mode is adopted as a parameter initialization mode during network training, and the network parameters areThe variance of (2) can be found by the following formula:
Where Ml-1 is the number of neurons in layer 1 of the predicted highly abstract feature network, which is calculated by multiplying the square of the kernel by the number of channels and adding the offset number (equal to the number of channels), the method calculates as 3×3×1×6+6=60. Initializing parameters with uniform distribution of intervals [ -r, r ]Wherein the method comprises the steps of
Step 1-7-2, learning optimizing network parameters by using a small-batch gradient descent method, wherein the small-batch gradient descent method comprises the following calculation expression:
Wherein L (wi, B) is a small batch loss, wi is a network parameter, B is a bias term, Yi is a prediction result, Y is a real label, and the batch size K is obtained by adopting a linear scaling rule according to the following equation:
K=N*I/E
where K is the batch size, N is the number of training sets, I is the number of iterations, and E is the number of rounds of training.
Step 1-7-3, the learning rate is adjusted by adopting an improvement of the AdaGrad algorithm in Zeiler 2012 and by means of the exponential decay moving average of the square of the gradient, and the specific calculation method is as follows:
first, an exponential decay moving average of the square of the gradient gt is calculated for each iteration, the calculation formula is:
Wherein t is the current moment, and according to the element product, gτ is the gradient of the τ iteration, and β is the attenuation rate, the method takes 0.9.
Next, an exponential decay weight moving average of the square of the difference Δθ is calculated for each parameter update as:
Finally, calculating a parameter update difference value as follows:
where t-1 represents the previous moment in time,For the t-th iteration, the parameter updates the exponential decay weight moving average of the square of the difference delta thetat,The method is characterized in that the method comprises the steps of updating an exponential decay weight moving average of a difference delta theta for parameters, Gt is an iterative gradient each time, Gt is an exponential decay moving average of Gt square of the iterative gradient each time, epsilon is a very small constant set for keeping numerical stability, and generally takes values e-7 to e-10, and the method takes a value e-8.
Step 1-7-4, layer-by-layer normalization, namely, locally normalizing the adjacent feature map by adopting a local response normalization method of Krizhevsky et al after an activation function of each layer, wherein the purpose is to balance and restrict the activity values of adjacent neurons and enhance the generalization of a network, and the method is expressed as follows:
wherein Y'p is the result of local normalization of the adjacent feature map, Y is the output feature map of the convolutional layerYp is an output feature mapN, k, alpha and beta are super parameters, n is the size of a local normalized characteristic window, the method adopts the value in AlexNet, n is 5, k is 2, alpha is 10e-4, and beta is 0.75;
Step 1-7-5, calculating the loss by using a double loss function, wherein the first type of loss function adopts a DiceLoss loss function proposed by MILLETARI et al in 2016, and the mathematical calculation formula is as follows:
Wherein y represents the result of the recognition and,A sample label is represented and is displayed,Representing y andThe number of intersection lattice points between, |y| andRespectively represent y andThe number of middle lattice points, smooths, is the smoothing parameter (set to 0.1 herein).
The first type of total loss is calculated by the sum of 1 to 6 hours label and forecastWhere y represents the true value and y' represents the predicted value.
The second class of loss functions adopts a two-class cross entropy loss function, and the mathematical calculation formula is as follows:
where N is the total number of training data for a batch of input networks, yi is the class to which the i-th sample belongs, pi is the predicted value of the i-th sample, and is a probability value.
The second type of total loss calculation is 1 to 6 hours of the sum of the label and the forecastWhere y represents the true value and y' represents the predicted value.
Finally the loss of the whole network is that
Step 1-7-6, regularizing the network by adopting an elastic network regularization method of simultaneously adding l1 and l2, wherein the regularization method is expressed as the following mathematical formula:
Wherein θ* is the result of regularization of network parameters, argθ min is a parameter combination function that finds the minimum loss,N is the number of training samples, f (), θ is the network parameter, l is the norm function, and λ1 and λ2 are the coefficients of the two regularization terms, respectively, for the loss function.
Step 2 comprises the following steps:
step 2-1, reading 6 types of meteorological element data in a lightning prediction area, namely reading temperature (T), horizontal wind speed (U), vertical wind speed (V), specific humidity (Q), radar echo intensity (R) and aerosol concentration (A) data at the current moment and the whole point moment of the first 5 hours, and inputting the data into the optimal network selected in the step 1 after data preprocessing;
step 2-2, forward propagation, namely obtaining a lightning 3-dimensional probability map of two classifications in a prediction area by using a Sigmoid function after forward propagation calculation of data through an optimal network;
Step 2-3, determining a prediction result of lightning in the prediction area, namely determining the lightning result of each grid point in the prediction area through a Max function by using a 3-dimensional probability map output by a prediction network;
Step 2-4, calibrating the network output and outputting the final lightning prediction result, namely checking and correcting the output of the prediction network by using a deviation corrector as shown in fig. 10;
the steps 2-4 comprise the following steps:
and 2-4-1, determining a deviation rectifier structure, wherein the deviation rectifier mainly comprises a predictor, a discriminator and a logic arithmetic unit as shown in figure 10, and the predictor and the discriminator are respectively divided into three types of water vapor content, vertical wind speed and electric field intensity.
2-4-2, Determining the input of the rectifier, wherein the input of the rectifier mainly comprises 4 types, the first type is the water vapor content within 6 hours, the water vapor content can be measured by instruments such as a microwave radiation sensor, the second type is the vertical wind speed within 6 hours, the third type is the electric field intensity within 6 hours, the electric field intensity can be measured by instruments such as an electric field detector in the air, and the fourth type is the lightning result predicted by a network;
step 2-4-3, determining a fitting function of a predictor, wherein the predictor predicts the data input by the deviation corrector for 1 to 6 hours by adopting the fitting function, and the fitting function adopted by the predictor in the method is a four-parameter fitting regression equation, and the expression is as follows:
wherein x is input data, y is a prediction result of the next moment, and A, B, C and D parameters are obtained through the following steps:
in a first step, the values of A, D, C, and B are obtained using the following linear form
Wherein, the initial value of D is set as the maximum value of the inputted y value plus 1, and the initial value of A is set as the minimum value of the inputted y value minus 0.1.
The second step is to obtain partial differentiation of four parameters in the first step equation to obtain the Taylor series expansion of the increment (delta A, delta D, delta C and delta B) of the given coefficient of y, wherein the Taylor series expansion is as follows:
converting the curve regression into multiple linear regression, obtaining the variables of four parameters through iterative calculation, and gradually correcting the values of the four parameters. The multiple linear regression is identical to the polynomial fitting method. The parameter variable value can be calculated in each iteration, and the new parameter value is the superposition of the original parameter value and the variable value.
And thirdly, introducing a coefficient k, setting an initial value to be 2, multiplying k by a variable matrix of the parameter, and calculating a correlation coefficient. k=k/2, cycling 10 times, halving the value of k each time. And taking a variable matrix [ delta A, delta D, delta C and delta B ] with the maximum correlation coefficient obtained in the cycle.
And fourthly, defaulting to 1000 times of total iteration times, or stopping iteration when the correlation coefficient is not reduced any more. And returning the obtained four parameter values.
Other better fitting functions may also be utilized for better fitting.
And 2-4-4, the working mode of the deviation corrector is that the deviation corrector firstly predicts the water vapor, the vertical wind speed and the electric field intensity within 6 hours by using a predictor as shown in fig. 10, and then inputs the predicted result into a class 3 discriminator for judgment. The threshold value of the vertical wind speed discriminator is set to be 90%, the output is 1 if the threshold value is more than or equal to 90%, otherwise, the threshold value of the vertical wind speed discriminator is set to be 0.5 m/s, the output is 1 if the threshold value is more than 0.5 m/s, otherwise, the threshold value of the electric field strength discriminator is set to be 0, and the threshold value of the electric field strength discriminator is set to be 3KV/mm, and the output is greater than 3 KV/otherwise, and is set to be 1. And finally, the logic arithmetic unit performs AND operation on the predicted result of the predicted high-abstraction characteristic network and the output result of the discriminator to output a final lightning predicted result of the predicted area.
The invention provides a lightning automatic prediction method based on a predicted high-abstraction characteristic network, and the method and the way for realizing the technical scheme are numerous, the above is only a preferred embodiment of the invention, and it should be noted that, for a person skilled in the art, a plurality of improvements and modifications can be made, and the improvements and modifications should be regarded as the protection scope of the invention. The components not explicitly described in this embodiment can be implemented by using the prior art.