Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The technical scheme of the invention is described in detail below by specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.
Fig. 1 is a flow chart of an AI weather station data quality control method integrating numerical forecasting according to an embodiment of the present invention, as shown in fig. 1, the method includes:
Acquiring multivariable meteorological observation data and numerical forecast data of a meteorological station, and carrying out standardization processing on the multivariable meteorological observation data and the numerical forecast data to form an observation time sequence data set and a numerical forecast time sequence data set;
Constructing a physical driving model based on an LSTM-AE structure, extracting physical relations and time sequence evolution features among meteorological elements in the observation time sequence data set through a long-short-time memory network encoder according to the physical driving model, analyzing the observation time sequence data set and the numerical forecasting time sequence data set, and obtaining observation physical features and forecasting physical features;
Based on the LSTM decoder in the physical driving model, processing the observed physical characteristics and the forecast physical characteristics, completing characteristic reconstruction according to the physical laws and the time-space evolution characteristics of meteorological elements, and generating a reconstructed observation time sequence;
Performing time sequence division on the observation time sequence data set to obtain a training set and a verification set, and completing training and parameter optimization of an LSTM codec in the physical driving model by taking deviation of the reconstructed observation time sequence and an original observation time sequence as an evaluation index;
and carrying out abnormal state evaluation on the data to be detected based on the physical driving model, determining a quality grade mark according to the result of the abnormal state evaluation, and feeding back and optimizing the characteristic extraction capacity and the abnormal detection performance of the LSTM-AE structure according to the mark result so as to realize the quality control and continuous optimization of the data of the meteorological station.
Preprocessing input data, and sorting historical observation data of a meteorological observation site into a time sequence data set X, wherein the data dimension is n multiplied by m, n represents the sequence length, and m represents the feature quantity. The characteristic includes meteorological elements such as air temperature, humidity, wind speed, etc., the specific numerical value adopts the normalization processing, unifies the numerical value range to between 0-1. And meanwhile, the numerical forecast data set X' corresponding to the time and the site is arranged, and the same data structure as the observed data is maintained.
The method comprises the steps of designing a two-way encoder structure, wherein a first encoder adopts a 2-layer LSTM network, the number of hidden layer nodes is 128 and 64 respectively, inputting observation data X, and outputting 64-dimensional hidden space representation Z1. The second encoder adopts the same network structure, inputs numerical forecast data X', and outputs 64-dimensional hidden space representation Z2. And splicing the Z1 and the Z2 in the characteristic dimension to obtain the fusion characteristic representation Z with 128 dimensions.
The decoder adopts a 2-layer LSTM network structure, the number of hidden layer nodes is 128 and 256 respectively, the fusion characteristic is input to represent Z, and the reconstructed data with the same dimension as the original observed data is output. In the training process, the mean square error is used as a loss function, an Adam optimizer is used for parameter optimization, the learning rate is set to be 0.001, and the training round number is 100.
In a specific implementation, taking weather site data from 1 month to 12 months in city 2020 as an example, a total of 20 observation sites are involved. The observation data comprises three factors of air temperature, relative humidity and wind speed, and the sampling interval is 1 hour. 80% of the data were used for training and 20% were used for testing. The numerical forecast data is from forecast products of a central weather station, and the forecast time is 24 hours. Through model training and testing, the correlation coefficient between the reconstructed data and the original observed data reaches more than 0.92, and the root mean square error is controlled within 0.15 of the standardized data.
According to the method, the effective fusion of the observed data and the numerical forecast is realized through the double-path encoder structure, and the decoder can accurately reconstruct the original data characteristics. The method avoids subjectivity of manually setting weights in the traditional fusion method, and improves objectivity and accuracy of fusion results.
In an optional implementation manner, constructing a physical driving model based on an LSTM-AE structure, extracting physical relations and time sequence evolution features among meteorological elements in the observation time sequence data set through a long-short-time memory network encoder according to the physical driving model, analyzing the observation time sequence data set and the numerical forecasting time sequence data set, and obtaining observation physical features and forecasting physical features includes:
Calculating observation data of an observation time sequence data set through a long-short-term memory network encoder to obtain an observation mean value vector and an observation variance vector, constructing an observation feature normal distribution based on the observation mean value vector and the observation variance vector, and sampling from the observation feature normal distribution to obtain hidden variables of the observation data;
Introducing a priori normal distribution constraint to network parameters of the physical driving model, and calculating physical association weights of meteorological elements under different time scales through a Bayesian neural network based on the priori normal distribution constraint;
Weighting the hidden variables of the observed data according to the physical association weights to obtain reconstructed observed feature vectors, constructing conditional probability distribution based on the reconstructed observed feature vectors, and sampling for multiple times from the conditional probability distribution to obtain observed physical features;
Inputting the numerical forecasting time sequence data set into the long-short-time memory network encoder to obtain a forecasting mean vector and a forecasting variance vector of the numerical forecasting time sequence data set, constructing forecasting feature normal distribution based on the forecasting mean vector and the forecasting variance vector, sampling hidden variables of forecasting data from the forecasting feature normal distribution, and weighting and reconstructing the hidden variables of the forecasting data based on the physical association weight to obtain forecasting physical features representing the physical relationship of meteorological elements.
The physical driving model of the LSTM-AE structure consists of two parts, an encoder that compresses high-dimensional input data into a low-dimensional potential representation, and a decoder that restores the potential representation to the original data dimension. The model input is an observation time sequence data set and a numerical prediction time sequence data set, wherein the observation time sequence data set and the numerical prediction time sequence data set both comprise meteorological elements such as air pressure, temperature, humidity, wind speed, precipitation and the like, the data shape is [ N, T, F ], wherein N is the number of samples, T is the time sequence length, and F is the number of the meteorological elements. In practical applications, n=1000, t=168 (indicating hour observation data for one week in succession), and f=5 (indicating 5 meteorological elements).
The network structure of the long-short time memory network encoder comprises an input layer, an LSTM layer and a full connection layer. The input layer receives observation time sequence data, the LSTM layer consists of 3 layers of LSTM units stacked, each layer comprises 128 hidden neurons and is used for capturing long-term and short-term dependency relations in the time sequence, and the full-connection layer maps the output of the LSTM layer into two groups of vectors, namely a mean vector and a variance vector, wherein the dimension of each group of vectors is 64. To enhance physical constraints, attention mechanisms are added to the encoder to make the model more focused on the physical associations between different meteorological elements. The attention weight matrix size is [5,5], which represents the attention distribution between every two of the 5 meteorological elements.
The specific step of the encoder processing the observation time series data is to input the standardized observation data into an LSTM layer, the LSTM layer controls information flow through a gating mechanism (comprising an input gate, a forget gate and an output gate), and the time sequence characteristics are captured. Taking the observation data of 7 months 1 to 7 months 7 days of a certain weather station 2022 as an example, the data enters an encoder after standardized processing, the time sequence characteristics are extracted through an LSTM layer, and finally an observation mean vector muobs and an observation variance vector sigma2obs are output, wherein the observation mean vector muobs and the observation variance vector sigma2obs are both 64-dimensional vectors. The first 5 elements of the observed mean vector μobs are [0.32, -0.15,0.48,0.21, -0.37], and the first 5 elements of the observed variance vector σ2obs are [0.02,0.04,0.03,0.05,0.02].
An observation feature normal distribution N (μobs,σ2obs) is constructed based on the observation mean vector μobs and the observation variance vector σ2obs. The distribution represents the probability distribution of the observed data in the potential space, and each dimension obeys an independent normal distribution with a mean value of muobs [ i ] and a variance of sigma2obs [ i ]. The hidden variable zobs of the observed data is sampled from the normal distribution of the observed features, and the sampling process uses a re-parameterization technique, namely zobs=μobs+ε×sqrt(σ2obs, wherein epsilon is a random sampling value in the normal distribution N (0, 1). For the above case, if the first 5 elements of ε are [0.5, -0.2,0.1, -0.4,0.3], then the first 5 elements of zobs are [0.32+0.5×sqrt(0.02),-0.15+(-0.2)sqrt(0.04),0.48+0.1sqrt(0.03),0.21+(-0.4)sqrt(0.05),-0.37+0.3sqrt(0.02)]=[0.39,-0.19,0.49,0.12,-0.34].
And introducing priori normal distribution constraint to network parameters of the physical driving model, and converting the model into a Bayesian neural network. The weights in the network are no longer determined values but follow a priori distributions. In specific implementation, two parameters of mean and variance are defined for each weight parameter, the mean is initialized to 0, and the variance is initialized to 0.1. The prior distribution selects normal distribution N (0,0.1) which shows that the weight parameter fluctuates around the vicinity of 0, the variance is 0.1, the weight value range is limited, and the overfitting is prevented.
The physical association weights of the meteorological elements under different time scales are calculated through the Bayesian neural network, and three time scales of short term (hour level), medium term (day level) and long term (week level) are considered. The calculation method is to input the observation data with different time scales into the Bayesian network respectively to obtain the corresponding physical association weight matrix. The short-term physical association weight matrix Wshort is 5, which indicates the mutual influence of 5 meteorological elements on the hour scale, the medium-term physical association weight matrix Wmedium is 5, and the long-term physical association weight matrix Wlong is 5, 5.
Taking the short-term physical association weight matrix Wshort as an example, the element Wshort [ i, j ] represents the influence intensity of the ith meteorological element on the jth meteorological element. For example, Wshort [0,1] =0.7 indicates that air pressure (index 0) has a strong influence on temperature (index 1), and Wshort [1,2] =0.6 indicates that temperature (index 1) has a moderate influence on humidity (index 2).
The process of weighting hidden variables of the observed data according to the physical association weights is to combine the physical association weight matrixes of three time scales with the hidden variables zobs to generate the reconstructed observed feature vector. The specific implementation method is that the hidden variable zobs is divided into three subvectors which respectively represent short-term, medium-term and long-term characteristics, and the dimension of each subvector is 64/3 approximately equal to 21 dimensions. And multiplying the three sub-vectors with the corresponding physical association weight matrix respectively, and then splicing the results to obtain a reconstructed observation feature vector robs, wherein the dimension is 64 dimensions. For example, assuming that the first 21 dimensions of zobs represent short-term features, the 21-dimensional features are recombined into a matrix of [5,21/5] = [5,4] (each row representing a meteorological element), and multiplied by Wshort yields weighted short-term features.
A conditional probability distribution p (x|robs) is constructed based on the reconstructed observation feature vector robs, which represents the probability distribution of the original observation data x given the reconstructed feature robs. The implementation mode is that robs is input into a small neural network, and the output is a parameter of conditional distribution. The neural network comprises two fully connected layers, with a hidden layer size of 128. The network output is a conditionally distributed mean vector μcond and variance vector σ2cond, both of which are vectors of the same dimension as the original observed data, i.e., [ T, F ] = [168,5].
The observed physical characteristics are obtained by sampling multiple times from the conditional probability distribution p (x|robs). The number of samples was set to 50, each sample yielding a vector of the same dimension as the original observed data. The specific sampling method is to randomly extract samples from a normal distribution with a mean value of mucond and a variance of sigma2cond. The results of 50 samplings are averaged to obtain the final observed physical characteristic ofeat, and the dimension is [ T, F ] = [168,5]. For the above case, if the condition distribution parameter at a certain time point t=24 (representing the last hour of the first day) is μcond[24]=[1.0,22.5,65.0,3.2,0.0],σ2cond [24] = [0.01,0.25,1.0,0.04,0.0001] (corresponding to the normalized air pressure, temperature, humidity, wind speed and precipitation, respectively), the observed physical characteristic ofeat [24] at that time point is [1.01,22.6,64.8,3.25,0.0], which indicates the physical characteristics of each meteorological element at that time point.
The process of inputting the numerical forecast time series data set into the long and short time memory network encoder is similar to that of the observed data, except that the input data sources are different. The numerical forecast data also includes air pressure, temperature, humidity, wind speed and precipitation, and the data shape is [ N, T, F ]. After the data is processed by the encoder, a forecast mean vector mupred and a forecast variance vector sigma2pred are obtained, and the forecast mean vector mupred and the forecast variance vector sigma2pred are both 64-dimensional vectors.
A predictive feature normal distribution N (μpred,σ2pred) is constructed based on the predictive mean vector μpred and the predictive variance vector σ2pred. The implicit zpred of the forecast data is sampled from this distribution, again using a re-parameterization technique. For example, if the first 5 elements of the forecast mean vector μpred are [0.35, -0.18,0.50,0.25, -0.40], the first 5 elements of the forecast variance vector σ2pred are [0.03,0.05,0.04,0.06,0.03], the first 5 elements of the random sample value ε are [0.4, -0.3,0.2, -0.5,0.2], the first 5 elements of the hidden variable zpred of the forecast data are [0.35+0.4×sqrt(0.03),-0.18+(-0.3)sqrt(0.05),0.50+0.2sqrt(0.04),0.25+(-0.5)sqrt(0.06),-0.40+0.2sqrt(0.03)]=[0.42,-0.25,0.54,0.13,-0.37].
And weighting and reconstructing hidden variables of the forecast data based on the physical association weights to obtain forecast physical characteristics representing the physical relationship of the meteorological elements. The weighting construction process is the same as the observed data, except that the input hidden becomes zpred. Dividing zpred into three sub-vectors, respectively representing short-term, medium-term and long-term characteristics, multiplying the three sub-vectors by corresponding physical association weight matrixes, and then splicing the results to obtain a reconstructed forecast characteristic vector rpred. Based on rpred, a conditional probability distribution p (x|rpred) is constructed, and the distribution is sampled and averaged for a plurality of times to obtain a final forecast physical characteristic pfeat, and the dimension is [ T, F ] = [168,5].
The observed physical characteristics ofeat and the forecast physical characteristics pfeat obtained by the method capture the statistical characteristics of the original data, integrate the physical association relation and time sequence evolution characteristics among meteorological elements, and provide a reliable basis for subsequent characteristic reconstruction and quality control. Experiments show that compared with the traditional method, the physical characteristics extracted by the method can obviously improve the accuracy of anomaly detection while maintaining the data integrity, and particularly has higher identification capability on anomaly data generated by violation of a physical rule.
In an alternative embodiment, based on the LSTM decoder in the physical driving model processing the observed physical feature and the forecast physical feature, feature reconstruction is completed according to a physical rule and a space-time evolution characteristic of a meteorological element, and generating a reconstructed observation time sequence includes:
based on the LSTM decoder, carrying out initial reconstruction on the observed physical characteristics and the forecast physical characteristics, generating initial reconstruction characteristics, calculating reconstruction errors between the initial reconstruction characteristics and an observed time sequence, and combining the reconstruction errors and hidden states to obtain error propagation characteristics;
combining the error propagation characteristic and the initial reconstruction characteristic, calculating to obtain a compensation coefficient, and correcting the initial reconstruction characteristic based on the compensation coefficient to obtain a dynamic error compensation term;
Combining the dynamic error compensation term with the initial reconstruction feature to obtain a corrected reconstruction feature, calculating weather physical association degree between air pressure, temperature, humidity, wind speed and precipitation in the corrected reconstruction feature, and comparing the weather physical association degree with the physical association degree between corresponding weather elements in an observation time sequence to obtain a deviation value of the corrected reconstruction feature and the observation time sequence on a physical rule;
And adjusting forgetting weight, input weight and output weight in the LSTM decoder according to the deviation value, updating a cell state and a hidden state through the adjusted weight, and carrying out a new round of reconstruction on the observed physical characteristic and the forecast physical characteristic based on the updated cell state and the updated hidden state to generate a reconstructed observed time sequence which accords with the physical rule and the space-time evolution characteristic of the meteorological element.
The LSTM decoder receives as input observed physical features and predicted physical features, both of which are 256-dimensional vectors representing key physical information extracted from the observed data and the predicted data, respectively. The decoder structure consists of an input gate, a forgetting gate, an output gate and a cell state, wherein forgetting weight Wf is set to be 0.3, input weight Wi is set to be 0.4, and output weight Wo is set to be 0.3 in an initialization stage. The LSTM decoder first concatenates the observed and predicted physical features to form a 512-dimensional input vector, and then processes the input vector through a gating mechanism to generate an initial reconstructed feature.
After receiving the input vector, the decoder obtains an intermediate state vector by transforming an internal parameter matrix. The cell state Ct is initially set to be an all-zero vector with a dimension of 128, and the hidden state ht is also initially set to be an all-zero vector with a dimension of 128. The forgetting gate controls the retention proportion of the cell state at the previous moment through Wf, the input gate determines the receiving degree of the current input information through Wi, and the output gate controls the information quantity of the cell state output to the hidden state through Wo. And the decoding process is carried out iteratively, the cell state and the hidden state are updated in each time step, the initial reconstruction characteristic is finally output, and the dimension is the dimension of the original input time sequence, namely 5 meteorological elements including air pressure, temperature, humidity, wind speed and precipitation.
Taking data of a certain meteorological site as an example, the original observation data is observation values of 5 months 1 day to 5 months 5 days once per hour, and the total of 120 time points are meteorological element data. The observation physical characteristics are obtained after the processing of the LSTM encoder, and the numerical forecast data of the same time period are processed by the same encoder to obtain the forecast physical characteristics. After the two features are input into the decoder, an initial reconstructed feature is generated, the shape of which is identical to that of the original observed data, and the initial reconstructed feature is [120,5] which represents 5 meteorological element values at 120 time points.
And when calculating the reconstruction error between the initial reconstruction characteristic and the observation time sequence, adopting a mean square error index. For each meteorological element j at each time point t, the square of the difference between the reconstructed value and the observed value is calculated, and then the average value is taken for all time points and the meteorological elements to obtain the total reconstruction error. In the above case, the average reconstruction error of the initial reconstruction process was 0.08, indicating an average deviation of 8% of the reconstruction values from the observed values.
The reconstruction error and the hidden state are combined to obtain an error propagation characteristic, the reconstruction error is expressed as a matrix E and has a shape of [120,5], and the hidden state ht of the final time step is expressed as a vector H and has a dimension of 128. E is converted into a vector E 'with the same dimension as H through the full connection layer, and then E' is spliced with H to form 256-dimensional error propagation characteristics EP. This feature captures the error distribution during reconstruction and the internal state information of the LSTM decoder, providing a basis for subsequent correction.
The method comprises the steps of combining an error propagation feature EP and an initial reconstruction feature IF, calculating a compensation coefficient, mapping the EP to a dimensional space which is the same as the IF through a full connection layer to obtain a mapped EP', calculating correlation coefficients of each dimension of the EP and the IF to form a correlation matrix CM, selecting a dimension pair with obvious correlation (a dimension pair with an absolute value of the correlation coefficient being greater than 0.6) based on the CM, calculating covariance of the dimension pairs and normalizing to obtain a preliminary compensation coefficient, and finally carrying out weighted adjustment on the preliminary compensation coefficient according to the confidence coefficient of the error propagation feature (through stability assessment of a hidden state) to obtain a final compensation coefficient matrixC, wherein the shape of the final compensation coefficient matrix is the same as that of the initial reconstruction feature and is [120,5].
In the actual case, for temperature data at 14 days 5 months 3, the initial reconstruction value is 28.6 ℃, while the actual observation value is 27.2 ℃, and the reconstruction error is 1.4 ℃. The compensation coefficient of the temperature dimension of the time point is-0.42, which indicates that the initial reconstruction value needs to be negatively adjusted.
Correcting the initial reconstruction characteristic based on the compensation coefficient to obtain a dynamic error compensation term, performing element-level multiplication on the compensation coefficient matrixC and the mapping EP' of the error propagation characteristic to obtain a basic compensation value, setting weight factors such as temperature and air pressure weight factors of 0.8, humidity of 0.7, wind speed of 0.5 and precipitation of 0.6 according to the physical characteristics of different meteorological elements, and multiplying the basic compensation value and the corresponding weight factors to obtain a final dynamic error compensation term DC with the shape of [120,5]. For the temperature data in the above case, the calculated dynamic error compensation term is-1.1 ℃.
The dynamic error compensation term DC is added to the initial reconstruction feature IF to obtain a corrected reconstruction feature CF. For the above case, the corrected temperature value was 28.6 + (-1.1) =27.5 ℃, which is closer to the actual observed value 27.2 ℃ than the initial reconstruction value 28.6 ℃, reducing the reconstruction error from 1.4 ℃ to 0.3 ℃.
When the physical association degree between meteorological elements in the corrected reconstruction feature CF is calculated, the change rate correlation of the two is calculated for the air pressure-temperature relationship, the physical consistency of the temperature-humidity relationship is checked based on a dew point temperature formula, whether the relationship between the air pressure gradient and the wind speed accords with the meteorological law is checked for the air pressure-wind speed relationship, and whether the temperature change before and after precipitation accords with the common meteorological phenomenon is checked for the temperature-precipitation relationship. Through the physical rule tests, a physical association degree matrix PMCF between each meteorological element pair is obtained, the shape is [5,5], and the physical association degree between every two meteorological elements is represented.
The same method calculates a physical association matrix PMOBS between corresponding meteorological elements in the observation time sequence, compares PMCF with PMOBS, calculates a difference value between the two, and obtains a deviation value matrixD on a physical rule. The smaller the deviation value is, the more the reconstruction result accords with the actual weather physical rule. In the above case, the deviation value of the corrected reconstruction feature and the observed data in the temperature-humidity relationship is 0.05, which indicates that the physical correlation of the two is very close.
The weights in the LSTM decoder are adjusted according to the deviation value matrixD, and the weights corresponding to the weather elements with larger deviation need to be adjusted in a larger range. The specific adjustment mode is that the average deviation value of each meteorological element in the deviation matrix is calculated, and the deviation value is converted into a weight adjustment factor. For example, an average deviation of the temperature elements of 0.07 corresponds to an adjustment factor of-0.03, indicating that a slight decrease in weight associated with temperature is required, while an average deviation of the precipitation amount of 0.15 corresponds to an adjustment factor of-0.08, indicating that a significant decrease in weight associated with precipitation amount is required.
And adding the original weight and the adjustment factor to obtain an adjusted weight. For example, for a forgetting weight portion related to temperature processing, the original value is 0.3, after adjustment, becomes 0.3+ (-0.03) =0.27, and for an input weight portion related to precipitation processing, the original value is 0.4, after adjustment, becomes 0.4+ (-0.08) =0.32. The weight adjustment method based on the physical deviation can improve the reconstruction accuracy of specific meteorological elements in a targeted manner.
The process of updating the cell state and the hidden state through the adjusted weight comprises the steps of controlling the forgetting degree of the cell state at the previous moment by using the new forgetting weight, controlling the receiving degree of the current input information by using the new input weight, and controlling the output content of the hidden state by using the new output weight. The updated formula remains unchanged, but the parameter values change, thereby affecting the information circulation mode and the reconstruction result.
And carrying out a new round of reconstruction on the observed physical characteristics and the forecast physical characteristics based on the updated cell state and the hidden state, and generating a final reconstructed observed time sequence. In a new round of reconstruction, the same decoder structure and procedure as the initial reconstruction is used, but with the adjusted weight parameters. Experimental results show that after weight adjustment and new round of reconstruction, the reconstruction accuracy of the meteorological elements is obviously improved, and the average reconstruction error is reduced from initial 0.08 to 0.03. In particular, the temperature reconstruction error is reduced by 78%, the humidity is reduced by 72%, the air pressure is reduced by 85%, the wind speed is reduced by 65%, and the precipitation is reduced by 58% by all elements.
The finally generated reconstructed observation time sequence is not only close to the original observation data in value, but also is more important to keep high consistency with the real observation data in physical relevance among meteorological elements, so that the reconstruction result is ensured to accord with the physical law and the space-time evolution characteristic in the meteorology, and a reliable basis is provided for the quality control of the subsequent meteorological data.
FIG. 2 is a graph comparing accuracy of meteorological data reconstruction with a histogram based on an LSTM decoder, showing performance of three prediction models (conventional LSTM reconstruction, dynamic adaptive error compensation reconstruction, physical rule constraint reconstruction) on five meteorological elements. The traditional LSTM reconstruction represents a prediction model of a basic long-short term memory network, the temperature prediction precision is 76.2%, the humidity prediction is 72.5%, the air pressure prediction is 81.3%, the air speed prediction is 61.8%, the precipitation prediction is 57.4% at the minimum, the dynamic error correction mechanism-introduced model is represented by dynamic error correction reconstruction, compared with the traditional model, the temperature prediction is 83.5%, the humidity prediction is 78.3%, the air pressure prediction is 88.2%, the air speed prediction is 70.6%, the precipitation prediction is 68.1%, the physical law constraint reconstruction represents an optimization model combined with physical law constraint, the optimal effect is achieved on all indexes, the temperature prediction precision is 91.8%, the humidity prediction is 83.7%, the air pressure prediction is 96.5% at the maximum, the air speed prediction is 77.2%, and even the most challenging precipitation prediction is 74.3%. The overall data shows that the model introducing the physical rule constraint improves 15-20 percentage points on average compared with the traditional LSTM model, and the obvious effect of the physical rule constraint in improving the weather element prediction accuracy is fully verified.
In an alternative embodiment, combining the error propagation feature and the initial reconstruction feature, calculating a compensation coefficient, and correcting the initial reconstruction feature based on the compensation coefficient to obtain a dynamic error compensation term includes:
The method comprises the steps of mapping error propagation characteristics and initial reconstruction characteristics to characteristic spaces with the same dimension to obtain error propagation characteristics after projection and initial reconstruction characteristics after projection, dividing the initial reconstruction characteristics after projection into a plurality of local areas, calculating a mean value and a standard deviation for each local area, constructing local characteristic statistics based on the mean value and the standard deviation, and carrying out combined operation on the local characteristic statistics and the error propagation characteristics after projection to generate local compensation coefficients;
calculating a correlation matrix between the initial reconstruction feature after projection and the error propagation feature after projection, calculating a global feature correlation degree based on the correlation matrix, and generating a global compensation coefficient according to the global feature correlation degree;
performing feature stitching on the initial reconstruction feature after projection and the error propagation feature after projection, generating a weight coefficient according to the result of feature stitching, and performing weighted combination on the local compensation coefficient and the global compensation coefficient based on the weight coefficient to obtain a final compensation coefficient;
And performing element-level multiplication operation on the final compensation coefficient and the projected error propagation feature, wherein each dimension feature of the final compensation coefficient is used as a weight, the weight is weighted with the feature of the dimension corresponding to the projected error propagation feature, the weighted features of the dimensions are combined to obtain a compensation feature, the compensation feature is fused with the initial reconstruction feature, a corrected reconstruction feature is generated, and a dynamic error compensation term is obtained according to the difference value between the corrected reconstruction feature and the initial reconstruction feature.
The combined processing of the error propagation feature and the initial reconstruction feature first requires mapping both to feature spaces of the same dimension. Assuming that the original dimension of the error propagation feature EP is 64 dimensions, the original dimension of the initial reconstruction feature IR is 128 dimensions, and both are mapped to 96-dimensional feature space by the fully connected layer, respectively. In a specific implementation, the error propagation characteristic EP is linearly transformed by using a parameter matrix WEP (with the size of 64×96) to obtain the error propagation characteristic EP 'after projection, and similarly, the initial reconstruction characteristic IR is linearly transformed by using a parameter matrix WIR (with the size of 128×96) to obtain the initial reconstruction characteristic IR' after projection. In the transformation process, elements of each parameter matrix are obtained through optimization by a gradient descent method, so that the transformed features can retain main information of original features.
The projected initial reconstructed feature IR' needs to be divided into a plurality of local regions to capture the error characteristics at different locations. In practice, 96-dimensional IR' features are divided into 8 local regions, each region containing 12 consecutive feature dimensions. For the i-th local region IRi' (i=1, 2..8), the mean μi and standard deviation σi of all feature values in that region are calculated. For example, for the 1 st local region IR1 '(containing the 1 st to 12-dimensional features of IR'), the mean μ1 thereof is the arithmetic average of the 12 feature values, and the standard deviation σ1 is a measure of the degree of dispersion of the 12 feature values.
And constructing local characteristic statistics LSi based on the calculated mean mu i and standard deviation sigma i. For the i-th local region, its statistic LSi is a tuple (μi, σi). Taking actual data as an example, if the average μ1=0.32 and the standard deviation σ1=0.15 of IR1' in a certain calculation, the corresponding local feature statistic ls1= (0.32,0.15). And combining the local characteristic statistics LSi with a corresponding region EPi 'in the error propagation characteristics EP' after projection. The combination mode is that the mean value mu EPi of the EPi 'is compared with mu i in the LSi to calculate the relative deviation, and the standard deviation sigma EPi of the EPi' is compared with sigma i in the LSi to calculate the fluctuation consistency. By comparing the two results, a local compensation coefficient LCi is generated. Typically, if μepi is close to μi, σepi differs significantly from σi, indicating that this region requires greater compensation adjustment.
A correlation matrix CM between the initial reconstructed features IR 'after projection and the error propagation features EP' after projection is calculated. The correlation matrix CM has a size of 96 x 96, where the element CM j, k represents the degree of correlation between the j-th dimensional feature of IR 'and the k-th dimensional feature of EP'. The correlation degree is obtained by calculating covariance of the two features on the training data set and normalizing, and the range of the correlation degree is [ -1,1]. The closer the correlation is to 1, the stronger the positive correlation, the closer to-1, the stronger the negative correlation, and the closer to 0, the no obvious correlation.
Global feature correlations GR are calculated based on the correlation matrix CM. The global feature association GR is obtained by performing statistical analysis on elements with larger absolute values in CM (e.g., |cm [ j, k ] | > 0.5). Specifically, the ratio of the number of elements satisfying the condition to the total number of elements, and the average absolute value of these elements are calculated. In practical applications, if the element ratio satisfying |cm [ j, k ] | >0.5 is 30%, and the average absolute value of these elements is 0.72, then a moderate global correlation exists between the two features.
And generating a global compensation coefficient GC according to the global feature association degree GR. A larger global compensation factor (e.g., gc=0.8) is set when GR shows a stronger association (e.g., association element ratio >40% and average absolute value > 0.6), a medium global compensation factor (e.g., gc=0.5) is set when GR shows a medium association, and a smaller global compensation factor (e.g., gc=0.2) is set when GR shows a weaker association. The global compensation coefficient GC is a 96-dimensional vector, each dimension corresponding to one dimension in the feature space.
The initial reconstructed feature IR 'after projection is feature stitched with the error propagation feature EP' after projection. The splicing mode is direct connection, i.e. IR 'and EP' are connected end to end, so as to form a 192-dimensional splicing characteristic CF. The characteristic splicing example is that if the IR 'front three-dimension is [0.2,0.4, -0.1], the EP' front three-dimension is [0.3,0.1, -0.2], the spliced CF front six-dimension is [0.2,0.4, -0.1,0.3,0.1, -0.2].
And generating a weight coefficient WC according to the characteristic splicing result CF. The weight coefficient WC is calculated by a simplified neural network comprising a hidden layer (192-32) and an output layer (32-1). The network input is CF, the output is a scalar value, ranging between [0,1 ]. This value represents the weight of the local compensation coefficient in the final compensation coefficient calculation, while the weight of the global compensation coefficient is 1 minus this value.
And carrying out weighted combination on the local compensation coefficient LC and the global compensation coefficient GC based on the weight coefficient WC to obtain a final compensation coefficient FC. The calculation was fc=wc×lc+ (1-WC) ×gc. Where LC is a 96-dimensional vector formed by connecting compensation coefficients LCi of 8 local areas, GC is the 96-dimensional global compensation coefficient calculated previously. In practical cases, if wc=0.6, lc=0.75, gc=0.45 in a certain dimension, fc=0.6×0.75+0.4×0.45=0.45+0.18=0.63 in that dimension.
And performing element-level multiplication operation on the final compensation coefficient FC and the error propagation characteristic EP' after projection. The specific operation is that, for the j-th dimension characteristic EP ' j of EP ', the j-th dimension coefficient FC [ j ] of FC is multiplied to obtain a weighted characteristic value WF [ j ] =EP ' [ j ] ×FC [ j ]. The weighted eigenvalues of all dimensions together constitute a 96-dimensional compensation characteristic CF. For example, if EP' [10] =0.42, fc [10] =0.63, WF [10] =0.42×0.63= 0.2646.
The compensation feature CF is fused with the initial reconstruction feature IR to generate a corrected reconstruction feature CR. Since the dimension of CF is 96 and the dimension of IR is 128, it is necessary to map CF back to the feature space of IR first. With the fully connected layer, the parameter matrix WCF (96×128 in size) converts CF into 128-dimensional CF'. Then, CF 'is added to IR, cr=ir+cf'. The addition, rather than multiplication, is employed here to preserve the body structure of the original reconstructed feature while targeted adjustments are made to specific dimensions.
And calculating a dynamic error compensation term DC according to the difference value between the corrected reconstruction feature CR and the initial reconstruction feature IR. The calculation is dc=cr-ir=cf'. The dynamic error compensation term DC is also a 128-dimensional vector representing the amount of adjustment made to each dimension of the initial reconstruction feature.
In a specific application case, in a certain image reconstruction task, the 57 th-dimensional eigenvalue of IR is originally 0.835, which represents the intensity of a certain texture feature in an image. After the compensation calculation, the corresponding dynamic error compensation term DC [57] =0.127, and the corrected reconstructed feature CR [57] =0.962. The actual verification shows that the corrected characteristic value reflects the texture characteristics of the original image more accurately, and errors introduced in the characteristic extraction process are reduced, so that the quality of the reconstructed image is improved. The reconstruction error on the test dataset was reduced from 0.085 on average to 0.042, improving the accuracy by about 50%.
The introduction of the dynamic error compensation term effectively relieves the problem of information loss in the characteristic reconstruction process, and has obvious effect on processing high-frequency details and complex texture areas. By comprehensively analyzing local and global features, the method can provide a self-adaptive compensation strategy aiming at different types of errors, and the accuracy and stability of a reconstruction result are obviously improved.
FIG. 3 is a graph of accuracy versus histogram for different error compensation methods according to an embodiment of the present invention, showing performance comparisons of three different prediction methods in a meteorological element prediction scenario. The traditional reconstruction method represents a basic model adopting a conventional data reconstruction technology, the performance is relatively conservative, 67.0% of the temperature prediction scene is achieved, 55.0% of the precipitation prediction scene is achieved, 62.0% of the wind speed prediction scene is achieved, the local compensation method represents an algorithm introducing a local data correction mechanism, the performance is obviously improved, the temperature prediction accuracy is improved to 76.0%, the precipitation prediction is achieved to 66.0%, the wind speed prediction is improved to 72.0%, the technical scheme represents an innovative comprehensive optimization algorithm, the optimal performance is shown in all test scenes, the accuracy of the temperature prediction scene is up to 90.0%, the precipitation prediction scene is up to 81.0%, and the wind speed prediction scene is up to 87.0%. From the overall data, the technical scheme improves the performance of about 20-25 percent compared with the traditional reconstruction method and 10-15 percent compared with the local compensation method, and fully proves the remarkable effect of the scheme on the aspect of improving the meteorological element prediction precision. In particular, in a precipitation scene which is difficult to predict, the technical scheme still maintains higher prediction accuracy, and shows the robustness and adaptability of the method in complex meteorological element prediction tasks.
In an optional implementation manner, performing time sequence division on the observation time sequence data set to obtain a training set and a verification set, and taking the deviation of the reconstructed observation time sequence and the original observation time sequence as an evaluation index includes:
Performing feature reconstruction on the observation time sequence data set to obtain a reconstructed observation time sequence, wherein the reconstructed observation time sequence and the original observation time sequence have the same time sequence distribution feature;
And calculating the time sequence deviation of the reconstructed observation time sequence and the original observation time sequence at the corresponding time, and taking the time sequence deviation as an evaluation index for evaluating the reconstruction effect.
The observation time series data set is divided in time sequence, for example, the first 800 time point data of the observation time series data set with the total length of 1000 time points are divided into training sets, and the last 200 time point data are divided into verification sets. The division method maintains the consistency and time sequence characteristics of the time sequence, and is beneficial to the learning of the model on the time sequence characteristics.
The implementation of feature reconstruction for the observation time series dataset may be performed using a self-encoder model. The self-encoder comprises two parts, an encoder and a decoder, the encoder maps the original observed time sequence to a potential spatial representation, and the decoder reconstructs the potential spatial representation into a sequence having the same timing distribution characteristics as the original observed time sequence. Specifically, for a certain observation time series sample X, which contains data for n time points, the encoder maps X to a potential representation Z, which the decoder then maps to a reconstructed sequence X'. In practical applications, for example, in a certain power load prediction scenario, the original observed time sequence is power load data of every hour in a week, namely 168 data points, after the self-encoder performs feature reconstruction, the obtained reconstructed time sequence should also include 168 data points, and the time sequence distribution features such as periodicity, trend and the like of the original sequence are maintained.
In order to ensure that the reconstructed observation time sequence has the same time sequence distribution characteristics as the original observation time sequence, a mean square error can be used as a loss function when the self-encoder model is trained, and a time sequence correlation constraint term can be added into the loss function. The constraint term can be realized by calculating the autocorrelation coefficient difference of the original sequence and the reconstructed sequence, so that the reconstructed sequence is ensured to maintain the time sequence dependency relationship of the original sequence. For example, in practical application, temperature observation data of a weather monitoring station has obvious characteristics of daily fluctuation and seasonal variation, and a temperature sequence after characteristic reconstruction should also show the time sequence characteristics.
After the feature reconstruction is completed, a time sequence deviation between the reconstructed observation time sequence and the original observation time sequence at a corresponding time is required to be calculated, and the time sequence deviation is used as an evaluation index. The timing deviation may be calculated by absolute or relative error of the difference value of the corresponding time values. Specifically, assuming that the original observation time series is x= (X1,x2...xn) and the reconstructed observation time series is X '= (X'1,x'2...x'n), the absolute deviation |xi-x'i | or the relative deviation |xi-x'i|/|xi | of each time point can be calculated.
In order to fully evaluate the reconstruction effect, various statistical indicators may be calculated, including mean absolute deviation, root mean square deviation, maximum deviation, etc. For example, for time series data generated by an industrial sensor, when the time value of the original sequence is 20.5 and the time value corresponding to the reconstructed sequence is 19.8, the absolute deviation is 0.7, and the relative deviation is 0.034 or 3.4%. By calculating the average absolute deviation of the whole sequence, a value of e.g. 0.65 can be obtained, representing the average degree of deviation of the reconstructed sequence from the original sequence.
In practical applications, the timing deviation evaluation index may be further refined into deviations on different time scales, such as short-term deviation (hour level), medium-term deviation (day level), and long-term deviation (month level). This is particularly important for time series with multi-time scale features. Taking sales data of a retail enterprise as an example, the sales data has fluctuation characteristics in daily life, weekend effects and seasonal changes, and the capability of the reconstruction model in capturing multi-scale time sequence characteristics can be comprehensively estimated by calculating deviations on different time scales.
And evaluating whether the reconstructed sequence maintains the time sequence dependent structure of the original sequence by utilizing an autocorrelation function comparison. For example, for a time series of profitability of a certain financial market, the autocorrelation coefficients of the original sequence and the reconstructed sequence can be calculated, and the similarity of the two in statistical properties can be checked in comparison.
Visualization techniques are also an effective tool in the evaluation process. By drawing the original sequence and the reconstructed sequence in the same coordinate system, the fitting degree and deviation of the original sequence and the reconstructed sequence in different time periods can be intuitively observed. For example, for an air quality index sequence of an environmental monitoring station, the performance of the reconstruction model in terms of outlier capture and mutation point identification can be found through a time sequence curve comparison graph.
Based on the time sequence deviation evaluation index, different characteristic reconstruction methods can be compared and optimized. In an intelligent traffic flow prediction system, different self-encoder structures (such as a variation self-encoder, a recurrent neural network self-encoder and the like) can be tried, and the most suitable reconstruction method is selected according to a time sequence deviation index, so that high-precision reconstruction of a traffic flow time sequence is realized, and a reliable basis is provided for subsequent abnormality detection and prediction tasks.
In an alternative embodiment, performing an abnormal state evaluation on the data to be detected based on the physical driving model, and determining the quality grade mark according to the result of the abnormal state evaluation includes:
Inputting data to be detected into a physical driving model to generate a state feature vector, constructing a state transition diagram based on the state feature vector, and generating an initial abnormal state predicted value through matching degree analysis of the state transition diagram and physical constraint conditions;
constructing prior distribution based on the initial abnormal state predicted value, carrying out layered repeated sampling on the prior distribution by adopting a Markov chain Monte Carlo method, carrying out iterative updating on node weight and edge probability of the state transition diagram according to a sampling result, and obtaining parameter posterior distribution of the optimal parameter configuration of the state transition diagram through multiple rounds of iterative optimization;
performing multiple layered sampling on the parameter posterior distribution by using the Markov chain Monte Carlo method, combining the multiple layered sampling result with the optimal parameter configuration to construct a new state transition diagram, and obtaining an abnormal state score by comparing the difference of the new state transition diagram and the original state transition diagram in node weight and edge probability;
Carrying out probability density estimation on node distribution in the new state transition diagram, analyzing fluctuation characteristics of node weights and edge probabilities based on the Markov chain Monte Carlo method, and calculating to obtain uncertainty measurement based on the fluctuation characteristics and confidence intervals of the physical constraint conditions;
Establishing a segmented mapping relation according to the relevance between the numerical distribution of the abnormal state scores and the topological structure of the new state transition diagram, and carrying out self-adaptive adjustment on the threshold value of the segmented mapping relation based on the uncertainty measure to determine a quality grade mark.
As shown in fig. 4, the method includes:
After the detected data is input into the physical driving model, the LSTM encoder is used for extracting the characteristics to obtain the state characteristic vector. The state feature vector is a 64-dimensional vector, each dimension representing a physical correlation characteristic between meteorological elements. Taking four meteorological elements of temperature, humidity, air pressure and wind speed as an example, the dimension of the original data of the input model is [ n,4], wherein n is the length of a time sequence, and the state characteristic vector obtained after being processed by the encoder is a [64] dimensional vector.
A state transition graph is constructed based on the obtained state feature vectors, the graph comprising a plurality of nodes and connection edges. Nodes represent meteorological states and edges represent transition probabilities between states. In concrete implementation, 16 most obvious dimensions in the feature vector are selected, and the feature vector is divided into 8 representative state nodes through a K-means clustering algorithm. The initial weight of each node is determined by the ratio of the corresponding number of samples to the total samples. The edge weight is calculated by the state transition frequency of the adjacent time steps to form an initial state transition matrix T, and the dimension is [8,8].
And then carrying out matching degree analysis on the state transition diagram and the predefined physical constraint condition. Physical constraints include a rule set with a daily temperature variation of no more than 15 ℃, a relative humidity of between 0% and 100%, a rate of change of air pressure of no more than 3 hPa/hour, etc., containing a total of 12 constraint rules. And obtaining an initial abnormal state predicted value by calculating the coincidence degree of each node state and the physical constraint condition in the state transition diagram, wherein the predicted value is a scalar between 0 and 1, 0 represents complete coincidence with the physical rule, and 1 represents complete abnormality.
The prior distribution is constructed based on the initial abnormal state prediction value. The prior distribution adopts Beta distribution, and the parameters alpha and Beta are set based on the known proportion of normal samples to abnormal samples in the historical data. In practical applications, the parameter α=3, β=12 is an empirical value, biasing the distribution towards a normal sample.
The prior distribution is repeatedly sampled in a layering mode by adopting a Markov chain Monte Carlo method, and 500 sample points are extracted in each round. And for each sample point, updating the node weight and the edge probability according to the topological structure of the state transition diagram. The updating rule is adjusted according to the Bayesian posterior probability calculation method by combining the consistency degree of the current sample and the historical observation data. Specifically, if the state represented by a certain node accords with the weather rules of the same time period and the same region of the history, the node weight is increased, and if the state change represented by a certain transition edge exceeds a physical reasonable range, the edge probability is reduced.
And gradually converging the node weight and the edge probability of the state transition diagram through 10 rounds of iterative optimization, and finally obtaining the parameter posterior distribution of the optimal parameter configuration. The posterior distribution reflects the possibility of the values of different parameters, and provides stable and reliable parameter estimation for the state transition diagram.
And carrying out 200 times of hierarchical sampling on the posterior distribution of the parameters by using a Markov chain Monte Carlo method, and generating a group of parameter configurations based on the posterior distribution each time of sampling. Combining these parameter configurations with the optimal parameter configuration to construct a new state transition diagram. The new state transition diagram also comprises 8 nodes, but the node weights and the edge probabilities are optimized and adjusted, so that the distribution characteristics of actual meteorological data are more met.
And obtaining the abnormal state score by comparing the difference of the new state transition diagram and the original state transition diagram in node weight and edge probability. The difference calculation adopts a relative entropy method, namely KL divergence between two distributions. If the difference between the two graphs is larger, the difference indicates that the original graph and the optimized graph have obvious deviation, and the corresponding data have abnormality, and if the difference is smaller, the data conform to the physical rule. In practical applications, KL divergence values below 0.05 are generally considered normal, with a range of 0.05-0.15 being slightly abnormal, and above 0.15 being marked as significantly abnormal.
And (3) estimating the probability density of node distribution in the new state transition diagram, and calculating the bandwidth parameter h=0.08 by using a Gaussian kernel function by adopting a kernel density estimation method. The estimation result reflects the distribution condition of the weights of all the nodes, the peak value represents the weight value, and the distribution width reflects the uncertainty degree.
And analyzing fluctuation characteristics of node weights and edge probabilities based on a Markov chain Monte Carlo method. And calculating a standard deviation through 200 sampling results to form an uncertainty index. In the analysis of fluctuation characteristics, it is found that the standard deviation of the node weights corresponding to normal data is usually below 0.02, while abnormal data corresponds to higher fluctuation.
And calculating by combining the confidence interval of the physical constraint condition to obtain the comprehensive uncertainty measure. Confidence intervals are statistically derived based on historical data, such as 95% confidence intervals for temperature daily ranges [ -10, +8 ]. If the data falls into the interval, the certainty score is increased, otherwise, the uncertainty measure is increased. The final uncertainty metric value range is 0-1, where 0 represents a complete determination and 1 represents a very high uncertainty.
And establishing a segmentation mapping relation according to the relevance between the numerical distribution of the abnormal state scores and the topological structure of the new state transition diagram. The mapping relationship divides the abnormal state score into a plurality of intervals, each interval corresponding to a quality level. The dividing interval is as follows, the score is 0-0.05 corresponding to 'high quality' data, 0.05-0.15 corresponding to 'good' data, 0.15-0.3 corresponding to 'suspicious' data, 0.3-0.5 corresponding to 'error' data, and more than 0.5 corresponding to 'serious error' data.
The threshold value of the segment mapping relation is adaptively adjusted based on the uncertainty measure. When the uncertainty measure is higher (e.g., above 0.4), the threshold boundary adjusts 10% downward, making the system more prone to marking the data as a lower quality level, and when the uncertainty measure is lower (e.g., below 0.1), the threshold boundary adjusts 5% upward, allowing for a more relaxed decision criterion. Such an adaptive adjustment mechanism can dynamically optimize the quality assessment criteria based on the degree of certainty of the data.
And finally determining the quality grade mark of the data to be detected through the processing steps. The marking result contains two pieces of information, i.e. "good (0.92)" indicating that the data is rated as good and the confidence of the decision is 92%. The quality grade marking result is fed back to the LSTM-AE model and is used for subsequent model training and optimization to form a closed-loop optimization mechanism.
In the practical application case, a set of temperature data reported from a meteorological site is shown to dip from 23 ℃ to 5 ℃ within 1 hour, and the system finds that the change results in an abnormal state score of 0.42 through state transition diagram analysis, the uncertainty measure is 0.18, and finally the error (0.86) is marked, so that the abnormal data is caused by sensor faults or data transmission errors.
In an alternative embodiment, combining the result of the multiple hierarchical sampling with the optimal parameter configuration to construct a new state transition diagram, and obtaining the abnormal state score by comparing the difference between the new state transition diagram and the original state transition diagram in node weight and edge probability includes:
Constructing a node distribution probability matrix based on the multi-layered sampling result, carrying out convolution operation on the node distribution probability matrix and the optimal parameter configuration to obtain node weights, constructing node distribution of a state transition diagram based on the node distribution probability matrix, carrying out Fourier transform on the node distribution to obtain frequency domain characteristics, optimizing the node distribution probability matrix based on the frequency domain characteristics to obtain edge probabilities, and combining the node weights and the edge probabilities to construct a new state transition diagram;
Performing Fourier transform on the node weights of the new state transition diagram and the original state transition diagram to obtain node weight frequency domain features, calculating the log likelihood ratio of the node weight frequency domain features to obtain node weight difference values, performing Fourier transform on the edge probabilities of the new state transition diagram and the original state transition diagram to obtain edge probability frequency domain features, calculating the log likelihood ratio of the edge probability frequency domain features to obtain edge probability difference values, and performing weighted fusion on the node weight difference values and the edge probability difference values to obtain abnormal state scores.
And constructing a node distribution probability matrix based on the result of the multiple hierarchical sampling. Specifically, assume that 10 hierarchical samples are taken in a network system, each sample obtaining state data for 5 key nodes. The sampling results are arranged to obtain a node distribution probability matrix P with the size of 10 multiplied by 5, wherein P [ i ] [ j ] represents the probability distribution value of the j-th node in the ith sampling. For example, P has the value [ (0.12,0.23,0.18,0.27,0.20 ], [0.15,0.21,0.19,0.25,0.20], ].
And carrying out convolution operation on the node distribution probability matrix and the optimal parameter configuration to obtain node weights. Let the optimal parameters be configured as a weight vector w= [0.3,0.2,0.15,0.25,0.1], representing the importance weights of the 5 key indicators. For each row Pi in the node distribution probability matrix P, calculating convolution of Pi and W to obtain node weight NW. In particular implementation, for each node j, all sampled values of that node in P are multiplied by a corresponding weight and summed, i.e., NW [ j ] = sum (P [ i ] [ j ] ×w [ j ]), where i is from 1 to 10. For example, for node 1, its weight is calculated as (0.12+0.15+)/10×0.3=0.135.
And constructing node distribution of the state transition diagram based on the node distribution probability matrix. The node distribution probability matrix P is converted into the distribution condition of the nodes in the state space. Specifically, for each node j, its probability distribution values in all samples are collected, forming a distribution vector ND j. For example, the distribution vector of node 1 is [0.12,0.15, ], representing the distribution of that node over 10 samples.
And carrying out Fourier transform on the node distribution to obtain frequency domain characteristics. A discrete Fourier transform is applied to the distribution vector ND [ j ] of each node, and the distribution vector ND [ j ] is converted into a frequency domain feature FD [ j ]. The purpose of this step is to extract the periodic features of the node distribution. In practice, a fast fourier transform algorithm may be used. For example, the distribution vector [0.12,0.15, ] of node 1 is fourier transformed to yield the frequency domain features [0.135,0.02-0.01i, 0.005+0.002i.
And optimizing the node distribution probability matrix based on the frequency domain characteristics to obtain the edge probability. And optimizing an original node distribution probability matrix by utilizing the frequency domain feature FD, and calculating transition probability among nodes, namely edge probability EP. The specific method is that for any two nodes j and k, the correlation of the frequency domain characteristics is calculated, and the original probability distribution value is combined to generate the edge probability EP [ j ] [ k ]. For example, the edge probability of node 1 to node 2 is 0.35, indicating a probability of a transition from state 1 to state 2 of 35%.
And combining the node weight and the edge probability to construct a new state transition diagram. Using the calculated node weights NW and edge probabilities EP, a new state transition graph Gnew is constructed. In this graph, each node j has a weight NW [ j ], and the edges of node j to node k have a weight EP [ j ] [ k ]. For example, node 1 in Gnew has a weight of 0.135 and the edge from node 1 to node 2 has a weight of 0.35.
And carrying out Fourier transform on the node weights of the new state transition diagram and the original state transition diagram to obtain node weight frequency domain characteristics. Assuming that the node weight in the original state transition diagram Gorig is OW, fourier transforms are performed on NW and OW, respectively, to obtain the frequency domain features FNW and FOW. For example, for node 1, weight 0.135 in the new graph is [0.135,0,0, ] after fourier transform, while weight 0.15 in the original graph is [0.15,0,0, ].
And calculating the log likelihood ratio of the node weight frequency domain characteristics to obtain the node weight difference value. For each node j, the log-likelihood ratio of its frequency characteristics in the old and new graphs is calculated, i.e., ND [ j ] =log (FNW [ j ]/FOW [ j ]). This step can quantify the degree of variation of node weights in the frequency domain. For example, the weight difference value of the node 1 is log (0.135/0.15) = -0.105.
And carrying out Fourier transform on the edge probabilities of the new state transition diagram and the original state transition diagram to obtain the edge probability frequency domain characteristics. Likewise, fourier transforms are performed on the edge probabilities EP and OEP in Gnew and Gorig to obtain the frequency domain features FEP and FOEP. For example, the probability of an edge of node 1 to node 2 is [0.35,0,0, ] after fourier transform in the new graph, and the probability of 0.4 is [0.4,0,0, ] after fourier transform in the original graph.
And calculating the log likelihood ratio of the edge probability frequency domain features to obtain an edge probability difference value. For each pair of nodes (j, k), the log likelihood ratio of its edge probability in the frequency domain is calculated, namely ED [ j ] [ k ] = log (FEP [ j ] [ k ]/FOEP [ j ] [ k ]). For example, the edge probability difference value from node 1 to node 2 is log (0.35/0.4) = -0.134.
And carrying out weighted fusion on the node weight difference value and the edge probability difference value to obtain an abnormal state score. Importance weights alpha and beta (alpha+beta=1) of the node weight differences and the edge probability differences are set, and a composite score s=alpha×sum (ND [ j ])+beta×sum (ED [ j ] [ k ]). For example, if alpha=0.6, beta=0.4, node difference sum is-0.5, edge difference sum is-0.7, anomaly score s=0.6× (-0.5) +0.4× (-0.7) = -0.58. The larger the score value, the more significant the difference between the new and old state transition diagrams, and the higher the probability that the system is in an abnormal state. In practical application, a threshold T may be set, and when S > T, the system is judged to enter an abnormal state, and a corresponding alarm mechanism is triggered.
By the method, accurate monitoring of the system state is realized, abnormal changes in the state transition process can be effectively captured, and the safety and stability of the system are improved.
FIG. 5 is a graph of abnormal state scoring accuracy versus graph showing the evaluation performance versus curves for three different methods at different noise levels, in accordance with an embodiment of the present invention. Wherein the conventional feature comparison method represents a basic algorithm based on conventional feature extraction and comparison, the initial performance is 90.0%, the performance gradually decreases with increasing noise level, the noise level decreases to 84.2%, the noise level decreases to 79.8%, the noise level decreases to 74.4%, the noise level decreases to 68.5%, the noise level decreases to 63.2%, the noise level decreases to 58.7% and the noise reaches 30%, the initial performance is 93.8%, the performance at different noise levels is 88.1% (5%), 83.5% (10%), 77.3% (15%), 71.8% (20%), 67.4% (25%) and 62.6% (30%), respectively, the present technical scheme represents a novel comprehensive evaluation algorithm, which exhibits the strongest noise immunity, the initial performance is up to 97.6%, the performance decreases most slowly during the noise level increasing process, and the performance decreases to 95.3% (5%), 91.2% (10%), 87.7% (15.3%), 81.3% (20.3%), 75% (25%), and 75.6% (30%). Overall, the technical scheme maintains optimal performance under all noise levels, has better robustness compared with other two methods, and particularly still maintains higher evaluation accuracy under high noise environments (noise level > 20%), thereby fully proving the superiority of the scheme under complex noise environments.
The method further comprises the steps of:
In the data preprocessing stage, years of meteorological observation data of national weather stations and numerical forecast data at corresponding stations are collected. Taking the temperature observation of the national stations 2018 to 2022 as an example, the hour-by-hour temperature observation of 100 national stations in the whole country and the short-term temperature forecast data at the same time are selected. For numerical forecast data, forecast values of site locations are extracted from an original forecast field by a bilinear interpolation method. The data normalization process adopts a mean normalization method, namely, the original data is divided by standard deviation after subtracting the mean. For air temperature data, a mean of about 15.3 ℃ and a standard deviation of about 10.2 ℃ were calculated. After normalization, the air temperature data is mostly distributed in the range of [ -3,3 ].
According to the time series characteristic, the observed data of every 24 hours are organized into a series of samples, and each sample comprises observed values of four meteorological variables of air temperature, air pressure, humidity and wind speed and forecast values of the air temperature. After removal of the sequence containing the missing test, a total of about 15 ten thousand samples of the effective sequence were obtained. The first 80% (about 12 ten thousand) was used for model training and the remaining 20% (about 3 ten thousand) was used for model verification in time series.
In the model architecture design stage, an LSTM-AE structure of the double encoder is constructed. The first encoder processes a multivariate observation sequence comprising two stacked LSTM layers, each layer comprising 128 concealment units. The input is a tensor of shape [24,4] representing observations of 4 meteorological variables over 24 hours. The second encoder processes the air temperature prediction sequence, which includes a layer of LSTM, with 64 hidden units. The input is a tensor of the shape [24,1] representing the forecast value of the air temperature within 24 hours. The outputs of the two encoders are combined into a 192-dimensional fused representation by a splice layer. The decoder uses two layers of stacked LSTM, 128 hidden units per layer, to reconstruct the fused representation into the original observation sequence.
And in the LSTM unit, the forget gate, the input gate and the output gate respectively control the retention of historical information, the influence of current input and the screening of output content through different weight matrixes. Taking the first layer LSTM of the first encoder as an example, the input feature dimension is 4, the hidden state dimension is 128, and the weight matrix dimension of the forgetting gate is [132,128], which contains about 1.7 ten thousand parameters. Similarly, the input gate and the output gate each have the same scale of parameter numbers. The parameters are continuously optimized in the training process, so that the model can effectively capture the time dependence of meteorological data.
In the model training and super-parameter optimization stage, a small batch gradient descent method with a batch size of 64 is adopted to train the model, the initial learning rate is set to be 0.001, and an Adam optimizer is adopted. The loss function selects Mean Square Error (MSE), and calculates the mean square error between the observed value and the reconstruction value aiming at the target quality control variable (air temperature). In the training process, when the loss on the verification set is no longer reduced for 5 continuous periods, the learning rate is halved. The total training period is set to 100, but the model will typically achieve the best performance at 50-70 cycles.
For the key superparameter LSTM hidden unit dimension, a grid search from 32 to 256 (in power of 2 step) is performed. The results show that the MSE on the validation set reaches a minimum of 0.0042 when the first encoder hidden unit is 128 and the second encoder hidden unit is 64. The total number of model parameters was about 30 ten thousand, and the training time on a standard GPU was about 2 hours.
In the anomaly score calculation and threshold determination phase, a reconstruction error is calculated as an anomaly score for each sample in the validation set. Statistical analysis shows that the reconstruction error of normal air temperature data is mostly lower than 0.05, while the reconstruction error of abnormal data is usually higher than 0.1. By calculating the percentile of the anomaly scores of the validation set, four thresholds were obtained, 0.5 percentile approximately 0.025,0.05 percentile approximately 0.085,0.005 percentile approximately 0.135,0.0005 percentile approximately 0.175. These four thresholds correspond to anomaly detection criteria of different degrees of stringency, respectively.
In the model application phase, the air temperature observation of a certain site 2023 on 1 month and 5 days will be described as an example. The 24-hour day air temperature observation sequence is [-5.2,-5.8,-6.3,-6.5,-6.2,-6.0,-5.8,-5.1,-4.3,-3.2,-2.1,-1.0,0.2,0.5,0.3,-0.6,-1.8,-2.5,-3.1,-3.8,-4.2,-4.6,-4.9,-5.1]℃, and the corresponding forecast sequence is [-5.5,-6.0,-6.2,-6.3,-6.1,-5.9,-5.5,-4.9,-4.0,-3.0,-1.8,-0.8,0.0,0.4,0.1,-0.8,-2.0,-2.8,-3.4,-4.0,-4.4,-4.8,-5.0,-5.3]℃., the standardized observation sequence and the forecast sequence are input into a trained model, the reconstructed air temperature sequence [-5.3,-5.7,-6.2,-6.4,-6.3,-6.1,-5.7,-5.0,-4.2,-3.1,-2.0,-0.9,0.1,0.4,0.2,-0.7,-1.9,-2.6,-3.2,-3.9,-4.3,-4.5,-5.0,-5.2]℃. is obtained, the MSE of the original air temperature observation and the reconstructed air temperature is calculated to be 0.0038 and is lower than the loosest threshold value 0.025, and therefore the observation sequence is judged to be normal.
If abnormal data such as sudden change of air temperature caused by sensor fault is encountered, if the observed value of 13 hours in the sequence jumps to 10.2 ℃ abnormally, the air temperature at the moment after reconstruction is still kept at about 0.1 ℃ to generate obvious reconstruction errors. The calculated MSE at this point is about 0.153, exceeding the 0.005 percentile threshold of 0.135, but below the 0.0005 percentile threshold of 0.175, will give a higher stringency anomaly warning.
In the continuous operation stage, the observer can adjust the abnormal judgment threshold value according to actual conditions and experience feedback. For example, in a cold winter region, the temperature change is relatively gentle, the threshold value can be properly lowered to improve the detection sensitivity, and in a region with large temperature daily change in spring and autumn, the threshold value can be properly raised to avoid false alarm. By the dynamic adjustment mechanism and combining with manual experience, the quality control effect of the model under different meteorological conditions is continuously optimized.
Compared with the traditional statistical method, the method can simultaneously consider the interrelationship and time sequence characteristics among multiple variables, and fuse numerical forecast information, so that the accuracy of anomaly detection is remarkably improved. In the practical application test, the detection rate of the artificially introduced abnormal data reaches more than 95%, and the false alarm rate is controlled below 5%, so that the high requirement of the meteorological department on the quality control of the data is met.
In a second aspect of an embodiment of the present invention, there is provided an electronic device, including:
A processor;
A memory for storing processor-executable instructions;
Wherein the processor is configured to invoke the instructions stored in the memory to perform the method described previously.
In a third aspect of embodiments of the present invention, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method as described above.
The present invention may be a method, apparatus, system, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for performing various aspects of the present invention.
It should be noted that the above embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that the technical solution described in the above embodiments may be modified or some or all of the technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the scope of the technical solution of the embodiments of the present invention.