Movatterモバイル変換


[0]ホーム

URL:


CN120561474A - AI-based meteorological station data quality control method integrated with numerical forecast - Google Patents

AI-based meteorological station data quality control method integrated with numerical forecast

Info

Publication number
CN120561474A
CN120561474ACN202511054747.XACN202511054747ACN120561474ACN 120561474 ACN120561474 ACN 120561474ACN 202511054747 ACN202511054747 ACN 202511054747ACN 120561474 ACN120561474 ACN 120561474A
Authority
CN
China
Prior art keywords
feature
physical
data
observation
reconstruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202511054747.XA
Other languages
Chinese (zh)
Other versions
CN120561474B (en
Inventor
王文兰
张方健
张备
姚彬
郑玉
刘嘉华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Institute Of Meteorological Science And Technology Innovation
Original Assignee
Nanjing Institute Of Meteorological Science And Technology Innovation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Institute Of Meteorological Science And Technology InnovationfiledCriticalNanjing Institute Of Meteorological Science And Technology Innovation
Priority to CN202511054747.XApriorityCriticalpatent/CN120561474B/en
Publication of CN120561474ApublicationCriticalpatent/CN120561474A/en
Application grantedgrantedCritical
Publication of CN120561474BpublicationCriticalpatent/CN120561474B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

The invention provides an AI weather station data quality control method integrating numerical forecast, which relates to the technical field of data control and comprises the steps of standardizing multivariate weather observation data and numerical forecast data, constructing an LSTM-AE physical driving model to extract observation and forecast physical characteristics, reconstructing an observation time sequence through an LSTM decoder, training a model by taking the deviation between the reconstructed sequence and the original sequence as an index, evaluating the abnormal state of the data to be detected, determining a quality grade mark, and feeding back an optimization model. The invention realizes the high-precision quality control of the data of the weather station, improves the abnormal detection performance and ensures the reliability of the weather data.

Description

AI weather station data quality control method integrating numerical value forecast
Technical Field
The invention relates to a data control technology, in particular to an AI weather station data quality control method integrating numerical forecasting.
Background
Meteorological observation data is an important basis for weather forecast, climate research and disaster prevention and reduction. Along with the expansion of a meteorological observation network, the observed data volume is increased sharply, and the data quality control becomes a key link in meteorological service. The traditional meteorological data quality control method mainly relies on statistics and physical rules to detect anomalies, such as extremum inspection, internal consistency inspection, time continuity inspection and the like. However, as the complexity of the observation system increases and the data dimension increases, it is difficult for conventional methods to fully exploit the complex physical associations and space-time evolution features between multivariate meteorological elements.
In recent years, with the development of artificial intelligence technology, a deep learning method is gradually applied to the field of meteorological data quality control. Particularly, the cyclic neural network can effectively capture the dynamic characteristics of time series data, and provides a new technical path for meteorological data quality control. Long and short term memory networks (LSTM) are an important tool for processing weather timing data with their excellent timing modeling capabilities. Meanwhile, the self-encoder (AE) structure has the capability of extracting key features from high-dimensional data, and can realize the reduction and reconstruction while retaining the main information of the data.
The existing quality control method is used for splitting the observed data and the numerical prediction data, and the physical background knowledge contained in the numerical prediction result is not fully utilized to guide the quality control process of the observed data, so that the recognition efficiency of abnormal data is low.
The traditional method mostly adopts fixed threshold values and preset rules to detect the abnormality, lacks learning ability of complex nonlinear relations among meteorological elements, is difficult to adapt to meteorological feature differences in different areas and seasons, and is easy to produce misjudgment and missed judgment.
The existing machine learning-based method has the common problem of black box, the model decision process is difficult to explain, and a continuous optimization mechanism is lacked, so that model parameters and detection standards cannot be automatically adjusted along with the accumulation of new data, and the reliability and the practicability of a quality control result are affected.
Disclosure of Invention
The embodiment of the invention provides an AI weather station data quality control method integrating numerical forecasting, which can solve the problems in the prior art.
In a first aspect of the embodiment of the present invention, an AI weather station data quality control method for fusing numerical predictions is provided, including:
Acquiring multivariable meteorological observation data and numerical forecast data of a meteorological station, and carrying out standardization processing on the multivariable meteorological observation data and the numerical forecast data to form an observation time sequence data set and a numerical forecast time sequence data set;
Constructing a physical driving model based on an LSTM-AE structure, extracting physical relations and time sequence evolution features among meteorological elements in the observation time sequence data set through a long-short-time memory network encoder according to the physical driving model, analyzing the observation time sequence data set and the numerical forecasting time sequence data set, and obtaining observation physical features and forecasting physical features;
Based on the LSTM decoder in the physical driving model, processing the observed physical characteristics and the forecast physical characteristics, completing characteristic reconstruction according to the physical laws and the time-space evolution characteristics of meteorological elements, and generating a reconstructed observation time sequence;
Performing time sequence division on the observation time sequence data set to obtain a training set and a verification set, and completing training and parameter optimization of an LSTM codec in the physical driving model by taking deviation of the reconstructed observation time sequence and an original observation time sequence as an evaluation index;
and carrying out abnormal state evaluation on the data to be detected based on the physical driving model, determining a quality grade mark according to the result of the abnormal state evaluation, and feeding back and optimizing the characteristic extraction capacity and the abnormal detection performance of the LSTM-AE structure according to the mark result so as to realize the quality control and continuous optimization of the data of the meteorological station.
Constructing a physical driving model based on an LSTM-AE structure, extracting physical relations and time sequence evolution features among meteorological elements in the observation time sequence data set through a long-short-time memory network encoder according to the physical driving model, analyzing the observation time sequence data set and the numerical forecasting time sequence data set, and obtaining observation physical features and forecasting physical features comprises the following steps:
Calculating observation data of an observation time sequence data set through a long-short-term memory network encoder to obtain an observation mean value vector and an observation variance vector, constructing an observation feature normal distribution based on the observation mean value vector and the observation variance vector, and sampling from the observation feature normal distribution to obtain hidden variables of the observation data;
Introducing a priori normal distribution constraint to network parameters of the physical driving model, and calculating physical association weights of meteorological elements under different time scales through a Bayesian neural network based on the priori normal distribution constraint;
Weighting the hidden variables of the observed data according to the physical association weights to obtain reconstructed observed feature vectors, constructing conditional probability distribution based on the reconstructed observed feature vectors, and sampling for multiple times from the conditional probability distribution to obtain observed physical features;
Inputting the numerical forecasting time sequence data set into the long-short-time memory network encoder to obtain a forecasting mean vector and a forecasting variance vector of the numerical forecasting time sequence data set, constructing forecasting feature normal distribution based on the forecasting mean vector and the forecasting variance vector, sampling hidden variables of forecasting data from the forecasting feature normal distribution, and weighting and reconstructing the hidden variables of the forecasting data based on the physical association weight to obtain forecasting physical features representing the physical relationship of meteorological elements.
Based on the LSTM decoder in the physical driving model to process the observed physical characteristics and the forecast physical characteristics, the method completes characteristic reconstruction according to the physical laws and the time-space evolution characteristics of meteorological elements, and the generating of the reconstructed observation time sequence comprises the following steps:
based on the LSTM decoder, carrying out initial reconstruction on the observed physical characteristics and the forecast physical characteristics, generating initial reconstruction characteristics, calculating reconstruction errors between the initial reconstruction characteristics and an observed time sequence, and combining the reconstruction errors and hidden states to obtain error propagation characteristics;
combining the error propagation characteristic and the initial reconstruction characteristic, calculating to obtain a compensation coefficient, and correcting the initial reconstruction characteristic based on the compensation coefficient to obtain a dynamic error compensation term;
Combining the dynamic error compensation term with the initial reconstruction feature to obtain a corrected reconstruction feature, calculating weather physical association degree between air pressure, temperature, humidity, wind speed and precipitation in the corrected reconstruction feature, and comparing the weather physical association degree with the physical association degree between corresponding weather elements in an observation time sequence to obtain a deviation value of the corrected reconstruction feature and the observation time sequence on a physical rule;
And adjusting forgetting weight, input weight and output weight in the LSTM decoder according to the deviation value, updating a cell state and a hidden state through the adjusted weight, and carrying out a new round of reconstruction on the observed physical characteristic and the forecast physical characteristic based on the updated cell state and the updated hidden state to generate a reconstructed observed time sequence which accords with the physical rule and the space-time evolution characteristic of the meteorological element.
Combining the error propagation feature and the initial reconstruction feature, calculating to obtain a compensation coefficient, and correcting the initial reconstruction feature based on the compensation coefficient to obtain a dynamic error compensation term, wherein the method comprises the following steps:
The method comprises the steps of mapping error propagation characteristics and initial reconstruction characteristics to characteristic spaces with the same dimension to obtain error propagation characteristics after projection and initial reconstruction characteristics after projection, dividing the initial reconstruction characteristics after projection into a plurality of local areas, calculating a mean value and a standard deviation for each local area, constructing local characteristic statistics based on the mean value and the standard deviation, and carrying out combined operation on the local characteristic statistics and the error propagation characteristics after projection to generate local compensation coefficients;
calculating a correlation matrix between the initial reconstruction feature after projection and the error propagation feature after projection, calculating a global feature correlation degree based on the correlation matrix, and generating a global compensation coefficient according to the global feature correlation degree;
performing feature stitching on the initial reconstruction feature after projection and the error propagation feature after projection, generating a weight coefficient according to the result of feature stitching, and performing weighted combination on the local compensation coefficient and the global compensation coefficient based on the weight coefficient to obtain a final compensation coefficient;
And performing element-level multiplication operation on the final compensation coefficient and the projected error propagation feature, wherein each dimension feature of the final compensation coefficient is used as a weight, the weight is weighted with the feature of the dimension corresponding to the projected error propagation feature, the weighted features of the dimensions are combined to obtain a compensation feature, the compensation feature is fused with the initial reconstruction feature, a corrected reconstruction feature is generated, and a dynamic error compensation term is obtained according to the difference value between the corrected reconstruction feature and the initial reconstruction feature.
Performing time sequence division on the observation time sequence data set to obtain a training set and a verification set, and taking the deviation of the reconstructed observation time sequence and the original observation time sequence as an evaluation index comprises the following steps:
Performing feature reconstruction on the observation time sequence data set to obtain a reconstructed observation time sequence, wherein the reconstructed observation time sequence and the original observation time sequence have the same time sequence distribution feature;
And calculating the time sequence deviation of the reconstructed observation time sequence and the original observation time sequence at the corresponding time, and taking the time sequence deviation as an evaluation index for evaluating the reconstruction effect.
Performing abnormal state evaluation on the data to be detected based on the physical driving model, and determining the quality grade mark according to the result of the abnormal state evaluation comprises the following steps:
Inputting data to be detected into a physical driving model to generate a state feature vector, constructing a state transition diagram based on the state feature vector, and generating an initial abnormal state predicted value through matching degree analysis of the state transition diagram and physical constraint conditions;
constructing prior distribution based on the initial abnormal state predicted value, carrying out layered repeated sampling on the prior distribution by adopting a Markov chain Monte Carlo method, carrying out iterative updating on node weight and edge probability of the state transition diagram according to a sampling result, and obtaining parameter posterior distribution of the optimal parameter configuration of the state transition diagram through multiple rounds of iterative optimization;
performing multiple layered sampling on the parameter posterior distribution by using the Markov chain Monte Carlo method, combining the multiple layered sampling result with the optimal parameter configuration to construct a new state transition diagram, and obtaining an abnormal state score by comparing the difference of the new state transition diagram and the original state transition diagram in node weight and edge probability;
Carrying out probability density estimation on node distribution in the new state transition diagram, analyzing fluctuation characteristics of node weights and edge probabilities based on the Markov chain Monte Carlo method, and calculating to obtain uncertainty measurement based on the fluctuation characteristics and confidence intervals of the physical constraint conditions;
Establishing a segmented mapping relation according to the relevance between the numerical distribution of the abnormal state scores and the topological structure of the new state transition diagram, and carrying out self-adaptive adjustment on the threshold value of the segmented mapping relation based on the uncertainty measure to determine a quality grade mark.
Combining the multi-layered sampling result with the optimal parameter configuration to construct a new state transition diagram, and obtaining an abnormal state score by comparing differences between the new state transition diagram and the original state transition diagram in node weight and edge probability comprises:
Constructing a node distribution probability matrix based on the multi-layered sampling result, carrying out convolution operation on the node distribution probability matrix and the optimal parameter configuration to obtain node weights, constructing node distribution of a state transition diagram based on the node distribution probability matrix, carrying out Fourier transform on the node distribution to obtain frequency domain characteristics, optimizing the node distribution probability matrix based on the frequency domain characteristics to obtain edge probabilities, and combining the node weights and the edge probabilities to construct a new state transition diagram;
Performing Fourier transform on the node weights of the new state transition diagram and the original state transition diagram to obtain node weight frequency domain features, calculating the log likelihood ratio of the node weight frequency domain features to obtain node weight difference values, performing Fourier transform on the edge probabilities of the new state transition diagram and the original state transition diagram to obtain edge probability frequency domain features, calculating the log likelihood ratio of the edge probability frequency domain features to obtain edge probability difference values, and performing weighted fusion on the node weight difference values and the edge probability difference values to obtain abnormal state scores.
In a second aspect of an embodiment of the present invention, there is provided an electronic device, including:
A processor;
A memory for storing processor-executable instructions;
Wherein the processor is configured to invoke the instructions stored in the memory to perform the method described previously.
In a third aspect of embodiments of the present invention, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method as described above.
The beneficial effects of the application are as follows:
According to the data quality control method of the AI meteorological station integrating numerical prediction, provided by the invention, the physical relation and time sequence evolution characteristics among meteorological elements can be effectively extracted by constructing the physical driving model based on the LSTM-AE structure, so that the model has stronger anomaly detection capability, and the accuracy of data quality control of the meteorological station is improved.
According to the method, the numerical forecast data is used as external physical driving information to be integrated into the model, so that the understanding capability of the model to the weather physical process is enhanced, the normal weather change and the abnormal data can be distinguished more accurately, the misjudgment rate is reduced, and the reliability and the scientificity of quality control are improved.
The invention realizes a feedback optimization mechanism of a quality control result, continuously optimizes the characteristic extraction capacity and the abnormality detection performance of the LSTM-AE structure by feeding the quality marking result back to the model training process, forms a self-perfected closed-loop system, ensures that the data quality control of the weather station is more intelligent and self-adaptive, and adapts to weather characteristics of different regions and seasons.
Drawings
FIG. 1 is a schematic flow chart of an AI weather station data quality control method integrating numerical forecasting in an embodiment of the invention;
FIG. 2 is a graph of accuracy versus histogram of meteorological data reconstruction based on an LSTM decoder in accordance with an embodiment of the present invention;
FIG. 3 is a graph of accuracy versus histogram for various error compensation methods according to embodiments of the present invention;
FIG. 4 is a flow chart of an abnormal state evaluation based on a physical driving model according to an embodiment of the present invention;
FIG. 5 is a graph showing the accuracy of the abnormal state score according to the embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The technical scheme of the invention is described in detail below by specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.
Fig. 1 is a flow chart of an AI weather station data quality control method integrating numerical forecasting according to an embodiment of the present invention, as shown in fig. 1, the method includes:
Acquiring multivariable meteorological observation data and numerical forecast data of a meteorological station, and carrying out standardization processing on the multivariable meteorological observation data and the numerical forecast data to form an observation time sequence data set and a numerical forecast time sequence data set;
Constructing a physical driving model based on an LSTM-AE structure, extracting physical relations and time sequence evolution features among meteorological elements in the observation time sequence data set through a long-short-time memory network encoder according to the physical driving model, analyzing the observation time sequence data set and the numerical forecasting time sequence data set, and obtaining observation physical features and forecasting physical features;
Based on the LSTM decoder in the physical driving model, processing the observed physical characteristics and the forecast physical characteristics, completing characteristic reconstruction according to the physical laws and the time-space evolution characteristics of meteorological elements, and generating a reconstructed observation time sequence;
Performing time sequence division on the observation time sequence data set to obtain a training set and a verification set, and completing training and parameter optimization of an LSTM codec in the physical driving model by taking deviation of the reconstructed observation time sequence and an original observation time sequence as an evaluation index;
and carrying out abnormal state evaluation on the data to be detected based on the physical driving model, determining a quality grade mark according to the result of the abnormal state evaluation, and feeding back and optimizing the characteristic extraction capacity and the abnormal detection performance of the LSTM-AE structure according to the mark result so as to realize the quality control and continuous optimization of the data of the meteorological station.
Preprocessing input data, and sorting historical observation data of a meteorological observation site into a time sequence data set X, wherein the data dimension is n multiplied by m, n represents the sequence length, and m represents the feature quantity. The characteristic includes meteorological elements such as air temperature, humidity, wind speed, etc., the specific numerical value adopts the normalization processing, unifies the numerical value range to between 0-1. And meanwhile, the numerical forecast data set X' corresponding to the time and the site is arranged, and the same data structure as the observed data is maintained.
The method comprises the steps of designing a two-way encoder structure, wherein a first encoder adopts a 2-layer LSTM network, the number of hidden layer nodes is 128 and 64 respectively, inputting observation data X, and outputting 64-dimensional hidden space representation Z1. The second encoder adopts the same network structure, inputs numerical forecast data X', and outputs 64-dimensional hidden space representation Z2. And splicing the Z1 and the Z2 in the characteristic dimension to obtain the fusion characteristic representation Z with 128 dimensions.
The decoder adopts a 2-layer LSTM network structure, the number of hidden layer nodes is 128 and 256 respectively, the fusion characteristic is input to represent Z, and the reconstructed data with the same dimension as the original observed data is output. In the training process, the mean square error is used as a loss function, an Adam optimizer is used for parameter optimization, the learning rate is set to be 0.001, and the training round number is 100.
In a specific implementation, taking weather site data from 1 month to 12 months in city 2020 as an example, a total of 20 observation sites are involved. The observation data comprises three factors of air temperature, relative humidity and wind speed, and the sampling interval is 1 hour. 80% of the data were used for training and 20% were used for testing. The numerical forecast data is from forecast products of a central weather station, and the forecast time is 24 hours. Through model training and testing, the correlation coefficient between the reconstructed data and the original observed data reaches more than 0.92, and the root mean square error is controlled within 0.15 of the standardized data.
According to the method, the effective fusion of the observed data and the numerical forecast is realized through the double-path encoder structure, and the decoder can accurately reconstruct the original data characteristics. The method avoids subjectivity of manually setting weights in the traditional fusion method, and improves objectivity and accuracy of fusion results.
In an optional implementation manner, constructing a physical driving model based on an LSTM-AE structure, extracting physical relations and time sequence evolution features among meteorological elements in the observation time sequence data set through a long-short-time memory network encoder according to the physical driving model, analyzing the observation time sequence data set and the numerical forecasting time sequence data set, and obtaining observation physical features and forecasting physical features includes:
Calculating observation data of an observation time sequence data set through a long-short-term memory network encoder to obtain an observation mean value vector and an observation variance vector, constructing an observation feature normal distribution based on the observation mean value vector and the observation variance vector, and sampling from the observation feature normal distribution to obtain hidden variables of the observation data;
Introducing a priori normal distribution constraint to network parameters of the physical driving model, and calculating physical association weights of meteorological elements under different time scales through a Bayesian neural network based on the priori normal distribution constraint;
Weighting the hidden variables of the observed data according to the physical association weights to obtain reconstructed observed feature vectors, constructing conditional probability distribution based on the reconstructed observed feature vectors, and sampling for multiple times from the conditional probability distribution to obtain observed physical features;
Inputting the numerical forecasting time sequence data set into the long-short-time memory network encoder to obtain a forecasting mean vector and a forecasting variance vector of the numerical forecasting time sequence data set, constructing forecasting feature normal distribution based on the forecasting mean vector and the forecasting variance vector, sampling hidden variables of forecasting data from the forecasting feature normal distribution, and weighting and reconstructing the hidden variables of the forecasting data based on the physical association weight to obtain forecasting physical features representing the physical relationship of meteorological elements.
The physical driving model of the LSTM-AE structure consists of two parts, an encoder that compresses high-dimensional input data into a low-dimensional potential representation, and a decoder that restores the potential representation to the original data dimension. The model input is an observation time sequence data set and a numerical prediction time sequence data set, wherein the observation time sequence data set and the numerical prediction time sequence data set both comprise meteorological elements such as air pressure, temperature, humidity, wind speed, precipitation and the like, the data shape is [ N, T, F ], wherein N is the number of samples, T is the time sequence length, and F is the number of the meteorological elements. In practical applications, n=1000, t=168 (indicating hour observation data for one week in succession), and f=5 (indicating 5 meteorological elements).
The network structure of the long-short time memory network encoder comprises an input layer, an LSTM layer and a full connection layer. The input layer receives observation time sequence data, the LSTM layer consists of 3 layers of LSTM units stacked, each layer comprises 128 hidden neurons and is used for capturing long-term and short-term dependency relations in the time sequence, and the full-connection layer maps the output of the LSTM layer into two groups of vectors, namely a mean vector and a variance vector, wherein the dimension of each group of vectors is 64. To enhance physical constraints, attention mechanisms are added to the encoder to make the model more focused on the physical associations between different meteorological elements. The attention weight matrix size is [5,5], which represents the attention distribution between every two of the 5 meteorological elements.
The specific step of the encoder processing the observation time series data is to input the standardized observation data into an LSTM layer, the LSTM layer controls information flow through a gating mechanism (comprising an input gate, a forget gate and an output gate), and the time sequence characteristics are captured. Taking the observation data of 7 months 1 to 7 months 7 days of a certain weather station 2022 as an example, the data enters an encoder after standardized processing, the time sequence characteristics are extracted through an LSTM layer, and finally an observation mean vector muobs and an observation variance vector sigma2obs are output, wherein the observation mean vector muobs and the observation variance vector sigma2obs are both 64-dimensional vectors. The first 5 elements of the observed mean vector μobs are [0.32, -0.15,0.48,0.21, -0.37], and the first 5 elements of the observed variance vector σ2obs are [0.02,0.04,0.03,0.05,0.02].
An observation feature normal distribution N (μobs2obs) is constructed based on the observation mean vector μobs and the observation variance vector σ2obs. The distribution represents the probability distribution of the observed data in the potential space, and each dimension obeys an independent normal distribution with a mean value of muobs [ i ] and a variance of sigma2obs [ i ]. The hidden variable zobs of the observed data is sampled from the normal distribution of the observed features, and the sampling process uses a re-parameterization technique, namely zobsobs+ε×sqrt(σ2obs, wherein epsilon is a random sampling value in the normal distribution N (0, 1). For the above case, if the first 5 elements of ε are [0.5, -0.2,0.1, -0.4,0.3], then the first 5 elements of zobs are [0.32+0.5×sqrt(0.02),-0.15+(-0.2)sqrt(0.04),0.48+0.1sqrt(0.03),0.21+(-0.4)sqrt(0.05),-0.37+0.3sqrt(0.02)]=[0.39,-0.19,0.49,0.12,-0.34].
And introducing priori normal distribution constraint to network parameters of the physical driving model, and converting the model into a Bayesian neural network. The weights in the network are no longer determined values but follow a priori distributions. In specific implementation, two parameters of mean and variance are defined for each weight parameter, the mean is initialized to 0, and the variance is initialized to 0.1. The prior distribution selects normal distribution N (0,0.1) which shows that the weight parameter fluctuates around the vicinity of 0, the variance is 0.1, the weight value range is limited, and the overfitting is prevented.
The physical association weights of the meteorological elements under different time scales are calculated through the Bayesian neural network, and three time scales of short term (hour level), medium term (day level) and long term (week level) are considered. The calculation method is to input the observation data with different time scales into the Bayesian network respectively to obtain the corresponding physical association weight matrix. The short-term physical association weight matrix Wshort is 5, which indicates the mutual influence of 5 meteorological elements on the hour scale, the medium-term physical association weight matrix Wmedium is 5, and the long-term physical association weight matrix Wlong is 5, 5.
Taking the short-term physical association weight matrix Wshort as an example, the element Wshort [ i, j ] represents the influence intensity of the ith meteorological element on the jth meteorological element. For example, Wshort [0,1] =0.7 indicates that air pressure (index 0) has a strong influence on temperature (index 1), and Wshort [1,2] =0.6 indicates that temperature (index 1) has a moderate influence on humidity (index 2).
The process of weighting hidden variables of the observed data according to the physical association weights is to combine the physical association weight matrixes of three time scales with the hidden variables zobs to generate the reconstructed observed feature vector. The specific implementation method is that the hidden variable zobs is divided into three subvectors which respectively represent short-term, medium-term and long-term characteristics, and the dimension of each subvector is 64/3 approximately equal to 21 dimensions. And multiplying the three sub-vectors with the corresponding physical association weight matrix respectively, and then splicing the results to obtain a reconstructed observation feature vector robs, wherein the dimension is 64 dimensions. For example, assuming that the first 21 dimensions of zobs represent short-term features, the 21-dimensional features are recombined into a matrix of [5,21/5] = [5,4] (each row representing a meteorological element), and multiplied by Wshort yields weighted short-term features.
A conditional probability distribution p (x|robs) is constructed based on the reconstructed observation feature vector robs, which represents the probability distribution of the original observation data x given the reconstructed feature robs. The implementation mode is that robs is input into a small neural network, and the output is a parameter of conditional distribution. The neural network comprises two fully connected layers, with a hidden layer size of 128. The network output is a conditionally distributed mean vector μcond and variance vector σ2cond, both of which are vectors of the same dimension as the original observed data, i.e., [ T, F ] = [168,5].
The observed physical characteristics are obtained by sampling multiple times from the conditional probability distribution p (x|robs). The number of samples was set to 50, each sample yielding a vector of the same dimension as the original observed data. The specific sampling method is to randomly extract samples from a normal distribution with a mean value of mucond and a variance of sigma2cond. The results of 50 samplings are averaged to obtain the final observed physical characteristic ofeat, and the dimension is [ T, F ] = [168,5]. For the above case, if the condition distribution parameter at a certain time point t=24 (representing the last hour of the first day) is μcond[24]=[1.0,22.5,65.0,3.2,0.0],σ2cond [24] = [0.01,0.25,1.0,0.04,0.0001] (corresponding to the normalized air pressure, temperature, humidity, wind speed and precipitation, respectively), the observed physical characteristic ofeat [24] at that time point is [1.01,22.6,64.8,3.25,0.0], which indicates the physical characteristics of each meteorological element at that time point.
The process of inputting the numerical forecast time series data set into the long and short time memory network encoder is similar to that of the observed data, except that the input data sources are different. The numerical forecast data also includes air pressure, temperature, humidity, wind speed and precipitation, and the data shape is [ N, T, F ]. After the data is processed by the encoder, a forecast mean vector mupred and a forecast variance vector sigma2pred are obtained, and the forecast mean vector mupred and the forecast variance vector sigma2pred are both 64-dimensional vectors.
A predictive feature normal distribution N (μpred2pred) is constructed based on the predictive mean vector μpred and the predictive variance vector σ2pred. The implicit zpred of the forecast data is sampled from this distribution, again using a re-parameterization technique. For example, if the first 5 elements of the forecast mean vector μpred are [0.35, -0.18,0.50,0.25, -0.40], the first 5 elements of the forecast variance vector σ2pred are [0.03,0.05,0.04,0.06,0.03], the first 5 elements of the random sample value ε are [0.4, -0.3,0.2, -0.5,0.2], the first 5 elements of the hidden variable zpred of the forecast data are [0.35+0.4×sqrt(0.03),-0.18+(-0.3)sqrt(0.05),0.50+0.2sqrt(0.04),0.25+(-0.5)sqrt(0.06),-0.40+0.2sqrt(0.03)]=[0.42,-0.25,0.54,0.13,-0.37].
And weighting and reconstructing hidden variables of the forecast data based on the physical association weights to obtain forecast physical characteristics representing the physical relationship of the meteorological elements. The weighting construction process is the same as the observed data, except that the input hidden becomes zpred. Dividing zpred into three sub-vectors, respectively representing short-term, medium-term and long-term characteristics, multiplying the three sub-vectors by corresponding physical association weight matrixes, and then splicing the results to obtain a reconstructed forecast characteristic vector rpred. Based on rpred, a conditional probability distribution p (x|rpred) is constructed, and the distribution is sampled and averaged for a plurality of times to obtain a final forecast physical characteristic pfeat, and the dimension is [ T, F ] = [168,5].
The observed physical characteristics ofeat and the forecast physical characteristics pfeat obtained by the method capture the statistical characteristics of the original data, integrate the physical association relation and time sequence evolution characteristics among meteorological elements, and provide a reliable basis for subsequent characteristic reconstruction and quality control. Experiments show that compared with the traditional method, the physical characteristics extracted by the method can obviously improve the accuracy of anomaly detection while maintaining the data integrity, and particularly has higher identification capability on anomaly data generated by violation of a physical rule.
In an alternative embodiment, based on the LSTM decoder in the physical driving model processing the observed physical feature and the forecast physical feature, feature reconstruction is completed according to a physical rule and a space-time evolution characteristic of a meteorological element, and generating a reconstructed observation time sequence includes:
based on the LSTM decoder, carrying out initial reconstruction on the observed physical characteristics and the forecast physical characteristics, generating initial reconstruction characteristics, calculating reconstruction errors between the initial reconstruction characteristics and an observed time sequence, and combining the reconstruction errors and hidden states to obtain error propagation characteristics;
combining the error propagation characteristic and the initial reconstruction characteristic, calculating to obtain a compensation coefficient, and correcting the initial reconstruction characteristic based on the compensation coefficient to obtain a dynamic error compensation term;
Combining the dynamic error compensation term with the initial reconstruction feature to obtain a corrected reconstruction feature, calculating weather physical association degree between air pressure, temperature, humidity, wind speed and precipitation in the corrected reconstruction feature, and comparing the weather physical association degree with the physical association degree between corresponding weather elements in an observation time sequence to obtain a deviation value of the corrected reconstruction feature and the observation time sequence on a physical rule;
And adjusting forgetting weight, input weight and output weight in the LSTM decoder according to the deviation value, updating a cell state and a hidden state through the adjusted weight, and carrying out a new round of reconstruction on the observed physical characteristic and the forecast physical characteristic based on the updated cell state and the updated hidden state to generate a reconstructed observed time sequence which accords with the physical rule and the space-time evolution characteristic of the meteorological element.
The LSTM decoder receives as input observed physical features and predicted physical features, both of which are 256-dimensional vectors representing key physical information extracted from the observed data and the predicted data, respectively. The decoder structure consists of an input gate, a forgetting gate, an output gate and a cell state, wherein forgetting weight Wf is set to be 0.3, input weight Wi is set to be 0.4, and output weight Wo is set to be 0.3 in an initialization stage. The LSTM decoder first concatenates the observed and predicted physical features to form a 512-dimensional input vector, and then processes the input vector through a gating mechanism to generate an initial reconstructed feature.
After receiving the input vector, the decoder obtains an intermediate state vector by transforming an internal parameter matrix. The cell state Ct is initially set to be an all-zero vector with a dimension of 128, and the hidden state ht is also initially set to be an all-zero vector with a dimension of 128. The forgetting gate controls the retention proportion of the cell state at the previous moment through Wf, the input gate determines the receiving degree of the current input information through Wi, and the output gate controls the information quantity of the cell state output to the hidden state through Wo. And the decoding process is carried out iteratively, the cell state and the hidden state are updated in each time step, the initial reconstruction characteristic is finally output, and the dimension is the dimension of the original input time sequence, namely 5 meteorological elements including air pressure, temperature, humidity, wind speed and precipitation.
Taking data of a certain meteorological site as an example, the original observation data is observation values of 5 months 1 day to 5 months 5 days once per hour, and the total of 120 time points are meteorological element data. The observation physical characteristics are obtained after the processing of the LSTM encoder, and the numerical forecast data of the same time period are processed by the same encoder to obtain the forecast physical characteristics. After the two features are input into the decoder, an initial reconstructed feature is generated, the shape of which is identical to that of the original observed data, and the initial reconstructed feature is [120,5] which represents 5 meteorological element values at 120 time points.
And when calculating the reconstruction error between the initial reconstruction characteristic and the observation time sequence, adopting a mean square error index. For each meteorological element j at each time point t, the square of the difference between the reconstructed value and the observed value is calculated, and then the average value is taken for all time points and the meteorological elements to obtain the total reconstruction error. In the above case, the average reconstruction error of the initial reconstruction process was 0.08, indicating an average deviation of 8% of the reconstruction values from the observed values.
The reconstruction error and the hidden state are combined to obtain an error propagation characteristic, the reconstruction error is expressed as a matrix E and has a shape of [120,5], and the hidden state ht of the final time step is expressed as a vector H and has a dimension of 128. E is converted into a vector E 'with the same dimension as H through the full connection layer, and then E' is spliced with H to form 256-dimensional error propagation characteristics EP. This feature captures the error distribution during reconstruction and the internal state information of the LSTM decoder, providing a basis for subsequent correction.
The method comprises the steps of combining an error propagation feature EP and an initial reconstruction feature IF, calculating a compensation coefficient, mapping the EP to a dimensional space which is the same as the IF through a full connection layer to obtain a mapped EP', calculating correlation coefficients of each dimension of the EP and the IF to form a correlation matrix CM, selecting a dimension pair with obvious correlation (a dimension pair with an absolute value of the correlation coefficient being greater than 0.6) based on the CM, calculating covariance of the dimension pairs and normalizing to obtain a preliminary compensation coefficient, and finally carrying out weighted adjustment on the preliminary compensation coefficient according to the confidence coefficient of the error propagation feature (through stability assessment of a hidden state) to obtain a final compensation coefficient matrixC, wherein the shape of the final compensation coefficient matrix is the same as that of the initial reconstruction feature and is [120,5].
In the actual case, for temperature data at 14 days 5 months 3, the initial reconstruction value is 28.6 ℃, while the actual observation value is 27.2 ℃, and the reconstruction error is 1.4 ℃. The compensation coefficient of the temperature dimension of the time point is-0.42, which indicates that the initial reconstruction value needs to be negatively adjusted.
Correcting the initial reconstruction characteristic based on the compensation coefficient to obtain a dynamic error compensation term, performing element-level multiplication on the compensation coefficient matrixC and the mapping EP' of the error propagation characteristic to obtain a basic compensation value, setting weight factors such as temperature and air pressure weight factors of 0.8, humidity of 0.7, wind speed of 0.5 and precipitation of 0.6 according to the physical characteristics of different meteorological elements, and multiplying the basic compensation value and the corresponding weight factors to obtain a final dynamic error compensation term DC with the shape of [120,5]. For the temperature data in the above case, the calculated dynamic error compensation term is-1.1 ℃.
The dynamic error compensation term DC is added to the initial reconstruction feature IF to obtain a corrected reconstruction feature CF. For the above case, the corrected temperature value was 28.6 + (-1.1) =27.5 ℃, which is closer to the actual observed value 27.2 ℃ than the initial reconstruction value 28.6 ℃, reducing the reconstruction error from 1.4 ℃ to 0.3 ℃.
When the physical association degree between meteorological elements in the corrected reconstruction feature CF is calculated, the change rate correlation of the two is calculated for the air pressure-temperature relationship, the physical consistency of the temperature-humidity relationship is checked based on a dew point temperature formula, whether the relationship between the air pressure gradient and the wind speed accords with the meteorological law is checked for the air pressure-wind speed relationship, and whether the temperature change before and after precipitation accords with the common meteorological phenomenon is checked for the temperature-precipitation relationship. Through the physical rule tests, a physical association degree matrix PMCF between each meteorological element pair is obtained, the shape is [5,5], and the physical association degree between every two meteorological elements is represented.
The same method calculates a physical association matrix PMOBS between corresponding meteorological elements in the observation time sequence, compares PMCF with PMOBS, calculates a difference value between the two, and obtains a deviation value matrixD on a physical rule. The smaller the deviation value is, the more the reconstruction result accords with the actual weather physical rule. In the above case, the deviation value of the corrected reconstruction feature and the observed data in the temperature-humidity relationship is 0.05, which indicates that the physical correlation of the two is very close.
The weights in the LSTM decoder are adjusted according to the deviation value matrixD, and the weights corresponding to the weather elements with larger deviation need to be adjusted in a larger range. The specific adjustment mode is that the average deviation value of each meteorological element in the deviation matrix is calculated, and the deviation value is converted into a weight adjustment factor. For example, an average deviation of the temperature elements of 0.07 corresponds to an adjustment factor of-0.03, indicating that a slight decrease in weight associated with temperature is required, while an average deviation of the precipitation amount of 0.15 corresponds to an adjustment factor of-0.08, indicating that a significant decrease in weight associated with precipitation amount is required.
And adding the original weight and the adjustment factor to obtain an adjusted weight. For example, for a forgetting weight portion related to temperature processing, the original value is 0.3, after adjustment, becomes 0.3+ (-0.03) =0.27, and for an input weight portion related to precipitation processing, the original value is 0.4, after adjustment, becomes 0.4+ (-0.08) =0.32. The weight adjustment method based on the physical deviation can improve the reconstruction accuracy of specific meteorological elements in a targeted manner.
The process of updating the cell state and the hidden state through the adjusted weight comprises the steps of controlling the forgetting degree of the cell state at the previous moment by using the new forgetting weight, controlling the receiving degree of the current input information by using the new input weight, and controlling the output content of the hidden state by using the new output weight. The updated formula remains unchanged, but the parameter values change, thereby affecting the information circulation mode and the reconstruction result.
And carrying out a new round of reconstruction on the observed physical characteristics and the forecast physical characteristics based on the updated cell state and the hidden state, and generating a final reconstructed observed time sequence. In a new round of reconstruction, the same decoder structure and procedure as the initial reconstruction is used, but with the adjusted weight parameters. Experimental results show that after weight adjustment and new round of reconstruction, the reconstruction accuracy of the meteorological elements is obviously improved, and the average reconstruction error is reduced from initial 0.08 to 0.03. In particular, the temperature reconstruction error is reduced by 78%, the humidity is reduced by 72%, the air pressure is reduced by 85%, the wind speed is reduced by 65%, and the precipitation is reduced by 58% by all elements.
The finally generated reconstructed observation time sequence is not only close to the original observation data in value, but also is more important to keep high consistency with the real observation data in physical relevance among meteorological elements, so that the reconstruction result is ensured to accord with the physical law and the space-time evolution characteristic in the meteorology, and a reliable basis is provided for the quality control of the subsequent meteorological data.
FIG. 2 is a graph comparing accuracy of meteorological data reconstruction with a histogram based on an LSTM decoder, showing performance of three prediction models (conventional LSTM reconstruction, dynamic adaptive error compensation reconstruction, physical rule constraint reconstruction) on five meteorological elements. The traditional LSTM reconstruction represents a prediction model of a basic long-short term memory network, the temperature prediction precision is 76.2%, the humidity prediction is 72.5%, the air pressure prediction is 81.3%, the air speed prediction is 61.8%, the precipitation prediction is 57.4% at the minimum, the dynamic error correction mechanism-introduced model is represented by dynamic error correction reconstruction, compared with the traditional model, the temperature prediction is 83.5%, the humidity prediction is 78.3%, the air pressure prediction is 88.2%, the air speed prediction is 70.6%, the precipitation prediction is 68.1%, the physical law constraint reconstruction represents an optimization model combined with physical law constraint, the optimal effect is achieved on all indexes, the temperature prediction precision is 91.8%, the humidity prediction is 83.7%, the air pressure prediction is 96.5% at the maximum, the air speed prediction is 77.2%, and even the most challenging precipitation prediction is 74.3%. The overall data shows that the model introducing the physical rule constraint improves 15-20 percentage points on average compared with the traditional LSTM model, and the obvious effect of the physical rule constraint in improving the weather element prediction accuracy is fully verified.
In an alternative embodiment, combining the error propagation feature and the initial reconstruction feature, calculating a compensation coefficient, and correcting the initial reconstruction feature based on the compensation coefficient to obtain a dynamic error compensation term includes:
The method comprises the steps of mapping error propagation characteristics and initial reconstruction characteristics to characteristic spaces with the same dimension to obtain error propagation characteristics after projection and initial reconstruction characteristics after projection, dividing the initial reconstruction characteristics after projection into a plurality of local areas, calculating a mean value and a standard deviation for each local area, constructing local characteristic statistics based on the mean value and the standard deviation, and carrying out combined operation on the local characteristic statistics and the error propagation characteristics after projection to generate local compensation coefficients;
calculating a correlation matrix between the initial reconstruction feature after projection and the error propagation feature after projection, calculating a global feature correlation degree based on the correlation matrix, and generating a global compensation coefficient according to the global feature correlation degree;
performing feature stitching on the initial reconstruction feature after projection and the error propagation feature after projection, generating a weight coefficient according to the result of feature stitching, and performing weighted combination on the local compensation coefficient and the global compensation coefficient based on the weight coefficient to obtain a final compensation coefficient;
And performing element-level multiplication operation on the final compensation coefficient and the projected error propagation feature, wherein each dimension feature of the final compensation coefficient is used as a weight, the weight is weighted with the feature of the dimension corresponding to the projected error propagation feature, the weighted features of the dimensions are combined to obtain a compensation feature, the compensation feature is fused with the initial reconstruction feature, a corrected reconstruction feature is generated, and a dynamic error compensation term is obtained according to the difference value between the corrected reconstruction feature and the initial reconstruction feature.
The combined processing of the error propagation feature and the initial reconstruction feature first requires mapping both to feature spaces of the same dimension. Assuming that the original dimension of the error propagation feature EP is 64 dimensions, the original dimension of the initial reconstruction feature IR is 128 dimensions, and both are mapped to 96-dimensional feature space by the fully connected layer, respectively. In a specific implementation, the error propagation characteristic EP is linearly transformed by using a parameter matrix WEP (with the size of 64×96) to obtain the error propagation characteristic EP 'after projection, and similarly, the initial reconstruction characteristic IR is linearly transformed by using a parameter matrix WIR (with the size of 128×96) to obtain the initial reconstruction characteristic IR' after projection. In the transformation process, elements of each parameter matrix are obtained through optimization by a gradient descent method, so that the transformed features can retain main information of original features.
The projected initial reconstructed feature IR' needs to be divided into a plurality of local regions to capture the error characteristics at different locations. In practice, 96-dimensional IR' features are divided into 8 local regions, each region containing 12 consecutive feature dimensions. For the i-th local region IRi' (i=1, 2..8), the mean μi and standard deviation σi of all feature values in that region are calculated. For example, for the 1 st local region IR1 '(containing the 1 st to 12-dimensional features of IR'), the mean μ1 thereof is the arithmetic average of the 12 feature values, and the standard deviation σ1 is a measure of the degree of dispersion of the 12 feature values.
And constructing local characteristic statistics LSi based on the calculated mean mu i and standard deviation sigma i. For the i-th local region, its statistic LSi is a tuple (μi, σi). Taking actual data as an example, if the average μ1=0.32 and the standard deviation σ1=0.15 of IR1' in a certain calculation, the corresponding local feature statistic ls1= (0.32,0.15). And combining the local characteristic statistics LSi with a corresponding region EPi 'in the error propagation characteristics EP' after projection. The combination mode is that the mean value mu EPi of the EPi 'is compared with mu i in the LSi to calculate the relative deviation, and the standard deviation sigma EPi of the EPi' is compared with sigma i in the LSi to calculate the fluctuation consistency. By comparing the two results, a local compensation coefficient LCi is generated. Typically, if μepi is close to μi, σepi differs significantly from σi, indicating that this region requires greater compensation adjustment.
A correlation matrix CM between the initial reconstructed features IR 'after projection and the error propagation features EP' after projection is calculated. The correlation matrix CM has a size of 96 x 96, where the element CM j, k represents the degree of correlation between the j-th dimensional feature of IR 'and the k-th dimensional feature of EP'. The correlation degree is obtained by calculating covariance of the two features on the training data set and normalizing, and the range of the correlation degree is [ -1,1]. The closer the correlation is to 1, the stronger the positive correlation, the closer to-1, the stronger the negative correlation, and the closer to 0, the no obvious correlation.
Global feature correlations GR are calculated based on the correlation matrix CM. The global feature association GR is obtained by performing statistical analysis on elements with larger absolute values in CM (e.g., |cm [ j, k ] | > 0.5). Specifically, the ratio of the number of elements satisfying the condition to the total number of elements, and the average absolute value of these elements are calculated. In practical applications, if the element ratio satisfying |cm [ j, k ] | >0.5 is 30%, and the average absolute value of these elements is 0.72, then a moderate global correlation exists between the two features.
And generating a global compensation coefficient GC according to the global feature association degree GR. A larger global compensation factor (e.g., gc=0.8) is set when GR shows a stronger association (e.g., association element ratio >40% and average absolute value > 0.6), a medium global compensation factor (e.g., gc=0.5) is set when GR shows a medium association, and a smaller global compensation factor (e.g., gc=0.2) is set when GR shows a weaker association. The global compensation coefficient GC is a 96-dimensional vector, each dimension corresponding to one dimension in the feature space.
The initial reconstructed feature IR 'after projection is feature stitched with the error propagation feature EP' after projection. The splicing mode is direct connection, i.e. IR 'and EP' are connected end to end, so as to form a 192-dimensional splicing characteristic CF. The characteristic splicing example is that if the IR 'front three-dimension is [0.2,0.4, -0.1], the EP' front three-dimension is [0.3,0.1, -0.2], the spliced CF front six-dimension is [0.2,0.4, -0.1,0.3,0.1, -0.2].
And generating a weight coefficient WC according to the characteristic splicing result CF. The weight coefficient WC is calculated by a simplified neural network comprising a hidden layer (192-32) and an output layer (32-1). The network input is CF, the output is a scalar value, ranging between [0,1 ]. This value represents the weight of the local compensation coefficient in the final compensation coefficient calculation, while the weight of the global compensation coefficient is 1 minus this value.
And carrying out weighted combination on the local compensation coefficient LC and the global compensation coefficient GC based on the weight coefficient WC to obtain a final compensation coefficient FC. The calculation was fc=wc×lc+ (1-WC) ×gc. Where LC is a 96-dimensional vector formed by connecting compensation coefficients LCi of 8 local areas, GC is the 96-dimensional global compensation coefficient calculated previously. In practical cases, if wc=0.6, lc=0.75, gc=0.45 in a certain dimension, fc=0.6×0.75+0.4×0.45=0.45+0.18=0.63 in that dimension.
And performing element-level multiplication operation on the final compensation coefficient FC and the error propagation characteristic EP' after projection. The specific operation is that, for the j-th dimension characteristic EP ' j of EP ', the j-th dimension coefficient FC [ j ] of FC is multiplied to obtain a weighted characteristic value WF [ j ] =EP ' [ j ] ×FC [ j ]. The weighted eigenvalues of all dimensions together constitute a 96-dimensional compensation characteristic CF. For example, if EP' [10] =0.42, fc [10] =0.63, WF [10] =0.42×0.63= 0.2646.
The compensation feature CF is fused with the initial reconstruction feature IR to generate a corrected reconstruction feature CR. Since the dimension of CF is 96 and the dimension of IR is 128, it is necessary to map CF back to the feature space of IR first. With the fully connected layer, the parameter matrix WCF (96×128 in size) converts CF into 128-dimensional CF'. Then, CF 'is added to IR, cr=ir+cf'. The addition, rather than multiplication, is employed here to preserve the body structure of the original reconstructed feature while targeted adjustments are made to specific dimensions.
And calculating a dynamic error compensation term DC according to the difference value between the corrected reconstruction feature CR and the initial reconstruction feature IR. The calculation is dc=cr-ir=cf'. The dynamic error compensation term DC is also a 128-dimensional vector representing the amount of adjustment made to each dimension of the initial reconstruction feature.
In a specific application case, in a certain image reconstruction task, the 57 th-dimensional eigenvalue of IR is originally 0.835, which represents the intensity of a certain texture feature in an image. After the compensation calculation, the corresponding dynamic error compensation term DC [57] =0.127, and the corrected reconstructed feature CR [57] =0.962. The actual verification shows that the corrected characteristic value reflects the texture characteristics of the original image more accurately, and errors introduced in the characteristic extraction process are reduced, so that the quality of the reconstructed image is improved. The reconstruction error on the test dataset was reduced from 0.085 on average to 0.042, improving the accuracy by about 50%.
The introduction of the dynamic error compensation term effectively relieves the problem of information loss in the characteristic reconstruction process, and has obvious effect on processing high-frequency details and complex texture areas. By comprehensively analyzing local and global features, the method can provide a self-adaptive compensation strategy aiming at different types of errors, and the accuracy and stability of a reconstruction result are obviously improved.
FIG. 3 is a graph of accuracy versus histogram for different error compensation methods according to an embodiment of the present invention, showing performance comparisons of three different prediction methods in a meteorological element prediction scenario. The traditional reconstruction method represents a basic model adopting a conventional data reconstruction technology, the performance is relatively conservative, 67.0% of the temperature prediction scene is achieved, 55.0% of the precipitation prediction scene is achieved, 62.0% of the wind speed prediction scene is achieved, the local compensation method represents an algorithm introducing a local data correction mechanism, the performance is obviously improved, the temperature prediction accuracy is improved to 76.0%, the precipitation prediction is achieved to 66.0%, the wind speed prediction is improved to 72.0%, the technical scheme represents an innovative comprehensive optimization algorithm, the optimal performance is shown in all test scenes, the accuracy of the temperature prediction scene is up to 90.0%, the precipitation prediction scene is up to 81.0%, and the wind speed prediction scene is up to 87.0%. From the overall data, the technical scheme improves the performance of about 20-25 percent compared with the traditional reconstruction method and 10-15 percent compared with the local compensation method, and fully proves the remarkable effect of the scheme on the aspect of improving the meteorological element prediction precision. In particular, in a precipitation scene which is difficult to predict, the technical scheme still maintains higher prediction accuracy, and shows the robustness and adaptability of the method in complex meteorological element prediction tasks.
In an optional implementation manner, performing time sequence division on the observation time sequence data set to obtain a training set and a verification set, and taking the deviation of the reconstructed observation time sequence and the original observation time sequence as an evaluation index includes:
Performing feature reconstruction on the observation time sequence data set to obtain a reconstructed observation time sequence, wherein the reconstructed observation time sequence and the original observation time sequence have the same time sequence distribution feature;
And calculating the time sequence deviation of the reconstructed observation time sequence and the original observation time sequence at the corresponding time, and taking the time sequence deviation as an evaluation index for evaluating the reconstruction effect.
The observation time series data set is divided in time sequence, for example, the first 800 time point data of the observation time series data set with the total length of 1000 time points are divided into training sets, and the last 200 time point data are divided into verification sets. The division method maintains the consistency and time sequence characteristics of the time sequence, and is beneficial to the learning of the model on the time sequence characteristics.
The implementation of feature reconstruction for the observation time series dataset may be performed using a self-encoder model. The self-encoder comprises two parts, an encoder and a decoder, the encoder maps the original observed time sequence to a potential spatial representation, and the decoder reconstructs the potential spatial representation into a sequence having the same timing distribution characteristics as the original observed time sequence. Specifically, for a certain observation time series sample X, which contains data for n time points, the encoder maps X to a potential representation Z, which the decoder then maps to a reconstructed sequence X'. In practical applications, for example, in a certain power load prediction scenario, the original observed time sequence is power load data of every hour in a week, namely 168 data points, after the self-encoder performs feature reconstruction, the obtained reconstructed time sequence should also include 168 data points, and the time sequence distribution features such as periodicity, trend and the like of the original sequence are maintained.
In order to ensure that the reconstructed observation time sequence has the same time sequence distribution characteristics as the original observation time sequence, a mean square error can be used as a loss function when the self-encoder model is trained, and a time sequence correlation constraint term can be added into the loss function. The constraint term can be realized by calculating the autocorrelation coefficient difference of the original sequence and the reconstructed sequence, so that the reconstructed sequence is ensured to maintain the time sequence dependency relationship of the original sequence. For example, in practical application, temperature observation data of a weather monitoring station has obvious characteristics of daily fluctuation and seasonal variation, and a temperature sequence after characteristic reconstruction should also show the time sequence characteristics.
After the feature reconstruction is completed, a time sequence deviation between the reconstructed observation time sequence and the original observation time sequence at a corresponding time is required to be calculated, and the time sequence deviation is used as an evaluation index. The timing deviation may be calculated by absolute or relative error of the difference value of the corresponding time values. Specifically, assuming that the original observation time series is x= (X1,x2...xn) and the reconstructed observation time series is X '= (X'1,x'2...x'n), the absolute deviation |xi-x'i | or the relative deviation |xi-x'i|/|xi | of each time point can be calculated.
In order to fully evaluate the reconstruction effect, various statistical indicators may be calculated, including mean absolute deviation, root mean square deviation, maximum deviation, etc. For example, for time series data generated by an industrial sensor, when the time value of the original sequence is 20.5 and the time value corresponding to the reconstructed sequence is 19.8, the absolute deviation is 0.7, and the relative deviation is 0.034 or 3.4%. By calculating the average absolute deviation of the whole sequence, a value of e.g. 0.65 can be obtained, representing the average degree of deviation of the reconstructed sequence from the original sequence.
In practical applications, the timing deviation evaluation index may be further refined into deviations on different time scales, such as short-term deviation (hour level), medium-term deviation (day level), and long-term deviation (month level). This is particularly important for time series with multi-time scale features. Taking sales data of a retail enterprise as an example, the sales data has fluctuation characteristics in daily life, weekend effects and seasonal changes, and the capability of the reconstruction model in capturing multi-scale time sequence characteristics can be comprehensively estimated by calculating deviations on different time scales.
And evaluating whether the reconstructed sequence maintains the time sequence dependent structure of the original sequence by utilizing an autocorrelation function comparison. For example, for a time series of profitability of a certain financial market, the autocorrelation coefficients of the original sequence and the reconstructed sequence can be calculated, and the similarity of the two in statistical properties can be checked in comparison.
Visualization techniques are also an effective tool in the evaluation process. By drawing the original sequence and the reconstructed sequence in the same coordinate system, the fitting degree and deviation of the original sequence and the reconstructed sequence in different time periods can be intuitively observed. For example, for an air quality index sequence of an environmental monitoring station, the performance of the reconstruction model in terms of outlier capture and mutation point identification can be found through a time sequence curve comparison graph.
Based on the time sequence deviation evaluation index, different characteristic reconstruction methods can be compared and optimized. In an intelligent traffic flow prediction system, different self-encoder structures (such as a variation self-encoder, a recurrent neural network self-encoder and the like) can be tried, and the most suitable reconstruction method is selected according to a time sequence deviation index, so that high-precision reconstruction of a traffic flow time sequence is realized, and a reliable basis is provided for subsequent abnormality detection and prediction tasks.
In an alternative embodiment, performing an abnormal state evaluation on the data to be detected based on the physical driving model, and determining the quality grade mark according to the result of the abnormal state evaluation includes:
Inputting data to be detected into a physical driving model to generate a state feature vector, constructing a state transition diagram based on the state feature vector, and generating an initial abnormal state predicted value through matching degree analysis of the state transition diagram and physical constraint conditions;
constructing prior distribution based on the initial abnormal state predicted value, carrying out layered repeated sampling on the prior distribution by adopting a Markov chain Monte Carlo method, carrying out iterative updating on node weight and edge probability of the state transition diagram according to a sampling result, and obtaining parameter posterior distribution of the optimal parameter configuration of the state transition diagram through multiple rounds of iterative optimization;
performing multiple layered sampling on the parameter posterior distribution by using the Markov chain Monte Carlo method, combining the multiple layered sampling result with the optimal parameter configuration to construct a new state transition diagram, and obtaining an abnormal state score by comparing the difference of the new state transition diagram and the original state transition diagram in node weight and edge probability;
Carrying out probability density estimation on node distribution in the new state transition diagram, analyzing fluctuation characteristics of node weights and edge probabilities based on the Markov chain Monte Carlo method, and calculating to obtain uncertainty measurement based on the fluctuation characteristics and confidence intervals of the physical constraint conditions;
Establishing a segmented mapping relation according to the relevance between the numerical distribution of the abnormal state scores and the topological structure of the new state transition diagram, and carrying out self-adaptive adjustment on the threshold value of the segmented mapping relation based on the uncertainty measure to determine a quality grade mark.
As shown in fig. 4, the method includes:
After the detected data is input into the physical driving model, the LSTM encoder is used for extracting the characteristics to obtain the state characteristic vector. The state feature vector is a 64-dimensional vector, each dimension representing a physical correlation characteristic between meteorological elements. Taking four meteorological elements of temperature, humidity, air pressure and wind speed as an example, the dimension of the original data of the input model is [ n,4], wherein n is the length of a time sequence, and the state characteristic vector obtained after being processed by the encoder is a [64] dimensional vector.
A state transition graph is constructed based on the obtained state feature vectors, the graph comprising a plurality of nodes and connection edges. Nodes represent meteorological states and edges represent transition probabilities between states. In concrete implementation, 16 most obvious dimensions in the feature vector are selected, and the feature vector is divided into 8 representative state nodes through a K-means clustering algorithm. The initial weight of each node is determined by the ratio of the corresponding number of samples to the total samples. The edge weight is calculated by the state transition frequency of the adjacent time steps to form an initial state transition matrix T, and the dimension is [8,8].
And then carrying out matching degree analysis on the state transition diagram and the predefined physical constraint condition. Physical constraints include a rule set with a daily temperature variation of no more than 15 ℃, a relative humidity of between 0% and 100%, a rate of change of air pressure of no more than 3 hPa/hour, etc., containing a total of 12 constraint rules. And obtaining an initial abnormal state predicted value by calculating the coincidence degree of each node state and the physical constraint condition in the state transition diagram, wherein the predicted value is a scalar between 0 and 1, 0 represents complete coincidence with the physical rule, and 1 represents complete abnormality.
The prior distribution is constructed based on the initial abnormal state prediction value. The prior distribution adopts Beta distribution, and the parameters alpha and Beta are set based on the known proportion of normal samples to abnormal samples in the historical data. In practical applications, the parameter α=3, β=12 is an empirical value, biasing the distribution towards a normal sample.
The prior distribution is repeatedly sampled in a layering mode by adopting a Markov chain Monte Carlo method, and 500 sample points are extracted in each round. And for each sample point, updating the node weight and the edge probability according to the topological structure of the state transition diagram. The updating rule is adjusted according to the Bayesian posterior probability calculation method by combining the consistency degree of the current sample and the historical observation data. Specifically, if the state represented by a certain node accords with the weather rules of the same time period and the same region of the history, the node weight is increased, and if the state change represented by a certain transition edge exceeds a physical reasonable range, the edge probability is reduced.
And gradually converging the node weight and the edge probability of the state transition diagram through 10 rounds of iterative optimization, and finally obtaining the parameter posterior distribution of the optimal parameter configuration. The posterior distribution reflects the possibility of the values of different parameters, and provides stable and reliable parameter estimation for the state transition diagram.
And carrying out 200 times of hierarchical sampling on the posterior distribution of the parameters by using a Markov chain Monte Carlo method, and generating a group of parameter configurations based on the posterior distribution each time of sampling. Combining these parameter configurations with the optimal parameter configuration to construct a new state transition diagram. The new state transition diagram also comprises 8 nodes, but the node weights and the edge probabilities are optimized and adjusted, so that the distribution characteristics of actual meteorological data are more met.
And obtaining the abnormal state score by comparing the difference of the new state transition diagram and the original state transition diagram in node weight and edge probability. The difference calculation adopts a relative entropy method, namely KL divergence between two distributions. If the difference between the two graphs is larger, the difference indicates that the original graph and the optimized graph have obvious deviation, and the corresponding data have abnormality, and if the difference is smaller, the data conform to the physical rule. In practical applications, KL divergence values below 0.05 are generally considered normal, with a range of 0.05-0.15 being slightly abnormal, and above 0.15 being marked as significantly abnormal.
And (3) estimating the probability density of node distribution in the new state transition diagram, and calculating the bandwidth parameter h=0.08 by using a Gaussian kernel function by adopting a kernel density estimation method. The estimation result reflects the distribution condition of the weights of all the nodes, the peak value represents the weight value, and the distribution width reflects the uncertainty degree.
And analyzing fluctuation characteristics of node weights and edge probabilities based on a Markov chain Monte Carlo method. And calculating a standard deviation through 200 sampling results to form an uncertainty index. In the analysis of fluctuation characteristics, it is found that the standard deviation of the node weights corresponding to normal data is usually below 0.02, while abnormal data corresponds to higher fluctuation.
And calculating by combining the confidence interval of the physical constraint condition to obtain the comprehensive uncertainty measure. Confidence intervals are statistically derived based on historical data, such as 95% confidence intervals for temperature daily ranges [ -10, +8 ]. If the data falls into the interval, the certainty score is increased, otherwise, the uncertainty measure is increased. The final uncertainty metric value range is 0-1, where 0 represents a complete determination and 1 represents a very high uncertainty.
And establishing a segmentation mapping relation according to the relevance between the numerical distribution of the abnormal state scores and the topological structure of the new state transition diagram. The mapping relationship divides the abnormal state score into a plurality of intervals, each interval corresponding to a quality level. The dividing interval is as follows, the score is 0-0.05 corresponding to 'high quality' data, 0.05-0.15 corresponding to 'good' data, 0.15-0.3 corresponding to 'suspicious' data, 0.3-0.5 corresponding to 'error' data, and more than 0.5 corresponding to 'serious error' data.
The threshold value of the segment mapping relation is adaptively adjusted based on the uncertainty measure. When the uncertainty measure is higher (e.g., above 0.4), the threshold boundary adjusts 10% downward, making the system more prone to marking the data as a lower quality level, and when the uncertainty measure is lower (e.g., below 0.1), the threshold boundary adjusts 5% upward, allowing for a more relaxed decision criterion. Such an adaptive adjustment mechanism can dynamically optimize the quality assessment criteria based on the degree of certainty of the data.
And finally determining the quality grade mark of the data to be detected through the processing steps. The marking result contains two pieces of information, i.e. "good (0.92)" indicating that the data is rated as good and the confidence of the decision is 92%. The quality grade marking result is fed back to the LSTM-AE model and is used for subsequent model training and optimization to form a closed-loop optimization mechanism.
In the practical application case, a set of temperature data reported from a meteorological site is shown to dip from 23 ℃ to 5 ℃ within 1 hour, and the system finds that the change results in an abnormal state score of 0.42 through state transition diagram analysis, the uncertainty measure is 0.18, and finally the error (0.86) is marked, so that the abnormal data is caused by sensor faults or data transmission errors.
In an alternative embodiment, combining the result of the multiple hierarchical sampling with the optimal parameter configuration to construct a new state transition diagram, and obtaining the abnormal state score by comparing the difference between the new state transition diagram and the original state transition diagram in node weight and edge probability includes:
Constructing a node distribution probability matrix based on the multi-layered sampling result, carrying out convolution operation on the node distribution probability matrix and the optimal parameter configuration to obtain node weights, constructing node distribution of a state transition diagram based on the node distribution probability matrix, carrying out Fourier transform on the node distribution to obtain frequency domain characteristics, optimizing the node distribution probability matrix based on the frequency domain characteristics to obtain edge probabilities, and combining the node weights and the edge probabilities to construct a new state transition diagram;
Performing Fourier transform on the node weights of the new state transition diagram and the original state transition diagram to obtain node weight frequency domain features, calculating the log likelihood ratio of the node weight frequency domain features to obtain node weight difference values, performing Fourier transform on the edge probabilities of the new state transition diagram and the original state transition diagram to obtain edge probability frequency domain features, calculating the log likelihood ratio of the edge probability frequency domain features to obtain edge probability difference values, and performing weighted fusion on the node weight difference values and the edge probability difference values to obtain abnormal state scores.
And constructing a node distribution probability matrix based on the result of the multiple hierarchical sampling. Specifically, assume that 10 hierarchical samples are taken in a network system, each sample obtaining state data for 5 key nodes. The sampling results are arranged to obtain a node distribution probability matrix P with the size of 10 multiplied by 5, wherein P [ i ] [ j ] represents the probability distribution value of the j-th node in the ith sampling. For example, P has the value [ (0.12,0.23,0.18,0.27,0.20 ], [0.15,0.21,0.19,0.25,0.20], ].
And carrying out convolution operation on the node distribution probability matrix and the optimal parameter configuration to obtain node weights. Let the optimal parameters be configured as a weight vector w= [0.3,0.2,0.15,0.25,0.1], representing the importance weights of the 5 key indicators. For each row Pi in the node distribution probability matrix P, calculating convolution of Pi and W to obtain node weight NW. In particular implementation, for each node j, all sampled values of that node in P are multiplied by a corresponding weight and summed, i.e., NW [ j ] = sum (P [ i ] [ j ] ×w [ j ]), where i is from 1 to 10. For example, for node 1, its weight is calculated as (0.12+0.15+)/10×0.3=0.135.
And constructing node distribution of the state transition diagram based on the node distribution probability matrix. The node distribution probability matrix P is converted into the distribution condition of the nodes in the state space. Specifically, for each node j, its probability distribution values in all samples are collected, forming a distribution vector ND j. For example, the distribution vector of node 1 is [0.12,0.15, ], representing the distribution of that node over 10 samples.
And carrying out Fourier transform on the node distribution to obtain frequency domain characteristics. A discrete Fourier transform is applied to the distribution vector ND [ j ] of each node, and the distribution vector ND [ j ] is converted into a frequency domain feature FD [ j ]. The purpose of this step is to extract the periodic features of the node distribution. In practice, a fast fourier transform algorithm may be used. For example, the distribution vector [0.12,0.15, ] of node 1 is fourier transformed to yield the frequency domain features [0.135,0.02-0.01i, 0.005+0.002i.
And optimizing the node distribution probability matrix based on the frequency domain characteristics to obtain the edge probability. And optimizing an original node distribution probability matrix by utilizing the frequency domain feature FD, and calculating transition probability among nodes, namely edge probability EP. The specific method is that for any two nodes j and k, the correlation of the frequency domain characteristics is calculated, and the original probability distribution value is combined to generate the edge probability EP [ j ] [ k ]. For example, the edge probability of node 1 to node 2 is 0.35, indicating a probability of a transition from state 1 to state 2 of 35%.
And combining the node weight and the edge probability to construct a new state transition diagram. Using the calculated node weights NW and edge probabilities EP, a new state transition graph Gnew is constructed. In this graph, each node j has a weight NW [ j ], and the edges of node j to node k have a weight EP [ j ] [ k ]. For example, node 1 in Gnew has a weight of 0.135 and the edge from node 1 to node 2 has a weight of 0.35.
And carrying out Fourier transform on the node weights of the new state transition diagram and the original state transition diagram to obtain node weight frequency domain characteristics. Assuming that the node weight in the original state transition diagram Gorig is OW, fourier transforms are performed on NW and OW, respectively, to obtain the frequency domain features FNW and FOW. For example, for node 1, weight 0.135 in the new graph is [0.135,0,0, ] after fourier transform, while weight 0.15 in the original graph is [0.15,0,0, ].
And calculating the log likelihood ratio of the node weight frequency domain characteristics to obtain the node weight difference value. For each node j, the log-likelihood ratio of its frequency characteristics in the old and new graphs is calculated, i.e., ND [ j ] =log (FNW [ j ]/FOW [ j ]). This step can quantify the degree of variation of node weights in the frequency domain. For example, the weight difference value of the node 1 is log (0.135/0.15) = -0.105.
And carrying out Fourier transform on the edge probabilities of the new state transition diagram and the original state transition diagram to obtain the edge probability frequency domain characteristics. Likewise, fourier transforms are performed on the edge probabilities EP and OEP in Gnew and Gorig to obtain the frequency domain features FEP and FOEP. For example, the probability of an edge of node 1 to node 2 is [0.35,0,0, ] after fourier transform in the new graph, and the probability of 0.4 is [0.4,0,0, ] after fourier transform in the original graph.
And calculating the log likelihood ratio of the edge probability frequency domain features to obtain an edge probability difference value. For each pair of nodes (j, k), the log likelihood ratio of its edge probability in the frequency domain is calculated, namely ED [ j ] [ k ] = log (FEP [ j ] [ k ]/FOEP [ j ] [ k ]). For example, the edge probability difference value from node 1 to node 2 is log (0.35/0.4) = -0.134.
And carrying out weighted fusion on the node weight difference value and the edge probability difference value to obtain an abnormal state score. Importance weights alpha and beta (alpha+beta=1) of the node weight differences and the edge probability differences are set, and a composite score s=alpha×sum (ND [ j ])+beta×sum (ED [ j ] [ k ]). For example, if alpha=0.6, beta=0.4, node difference sum is-0.5, edge difference sum is-0.7, anomaly score s=0.6× (-0.5) +0.4× (-0.7) = -0.58. The larger the score value, the more significant the difference between the new and old state transition diagrams, and the higher the probability that the system is in an abnormal state. In practical application, a threshold T may be set, and when S > T, the system is judged to enter an abnormal state, and a corresponding alarm mechanism is triggered.
By the method, accurate monitoring of the system state is realized, abnormal changes in the state transition process can be effectively captured, and the safety and stability of the system are improved.
FIG. 5 is a graph of abnormal state scoring accuracy versus graph showing the evaluation performance versus curves for three different methods at different noise levels, in accordance with an embodiment of the present invention. Wherein the conventional feature comparison method represents a basic algorithm based on conventional feature extraction and comparison, the initial performance is 90.0%, the performance gradually decreases with increasing noise level, the noise level decreases to 84.2%, the noise level decreases to 79.8%, the noise level decreases to 74.4%, the noise level decreases to 68.5%, the noise level decreases to 63.2%, the noise level decreases to 58.7% and the noise reaches 30%, the initial performance is 93.8%, the performance at different noise levels is 88.1% (5%), 83.5% (10%), 77.3% (15%), 71.8% (20%), 67.4% (25%) and 62.6% (30%), respectively, the present technical scheme represents a novel comprehensive evaluation algorithm, which exhibits the strongest noise immunity, the initial performance is up to 97.6%, the performance decreases most slowly during the noise level increasing process, and the performance decreases to 95.3% (5%), 91.2% (10%), 87.7% (15.3%), 81.3% (20.3%), 75% (25%), and 75.6% (30%). Overall, the technical scheme maintains optimal performance under all noise levels, has better robustness compared with other two methods, and particularly still maintains higher evaluation accuracy under high noise environments (noise level > 20%), thereby fully proving the superiority of the scheme under complex noise environments.
The method further comprises the steps of:
In the data preprocessing stage, years of meteorological observation data of national weather stations and numerical forecast data at corresponding stations are collected. Taking the temperature observation of the national stations 2018 to 2022 as an example, the hour-by-hour temperature observation of 100 national stations in the whole country and the short-term temperature forecast data at the same time are selected. For numerical forecast data, forecast values of site locations are extracted from an original forecast field by a bilinear interpolation method. The data normalization process adopts a mean normalization method, namely, the original data is divided by standard deviation after subtracting the mean. For air temperature data, a mean of about 15.3 ℃ and a standard deviation of about 10.2 ℃ were calculated. After normalization, the air temperature data is mostly distributed in the range of [ -3,3 ].
According to the time series characteristic, the observed data of every 24 hours are organized into a series of samples, and each sample comprises observed values of four meteorological variables of air temperature, air pressure, humidity and wind speed and forecast values of the air temperature. After removal of the sequence containing the missing test, a total of about 15 ten thousand samples of the effective sequence were obtained. The first 80% (about 12 ten thousand) was used for model training and the remaining 20% (about 3 ten thousand) was used for model verification in time series.
In the model architecture design stage, an LSTM-AE structure of the double encoder is constructed. The first encoder processes a multivariate observation sequence comprising two stacked LSTM layers, each layer comprising 128 concealment units. The input is a tensor of shape [24,4] representing observations of 4 meteorological variables over 24 hours. The second encoder processes the air temperature prediction sequence, which includes a layer of LSTM, with 64 hidden units. The input is a tensor of the shape [24,1] representing the forecast value of the air temperature within 24 hours. The outputs of the two encoders are combined into a 192-dimensional fused representation by a splice layer. The decoder uses two layers of stacked LSTM, 128 hidden units per layer, to reconstruct the fused representation into the original observation sequence.
And in the LSTM unit, the forget gate, the input gate and the output gate respectively control the retention of historical information, the influence of current input and the screening of output content through different weight matrixes. Taking the first layer LSTM of the first encoder as an example, the input feature dimension is 4, the hidden state dimension is 128, and the weight matrix dimension of the forgetting gate is [132,128], which contains about 1.7 ten thousand parameters. Similarly, the input gate and the output gate each have the same scale of parameter numbers. The parameters are continuously optimized in the training process, so that the model can effectively capture the time dependence of meteorological data.
In the model training and super-parameter optimization stage, a small batch gradient descent method with a batch size of 64 is adopted to train the model, the initial learning rate is set to be 0.001, and an Adam optimizer is adopted. The loss function selects Mean Square Error (MSE), and calculates the mean square error between the observed value and the reconstruction value aiming at the target quality control variable (air temperature). In the training process, when the loss on the verification set is no longer reduced for 5 continuous periods, the learning rate is halved. The total training period is set to 100, but the model will typically achieve the best performance at 50-70 cycles.
For the key superparameter LSTM hidden unit dimension, a grid search from 32 to 256 (in power of 2 step) is performed. The results show that the MSE on the validation set reaches a minimum of 0.0042 when the first encoder hidden unit is 128 and the second encoder hidden unit is 64. The total number of model parameters was about 30 ten thousand, and the training time on a standard GPU was about 2 hours.
In the anomaly score calculation and threshold determination phase, a reconstruction error is calculated as an anomaly score for each sample in the validation set. Statistical analysis shows that the reconstruction error of normal air temperature data is mostly lower than 0.05, while the reconstruction error of abnormal data is usually higher than 0.1. By calculating the percentile of the anomaly scores of the validation set, four thresholds were obtained, 0.5 percentile approximately 0.025,0.05 percentile approximately 0.085,0.005 percentile approximately 0.135,0.0005 percentile approximately 0.175. These four thresholds correspond to anomaly detection criteria of different degrees of stringency, respectively.
In the model application phase, the air temperature observation of a certain site 2023 on 1 month and 5 days will be described as an example. The 24-hour day air temperature observation sequence is [-5.2,-5.8,-6.3,-6.5,-6.2,-6.0,-5.8,-5.1,-4.3,-3.2,-2.1,-1.0,0.2,0.5,0.3,-0.6,-1.8,-2.5,-3.1,-3.8,-4.2,-4.6,-4.9,-5.1]℃, and the corresponding forecast sequence is [-5.5,-6.0,-6.2,-6.3,-6.1,-5.9,-5.5,-4.9,-4.0,-3.0,-1.8,-0.8,0.0,0.4,0.1,-0.8,-2.0,-2.8,-3.4,-4.0,-4.4,-4.8,-5.0,-5.3]℃., the standardized observation sequence and the forecast sequence are input into a trained model, the reconstructed air temperature sequence [-5.3,-5.7,-6.2,-6.4,-6.3,-6.1,-5.7,-5.0,-4.2,-3.1,-2.0,-0.9,0.1,0.4,0.2,-0.7,-1.9,-2.6,-3.2,-3.9,-4.3,-4.5,-5.0,-5.2]℃. is obtained, the MSE of the original air temperature observation and the reconstructed air temperature is calculated to be 0.0038 and is lower than the loosest threshold value 0.025, and therefore the observation sequence is judged to be normal.
If abnormal data such as sudden change of air temperature caused by sensor fault is encountered, if the observed value of 13 hours in the sequence jumps to 10.2 ℃ abnormally, the air temperature at the moment after reconstruction is still kept at about 0.1 ℃ to generate obvious reconstruction errors. The calculated MSE at this point is about 0.153, exceeding the 0.005 percentile threshold of 0.135, but below the 0.0005 percentile threshold of 0.175, will give a higher stringency anomaly warning.
In the continuous operation stage, the observer can adjust the abnormal judgment threshold value according to actual conditions and experience feedback. For example, in a cold winter region, the temperature change is relatively gentle, the threshold value can be properly lowered to improve the detection sensitivity, and in a region with large temperature daily change in spring and autumn, the threshold value can be properly raised to avoid false alarm. By the dynamic adjustment mechanism and combining with manual experience, the quality control effect of the model under different meteorological conditions is continuously optimized.
Compared with the traditional statistical method, the method can simultaneously consider the interrelationship and time sequence characteristics among multiple variables, and fuse numerical forecast information, so that the accuracy of anomaly detection is remarkably improved. In the practical application test, the detection rate of the artificially introduced abnormal data reaches more than 95%, and the false alarm rate is controlled below 5%, so that the high requirement of the meteorological department on the quality control of the data is met.
In a second aspect of an embodiment of the present invention, there is provided an electronic device, including:
A processor;
A memory for storing processor-executable instructions;
Wherein the processor is configured to invoke the instructions stored in the memory to perform the method described previously.
In a third aspect of embodiments of the present invention, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method as described above.
The present invention may be a method, apparatus, system, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for performing various aspects of the present invention.
It should be noted that the above embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that the technical solution described in the above embodiments may be modified or some or all of the technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the scope of the technical solution of the embodiments of the present invention.

Claims (9)

Translated fromChinese
1.融合数值预报的AI气象台站数据质量控制方法,其特征在于,包括:1. An AI meteorological station data quality control method integrated with numerical forecasting, characterized by comprising:获取气象台站的多变量气象观测数据和数值预报数据,对所述多变量气象观测数据和所述数值预报数据进行标准化处理形成观测时间序列数据集和数值预报时间序列数据集;Acquiring multivariate meteorological observation data and numerical forecast data from a meteorological station, and performing standardization processing on the multivariate meteorological observation data and the numerical forecast data to form an observation time series dataset and a numerical forecast time series dataset;构建基于LSTM-AE结构的物理驱动模型,根据所述物理驱动模型通过长短时记忆网络编码器提取所述观测时间序列数据集中气象要素间的物理关系和时序演化特征,解析所述观测时间序列数据集和所述数值预报时间序列数据集,得到观测物理特征与预报物理特征;Constructing a physical driving model based on the LSTM-AE structure, extracting the physical relationships and temporal evolution characteristics between meteorological elements in the observation time series dataset through a long short-term memory network encoder based on the physical driving model, parsing the observation time series dataset and the numerical forecast time series dataset to obtain observed physical characteristics and predicted physical characteristics;基于所述物理驱动模型中的LSTM解码器处理所述观测物理特征和所述预报物理特征,按照气象要素的物理规律和时空演化特性完成特征重构,生成重构后的观测时间序列;Processing the observed physical features and the forecast physical features based on the LSTM decoder in the physical drive model, completing feature reconstruction according to the physical laws and spatiotemporal evolution characteristics of meteorological elements, and generating a reconstructed observation time series;对所述观测时间序列数据集进行时序划分得到训练集和验证集,以所述重构后的观测时间序列与原始观测时间序列的偏差作为评价指标,完成所述物理驱动模型中LSTM编解码器的训练和参数优化;The observation time series dataset is divided into a training set and a validation set, and the deviation between the reconstructed observation time series and the original observation time series is used as an evaluation indicator to complete the training and parameter optimization of the LSTM codec in the physical drive model;基于所述物理驱动模型对待检测数据进行异常状态评估,根据所述异常状态评估的结果确定质量等级标记,标记结果反馈优化所述LSTM-AE结构的特征提取能力和异常检测性能,实现气象台站数据的质量控制和持续优化。Based on the physical driving model, an abnormal state evaluation is performed on the data to be detected, and a quality grade label is determined according to the result of the abnormal state evaluation. The labeling result is fed back to optimize the feature extraction capability and anomaly detection performance of the LSTM-AE structure, thereby realizing quality control and continuous optimization of meteorological station data.2.根据权利要求1所述的方法,其特征在于,构建基于LSTM-AE结构的物理驱动模型,根据所述物理驱动模型通过长短时记忆网络编码器提取所述观测时间序列数据集中气象要素间的物理关系和时序演化特征,解析所述观测时间序列数据集和所述数值预报时间序列数据集,得到观测物理特征与预报物理特征包括:2. The method according to claim 1, characterized in that a physical driving model based on an LSTM-AE structure is constructed, and the physical relationship and temporal evolution characteristics between meteorological elements in the observed time series dataset are extracted using a long short-term memory network encoder according to the physical driving model. The observed time series dataset and the numerical forecast time series dataset are parsed to obtain the observed physical characteristics and the predicted physical characteristics, including:通过长短时记忆网络编码器计算观测时间序列数据集的观测数据得到观测均值向量和观测方差向量,基于所述观测均值向量和所述观测方差向量构建观测特征正态分布,从所述观测特征正态分布中采样得到观测数据的隐变量;Calculating the observation data of the observation time series data set through a long short-term memory network encoder to obtain an observation mean vector and an observation variance vector, constructing an observation feature normal distribution based on the observation mean vector and the observation variance vector, and sampling from the observation feature normal distribution to obtain a latent variable of the observation data;对所述物理驱动模型的网络参数引入先验正态分布约束,基于所述先验正态分布约束通过贝叶斯神经网络计算不同时间尺度下气象要素的物理关联权重;Introducing a priori normal distribution constraints on the network parameters of the physical driving model, and calculating the physical association weights of meteorological elements at different time scales through a Bayesian neural network based on the prior normal distribution constraints;根据所述物理关联权重对所述观测数据的隐变量进行加权重构,得到重构后的观测特征向量,基于所述重构后的观测特征向量构建条件概率分布,从所述条件概率分布中进行多次采样得到观测物理特征;Performing weighted reconstruction on the latent variables of the observation data according to the physical association weights to obtain a reconstructed observation feature vector, constructing a conditional probability distribution based on the reconstructed observation feature vector, and performing multiple sampling from the conditional probability distribution to obtain an observed physical feature;将所述数值预报时间序列数据集输入所述长短时记忆网络编码器,获得所述数值预报时间序列数据集的预报均值向量和预报方差向量,基于所述预报均值向量和预报方差向量构建预报特征正态分布,从所述预报特征正态分布中采样得到预报数据的隐变量,基于所述物理关联权重对所述预报数据的隐变量进行加权重构,得到表征气象要素物理关系的预报物理特征。The numerical forecast time series data set is input into the long short-term memory network encoder to obtain a forecast mean vector and a forecast variance vector of the numerical forecast time series data set, a forecast feature normal distribution is constructed based on the forecast mean vector and the forecast variance vector, latent variables of the forecast data are sampled from the forecast feature normal distribution, and the latent variables of the forecast data are weightedly reconstructed based on the physical association weights to obtain forecast physical characteristics that characterize the physical relationship of meteorological elements.3.根据权利要求1所述的方法,其特征在于,基于所述物理驱动模型中的LSTM解码器处理所述观测物理特征和所述预报物理特征,按照气象要素的物理规律和时空演化特性完成特征重构,生成重构后的观测时间序列包括:3. The method according to claim 1, wherein the LSTM decoder in the physical drive model processes the observed physical features and the predicted physical features, and reconstructs the features according to the physical laws and spatiotemporal evolution characteristics of the meteorological elements, generating a reconstructed observation time series, comprising:基于所述LSTM解码器对所述观测物理特征和所述预报物理特征进行初始重构,生成初始重构特征,计算所述初始重构特征与观测时间序列之间的重构误差,将所述重构误差和隐状态进行组合得到误差传播特征;Performing initial reconstruction of the observed physical features and the predicted physical features based on the LSTM decoder to generate initial reconstruction features, calculating a reconstruction error between the initial reconstruction features and the observed time series, and combining the reconstruction error with a hidden state to obtain an error propagation feature;将所述误差传播特征和所述初始重构特征进行组合,计算得到补偿系数,基于所述补偿系数对所述初始重构特征进行校正得到动态误差补偿项;Combining the error propagation feature and the initial reconstruction feature to calculate a compensation coefficient, and correcting the initial reconstruction feature based on the compensation coefficient to obtain a dynamic error compensation term;将所述动态误差补偿项与所述初始重构特征进行组合得到校正后的重构特征,计算所述校正后的重构特征中气压、温度、湿度、风速和降水量之间的气象物理关联度,将所述气象物理关联度与观测时间序列中对应气象要素之间的物理关联度进行对比,得到所述校正后的重构特征与所述观测时间序列在物理规律上的偏差值;combining the dynamic error compensation term with the initial reconstructed feature to obtain a corrected reconstructed feature, calculating the meteorological physical correlation between air pressure, temperature, humidity, wind speed, and precipitation in the corrected reconstructed feature, and comparing the meteorological physical correlation with the physical correlation between corresponding meteorological elements in the observation time series to obtain a deviation value between the corrected reconstructed feature and the observation time series in terms of physical laws;根据所述偏差值对所述LSTM解码器中的遗忘权重、输入权重和输出权重进行调整,通过调整后的权重更新细胞状态和隐状态,基于更新后的所述细胞状态和所述隐状态对所述观测物理特征和所述预报物理特征进行新一轮重构,生成符合气象要素物理规律和时空演化特性的重构后的观测时间序列。The forgetting weight, input weight and output weight in the LSTM decoder are adjusted according to the deviation value, the cell state and hidden state are updated by the adjusted weights, and a new round of reconstruction of the observed physical characteristics and the predicted physical characteristics is performed based on the updated cell state and the hidden state to generate a reconstructed observation time series that conforms to the physical laws and spatiotemporal evolution characteristics of meteorological elements.4.根据权利要求3所述的方法,其特征在于,将所述误差传播特征和所述初始重构特征进行组合,计算得到补偿系数,基于所述补偿系数对所述初始重构特征进行校正得到动态误差补偿项包括:4. The method according to claim 3, wherein the step of combining the error propagation feature and the initial reconstruction feature to calculate a compensation coefficient, and correcting the initial reconstruction feature based on the compensation coefficient to obtain a dynamic error compensation term comprises:将误差传播特征和初始重构特征映射至相同维度的特征空间,得到投影后的误差传播特征和投影后的初始重构特征;将所述投影后的初始重构特征划分为多个局部区域,对每个所述局部区域计算均值和标准差,基于所述均值和所述标准差构建局部特征统计量,将所述局部特征统计量与所述投影后的误差传播特征进行组合运算,生成局部补偿系数;Mapping the error propagation feature and the initial reconstruction feature to a feature space of the same dimension to obtain a projected error propagation feature and a projected initial reconstruction feature; dividing the projected initial reconstruction feature into a plurality of local regions, calculating a mean and a standard deviation for each of the local regions, constructing a local feature statistic based on the mean and the standard deviation, and combining the local feature statistic with the projected error propagation feature to generate a local compensation coefficient;计算所述投影后的初始重构特征与所述投影后的误差传播特征之间的相关性矩阵,基于所述相关性矩阵计算全局特征关联度,根据所述全局特征关联度生成全局补偿系数;Calculating a correlation matrix between the projected initial reconstructed features and the projected error propagation features, calculating a global feature correlation based on the correlation matrix, and generating a global compensation coefficient according to the global feature correlation;将所述投影后的初始重构特征与所述投影后的误差传播特征进行特征拼接,根据所述特征拼接的结果生成权重系数,基于所述权重系数对所述局部补偿系数和所述全局补偿系数进行加权组合,得到最终补偿系数;Performing feature splicing on the projected initial reconstructed features and the projected error propagation features, generating a weight coefficient according to the result of the feature splicing, and performing a weighted combination of the local compensation coefficient and the global compensation coefficient based on the weight coefficient to obtain a final compensation coefficient;将所述最终补偿系数与所述投影后的误差传播特征进行元素级乘法运算,包括:将所述最终补偿系数的每一维特征分别作为权重,与所述投影后的误差传播特征对应维度的特征进行加权,将各维度加权后的特征进行组合得到补偿特征,将所述补偿特征与所述初始重构特征进行融合,生成校正后的重构特征,根据所述校正后的重构特征与所述初始重构特征的差值得到动态误差补偿项。The final compensation coefficient is subjected to element-wise multiplication operation with the projected error propagation feature, including: taking each dimension feature of the final compensation coefficient as a weight, weighting the feature of the corresponding dimension of the projected error propagation feature, combining the weighted features of each dimension to obtain a compensation feature, fusing the compensation feature with the initial reconstruction feature to generate a corrected reconstruction feature, and obtaining a dynamic error compensation term according to the difference between the corrected reconstruction feature and the initial reconstruction feature.5.根据权利要求1所述的方法,其特征在于,对所述观测时间序列数据集进行时序划分得到训练集和验证集,以所述重构后的观测时间序列与原始观测时间序列的偏差作为评价指标包括:5. The method according to claim 1, wherein the step of dividing the observed time series data set into a training set and a validation set, and using the deviation between the reconstructed observed time series and the original observed time series as an evaluation indicator comprises:对所述观测时间序列数据集进行特征重构得到重构后的观测时间序列,所述重构后的观测时间序列与原始观测时间序列具有相同的时序分布特征;Performing feature reconstruction on the observation time series data set to obtain a reconstructed observation time series, wherein the reconstructed observation time series has the same temporal distribution characteristics as the original observation time series;计算所述重构后的观测时间序列与所述原始观测时间序列在对应时刻的时序偏差,将所述时序偏差作为评价指标,用于评估重构效果。The timing deviation between the reconstructed observation time series and the original observation time series at corresponding moments is calculated, and the timing deviation is used as an evaluation index to evaluate the reconstruction effect.6.根据权利要求1所述的方法,其特征在于,基于所述物理驱动模型对待检测数据进行异常状态评估,根据所述异常状态评估的结果确定质量等级标记包括:6. The method according to claim 1, wherein performing an abnormality assessment on the data to be detected based on the physical drive model and determining a quality grade mark according to a result of the abnormality assessment comprises:将待检测数据输入至物理驱动模型生成状态特征向量,基于所述状态特征向量构建状态转移图,通过所述状态转移图与物理约束条件的匹配度分析生成初始异常状态预测值;Input the data to be detected into the physical drive model to generate a state feature vector, construct a state transition diagram based on the state feature vector, and generate an initial abnormal state prediction value by analyzing the matching degree between the state transition diagram and the physical constraint conditions;基于所述初始异常状态预测值构建先验分布,采用马尔可夫链蒙特卡洛方法对所述先验分布进行分层重复采样,根据采样结果对所述状态转移图的节点权重与边缘概率进行迭代更新,通过多轮迭代优化得到所述状态转移图最优参数配置的参数后验分布;A prior distribution is constructed based on the initial abnormal state prediction value, and the Markov chain Monte Carlo method is used to perform stratified repeated sampling on the prior distribution. The node weights and edge probabilities of the state transition diagram are iteratively updated according to the sampling results, and the parameter posterior distribution of the optimal parameter configuration of the state transition diagram is obtained through multiple rounds of iterative optimization;利用所述马尔可夫链蒙特卡洛方法对所述参数后验分布进行多次分层抽样,将所述多次分层抽样的结果与所述最优参数配置相结合构建新的状态转移图,通过对比所述新的状态转移图与原始状态转移图在节点权重与边缘概率上的差异,得到异常状态评分;Performing multiple stratified sampling on the parameter posterior distribution using the Markov Chain Monte Carlo method, combining the results of the multiple stratified sampling with the optimal parameter configuration to construct a new state transition diagram, and obtaining an abnormal state score by comparing the differences in node weights and edge probabilities between the new state transition diagram and the original state transition diagram;对所述新的状态转移图中的节点分布进行概率密度估计,基于所述马尔可夫链蒙特卡洛方法分析节点权重与边缘概率的波动特性,基于所述波动特性结合所述物理约束条件的置信区间计算得到不确定性度量;Performing a probability density estimation on the node distribution in the new state transition graph, analyzing the fluctuation characteristics of the node weights and edge probabilities based on the Markov chain Monte Carlo method, and calculating an uncertainty measure based on the fluctuation characteristics combined with the confidence interval of the physical constraints;根据所述异常状态评分的数值分布与所述新的状态转移图的拓扑结构之间的关联性建立分段映射关系,基于所述不确定性度量对所述分段映射关系的阈值进行自适应调整,确定质量等级标记。A segmented mapping relationship is established according to the correlation between the numerical distribution of the abnormal state score and the topological structure of the new state transition diagram, and a threshold of the segmented mapping relationship is adaptively adjusted based on the uncertainty metric to determine a quality level mark.7.根据权利要求6所述的方法,其特征在于,将所述多次分层抽样的结果与所述最优参数配置相结合构建新的状态转移图,通过对比所述新的状态转移图与原始状态转移图在节点权重与边缘概率上的差异,得到异常状态评分包括:7. The method according to claim 6, wherein the steps of combining the results of the multiple stratified samplings with the optimal parameter configuration to construct a new state transition diagram, and comparing the differences in node weights and edge probabilities between the new state transition diagram and the original state transition diagram to obtain an abnormal state score include:基于所述多次分层抽样的结果构建节点分布概率矩阵,将所述节点分布概率矩阵与所述最优参数配置进行卷积运算得到节点权重,基于所述节点分布概率矩阵构建状态转移图的节点分布,对所述节点分布进行傅里叶变换得到频域特征,基于所述频域特征对所述节点分布概率矩阵进行优化得到边缘概率,将所述节点权重与所述边缘概率组合构建新的状态转移图;Constructing a node distribution probability matrix based on the results of the multiple stratified samplings, performing a convolution operation on the node distribution probability matrix and the optimal parameter configuration to obtain node weights, constructing a node distribution of a state transition diagram based on the node distribution probability matrix, performing Fourier transform on the node distribution to obtain frequency domain features, optimizing the node distribution probability matrix based on the frequency domain features to obtain edge probabilities, and combining the node weights and the edge probabilities to construct a new state transition diagram;对所述新的状态转移图与原始状态转移图的节点权重进行所述傅里叶变换得到节点权重频域特征,计算所述节点权重频域特征的对数似然比得到节点权重差异值,对所述新的状态转移图与原始状态转移图的边缘概率进行所述傅里叶变换得到边缘概率频域特征,计算所述边缘概率频域特征的对数似然比得到边缘概率差异值,将所述节点权重差异值与所述边缘概率差异值进行加权融合得到异常状态评分。The node weights of the new state transition diagram and the original state transition diagram are subjected to the Fourier transform to obtain the node weight frequency domain features, the log likelihood ratio of the node weight frequency domain features is calculated to obtain the node weight difference value, the edge probability of the new state transition diagram and the original state transition diagram is subjected to the Fourier transform to obtain the edge probability frequency domain features, the log likelihood ratio of the edge probability frequency domain features is calculated to obtain the edge probability difference value, and the node weight difference value and the edge probability difference value are weightedly fused to obtain the abnormal state score.8.一种电子设备,其特征在于,包括:8. An electronic device, comprising:处理器;processor;用于存储处理器可执行指令的存储器;a memory for storing processor-executable instructions;其中,所述处理器被配置为调用所述存储器存储的指令,以执行权利要求1至7中任意一项所述的方法。The processor is configured to call the instructions stored in the memory to execute the method according to any one of claims 1 to 7.9.一种计算机可读存储介质,其上存储有计算机程序指令,其特征在于,所述计算机程序指令被处理器执行时实现权利要求1至7中任意一项所述的方法。9. A computer-readable storage medium having computer program instructions stored thereon, wherein the computer program instructions implement the method according to any one of claims 1 to 7 when executed by a processor.
CN202511054747.XA2025-07-302025-07-30 AI-based meteorological station data quality control method integrated with numerical forecastActiveCN120561474B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202511054747.XACN120561474B (en)2025-07-302025-07-30 AI-based meteorological station data quality control method integrated with numerical forecast

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202511054747.XACN120561474B (en)2025-07-302025-07-30 AI-based meteorological station data quality control method integrated with numerical forecast

Publications (2)

Publication NumberPublication Date
CN120561474Atrue CN120561474A (en)2025-08-29
CN120561474B CN120561474B (en)2025-09-30

Family

ID=96831125

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202511054747.XAActiveCN120561474B (en)2025-07-302025-07-30 AI-based meteorological station data quality control method integrated with numerical forecast

Country Status (1)

CountryLink
CN (1)CN120561474B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN104484539A (en)*2015-01-042015-04-01北京师范大学Multi-parameter space-time meteorological field simulating method and device
CN111783938A (en)*2020-05-252020-10-16华为技术有限公司 Prediction method and device for time series
CN118707634A (en)*2024-06-252024-09-27山东省水文计量检定中心 A digital intelligent hydrological tower rainfall monitoring system
CN119066541A (en)*2024-07-172024-12-03广西电网有限责任公司电力科学研究院 New energy station monitoring data quality evaluation method and system based on multi-source data
CN119200041A (en)*2024-11-282024-12-27水利部交通运输部国家能源局南京水利科学研究院 Rapid storm surge forecasting method for tidal river sections based on deep learning and AI big model
CN119961876A (en)*2025-04-102025-05-09南昌科晨电力试验研究有限公司 A method for assimilation and fusion of multi-source heterogeneous meteorological data
CN120372257A (en)*2025-06-262025-07-25国网浙江省电力有限公司电力科学研究院Meteorological downscaling method integrating numerical weather forecast and AI weather forecast

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN104484539A (en)*2015-01-042015-04-01北京师范大学Multi-parameter space-time meteorological field simulating method and device
CN111783938A (en)*2020-05-252020-10-16华为技术有限公司 Prediction method and device for time series
CN118707634A (en)*2024-06-252024-09-27山东省水文计量检定中心 A digital intelligent hydrological tower rainfall monitoring system
CN119066541A (en)*2024-07-172024-12-03广西电网有限责任公司电力科学研究院 New energy station monitoring data quality evaluation method and system based on multi-source data
CN119200041A (en)*2024-11-282024-12-27水利部交通运输部国家能源局南京水利科学研究院 Rapid storm surge forecasting method for tidal river sections based on deep learning and AI big model
CN119961876A (en)*2025-04-102025-05-09南昌科晨电力试验研究有限公司 A method for assimilation and fusion of multi-source heterogeneous meteorological data
CN120372257A (en)*2025-06-262025-07-25国网浙江省电力有限公司电力科学研究院Meteorological downscaling method integrating numerical weather forecast and AI weather forecast

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
RODNEY RICK,ET AL.: "Energy forecasting model based on CNN-LSTM-AE for many time series with uneual lengths", ENGINEERING APPLICATION OF ARTIFICAL INTELLIGENCE, 2 June 2022 (2022-06-02)*
SELIM FURKAN TEKIN, ET AL.: "Numerical Weather Forecasting Using Convolutional-LSTM With Attention and Context Matcher Mechanisms", IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 3 June 2024 (2024-06-03)*
庄潇然 等: "基于深度学习的融合降水临近预报方法及其在中国东部地区的应用研究", 气象学报, 15 April 2023 (2023-04-15)*
沈凯令: "基于物理引导多尺度融合环境信息的短临降雨预报方法研究与应用", 中国优秀硕士学位论文全文数据库, 15 April 2025 (2025-04-15)*

Also Published As

Publication numberPublication date
CN120561474B (en)2025-09-30

Similar Documents

PublicationPublication DateTitle
CN110263866B (en)Power consumer load interval prediction method based on deep learning
CN116757534B (en) A reliability analysis method for smart refrigerators based on neural training network
CN111723929A (en)Numerical prediction product correction method, device and system based on neural network
CN115564126A (en)Photovoltaic power generation prediction method, system, device and storage medium
Sundararajan et al.Regression and generalized additive model to enhance the performance of photovoltaic power ensemble predictors
CN112288137A (en) An LSTM short-term load forecasting method and device considering electricity price and Attention mechanism
CN112163020A (en) A kind of multidimensional time series anomaly detection method and detection system
CN117114913A (en)Intelligent agricultural data acquisition system based on big data
CN118643280B (en)Method, device, equipment and storage medium for predicting service life of SPI NAND flash memory
CN114330934A (en)Model parameter self-adaptive GRU new energy short-term power generation power prediction method
CN118297414A (en)Mine water hazard forecasting electronic equipment and computer program product
CN119577648A (en) A multi-factor fusion method for identifying outlier data in the Internet of Things
CN119623531B (en) Supervised time series water level data generation method, system and storage medium
CN119312068B (en)AI-based power distribution network fault identification method and related device
CN118798446B (en) A method and system for predicting heat load on primary side of thermal power station
CN119761849A (en)Method for identifying and forecasting strong convection based on deep learning
CN120561474B (en) AI-based meteorological station data quality control method integrated with numerical forecast
CN118859360A (en) Probabilistic prediction model and prediction system for extreme weather in specific scenarios
Kotevska et al.Short-Term Load Forecasting using Artificial Neural Networks techniques: A case study for Republic of North Macedonia
CN120067650B (en) A regional atmospheric downscaling generation method based on diffusion model
CN119670982B (en) Method for improving the accuracy of photovoltaic power generation capacity prediction
CN120337157B (en)Distribution network fault diagnosis method, device, terminal and medium based on large model
CN118628928B (en)Satellite health monitoring method, device and medium based on artificial intelligence algorithm
CN118759603B (en) A retractable weather station with protection function
Rakotoarimanana et al.Global Temperature Prediction by Convolutional Neural Network (CNN)

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp