Movatterモバイル変換


[0]ホーム

URL:


CN119066541A - New energy station monitoring data quality evaluation method and system based on multi-source data - Google Patents

New energy station monitoring data quality evaluation method and system based on multi-source data
Download PDF

Info

Publication number
CN119066541A
CN119066541ACN202410958735.9ACN202410958735ACN119066541ACN 119066541 ACN119066541 ACN 119066541ACN 202410958735 ACN202410958735 ACN 202410958735ACN 119066541 ACN119066541 ACN 119066541A
Authority
CN
China
Prior art keywords
data
deviation
probability
state
monitoring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410958735.9A
Other languages
Chinese (zh)
Inventor
彭博雅
孙志媛
文立斌
孙艳
胡弘
黄馗
詹厚剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Electric Power Research Institute of Guangxi Power Grid Co Ltd
Original Assignee
Electric Power Research Institute of Guangxi Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Electric Power Research Institute of Guangxi Power Grid Co LtdfiledCriticalElectric Power Research Institute of Guangxi Power Grid Co Ltd
Priority to CN202410958735.9ApriorityCriticalpatent/CN119066541A/en
Publication of CN119066541ApublicationCriticalpatent/CN119066541A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

The invention discloses a new energy station monitoring data quality evaluation method and system based on multi-source data, which relate to the field of enterprise production operation quality evaluation and comprise the steps of acquiring multi-source monitoring data collected by each monitoring device in a new energy station and preprocessing the multi-source monitoring data; the method comprises the steps of integrating a learning network and a data deviation recognition model, constructing an observation sequence, converting a predicted hidden state sequence into state label data, recognizing data deviation among all monitoring data sources, calculating data deviation abnormal scores, setting data quality grades for all data deviation points, and evaluating the influence of the data deviation on the running performance of the new energy station.

Description

New energy station monitoring data quality evaluation method and system based on multi-source data
Technical Field
The invention relates to the field of enterprise production operation quality evaluation, in particular to a new energy station monitoring data quality evaluation method and system based on multi-source data.
Background
The new energy station relies on a large amount of real-time data to carry out operation management, including generated energy, equipment state, environmental conditions and the like, the accuracy of the data directly influences the operation efficiency and decision quality of the station, the high-quality monitoring data is helpful for optimizing the operation strategy of the new energy station, the energy conversion efficiency and the equipment utilization rate are improved, and the generated energy of the new energy station is greatly influenced by environmental factors, such as wind speed, illumination intensity and the like. Accurate data is helpful for improving the accuracy of the power generation amount prediction model, the quality of the monitored data is directly related to the safe operation of the station, and abnormal data can be used for indicating equipment faults or other potential safety hazards. With the transition of global energy structures to clean and low-carbonization, new energy sources such as wind energy, solar energy and the like are increasingly developed and utilized. The efficient operation of new energy stations relies on accurate, real-time monitoring data for various aspects such as equipment status monitoring, fault early warning, performance assessment, operation and maintenance decisions. However, the conventional data quality evaluation method is mostly based on a single data source, such as SCADA (Supervisory Control And Data Acquisition) system, and it is difficult to comprehensively reflect the influence of complex and changeable new energy environmental factors, such as weather changes, equipment aging and the like, on the data quality. In addition, a single data source is easily affected by sensor faults, communication interruption and other problems, so that data is incomplete or distorted, and further accurate management and optimal operation of a new energy station are affected.
In recent years, along with the rapid development of technologies such as the Internet of things, big data, artificial intelligence and the like, multi-source data fusion becomes an effective way for improving the data quality. The multi-source data comprises but is not limited to SCADA data, meteorological data, equipment history maintenance records, geographic Information System (GIS) data and the like, and through comprehensively analyzing the data, the data quality can be more comprehensively evaluated, abnormal conditions can be identified, the equipment state can be predicted, and fine management and intelligent operation and maintenance can be realized. However, the application of multi-source data fusion in new energy stations is still in the primary stage at present, and the challenges of large data integration difficulty, complex model construction, limited real-time processing capacity and the like exist. Particularly in the aspect of data quality evaluation, how to effectively integrate multi-source information and establish a set of scientific and reasonable evaluation system is a key problem to be solved currently.
Disclosure of Invention
The present invention has been made in view of the above-described problems occurring in the prior art.
Therefore, the invention provides a new energy station monitoring data quality evaluation method and system based on multi-source data, which solve the problems of accurate identification and dynamic quality evaluation of new energy station monitoring data deviation in a multi-source data environment, and realize efficient evaluation and intelligent early warning of the influence of the data deviation on running performance.
In order to solve the technical problems, the invention provides the following technical scheme:
In a first aspect, an embodiment of the present invention provides a new energy station monitoring data quality evaluation method based on multi-source data, which includes,
Acquiring multi-source monitoring data acquired by each monitoring device in the new energy station, and preprocessing the multi-source monitoring data;
based on the preprocessed multi-source monitoring data, performing data deviation recognition model training on the preprocessed multi-source monitoring data by utilizing a Baum-Welch algorithm;
Integrating the learning network and the data deviation recognition model, constructing an observation sequence, converting the predicted hidden state sequence into state label data, and recognizing the data deviation among all monitoring data sources;
Calculating abnormal scores of the data deviations, setting data quality grades for all the data deviation points, and evaluating the influence of the data deviations on the running performance of the new energy station;
based on the data quality grade, a real-time monitoring platform is established, the change trend of the data deviation is tracked, a change rate threshold is set, and early warning is carried out on the data deviation exceeding the change rate threshold.
The method for evaluating the quality of the new energy station monitoring data based on the multi-source data is characterized by comprising the following main steps of:
Listing all monitoring devices in the new energy station, and definitely determining the type and index of data to be acquired;
installing monitoring equipment, ensuring that the position and the installation method of the equipment meet the standards, debugging the equipment, ensuring the normal operation of the equipment and accurately acquiring data;
And transmitting the acquired data to a cloud server through a network, and preprocessing the original data.
The invention relates to a new energy station monitoring data quality evaluation method based on multi-source data, which is a preferable scheme, wherein based on the preprocessed multi-source monitoring data, a Baum-Welch algorithm is utilized to train a data deviation recognition model of the preprocessed multi-source monitoring data, and the method mainly comprises the following steps:
Carrying out normalization processing on the multi-source monitoring data, and setting parameters of a data deviation recognition model based on the normalization processing result;
initializing an initial state probability vector, a state transition probability matrix and an observation probability matrix, and constructing a hidden Markov model;
The Baum-Welch algorithm is applied to calculate the forward probability at the initial moment, and the forward probability at each moment is calculated through forward recursion to calculate the total probability of the observation sequence;
setting the backward probability of the final moment, and calculating the backward probability of each moment through backward recursion;
calculating a state occupancy probability based on the forward probability and the backward probability;
calculating a state transition probability based on the forward probability and the backward probability;
The formula for calculating the state occupancy probability based on the forward probability and the backward probability is as follows:
the formula for calculating the state transition probability based on the forward probability and the backward probability is as follows:
Wherein γt (i) represents a state occupancy probability of the state i at the time t, Ωt (i) represents a forward probability of the state i at the time t, BETAt (i) represents a backward probability of the state i at the time t, W (o|λ) represents a total probability of occurrence of the observation sequence O at the given model parameter λ, Et (i, j) represents a state transition probability of the state i to the state j at the time t, aij represents a probability of the state i to the state j, bj(Ot+1) represents a probability of generating the observation data Ot+1 at the state j, Ot+1 represents data at the time t+1, βt+1 (i) represents a backward probability of the state j at the time t+1;
And updating the initial state probability vector, the state transition probability matrix and the observation probability matrix based on the state occupancy probability and the state transition probability.
As an optimal scheme of the new energy station monitoring data quality evaluation method based on the multi-source data, the invention comprises the following steps: the integrated learning network and the data deviation recognition model construct an observation sequence, convert the predicted hidden state sequence into state label data and recognize the data deviation among all monitoring data sources, and mainly comprises the following steps:
integrating a learning network and a data deviation recognition model;
based on the initial state probability vector, the state transition probability matrix, and the observation probability matrix;
the observed data of all data sources are aligned according to time and fused into a multidimensional time sequence;
inputting the fused multidimensional time sequence as an observation sequence into a trained data deviation recognition model, and constructing the observation sequence;
Initializing a path probability and a path record matrix, and generating an initialized path probability matrix and a path record matrix;
repeatedly executing the method, respectively calculating the path probability of each moment and recording the optimal precursor state of each state while calculating the path probability, and updating the path probability matrix and the path record matrix until the optimal precursor state of each state is reached;
the calculation formula of the path probability at each moment is as follows:
δt+1(i)=max[δt(i)·aij]bjOt+1;
Wherein δt+1 (i) represents the optimal path probability at time t+1 at the ith data, and δt (i) represents the optimal path probability at time t at the ith data;
Based on the optimal precursor state, calculating the optimal path probability and the optimal state at the final moment;
Extracting optimal path probability from the final moment, determining an optimal state at the final moment, and initializing an optimal hidden state sequence;
The optimal state of each moment is determined by utilizing the path record matrix and tracing back to the moment from the moment;
Constructing an optimal hidden state sequence according to the optimal state at each moment obtained by backtracking, and converting the predicted hidden state sequence into state label data;
By comparing the difference between the observed data and the state label data of each monitoring data, the data deviation between each monitoring data source is accurately identified.
As an optimal scheme of the new energy station monitoring data quality evaluation method based on multi-source data, the invention comprises the following main steps of:
The method comprises the steps that a log recording module is used for collecting a prediction result output by a model every minute and comparing the prediction result with prediction deviation, real deviation and corresponding timestamp data;
Classifying and marking the collected data according to a preset deviation type and severity standard by using an automatic script, and distinguishing true positives, false positives, true negatives and false negatives to form a marked feedback data set;
Setting a retraining day, and triggering the model to retrain on the monthly day;
for the hidden Markov model, re-estimating a state transition probability matrix and an observation probability matrix by using a Baum-Welch algorithm;
For an isolated forest model, the number of trees and the size of sub-samples are adjusted, so that the model is ensured to adapt to new data characteristics;
Adopting random gradient descent as an online learning algorithm, randomly extracting small batch data from the latest feedback data set each time, updating model parameters, and adopting the following formula:
the formula of the loss function is as follows;
wherein P (y|x; θ) represents the probability that the model predicts class y under parameter θ, N represents the total number of samples, θ represents the model parameter, η represents the learning rate, L (θt;xt,yt) represents the loss function, xt and yt represent the characteristics and labels of the current sample, and xi and yi represent the ith input characteristic and output label, respectively;
Randomly initializing a model parameter theta0, inputting samples in a data set in a streaming mode, acquiring one small batch of data each time, calculating the gradient of the model parameter about a loss function for each small batch of data, updating the model parameter according to the calculated gradient and a preset learning rate, and continuously iterating until the data stream reaches a preset iteration number;
Adopting a deep Q network algorithm, setting a reward function, giving forward rewards r+ for each correct early warning, and giving r- for each incorrect early warning, wherein the reward design follows the following principles:
R+=loge (1+V) for each correct pre-warning;
R-=-loge (1+RL) for each error warning;
Wherein V and RL represent response speed and response delay of the system under the condition of correct and error early warning respectively;
Setting a change rate threshold delta, and when the change rate deltanew > delta of the monitoring data, judging that an abnormal condition possibly exists by the system, and triggering early warning according to the formula:
Setting an abnormal score threshold value s, and triggering early warning when the abnormal score snew of a certain monitoring point is more than s;
Deltanew and snew represent a new change rate threshold and an abnormality score threshold, respectively, α and β are false positive rate and false negative rate, respectively, and T is average response time;
setting a working day of each month as an evaluation day, improving the early warning effect and the operation and maintenance efficiency based on the past month, and executing strategy evaluation, wherein the data deviation recognition accuracy ACC formula is as follows:
Wherein TP represents the abnormal number correctly recognized by the system, TN represents the data amount correctly judged as normal by the system, FP represents the number incorrectly judged as abnormal by the system, FN represents the actual abnormal data amount not recognized by the system;
The early warning response time RT formula is:
Wherein, Tr,i represents the response time point of the ith abnormal event by the system, Tr represents the response time, Tt,i represents the actual occurrence time of the ith abnormal event, Tt represents the actual occurrence time of the abnormal event, and N represents the total number of the abnormal events in the monitoring period;
The operation and maintenance efficiency improvement percentage E formula is:
wherein OT represents the manual operation and maintenance time required for solving the same scale problem before introducing an automatic early warning system, and NT represents the actual time consumption for processing the same problem after system intervention;
the CPI equation reflecting overall performance is:
wherein, RTmax represents the maximum acceptable early warning response time, alpha represents the regulating factor, and w1、w2 and w3 respectively represent the weight coefficient ratio of accuracy, response time and operation and maintenance efficiency improvement percentage.
The invention relates to a new energy station monitoring data quality evaluation method based on multi-source data, which comprises the following steps of calculating data deviation abnormal scores, setting data quality grades for each data deviation point, and evaluating the influence of the data deviation on the operation performance of the new energy station, wherein the method mainly comprises the following steps:
carrying out standardization processing on each data deviation, and forming a test data set from the standardized data deviations;
Constructing an anomaly monitoring model, setting parameters of the anomaly monitoring model, training the anomaly monitoring model by using a test data set as training data, predicting anomaly scores of data deviations in the test data set by using the trained anomaly monitoring model, traversing the data deviations in the test data set, inputting the data deviations into an isolated tree of the anomaly monitoring model, starting from a root node, traversing downwards according to splitting conditions of the isolated tree until leaf nodes are reached, recording traversal path lengths of each data point in each tree, calculating average values of path lengths of the data deviations in all the isolated trees with respect to the data deviations, taking the average values as final anomaly scores of current data deviations, and normalizing the anomaly scores to a preset range;
the calculation formula of the anomaly score is as follows;
wherein S (x) represents an anomaly score for the data bias x,Representing the average path length of the data bias x in all the isolated trees, C (n) representing the adjustment coefficient;
Setting a data quality level based on the anomaly score and scoring the anomaly data points;
setting a data quality class quality based on the anomaly score, the class being settable according to a score range:
High-quality data, wherein the abnormal score is in the range of 0-a;
Good data, anomaly score in the a-b range;
Moderate data, anomaly score in the b-c range;
Low-prime data, anomaly score in the c-d range;
abnormal data, wherein the abnormal score is in the range of d-e;
Assigning a corresponding quality level to each data point based on its anomaly score
Each outlier data point is given a score, which can be set according to the magnitude of the outlier score:
data points with anomaly scores in the d-E range are scored as E' (highest score);
data points with anomaly scores in the c-D range are scored as D';
Data points with anomaly scores in the b-C range are scored as C';
data points with anomaly scores in the a-B range are scored as B';
data points with anomaly scores in the range of 0-a are scored as a' (lowest score);
and evaluating the influence of the data deviation on the operation performance of the new energy station according to the scoring of the abnormal data points.
The invention relates to a new energy station monitoring data quality evaluation method based on multi-source data, which comprises the following main steps of establishing a real-time monitoring platform, tracking the change trend of data deviation, setting a change rate threshold value, and pre-warning the data deviation exceeding the change rate threshold value based on the data quality grade:
Based on the data quality level, establishing a real-time monitoring platform;
Performing differential operation on the data deviation indexes to obtain differential values, performing stability test on the differential values, and taking the deviation values after the stability test as the change rate of the data deviation;
setting a change rate threshold of data deviation as mu0, wherein a deviation change rate mu formula is as follows:
μ=k〃σ;
wherein sigma represents the standard deviation of the data, and k represents the safety factor;
introducing an objective function G (k), and evaluating early warning effects under different k values to obtain an optimal safety coefficient k, wherein the formula is as follows:
Wherein FPR (k) represents false alarm rate, FNR (k) represents false alarm rate, and w4 and w5 represent weight coefficients of false alarm rate and false alarm rate respectively;
The formula of the optimal safety coefficient k is:
when mu > mu0, triggering early warning, and transmitting early warning information to operation and maintenance personnel so as to analyze the early warning information:
collecting all data deviations with the change rate exceeding a preset threshold value;
classifying the super-threshold data deviation according to types to obtain abnormal data points of each category;
Performing root cause analysis on abnormal data points of each category to acquire the cause of data deviation;
And according to the analysis result, periodically evaluating and adjusting the early warning rule, and optimizing the real-time monitoring platform.
The invention provides a new energy station monitoring data quality evaluation system based on multi-source data, which comprises a data acquisition and preprocessing module, a data deviation recognition model training module, an integrated learning network and deviation recognition module, a data deviation abnormal score and quality evaluation module and a real-time monitoring and early warning response module;
The data acquisition and preprocessing module is used for acquiring multi-source monitoring data acquired by each monitoring device in the new energy station and preprocessing the multi-source monitoring data;
The data deviation recognition model training module is used for carrying out data deviation recognition model training on the preprocessed multi-source monitoring data by utilizing a Baum-Welch algorithm based on the preprocessed multi-source monitoring data;
the integrated learning network and deviation recognition module is used for integrating the learning network and the data deviation recognition model, constructing an observation sequence, converting the predicted hidden state sequence into state label data and recognizing the data deviation among all monitoring data sources;
the data deviation abnormal score and quality evaluation module is used for calculating the data deviation abnormal score, setting data quality grades for each data deviation point and evaluating the influence of the data deviation on the operation performance of the new energy station;
The real-time monitoring and early warning response module is used for establishing a real-time monitoring platform, tracking the change trend of the data deviation, setting a change rate threshold value and carrying out early warning on the data deviation exceeding the change rate threshold value.
In a third aspect, an embodiment of the present invention provides a computer device, including a memory and a processor, where the memory stores a computer program, where the computer program when executed by the processor implements any step of the new energy station monitoring data quality evaluation method based on multi-source data according to the first aspect of the present invention.
In a fourth aspect, an embodiment of the present invention provides a computer readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements any step of the new energy station monitoring data quality evaluation method based on multi-source data according to the first aspect of the present invention.
The method has the beneficial effects that the accuracy and the reliability of the monitoring data of the new energy station are improved, the predictability and the coping capacity of potential risks are greatly enhanced through a real-time monitoring and early warning mechanism, and finally the stable operation and the optimized management of the new energy station are ensured.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a new energy station monitoring data quality evaluation method based on multi-source data in embodiment 1.
Fig. 2 is a training flowchart of the data deviation recognition model in embodiment 1.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.
Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic can be included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
Embodiment 1, referring to fig. 1 and 2, provides a new energy station monitoring data quality evaluation method based on multi-source data, which comprises the following steps:
S1, acquiring multi-source monitoring data acquired by each monitoring device in the new energy station, and preprocessing the multi-source monitoring data.
Listing all monitoring devices in the new energy station, and definitely determining the type and index of data to be acquired;
installing monitoring equipment, ensuring that the position and the installation method of the equipment meet the standards, debugging the equipment, ensuring the normal operation of the equipment and accurately acquiring data;
And transmitting the acquired data to a cloud server through a network, and cleaning, integrating, characteristic engineering and normalizing the original data.
It is to be explained that all monitoring devices in the new energy station, including sensors, meters and the like, are listed, the data types and indexes, such as temperature, humidity, wind speed, generating capacity, device states and the like, which are required to be acquired clearly are installed, the positions and the installation methods of the devices are ensured to meet the standards, the device debugging is carried out, the normal operation of the devices is ensured, and the data can be acquired accurately.
S2, based on the multi-source monitoring data, training a data deviation recognition model of the preprocessed multi-source monitoring data by utilizing a Baum-Welch algorithm.
Carrying out normalization processing on the multi-source monitoring data, and setting parameters of a data deviation recognition model based on the normalization processing result;
It should be explained that, collecting all multi-source monitoring data, ensuring that the data is covered comprehensively, including all parameters to be monitored, sorting the data of different sources, ensuring that the time stamps are aligned, enabling the data to have a consistent time dimension, selecting a proper normalization method, such as minimum-maximum normalization (Min-Max Normalization) or Z-score normalization, performing normalization processing on each monitoring parameter, enabling the data value to fall within the same range, such as [0,1] or standard normal distribution, checking the normalized data, and ensuring that the normalized data is error-free and meets expectations.
In the application, the data deviation recognition model is a hidden Markov model, wherein the hidden Markov model (Hidden Markov Model, HMM) is a statistical model and is used for describing an observation sequence generated by a hidden state sequence, and the hidden Markov model is commonly used for modeling and analyzing time sequence data.
Initializing an initial state probability vector, a state transition probability matrix and an observation probability matrix to construct a hidden Markov model;
The Baum-Welch algorithm is applied to calculate the forward probability at the initial moment, and the forward probability at each moment is calculated through forward recursion to calculate the total probability of the observation sequence;
setting the backward probability of the final moment, and calculating the backward probability of each moment through backward recursion;
calculating a state occupancy probability based on the forward probability and the backward probability;
calculating a state transition probability based on the forward probability and the backward probability;
The formula for calculating the state occupancy probability based on the forward probability and the backward probability is as follows:
the formula for calculating the state transition probability based on the forward probability and the backward probability is:
Wherein γt (i) represents a state occupancy probability of the state i at the time t, Ωt (i) represents a forward probability of the state i at the time t, BETAt (i) represents a backward probability of the state i at the time t, W (o|λ) represents a total probability of occurrence of the observation sequence O at the given model parameter λ, Et (i, j) represents a state transition probability of the state i to the state j at the time t, aij represents a probability of the state i to the state j, bj(Ot+1) represents a probability of generating the observation data Ot+1 at the state j, Ot+1 represents data at the time t+1, βt+1 (i) represents a backward probability of the state j at the time t+1;
And updating the initial state probability vector, the state transition probability matrix and the observation probability matrix based on the state occupancy probability and the state transition probability.
It is to be explained that the initial state probability vector, the state transition probability matrix and the observation probability matrix are all parameters of the data deviation recognition model;
wherein the initial state probability vector represents the probability that the data bias identification model was initially in each hidden state;
the state transition probability matrix represents the probability of transitioning from one hidden state to another hidden state;
The observation probability matrix represents the probability of generating a certain observation under a certain hidden state.
The forward algorithm is a dynamic programming algorithm, and is used for calculating the probability of an observation sequence under the given Hidden Markov Model (HMM) parameter, and the algorithm propagates the state probability of the previous moment to the next moment through recursive calculation in each step, and updates the probability by combining the observation values of the current moment, wherein the probability is defined as being in a state i at a time t and the probability of observing the previous t observation values is the forward probability.
The backward algorithm is also a dynamic programming algorithm for calculating the probability of being in a certain state from a certain moment and generating a subsequent observation under the condition of a given observation sequence, defined as the probability of being in state i at time T and observed from time T to the final moment T.
S3, integrating the learning network and the data deviation recognition model, constructing an observation sequence, converting the predicted hidden state sequence into state label data, and recognizing the data deviation among all monitoring data sources.
Based on the initial state probability vector, the state transition probability matrix, and the observation probability matrix;
the observed data of all data sources are aligned according to time and fused into a multidimensional time sequence;
inputting the fused multidimensional time sequence as an observation sequence into a trained data deviation recognition model, and constructing the observation sequence;
Initializing a path probability and a path record matrix, and generating an initialized path probability matrix and a path record matrix;
repeatedly executing the method, respectively calculating the path probability of each moment and recording the optimal precursor state of each state while calculating the path probability, and updating the path probability matrix and the path record matrix until the optimal precursor state of each state is reached;
The calculation formula of the path probability at each moment is as follows:
δt+1(i)=max[δt(i)·aij]bjOt+1;
Wherein δt+1 (i) represents the optimal path probability at time t+1 at the ith data, and δt (i) represents the optimal path probability at time t at the ith data;
It should be explained that, determining the path probability of each possible state in the initial state, creating a matrix to store these initial probabilities based on the initial state probability and the probability of the first observation value, each row of the matrix representing a state and each column representing a time point, simultaneously creating a matrix to record the precursor state reaching each state, since this is the initial time point, each item of the precursor state matrix may be set to 0, indicating no precursor state, gradually calculating the path probability of each time point from time point t=2, considering the path probabilities of transitioning from all possible precursor states to the current state for each time point, comparing all possible paths by comparing all paths transitioning from each precursor state to the current state in order to calculate the optimal path probability of reaching each state at the current time point, selecting the path with the maximum path probability as the optimal path, and storing this optimal path probability in the path probability matrix corresponding to the current time point and the position of the state.
The method comprises the specific operations of storing an index of a precursor state with the highest path probability in the process of determining an optimal path, namely, transferring the precursor state from the precursor state to the current state at the current moment, storing the index of the precursor state in a path record matrix corresponding to the current time point and the position of the state, continuously updating the path record matrix, recording the optimal precursor state for each state at each time point, storing precursor state information of each step on the whole path in the path record matrix, continuously executing the previous steps, performing recursive calculation and recording on each time point and the state, calculating the optimal path probability of the current time point through the previous state probability and the transfer probability at each time point, simultaneously recording the optimal precursor state of each state, continuously updating the path probability matrix, enabling the optimal path probability of the current state to be reflected at each time point, simultaneously updating the path record matrix, enabling the optimal precursor state of each state at each time point to be recorded, and guaranteeing that the optimal path can be completely traced back. Based on the optimal precursor state, calculating optimal path probability and optimal state at the final moment, extracting the optimal path probability from the final moment, simultaneously determining the optimal state at the final moment, and initializing an optimal hidden state sequence;
It should be explained that, the observation data of all the monitoring devices are collected, each data source is ensured to have an accurate time stamp, the interpolation method is adopted to time align the data at different moments, and each time point is ensured to have complete multidimensional data.
The aligned data form is :X={(t1,x11,x12,…,x1n),(t2,x21,x22,…,x2n),…,(tT,xT1,xT2,…,xTn)},, where ti represents a timestamp, xij represents the j-th dimension observation at the i-th time point;
fusing the aligned multi-source data into a multi-dimensional time sequence, o= { O1,o2,…,oT }, wherein Oi=(xi1,xi2,…,xin) represents a multi-dimensional observation vector of the ith time point;
Inputting the fused multidimensional time sequence O into a trained data deviation recognition model, ensuring that the model can process multidimensional observation vectors, calculating the probability of each observation vector according to an observation probability matrix B of the model, initializing a path probability matrix delta, and representing the optimal path probability of each state reaching each moment;
δ1(i)=πibi(o1),πi represents the initial state probability, bi(o1) is the probability that o1 is observed at the i-th point in time for the state.
And initializing a path record matrix psi for recording the optimal precursor state of each state at each moment.
Based on the initialized path probability and the path record matrix, the path probability at each moment is calculated by recursion based on an observation sequence by utilizing a Viterbi algorithm, the optimal precursor state reaching each state is recorded, the optimal path probability and the optimal state at the final moment are calculated based on the optimal precursor state, and the optimal hidden state sequence is obtained by backtracking the recorded optimal precursor state from the optimal state at the final moment.
The viterbi algorithm (Viterbi Algorithm) is a dynamic programming algorithm used to find the most likely hidden state sequence of a given observation sequence in a Hidden Markov Model (HMM), and is widely used in the fields of speech recognition, bioinformatics, communication signal processing, etc.
The optimal state of each moment is determined by utilizing the path record matrix and tracing back to the moment from the moment;
Constructing an optimal hidden state sequence according to the optimal state at each moment obtained by backtracking, and converting the predicted hidden state sequence into state label data;
By comparing the difference between the observed data and the state label data of each monitoring data, the data deviation between each monitoring data source is accurately identified.
It should be explained that at the final moment, all values in the path probability matrix are checked to find the largest value, which means the optimal path probability of the whole observation sequence from the beginning to the final moment, after determining this optimal path probability, the states corresponding to the maximum path probability are recorded for subsequent analysis, this state is the optimal state at the final moment, this optimal state is recorded to initialize the optimal hidden state sequence, an array or list is created for storing the optimal hidden state sequence, the optimal state at the final moment is used as the last element of the sequence, starting from the known optimal state at the final moment, the optimal precursor state at each time point is found by gradually tracing back through the path record matrix, searching the precursor state of the current optimal state, storing the optimal precursor state of each state at each moment in a path record matrix, reading the optimal state at the current moment from the path record matrix, recording the optimal state in an optimal hidden state sequence, inserting the found optimal state into the corresponding position of the optimal hidden state sequence at each time point, continuously backtracking until the first moment is returned, finishing the optimal hidden state sequence, gradually backtracking until the optimal state at each time point is determined, finally constructing a complete optimal hidden state sequence, wherein the optimal hidden state sequence comprises the optimal hidden state of each step from the initial moment to the final moment, checking and verifying the optimal hidden state sequence, and ensuring that no time point is missed in the backtracking process.
Each hidden state can be corresponding to a specific label, such as normal, abnormal and the like, a corresponding label is allocated to the hidden state of each time point to form a complete state label data sequence, the observed data and the state label data are compared at each time point, the difference between the observed data and the state label data is checked, whether the monitored data sources are inconsistent or deviated or not can be found through the comparison, the difference between the observed data and the state label data is analyzed, and possible data deviation is identified.
S4, integrating the learning network and the data deviation recognition model.
The method comprises the steps that a log recording module is used for collecting a prediction result output by a model every minute and comparing the prediction result with prediction deviation, real deviation and corresponding timestamp data;
Classifying and marking the collected data according to a preset deviation type and severity standard by using an automatic script, and distinguishing true positives, false positives, true negatives and false negatives to form a marked feedback data set;
Setting a retraining day, and triggering the model to retrain on the monthly day;
for the hidden Markov model, re-estimating a state transition probability matrix and an observation probability matrix by using a Baum-Welch algorithm;
For an isolated forest model, the number of trees and the size of sub-samples are adjusted, so that the model is ensured to adapt to new data characteristics;
Adopting random gradient descent as an online learning algorithm, randomly extracting small batch data from the latest feedback data set each time, updating model parameters, and adopting the following formula:
the formula of the loss function is as follows;
wherein P (y|x; θ) represents the probability that the model predicts class y under parameter θ, N represents the total number of samples, θ represents the model parameter, η represents the learning rate, L (θt;xt,yt) represents the loss function, xt and yt represent the characteristics and labels of the current sample, and xi and yi represent the ith input characteristic and output label, respectively;
Randomly initializing a model parameter theta0, inputting samples in a data set in a streaming mode, acquiring one small batch of data each time, calculating the gradient of the model parameter about a loss function for each small batch of data, updating the model parameter according to the calculated gradient and a preset learning rate, and continuously iterating until the data stream reaches a preset iteration number;
Adopting a deep Q network algorithm, setting a reward function, giving forward rewards r+ for each correct early warning, and giving r- for each incorrect early warning, wherein the reward design follows the following principles:
R+=loge (1+V) for each correct pre-warning;
R-=-loge (1+RL) for each error warning;
Wherein V and RL represent response speed and response delay of the system under the condition of correct and error early warning respectively;
Setting a change rate threshold delta, and when the change rate deltanew > delta of the monitoring data, judging that an abnormal condition possibly exists by the system, and triggering early warning according to the formula:
Setting an abnormal score threshold value s, and triggering early warning when the abnormal score snew of a certain monitoring point is more than s;
Deltanew and snew represent a new change rate threshold and an abnormality score threshold, respectively, α and β are false positive rate and false negative rate, respectively, and T is average response time;
The method comprises the steps of determining main parameters of an isolated forest model, namely the number of trees (namely the number of isolated trees), the size of sub-samples (namely the number of data samples used for training by each tree), the maximum feature number and the like, wherein the parameters influence the performance and detection precision of the model, the parameters can be adjusted according to actual requirements and data characteristics, for example, the higher the number of trees is, the higher the stability of the model is, the size of the sub-samples influences the training speed and precision of the model, a test data set is used as training data, the data set is ensured to be preprocessed, such as the removal of a missing value and normalization processing, so that the effectiveness of model training is ensured, the data set comprises various data deviation conditions, the model can learn normal and abnormal modes of data, the set parameters and the prepared training data are input into the isolated forest model for training, the model learns the data distribution conditions by constructing a plurality of isolated trees, each isolated tree forms a small number of partitioning rules by randomly partitioning the data samples, the isolated tree can isolate the abnormal points through recursive partitioning, the isolated points can be used for the training data points to be used as the training data points, the abnormal points can be conveniently preprocessed according to the training data points, the normalized score is set to a unified score of the error score is set to be more than the threshold value 1, and the abnormal data points can be conveniently normalized to a score is better, and the abnormal point is better than the abnormal point is conveniently predicted, and the abnormal point is better than the score is better than a score normalized to be normalized according to the abnormal point score is better, and the score is better normalized to a score is better normalized.
A common method includes selecting a fixed score value as the threshold value or setting a percentile (such as marking the point with the top 5% of the score as abnormal) according to the distribution condition of the data, and marking all the data points with the score higher than the set threshold value in the test data set as abnormal data points.
Setting a working day of each month as an evaluation day, improving the early warning effect and the operation and maintenance efficiency based on the past month, and executing strategy evaluation, wherein the data deviation recognition accuracy ACC formula is as follows:
TP is the real number of cases, represents the number of anomalies correctly recognized by the system, TN is the true number of cases, represents the number of data correctly judged as normal by the system, FP is the false number of cases, represents the number of normal data incorrectly judged as anomalies by the system, FN is the false number of cases, represents the number of actual anomalies that the system fails to recognize;
The early warning response time RT formula is:
Wherein, Tr,i represents the response time point of the ith abnormal event by the system, Tr represents the response time, Tt,i represents the actual occurrence time of the ith abnormal event, Tt represents the actual occurrence time of the abnormal event, and N represents the total number of the abnormal events in the monitoring period;
The operation and maintenance efficiency improvement percentage E is expressed as follows:
wherein OT is old operation and maintenance time, representing manual operation and maintenance time required for solving the same scale problem before introducing an automatic early warning system, NT is new operation and maintenance time, representing actual time consumption for processing the same problem after system intervention;
the CPI equation reflecting overall performance is:
wherein, RTmax represents the maximum acceptable early warning response time, alpha represents the regulating factor, and w1、w2 and w3 respectively represent the weight coefficient ratio of accuracy, response time and operation and maintenance efficiency improvement percentage.
S5, calculating abnormal scores of the data deviations, setting data quality grades for all the data deviation points, and evaluating the influence of the data deviations on the operation performance of the new energy station.
Carrying out standardization processing on each data deviation, and forming a test data set from the standardized data deviations;
Constructing an anomaly monitoring model, setting parameters of the anomaly monitoring model, training the anomaly monitoring model by using a test data set as training data, predicting anomaly scores of data deviations in the test data set by using the trained anomaly monitoring model, traversing the data deviations in the test data set, inputting the data deviations into an isolated tree of the anomaly monitoring model, starting from a root node, traversing downwards according to splitting conditions of the isolated tree until leaf nodes are reached, recording traversal path lengths of each data point in each tree, calculating average values of path lengths of the data deviations in all the isolated trees with respect to the data deviations, taking the average values as final anomaly scores of current data deviations, and normalizing the anomaly scores to a preset range;
the calculation formula of the anomaly score is as follows;
wherein S (x) represents an anomaly score for the data bias x,Representing the average path length of the data deviation x in all the isolated trees, C (n) representing the adjustment coefficient, T representing the total number of trees in the isolated forest, n representing any positive integer;
It is to be explained that each data point in the test data set is traversed to ensure that each data point is evaluated by an isolated forest model, each data point in the data set is checked, anomaly detection of each point is sequentially processed, the current data point is input into each isolated tree of the trained isolated forest model, each isolated tree is constructed by randomly selecting characteristics and randomly selecting splitting points and is used for isolating abnormal points to the maximum extent, the isolated tree is traversed downwards according to the splitting conditions of the isolated tree from a root node according to the characteristic value of the current data point, and whether the isolated tree enters a left sub-node or a right sub-node is determined at each node according to the comparison result of the characteristic value and the splitting points until a leaf node is reached.
In each isolated tree, the traversing path length of the current data point from the root node to the leaf node is recorded, the path length represents the required splitting times of the isolated tree for isolating the data point, the shorter the path is, the easier the data point is isolated, the higher the possibility is, the abnormal point is, the traversing path lengths of the current data point in all the isolated trees are collected so as to calculate the average path length later, the path length of each data point in all the isolated trees is averaged, the average path length reflects the isolation difficulty of the data point in an isolated forest model, the shorter the path length is, the easier the data point is isolated, the higher the abnormal score is, the average path length is converted into the abnormal score, the isolated forest model generally carries out normalization processing on the shorter average path length corresponding to the higher abnormal score, all the scores are ensured to be in a preset range, for example, 0 to 1 is favorable for unifying the standards of the abnormal score, and the subsequent threshold setting and abnormal point identification are facilitated.
Setting a data quality level based on the anomaly score and scoring the anomaly data points;
The quality level may be set according to the score range. For example, the following levels may be set:
High-quality data, wherein the abnormal score is in the range of 0-a;
Good data, anomaly score in the a-b range;
Moderate data, anomaly score in the b-c range;
Low-prime data, anomaly score in the c-d range;
abnormal data, wherein the abnormal score is in the range of d-e;
where a=0.2, b=0.4, c=0.6, d=0.8, e=1.0;
each data point is assigned a corresponding quality level based on its anomaly score.
Each outlier data point is assigned a score, which may be set according to the magnitude of the outlier score.
For example:
data points with anomaly scores in the d-E range are scored as E' (highest score);
data points with anomaly scores in the c-D range are scored as D';
Data points with anomaly scores in the b-C range are scored as C';
data points with anomaly scores in the a-B range are scored as B';
data points with anomaly scores in the range of 0-a are scored as a' (lowest score);
Wherein a ' =1 score, B ' =2 score, C ' =3 score, D ' =4 score, E ' =5 score;
and evaluating the influence of the data deviation on the operation performance of the new energy station according to the scoring of the abnormal data points.
Summarizing scores of all abnormal data points, calculating the data point proportion and distribution condition of each quality grade, counting the number of data points of each grade and the proportion of the data points in the total data, analyzing the overall condition of data quality, and evaluating the influence of high-score (namely high abnormal score) data points on the station operation performance. High scoring data points often mean that the data bias is large and may adversely affect the operational decisions, predictions, and controls of the station.
Specific analysis may include:
(1) And (3) comparing the key performance indexes (such as generated energy, equipment utilization rate and the like) of the time period containing the high-score data points with the key performance indexes of the normal time period, and identifying the specific influence of the abnormal data on the performance.
(2) And (3) adjusting the operation strategy, namely adjusting the operation strategy and the control method of the station based on the distribution and the grading of the abnormal data points, such as removing high-grading data points or increasing the data correction strength.
(3) And (3) evaluating the prediction accuracy, namely checking the influence of historical data containing abnormal data on a prediction model of future operation, and evaluating the accuracy and reliability of the prediction model.
And according to the evaluation result, corresponding data management and improvement measures are formulated, and the influence of data deviation on the operation performance of the new energy station is reduced.
The improvement may include:
(1) And (3) data cleaning and preprocessing, namely enhancing data cleaning and preprocessing work, and correcting or eliminating abnormal data points in time.
(2) Optimizing the monitoring equipment, optimizing the arrangement of the monitoring equipment and the sensor, and improving the accuracy and reliability of data acquisition.
(3) And data fusion and correction, namely reducing the influence of the deviation of a single data source on the quality of the whole data by utilizing a multi-source data fusion and correction technology.
And S6, based on the data quality grade, a real-time monitoring platform is established, the change trend of the data deviation is tracked, a change rate threshold is set, and early warning is carried out on the data deviation exceeding the change rate threshold.
Based on the data quality level, establishing a real-time monitoring platform;
Performing differential operation on the data deviation indexes to obtain differential values, performing stability test on the differential values, and taking the deviation values after the stability test as the change rate of the data deviation;
setting a change rate threshold of data deviation as mu0, wherein a deviation change rate mu formula is as follows:
μ=k〃σ;
wherein sigma is the standard deviation of data, and k is the safety factor;
introducing an objective function G (k), and evaluating early warning effects under different k values to obtain an optimal safety coefficient k, wherein the formula is as follows:
wherein FPR (k) represents false alarm rate, FNR (k) represents false alarm rate, and w4 and w5 represent weight coefficients of false alarm rate and false alarm rate respectively;
The formula of the optimal safety coefficient k is:
when mu > mu0, triggering early warning, and transmitting early warning information to operation and maintenance personnel so as to analyze the early warning information:
collecting all data deviations with the change rate exceeding a preset threshold value;
classifying the super-threshold data deviation according to types to obtain abnormal data points of each category;
Performing root cause analysis on abnormal data points of each category to acquire the cause of data deviation;
And according to the analysis result, periodically evaluating and adjusting the early warning rule, and optimizing the real-time monitoring platform.
It should be explained that a platform capable of collecting, processing and displaying data in real time is constructed by selecting a proper technical framework and a tool, the common tool comprises Kafka, influxDB, grafana and the like, the real-time monitoring platform needs to have the capabilities of data collection, storage, analysis and visualization so as to monitor the change and deviation of the data at any time, various indexes in the real-time data are collected, data deviation values are calculated, the deviation values represent the differences between actual observed data and expected data, the difference operation is carried out on the collected data deviation values, the difference is a method for calculating the difference between the data values of adjacent time points, and the trend and seasonal components in the time sequence data can be eliminated through the difference, so that the data is more stable.
And a difference operation step of calculating the difference between the deviation value of each time point and the deviation value of the previous time point, namely, the difference value=the current time point deviation value-the previous time point deviation value.
The stationarity test is to ensure that the data after differentiation is stable, i.e. the statistical properties (such as mean and variance) of the data are unchanged with time, so that the stationary data is more suitable for time sequence analysis and prediction, and the common stationarity test method comprises ADF (Augmented diode-Fuller) test and KPSS
(Kwiatkowski-Phillips-Schmidt-Shin) test.
ADF test, namely, testing that the original hypothesis is that the data has a unit root (namely, the data is not stable), and rejecting the original hypothesis if the test result is obvious, and indicating that the data is stable.
KPSS checking that the original assumption is stable, and if the checking result is obvious, rejecting the original assumption to indicate that the data is not stable.
And carrying out the stability test on the data deviation value after the difference, if the data passes the stability test, considering that the data deviation value is stable, regarding the difference value passing the stability test as the change rate of the data deviation, reflecting the change condition of the data deviation along with time, and storing the calculated change rate of the data deviation into a real-time monitoring platform for subsequent analysis and early warning.
According to historical data and business requirements, a reasonable data deviation change rate threshold is set, and the change rate threshold can be a single fixed value or can be dynamically adjusted according to different time periods and conditions.
For example, a change rate threshold is set, and when the change rate of the data deviation exceeds the change rate threshold, the data deviation is considered to be abnormal in change, and attention is required;
Defining a specific early warning rule, triggering early warning when the change rate of data deviation exceeds a set threshold, wherein the early warning rule can comprise various forms such as short messages, mail notices and the like, and definitely determining the triggering conditions of early warning, such as that the change rate of the data deviation exceeds the threshold at a plurality of continuous time points or the change rate of a single time point is abnormally high, realizing an early warning function in a real-time monitoring platform, monitoring the change rate of the data deviation in real time, triggering an early warning mechanism when the change rate exceeds a preset threshold, and timely informing related personnel to take corresponding measures through the early warning mechanism so as to avoid great influence of the data deviation on service.
Collecting all data deviation points with the change rate exceeding a preset threshold value from a real-time monitoring platform, recording and storing the data deviation points for later analysis, setting classification standards according to the characteristics (such as time period, equipment type, geographical position and the like) of the data deviation, classifying the collected super-threshold data deviation according to the preset classification standards to obtain abnormal data points of each category, sorting and recording the abnormal data points, carrying out various analysis tools and methods such as Fault Tree Analysis (FTA), fish bone graph (Ishikawa graph), causal analysis and the like, carrying out detailed tracing on the abnormal data points of each category according to the classification result, searching specific causes causing the data deviation, summarizing and summarizing root causes of the data deviation of each category which can comprise equipment faults, data transmission problems, environmental factors and the like, recording the result of root cause analysis to form a detailed analysis report, evaluating the effectiveness of the current early warning rule according to the result of root cause analysis, judging whether the early warning rule can timely and accurately identify important abnormal data points, collecting feedback of related personnel to the early warning rule, knowing the effect and deficiency of the early warning rule in practical application, adjusting the early warning threshold according to the analysis result of data deviation and change rate, enabling the early warning to be more accurate and timely, updating and perfecting the early warning rule according to the newly discovered root cause, for example, aiming at the abnormal data points of a specific type, formulating a more refined early warning strategy, regularly checking and evaluating the early warning rule and the monitoring platform to ensure that the early warning rule and the monitoring platform are always in the optimal state, continuously improving the early warning rule and the monitoring platform according to the practical operation condition and the analysis result, improving the performance and reliability.
The embodiment also provides a new energy station monitoring data quality evaluation system based on multi-source data, which comprises a data acquisition and preprocessing module, a data deviation recognition model training module, an integrated learning network and deviation recognition module, a data deviation abnormal score and quality evaluation module and a real-time monitoring and early warning response module, wherein the data acquisition and preprocessing module is used for acquiring the multi-source monitoring data acquired by each monitoring device in the new energy station, preprocessing the multi-source monitoring data, the data deviation recognition model training module is used for carrying out data deviation recognition model training on the preprocessed multi-source monitoring data by using a Baum-Welch algorithm based on the preprocessed multi-source monitoring data, integrating the learning network and the data deviation recognition module, constructing an observation sequence, converting the predicted hidden state sequence into state label data, recognizing the data deviation between each monitoring data source, the data deviation abnormal score and the quality evaluation module is used for calculating the data deviation abnormal score, setting the data quality grade for each data deviation point, evaluating the influence of the data deviation on the running performance of the new energy station, carrying out real-time monitoring and early warning response module is used for setting the data deviation change rate exceeding a threshold value change trend, and setting the real-time change trend of the monitoring platform.
The embodiment also provides computer equipment, which is suitable for the condition of the new energy station monitoring data quality evaluation method based on the multi-source data, and comprises a memory and a processor, wherein the memory is used for storing computer executable instructions, and the processor is used for executing the computer executable instructions to realize the new energy station monitoring data quality evaluation method based on the multi-source data, which is provided by the embodiment.
The computer device may be a terminal comprising a processor, a memory, a communication interface, a display screen and input means connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
The present embodiment also provides a storage medium having a computer program stored thereon, which when executed by a processor implements the new energy station monitoring data quality evaluation method based on multi-source data as proposed in the above embodiments, and the storage medium may be implemented by any type of volatile or non-volatile storage device or combination thereof, such as a static random access Memory (Static Random Access Memory, SRAM), an electrically erasable Programmable Read-Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, EEPROM), an erasable Programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), a Programmable Read-Only Memory (PROM), a Read-Only Memory (ROM), a magnetic Memory, a flash Memory, a magnetic disk, or an optical disk.
In summary, the invention realizes the monitoring and understanding of the user behavior by analyzing the user file operation behavior, can accurately capture the operation habit and behavior pattern of the user, provides basic data for the subsequent encryption strategy generation, is also beneficial to the detection and prevention of the abnormal behavior by the system, predicts the future operation of the user according to the behavior habit of the user, thereby more intelligently formulating the encryption strategy, more accurately protecting the user privacy, reducing the risk of data leakage, improving the data security and confidentiality of the system, better adapting to the operation habit and requirement of the user, improving the intelligent level and user experience of the encryption algorithm, and enhancing the flexibility and adaptability of the system.
Example 2 referring to table 1, for the second example of the present invention, experimental simulation data of a new energy station monitoring data quality evaluation method based on multi-source data is provided for further verifying the advancement of the present invention.
A typical wind farm was chosen as the subject. The wind power plant is provided with an advanced sensor network, comprises multi-source monitoring equipment such as wind speed, temperature, humidity and equipment vibration, and has 200 monitoring points in total, and 5 monitoring points are randomly selected from the 200 monitoring points to carry out experiments. The experimental design compares the traditional monitoring method with the method provided by the invention, and the performance difference in the aspects of data preprocessing, deviation recognition, anomaly detection, real-time monitoring and the like is achieved.
Firstly, raw data of all monitoring points are continuously collected, data cleaning is carried out, missing values and obvious abnormal values are removed, the integrity of the data is guaranteed, the data is standardized, and the monitoring data of different types are comparable.
And then training the preprocessed data by using a Baum-Welch algorithm to construct a data deviation recognition model. The model can automatically identify potential deviations between the monitored data sources.
And secondly, combining the trained model with an integrated learning network, and identifying the data deviation among all the monitoring data sources. And accurately identifying the data deviation by comparing the observed data with the model prediction result.
Then, an anomaly score for the data bias was calculated and the data quality was classified into four classes, excellent, good, general, poor, according to the score. This helps to assess the impact of data bias on wind farm performance.
And finally, establishing a real-time monitoring platform, tracking the change trend of the data deviation, and setting a change rate threshold. When the data deviation exceeds a threshold value, the system automatically triggers early warning to inform operation and maintenance personnel of timely response.
The specific examples are shown in Table 1:
Table 1 table of experimental records
The effectiveness and superiority of the method of the invention are evident from the data before and after the comparison of the examples, and from the table above, the data deviation abnormality score of the monitoring point 001 is 0.12, and is rated as an "excellent" data quality grade, which means that the influence of the deviation of the data source on the running performance of the wind farm is very small and is only 0.03. In contrast, the score of monitoring point 004 was as high as 0.49, and was rated as "poor", the impact index on the running performance was 0.22, significantly higher than other monitoring points.
The invention also has obvious advantages in the aspect of data deviation recognition accuracy through the proposed system. For example, the accuracy of deviation identification of monitoring point 003 is 88%, whereas the conventional method is only 78%. The method and the device can not only accurately identify the data deviation, but also effectively evaluate the influence of the deviation on the running performance, and remarkably improve the running and maintenance efficiency and the safety of the wind power plant.
In summary, the innovation and the practicability of the new energy station monitoring data quality evaluation system based on the multi-source data are fully proved by comparing experimental data. The system shows performance superior to the traditional method in the aspects of data preprocessing, deviation recognition, anomaly detection, real-time monitoring and the like, and provides a more reliable and efficient monitoring solution for new energy stations.
It should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present invention may be modified or substituted without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered in the scope of the claims of the present invention.

Claims (10)

Translated fromChinese
1.一种基于多源数据的新能源场站监测数据质量评价方法,其特征在于,包括:1. A method for evaluating the quality of monitoring data of new energy stations based on multi-source data, characterized by comprising:获取新能源场站中各监测设备采集的多源监测数据,对多源监测数据进行预处理;Obtain multi-source monitoring data collected by various monitoring devices in new energy stations and pre-process the multi-source monitoring data;基于预处理后的多源监测数据,利用Baum-Welch算法对预处理后的多源监测数据进行数据偏差识别模型训练;Based on the preprocessed multi-source monitoring data, the Baum-Welch algorithm is used to train the data deviation recognition model for the preprocessed multi-source monitoring data;集成学习网络和数据偏差识别模型,构建观测序列,并将预测的隐状态序列转换为状态标签数据,识别各监测数据源之间的数据偏差;Integrate learning networks and data deviation identification models to construct observation sequences, convert predicted latent state sequences into state label data, and identify data deviations between various monitoring data sources;计算数据偏差异常得分,为各数据偏差点设定数据质量等级,评估数据偏差对新能源场站运行性能的影响;Calculate data deviation anomaly scores, set data quality levels for each data deviation point, and evaluate the impact of data deviation on the operating performance of new energy stations;基于数据质量等级,建立实时监控平台,追踪数据偏差的变化趋势,并设定变化率阈值,对超过变化率阈值的数据偏差进行预警。Based on the data quality level, a real-time monitoring platform is established to track the changing trend of data deviations, and a change rate threshold is set to issue an early warning for data deviations that exceed the change rate threshold.2.如权利要求1所述的基于多源数据的新能源场站监测数据质量评价方法,其特征在于,所述获取新能源场站中各监测设备采集的多源监测数据,对多源监测数据进行预处理,主要步骤为:2. The method for evaluating the quality of monitoring data of a new energy station based on multi-source data according to claim 1 is characterized in that the multi-source monitoring data collected by each monitoring device in the new energy station is obtained, and the multi-source monitoring data is pre-processed, and the main steps are:列出新能源场站内所有监测设备,明确需要采集的数据类型和指标;List all monitoring equipment in the new energy station and specify the types of data and indicators that need to be collected;安装监测设备,确保设备位置和安装方法符合标准,进行设备调试,确保设备正常运行并能够准确采集数据;Install monitoring equipment, ensure that the equipment location and installation method meet the standards, and debug the equipment to ensure that the equipment operates normally and can accurately collect data;将采集到的数据通过网络传输到云端服务器,对原始数据进行预处理。The collected data is transmitted to the cloud server through the network to pre-process the original data.3.如权利要求2所述的基于多源数据的新能源场站监测数据质量评价方法,其特征在于,所述基于预处理后的多源监测数据,利用Baum-Welch算法对预处理后的多源监测数据进行数据偏差识别模型训练,主要步骤为:3. The method for evaluating the quality of monitoring data of new energy stations based on multi-source data according to claim 2 is characterized in that the pre-processed multi-source monitoring data is used to train the data deviation identification model using the Baum-Welch algorithm, and the main steps are:对多源监测数据进行归一化处理,基于归一化处理的结果,设定数据偏差识别模型的参数;Normalize the multi-source monitoring data and set the parameters of the data deviation identification model based on the normalization results;初始化初始状态概率向量、状态转移概率矩阵及观测概率矩阵,构建隐马尔可夫模型;Initialize the initial state probability vector, state transition probability matrix and observation probability matrix to construct a hidden Markov model;应用Baum-Welch算法计算初始时刻的前向概率,通过前向递归计算每个时刻的前向概率,计算观测序列的总概率;The Baum-Welch algorithm is used to calculate the forward probability at the initial moment, and the forward probability at each moment is calculated through forward recursion to calculate the total probability of the observation sequence;设置最终时刻的后向概率,通过后向递归计算每个时刻的后向概率;Set the backward probability at the final moment, and calculate the backward probability at each moment through backward recursion;基于向前概率与后向概率计算状态占用概率;Calculate state occupancy probability based on forward probability and backward probability;基于向前概率与后向概率计算状态转移概率;Calculate the state transition probability based on the forward probability and the backward probability;其中,所述基于向前概率与后向概率计算状态占用概率的公式为:The formula for calculating the state occupancy probability based on the forward probability and the backward probability is:所述基于向前概率与后向概率计算状态转移概率的公式为:The formula for calculating the state transition probability based on the forward probability and the backward probability is:其中,γt(i)表示时刻t处于状态i的状态占用概率,Ωt(i)表示时刻t处于状态i时的向前概率,Βt(i)表示时刻t处于状态i时的向后概率,W(O|λ)表示给定模型参数λ时,观测序列O出现的总概率,Et(i,j)表示时刻t从状态i转移到状态j的状态转移概率,aij表示状态i转移到状态j的概率,bj(Ot+1)表示状态j下生成观测数据Ot+1的概率,Ot+1代表t+1时刻的数据,βt+1(i)表示时刻t+1处于状态j时的向后概率;Wherein, γt (i) represents the state occupancy probability of being in state i at time t, Ωt (i) represents the forward probability of being in state i at time t, Βt (i) represents the backward probability of being in state i at time t, W(O|λ) represents the total probability of the observation sequence O occurring when the model parameter λ is given, Et (i,j) represents the state transition probability of transitioning from state i to state j at time t, aij represents the probability of transitioning from state i to state j, bj (Ot+1 ) represents the probability of generating observation data Ot+1 under state j, Ot+1 represents the data at time t+1, and βt+1 (i) represents the backward probability of being in state j at time t+1;基于状态占用概率和状态转移概率更新初始状态概率向量、状态转移概率矩阵和观测概率矩阵。The initial state probability vector, state transition probability matrix and observation probability matrix are updated based on the state occupancy probability and state transition probability.4.如权利要求3所述的基于多源数据的新能源场站监测数据质量评价方法,其特征在于,所述集成学习网络和数据偏差识别模型,构建观测序列,并将预测的隐状态序列转换为状态标签数据,识别各监测数据源之间的数据偏差,主要步骤为:4. The method for evaluating the quality of monitoring data of a new energy station based on multi-source data according to claim 3 is characterized in that the integrated learning network and the data deviation identification model construct an observation sequence, and convert the predicted hidden state sequence into state label data, and identify the data deviation between each monitoring data source, and the main steps are:集成学习网络和数据偏差识别模型;Integrated learning network and data bias identification model;基于初始状态概率向量、状态转移概率矩阵和观测概率矩阵;Based on the initial state probability vector, the state transition probability matrix and the observation probability matrix;将各数据源的观测数据按时间对齐,并融合成多维时间序列;Align the observation data from each data source in time and fuse them into a multidimensional time series;将融合后的多维时间序列作为观测序列输入训练后的数据偏差识别模型,构建观测序列;The fused multi-dimensional time series is input into the trained data deviation recognition model as the observation sequence to construct the observation sequence;初始化路径概率和路径记录矩阵,生成初始化路径概率矩阵和路径记录矩阵;Initialize the path probability and path record matrix, and generate the initialized path probability matrix and path record matrix;重复执行对于每个时刻,分别计算每个时刻的路径概率和在计算路径概率的同时记录每个状态的最优前驱状态,更新路径概率矩阵和路径记录矩阵,直至到达每个状态的最优前驱状态;Repeat the process for each moment, respectively calculate the path probability at each moment and record the optimal predecessor state of each state while calculating the path probability, update the path probability matrix and the path record matrix, until the optimal predecessor state of each state is reached;其中,所述每个时刻的路径概率的计算公式为:The calculation formula of the path probability at each moment is:δt+1(i)=max[δt(i)·aij]bjOt+1δt+1 (i)=max[δt (i)·aij ]bj Ot+1 ;其中,δt+1(i)表示在时刻t+1在第i个数据时的最优路径概率,δt(i)表示在时刻t在第i组数据时的最优路径概率;Wherein, δt+1 (i) represents the optimal path probability for the i-th data at time t+1, and δt (i) represents the optimal path probability for the i-th group of data at time t;基于最优前驱状态,计算最终时刻的最优路径概率和最优状态;Based on the optimal predecessor state, calculate the optimal path probability and optimal state at the final moment;从最终时刻中提取最优路径概率,同时确定最终时刻的最优状态,并初始化最优隐状态序列;Extract the optimal path probability from the final moment, determine the optimal state at the final moment, and initialize the optimal hidden state sequence;利用路径记录矩阵,从时刻向前回溯到时刻,确定每个时刻的最优状态;Using the path record matrix, trace back from time to time to determine the optimal state at each time;根据回溯得到的每个时刻的最优状态,构建最优隐状态序列,将预测的隐状态序列转换为状态标签数据;According to the optimal state at each moment obtained by backtracking, the optimal hidden state sequence is constructed, and the predicted hidden state sequence is converted into state label data;通过对比各监测数据的观测数据与状态标签数据之间的差异,精准识别出各监测数据源之间的数据偏差。By comparing the differences between the observed data and the state label data of each monitoring data, the data deviations between the various monitoring data sources can be accurately identified.5.如权利要求4所述的基于多源数据的新能源场站监测数据质量评价方法,其特征在于,所述集成学习网络和数据偏差识别模型,主要步骤为:5. The method for evaluating the quality of monitoring data of new energy stations based on multi-source data according to claim 4 is characterized in that the integrated learning network and the data deviation identification model mainly comprise the following steps:通过日志记录模块,每分钟采集一次模型输出的预测结果与预测偏差、真实偏差及对应的时间戳数据对比;Through the logging module, the prediction results output by the model are collected once a minute for comparison with the prediction deviation, the actual deviation and the corresponding timestamp data;使用自动化脚本,依据预设的偏差类型和严重程度标准,对收集的数据进行分类和标注,区分真阳性、假阳性、真阴性和假阴性,形成标注的反馈数据集;Use automated scripts to classify and annotate the collected data according to preset deviation types and severity standards, distinguish true positives, false positives, true negatives, and false negatives, and form annotated feedback data sets;设定再训练日,在每月的那天,触发模型再训练;Set a retraining day to trigger model retraining on that day of each month;对于隐马尔科夫模型,使用Baum-Welch算法重新估计状态转移概率矩阵和观测概率矩阵;For the hidden Markov model, the Baum-Welch algorithm is used to re-estimate the state transition probability matrix and the observation probability matrix;对于孤立森林模型,调整树的数量和子样本大小,确保模型适应新数据特性;For the Isolation Forest model, adjust the number of trees and subsample size to ensure that the model adapts to the new data characteristics;采用随机梯度下降作为在线学习算法,每次从最新反馈数据集中随机抽取小批量数据,更新模型参数,公式为:Stochastic gradient descent is used as the online learning algorithm. A small batch of data is randomly extracted from the latest feedback data set each time to update the model parameters. The formula is:损失函数的公式为;The formula for the loss function is;其中,P(y|x;θ)代表模型在参数θ下预测类别为y的概率,N代表样本总数,θ代表模型参数,η代表学习率,L(θt;xt,yt)代表损失函数,xt和yt代表当前样本的特征和标签,xi和yi分别代表第i个输入特征和输出标签;Among them, P(y|x; θ) represents the probability that the model predicts category y under parameter θ, N represents the total number of samples, θ represents the model parameters, η represents the learning rate, L(θt ; xt , yt ) represents the loss function, xt and yt represent the features and labels of the current sample,xi andyi represent the i-th input feature and output label respectively;随机初始化模型参数θ0,数据集中的样本以流的方式输入,每次获取一个小批量数据,对于每一个小批量数据,计算模型参数关于损失函数的梯度,根据计算出的梯度和预设的学习率,更新模型参数,不断迭代,直到数据流达到预定的迭代次数;The model parameters θ0 are randomly initialized. The samples in the data set are input in a stream manner. A small batch of data is obtained each time. For each small batch of data, the gradient of the model parameters with respect to the loss function is calculated. According to the calculated gradient and the preset learning rate, the model parameters are updated and iterated continuously until the data stream reaches the predetermined number of iterations.采用深度Q网络算法,设定奖励函数,对于每一次正确预警,给予正向奖励r+,对每一次的错误预警,则给予r-,奖励设计遵循以下原则:Using the deep Q network algorithm, we set the reward function. For each correct warning, we give a positive reward r+ , and for each wrong warning, we give r- . The reward design follows the following principles:对于每一次正确预警:r+=loge(1+V);For each correct warning: r+ =loge (1+V);对每一次的错误预警:r-=-loge(1+RL);For each false alarm: r- = -loge (1+RL);其中,V和RL分别代表系统在正确和错误预警情况下的响应速度和响应延迟;Among them, V and RL represent the response speed and response delay of the system in the case of correct and false warning, respectively;设定变化率阈值δ,当监控数据的变化率δnew>δ时,系统将判定可能存在异常情况,进而触发预警,公式为:Set the change rate threshold δ. When the change rate δnew of the monitoring data >δ, the system will determine that there may be an abnormal situation and trigger an early warning. The formula is:设定异常得分阈值s,当某监测点的异常得分snew>s时,触发预警;Set an abnormal score threshold s. When the abnormal score snew of a monitoring point >s, trigger an early warning.δnew和snew分别代表新的变化率阈值和异常得分阈值,α和β分别是误报率和漏报率,而T是平均响应时间;δnew and snew represent the new change rate threshold and anomaly score threshold, α and β are the false positive rate and false negative rate, and T is the average response time;设定每个月的一个工作日为评估日,将基于过去一个月的预警效果和运维效率提升,执行策略评估,其中数据偏差识别准确率ACC公式为:One working day of each month is set as the evaluation day. Based on the warning effect and operation and maintenance efficiency improvement of the past month, the strategy evaluation is performed. The data deviation identification accuracy ACC formula is:其中,TP代表系统正确识别出的异常数量,TN代表系统正确判定为正常的数据量,FP代表系统错误地将正常数据判定为异常的数量,FN代表系统未能识别出的实际异常数据数量;Among them, TP represents the number of anomalies correctly identified by the system, TN represents the amount of data correctly judged as normal by the system, FP represents the number of normal data incorrectly judged as anomalies by the system, and FN represents the actual number of abnormal data that the system failed to identify;预警响应时间RT公式为:The formula for early warning response time RT is:其中,Tr,i代表第i个异常事件被系统响应时间点,Tr代表响应时间,Tt,i代表第i个异常事件实际发生的时间点,Tt代表异常事件实际发生时间,N代表监测周期内异常事件总数;Wherein,Tr,i represents the time point when the i-th abnormal event is responded by the system,Tr represents the response time, Tt,i represents the time point when the i-th abnormal event actually occurs, Tt represents the actual occurrence time of the abnormal event, and N represents the total number of abnormal events in the monitoring period;运维效率提升百分比E公式为:The formula for improving the operation and maintenance efficiency percentage E is:其中,OT代表引入自动化预警系统前解决相同规模问题所需的人工运维时间,NT代表系统介入后处理相同问题的实际耗时;Among them, OT represents the manual operation and maintenance time required to solve the same scale of problems before the introduction of the automated early warning system, and NT represents the actual time spent on solving the same problem after the system intervenes;反映整体性能CPI公式为:The CPI formula that reflects overall performance is:其中,RTmax代表最大可接受预警响应时间,α代表调节因子,w1、w2和w3分别代表准确率、响应时间和运维效率提升百分比的权重系数比。Among them, RTmax represents the maximum acceptable warning response time, α represents the adjustment factor, and w1 , w2 , and w3 represent the weight coefficient ratios of accuracy, response time, and operation and maintenance efficiency improvement percentage, respectively.6.如权利要求5所述的基于多源数据的新能源场站监测数据质量评价方法,其特征在于,所述计算数据偏差异常得分,为各数据偏差点设定数据质量等级,评估数据偏差对新能源场站运行性能的影响,主要步骤为:6. The method for evaluating the quality of monitoring data of a new energy station based on multi-source data according to claim 5 is characterized in that the data deviation anomaly score is calculated, a data quality level is set for each data deviation point, and the impact of data deviation on the operating performance of the new energy station is evaluated. The main steps are:对各数据偏差进行标准化处理,并将标准化处理后的数据偏差组成测试数据集;Standardize each data deviation, and form a test data set with the standardized data deviations;构建异常监测模型,设定异常监测模型的参数,并利用测试数据集作为训练数据训练异常监测模型,利用训练后的异常监测模型预测测试数据集中各数据偏差的异常得分,遍历测试数据集中各数据偏差,将各数据偏差输入异常监测模型的孤立树中,从根节点开始,根据孤立树的分裂条件向下遍历,直到达到叶子节点,记录每个数据点在每棵树中的遍历路径长度,关于各数据偏差,计算数据偏差在所有孤立树中路径长度的平均值,并将平均值作为当前数据偏差最终的异常得分,同时将异常得分归一化到预设范围;Construct an anomaly monitoring model, set the parameters of the anomaly monitoring model, and use the test data set as training data to train the anomaly monitoring model. Use the trained anomaly monitoring model to predict the anomaly score of each data deviation in the test data set, traverse each data deviation in the test data set, input each data deviation into the isolated tree of the anomaly monitoring model, start from the root node, traverse downward according to the splitting condition of the isolated tree until reaching the leaf node, record the traversal path length of each data point in each tree, and for each data deviation, calculate the average of the path lengths of the data deviation in all isolated trees, and use the average as the final anomaly score of the current data deviation, and normalize the anomaly score to a preset range;所述异常得分的计算公式为;The calculation formula of the abnormal score is:其中,S(x)代表数据偏差x的异常得分,代表数据偏差x在所有孤立树中的平均路径长度,C(n)表示调整系数,n代表任意正整数;Among them, S(x) represents the anomaly score of data deviation x, represents the average path length of data deviation x in all isolated trees, C(n) represents the adjustment coefficient, and n represents any positive integer;基于异常得分设定数据质量等级质量,等级可以根据得分范围进行设定:Set the data quality level based on the anomaly score. The level can be set based on the score range:优质数据:异常得分在0-a范围内;High-quality data: anomaly scores are in the range of 0-a;良好数据:异常得分在a-b范围内;Good data: anomaly scores are in the range of a-b;中等数据:异常得分在b-c范围内;Moderate data: anomaly scores are in the b-c range;低质数据:异常得分在c-d范围内;Low-quality data: anomaly scores are in the range of c-d;异常数据:异常得分在d-e范围内;Abnormal data: the anomaly score is in the range of d-e;对每个数据点根据其异常得分分配相应的质量等级Assign a corresponding quality level to each data point according to its anomaly score为每个异常数据点赋予一个评分,评分可以根据异常得分的高低来设定:Each abnormal data point is assigned a score, which can be set based on the anomaly score:异常得分在d-e范围内的数据点评分为E'(最高分);Data with anomaly scores in the range of d-e are scored as E' (the highest score);异常得分在c-d范围内的数据点评分为D';Data with abnormal scores in the range of c-d are scored as D';异常得分在b-c范围内的数据点评分为C';Data with abnormal scores in the range of b-c are scored as C';异常得分在a-b范围内的数据点评分为B';Data with abnormal scores in the range of a-b are classified as B';异常得分在0-a范围内的数据点评分为A'(最低分);Data with anomaly scores in the range of 0-a are scored as A' (the lowest score);根据异常数据点的打分评估数据偏差对新能源场站运行性能的影响。The impact of data deviation on the operating performance of new energy stations is evaluated based on the scoring of abnormal data points.7.如权利要求6所述的基于多源数据的新能源场站监测数据质量评价方法,其特征在于,所述基于数据质量等级,建立实时监控平台,追踪数据偏差的变化趋势,并设定变化率阈值,对超过变化率阈值的数据偏差进行预警,主要步骤为:7. The method for evaluating the quality of monitoring data of a new energy station based on multi-source data according to claim 6 is characterized in that, based on the data quality level, a real-time monitoring platform is established to track the change trend of data deviation, and a change rate threshold is set to warn the data deviation exceeding the change rate threshold, and the main steps are:基于数据质量等级,建立实时监控平台;Establish a real-time monitoring platform based on data quality levels;对数据偏差指标进行差分操作,得到差分值,对差分值进行平稳性检验,将平稳性检验后的偏差值作为数据偏差的变化率;Perform a difference operation on the data deviation index to obtain a difference value, perform a stability test on the difference value, and use the deviation value after the stability test as the rate of change of the data deviation;设定数据偏差的变化率阈值为μ0,偏差变化率μ公式为:The threshold value of the change rate of data deviation is set to μ0 , and the formula of the deviation change rate μ is:μ=k〃σ;μ=k〃σ;其中σ代表数据的标准差,k代表安全系数;Where σ represents the standard deviation of the data, and k represents the safety factor;引入目标函数G(k),评估不同k值下的预警效果,从而得到最优安全系数k,公式为:The objective function G(k) is introduced to evaluate the early warning effect under different k values, so as to obtain the optimal safety factor k, the formula is:其中,FPR(k)代表误报率,FNR(k)代表漏报率,w4和w5分别代表误报率和漏报率的权重系数;Where FPR(k) represents the false alarm rate, FNR(k) represents the false alarm rate,w4 andw5 represent the weight coefficients of the false alarm rate and false alarm rate, respectively;最优安全系数k的公式为:The formula for the optimal safety factor k is:当μ>μ0时,触发预警,将预警信息传达给运维人员,进而对预警信息进行分析:When μ>μ0 , an early warning is triggered and the early warning information is conveyed to the operation and maintenance personnel, who then analyze the early warning information:收集所有变化率超过预设阈值的数据偏差;Collect all data deviations whose change rate exceeds a preset threshold;将超阈值数据偏差按类型进行分类,得到各类别的异常数据点;Classify the data deviations exceeding the threshold by type to obtain abnormal data points of each category;针对各类别的异常数据点,进行根因分析,获取数据偏差的原因;Conduct root cause analysis for each category of abnormal data points to obtain the causes of data deviation;根据分析结果,定期评估和调整预警规则,优化实时监控平台。Based on the analysis results, the early warning rules are regularly evaluated and adjusted to optimize the real-time monitoring platform.8.基于多源数据的新能源场站监测数据质量评价系统,用于实现权利要求1-7中任一项所述的基于多源数据的新能源场站监测数据质量评价方法,其特征在于,该新能源场站监测数据质量评价系统包括:数据采集与预处理模块、数据偏差识别模型训练模块、集成学习网络与偏差识别模块、数据偏差异常得分与质量评估模块和实时监控与预警响应模块;8. A new energy station monitoring data quality evaluation system based on multi-source data, used to implement the new energy station monitoring data quality evaluation method based on multi-source data according to any one of claims 1-7, characterized in that the new energy station monitoring data quality evaluation system comprises: a data acquisition and preprocessing module, a data deviation identification model training module, an integrated learning network and deviation identification module, a data deviation anomaly score and quality assessment module and a real-time monitoring and early warning response module;数据采集与预处理模块,用于获取新能源场站中各监测设备采集的多源监测数据,对多源监测数据进行预处理;The data acquisition and preprocessing module is used to obtain the multi-source monitoring data collected by various monitoring devices in the new energy station and pre-process the multi-source monitoring data;数据偏差识别模型训练模块,用于基于预处理后的多源监测数据,利用Baum-Welch算法对预处理后的多源监测数据进行数据偏差识别模型训练;A data deviation identification model training module is used to train a data deviation identification model on the preprocessed multi-source monitoring data using the Baum-Welch algorithm;集成学习网络与偏差识别模块,用于集成学习网络和数据偏差识别模型,构建观测序列,并将预测的隐状态序列转换为状态标签数据,识别各监测数据源之间的数据偏差;The integrated learning network and deviation identification module is used to integrate the learning network and the data deviation identification model, construct the observation sequence, convert the predicted hidden state sequence into state label data, and identify the data deviation between various monitoring data sources;数据偏差异常得分与质量评估模块,用于计算数据偏差异常得分,为各数据偏差点设定数据质量等级,评估数据偏差对新能源场站运行性能的影响;The data deviation anomaly score and quality assessment module is used to calculate the data deviation anomaly score, set the data quality level for each data deviation point, and evaluate the impact of data deviation on the operating performance of new energy stations;实时监控与预警响应模块,用于建立实时监控平台,追踪数据偏差的变化趋势,并设定变化率阈值,对超过变化率阈值的数据偏差进行预警。The real-time monitoring and early warning response module is used to establish a real-time monitoring platform, track the changing trend of data deviations, set change rate thresholds, and issue early warnings for data deviations that exceed the change rate thresholds.9.一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其特征在于:所述处理器执行所述计算机程序时实现权利要求1~7任一所述的文件加密方法的步骤。9. A computer device comprising a memory and a processor, wherein the memory stores a computer program, wherein the processor implements the steps of the file encryption method according to any one of claims 1 to 7 when executing the computer program.10.一种计算机可读存储介质,其上存储有计算机程序,其特征在于:所述计算机程序被处理器执行时实现权利要求1~7任一所述的文件加密方法的步骤。10. A computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the file encryption method according to any one of claims 1 to 7.
CN202410958735.9A2024-07-172024-07-17 New energy station monitoring data quality evaluation method and system based on multi-source dataPendingCN119066541A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202410958735.9ACN119066541A (en)2024-07-172024-07-17 New energy station monitoring data quality evaluation method and system based on multi-source data

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202410958735.9ACN119066541A (en)2024-07-172024-07-17 New energy station monitoring data quality evaluation method and system based on multi-source data

Publications (1)

Publication NumberPublication Date
CN119066541Atrue CN119066541A (en)2024-12-03

Family

ID=93636429

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202410958735.9APendingCN119066541A (en)2024-07-172024-07-17 New energy station monitoring data quality evaluation method and system based on multi-source data

Country Status (1)

CountryLink
CN (1)CN119066541A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN119474770A (en)*2025-01-162025-02-18数据易(北京)信息技术有限公司 An intelligent data governance system based on big models
CN119645984A (en)*2024-12-092025-03-18湖北华中电力科技开发有限责任公司Multidimensional data quality detection method based on data center table
CN119671205A (en)*2025-02-112025-03-21国网山西省电力公司信息通信分公司 A method and system for intelligent segmentation management of power cables
CN119884102A (en)*2025-03-282025-04-25高质标准化研究院(山东)有限公司Quality management method based on standard data
CN119884937A (en)*2025-03-242025-04-25长沙矿冶研究院有限责任公司Pressure early warning method and system for high-pressure jet pipeline
CN119904148A (en)*2025-01-212025-04-29中国标准化研究院 A data quality assessment method and system in enterprise credit information fusion
CN120120281A (en)*2025-05-142025-06-10浙江数智交院科技股份有限公司 A tunnel fan monitoring method based on laser vibration measurement
CN120342083A (en)*2025-06-172025-07-18山东交通学院 A remote real-time monitoring method for port shore power system
CN120541465A (en)*2025-07-282025-08-26联通沃音乐文化有限公司Report data anomaly monitoring and quality assessment system, method and electronic equipment
CN120561474A (en)*2025-07-302025-08-29南京气象科技创新研究院 AI-based meteorological station data quality control method integrated with numerical forecast

Cited By (13)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN119645984A (en)*2024-12-092025-03-18湖北华中电力科技开发有限责任公司Multidimensional data quality detection method based on data center table
CN119474770A (en)*2025-01-162025-02-18数据易(北京)信息技术有限公司 An intelligent data governance system based on big models
CN119904148A (en)*2025-01-212025-04-29中国标准化研究院 A data quality assessment method and system in enterprise credit information fusion
CN119671205A (en)*2025-02-112025-03-21国网山西省电力公司信息通信分公司 A method and system for intelligent segmentation management of power cables
CN119884937B (en)*2025-03-242025-09-09长沙矿冶研究院有限责任公司Pressure early warning method and system for high-pressure jet pipeline
CN119884937A (en)*2025-03-242025-04-25长沙矿冶研究院有限责任公司Pressure early warning method and system for high-pressure jet pipeline
CN119884102A (en)*2025-03-282025-04-25高质标准化研究院(山东)有限公司Quality management method based on standard data
CN120120281A (en)*2025-05-142025-06-10浙江数智交院科技股份有限公司 A tunnel fan monitoring method based on laser vibration measurement
CN120342083A (en)*2025-06-172025-07-18山东交通学院 A remote real-time monitoring method for port shore power system
CN120541465A (en)*2025-07-282025-08-26联通沃音乐文化有限公司Report data anomaly monitoring and quality assessment system, method and electronic equipment
CN120541465B (en)*2025-07-282025-09-23联通沃音乐文化有限公司Report data anomaly monitoring and quality assessment system, method and electronic equipment
CN120561474A (en)*2025-07-302025-08-29南京气象科技创新研究院 AI-based meteorological station data quality control method integrated with numerical forecast
CN120561474B (en)*2025-07-302025-09-30南京气象科技创新研究院 AI-based meteorological station data quality control method integrated with numerical forecast

Similar Documents

PublicationPublication DateTitle
CN119066541A (en) New energy station monitoring data quality evaluation method and system based on multi-source data
CN117930815B (en)Wind turbine generator remote fault diagnosis method and system based on cloud platform
CN118096131B (en)Operation and maintenance inspection method based on electric power scene model
CN117743909B (en) A method and device for analyzing heating system failure based on artificial intelligence
CN111460167A (en) Method and related equipment for locating sewage objects based on knowledge graph
CN110837866A (en) Evaluation method of defect degree of power secondary equipment based on XGBoost
CN119420639A (en) Communication network operation and maintenance fault location tracking method and system
CN112528519A (en)Method, system, readable medium and electronic device for engine quality early warning service
CN117055502A (en)Intelligent control system based on Internet of things and big data analysis
CN117254980B (en) An industrial network security risk assessment method and system based on attention mechanism
CN109753591A (en)Operation flow predictability monitoring method
CN119382613A (en) Photovoltaic operation fault diagnosis method and system
CN118070202A (en)Industrial data quality control system based on artificial intelligence
CN119691576B (en)Self-adaptive operation and maintenance root cause positioning method and system based on deep learning
CN115658546B (en)Heterogeneous information network-based software fault prediction method and system
CN113891342A (en) Base station inspection method, device, electronic device and storage medium
CN119988897B (en)Fault identification method based on intelligent model
CN116126807A (en)Log analysis method and related device
CN119396993A (en) A method and system for power grid operation and maintenance deployment based on knowledge graph
CN120162213A (en) Risk warning method, system and electronic equipment
CN119271657A (en) A data quality assessment method and system based on big data analysis
CN118859037B (en) A method for power equipment fault analysis based on multi-source data fusion
CN119916130A (en) Real-time analysis and precise positioning of distribution network faults and their treatment methods and systems
CN119273296A (en) Safety inspection method and system applied to project management
CN119005675A (en)Auditing method and device for construction cost of distribution network

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp