Disclosure of Invention
The invention aims to provide a method and a system for detecting the abnormality of monitoring data of a transformer substation based on deep learning, which can realize the rapid and accurate abnormality detection of the monitoring data of the transformer substation and provide powerful guarantee for the safe and stable operation of the transformer substation.
The invention aims to achieve the aim, and the aim is achieved by the following technical scheme:
On the one hand, the transformer substation monitoring data anomaly detection method based on deep learning comprises the following steps:
s1, preprocessing and extracting features of original data acquired in a transformer substation monitoring system;
s2, after feature extraction is completed, analyzing and processing the extracted features by adopting a deep learning algorithm, and identifying abnormal points or abnormal modes in the data;
S3, performing super-parameter optimization on the constructed deep learning algorithm model;
and S4, outputting the abnormality detection result in a visual mode.
Preferably, the step S1 specifically includes:
The preprocessing of the original data comprises data cleaning and data transformation, and is used for cleaning and converting the data, removing noise, processing missing values and converting data types;
The feature extraction of the original data is to extract key features useful for anomaly detection from the preprocessed data.
Preferably, the extracting key features useful for abnormality detection specifically includes:
S11, setting a threshold value tau1, screening out the characteristics with variance smaller than tau1 or the absolute value of the coefficient related to the target variable smaller than tau1, and marking the obtained characteristic set as F1;
S12, training a model supporting feature importance assessment on F1 to be M, and outputting an importance score of each feature by M to be importancei, wherein i represents a feature index;
S13, for each pair of features (Fi,fj) in the F1, calculating the correlation thereof to obtain a correlation matrix Cij, wherein Cuj represents the correlation between the features Fi and Fj;
s14, calculating a weighted score Si of each feature according to the importance score of the feature and the correlation between the features:
Where, |f1 | represents the number of features in F1;
And S15, setting a threshold tau2, and selecting the characteristics with the weighted score Si being larger than tau2 as a final characteristic subset Ffind.
Preferably, the step S2 specifically includes the following steps:
According to a deep learning algorithm, an activation function in the neural network is calculated and used for determining the activation mode and intensity of the neurons:
zt=σ(Wz·[ht-1,xt])
rt=σ(Wr·[ht-1,xt])
ht=(1-zt)*ht-1+zt*ht
Where zt and rt represent update and reset gates, respectively, Wz and Wr represent parameter training matrices,Indicating candidate cell states at the current moment, wherein ht and ht-1 respectively indicate the output of hidden neurons at the current moment and the previous moment, and sigma and tanh indicate activation functions;
a smooth switching function s (x) is defined using the sigmoid function, and the switching function s (x) increases from 0 to 1 as x increases, the tanhrelu function can be defined as:
TanhReLU(x)=s(x)·ReLU(x)+(1-s(x))·Tanh(x)
Wherein, beta is used for controlling the smoothness degree of the switching, and theta is the threshold value of the occurrence of the switching;
Substituting the definition of s (x) into TanhReLU functions yields:
Wherein, beta and theta are super parameters;
in the deep learning algorithm model, smooth ElasticNet regularization function expression is as follows:
Where w is the parameter vector of the model, wi is the i-th parameter, n is the number of parameters, λ1、λ2、λ3 is the regularization coefficient for controlling the importance of the different regularization terms, ε is a small positive number for avoiding zero-divide errors when wi =0 and making the function smoother when wi approaches 0.
Preferably, in the step S3, the optimizing the super parameters in the model by using the improved particle swarm optimization algorithm includes:
The basic algorithm for particle evolution QPSO is as follows:
Wherein, the average optimal position of the mbest particle group, Pij is the optimal position of the ith particle in the jth dimension, Pgj is the optimal position of the particle in the jth dimension; is a random position between Pij and Pgj, M is the size of the particle population, M and u are random numbers in one [0,1], and α is the shrink diffusion coefficient.
Preferably, the step S4 includes:
in the visual interface, the detected abnormal data points or abnormal patterns are highlighted for identifying the abnormal data.
On the other hand, a detection system based on the method for detecting abnormal substation monitoring data based on deep learning is provided, which comprises the following steps:
the data preprocessing module is used for preprocessing the original data acquired in the substation monitoring system and extracting the characteristics;
The model building and analyzing module is used for analyzing the extracted features by adopting a deep learning algorithm after the feature extraction is completed, identifying abnormal points or abnormal modes in the data, and performing super-parameter optimization on the constructed deep learning algorithm model;
And the result output module is used for outputting the abnormality detection result in a visual mode.
Compared with the prior art, the invention has the beneficial effects that:
1. WIFS the weighting integrated feature screening method combines the advantages of various feature selection methods such as a filtering method, an embedding method and the like. The filtering method reduces the calculation amount of subsequent processing by primarily screening low variance or irrelevant features, and the embedding rule further screens out features which have obvious influence on the prediction performance of the model by evaluating the feature importance in the model training process. This comprehensive approach allows for a more comprehensive consideration of the validity and importance of the features. In addition, WIFS introduces a weighting strategy, which not only considers the feature importance scores given by the model, but also considers the correlation among the features, and the weighting mode is helpful to avoid selecting highly correlated but redundant features, so that the representativeness and the effectiveness of the feature subset are improved, and meanwhile, the weighting strategy also allows the weight to be adjusted according to the specific application scene, so that the flexibility and the adaptability of the method are improved;
2. TanhReLU activates the function, which is helpful to maintain the stability of the gradient in the initial stage of training and avoid the training problem caused by too large or too small gradient, in addition, the TanhReLU function can automatically adjust the behavior of the function according to the different input values by introducing a learnable switching mechanism, which is helpful to better fit the complex data distribution of the model;
3. Smooth ElasticNet regularizing the function, which is helpful to reduce overfitting and improve the stability of the model, and can flexibly control the relative importance of sparsity, smoothness and weight change rate.
Detailed Description
The application will be further illustrated with reference to specific examples. It is to be understood that these examples are illustrative of the present application and are not intended to limit the scope of the present application. Further, it will be understood that various changes and modifications may be made by those skilled in the art after reading the teachings of the application, and equivalents thereof fall within the scope of the application as defined by the claims.
In the present invention, terms such as "upper", "lower", "left", "right", "front", "rear", "vertical", "horizontal", "side", "bottom", etc. refer to an orientation or a positional relationship based on that shown in the drawings, and are merely relational terms, which are used for convenience in describing structural relationships of various components or elements of the present invention, and do not denote any one of the components or elements of the present invention, and are not to be construed as limiting the present invention.
In the present invention, terms such as "fixedly attached," "connected," "coupled," and the like are to be construed broadly and refer to either a fixed connection or an integral or removable connection, or both, as well as directly or indirectly via an intermediary. The specific meaning of the terms in the present invention can be determined according to circumstances by a person skilled in the relevant art or the art, and is not to be construed as limiting the present invention.
Examples:
as shown in fig. 1, the embodiment provides a substation monitoring data anomaly detection method based on deep learning, which includes the following steps:
s1, preprocessing and extracting features of original data acquired in a transformer substation monitoring system;
s2, after feature extraction is completed, analyzing and processing the extracted features by adopting a deep learning algorithm, and identifying abnormal points or abnormal modes in the data;
S3, performing super-parameter optimization on the constructed deep learning algorithm model;
and S4, outputting the abnormality detection result in a visual mode.
The overall structural model is shown in fig. 2.
The method comprises the following steps of S1, preprocessing raw data collected in a transformer substation monitoring system and extracting features, wherein the preprocessing comprises data cleaning (noise removal, missing value processing and the like) and data transformation (normalization, standardization and the like), and the feature extraction is to extract key features useful for abnormality detection from the preprocessed data by using a statistical method or a special feature engineering technology, wherein the features can fully represent the difference between the normal state and the abnormal state of the data;
the invention provides a weighted integrated feature screening method WIFS (WEIGHTED INTEGRATED Feature Selection) for screening out feature information useful for a model, which aims to select the feature with the most important model prediction effect from original features so as to improve the accuracy and efficiency of the model, and comprises the following specific steps:
1. The method of filtering method primary screening, which is to set a threshold value tau1 by using a variance selection method or a correlation coefficient method and the like, screen out the characteristics that the variance is smaller than tau1 or the absolute value of the correlation coefficient with a target variable is smaller than tau1, and the characteristic set obtained in the step is marked as F1:
2. Training a basic model by an embedding method, namely training a model (such as a random forest, a gradient lifting tree and the like) supporting feature importance assessment on F1, marking the model as M, and marking the model M as importancei, wherein i represents a feature index, and outputting an importance score of each feature by the model M;
3. Correlation evaluation, namely, for each pair of features (Fi,fj) in the F1, calculating the correlation (such as pearson correlation coefficient, mutual information and the like) between the features to obtain a correlation matrix Cij, wherein Cij represents the correlation between the features Fi and Fj;
4. Weighted integration, calculating a weighted score Si for each feature based on the importance scores of the features and the correlation between the features:
where, |f1 | represents the number of features in F1, which takes into account the importance of a feature and its absolute value of average correlation with other features, if a feature is highly correlated with other features, its weighted score will decrease due to the presence of correlation;
5. The final feature is selected by setting a threshold τ2 and selecting features with a weighted score Si greater than τ2 as the final feature subset Ffind.
Step S2, anomaly detection based on a deep learning algorithm comprises the following steps:
The gating cyclic unit GRU (Gated Recurrent Unit, GRU) is used as a variant of the cyclic neural network, has unique advantages in the aspect of extracting characteristic information of data, on one hand, the GRU effectively controls information flow by introducing two gating mechanisms of an Update Gate (Update Gate) and a reset Gate (RESET GATE), and the mechanism enables the GRU to selectively retain or forget historical information, so that long-term dependency in sequence data can be better captured, and compared with the traditional RNN, the gating mechanism of the GRU helps to relieve gradient disappearance problem, and a model can learn long-distance dependency more stably in the training process.
Compared with another RNN variant, long Short-Term Memory network LSTM (LSTM), the GRU has fewer parameters while maintaining similar performance, which makes the GRU require fewer computing resources during training, and the training speed is faster, the model structure of the GRU is relatively simple, easy to implement and debug, and the basic algorithm is as follows:
zt=σ(Wz·[ht-1,xt])
rt=σ(Wr·[ht-1,xt])
ht=(1-zt)*ht-1+zt*ht
Where zt and rt represent update and reset gates, respectively, Wz and Wr represent parameter training matrices,Indicating candidate cell states at the current moment, wherein ht and ht-1 respectively indicate the output of hidden neurons at the current moment and the previous moment, and sigma and tanh indicate activation functions;
The activation function in neural networks, which determines the activation mode and intensity of neurons, discloses a TanhReLU activation function, tanhReLU function, which aims to avoid neuronal "death" by using the smoothing property of the Tanh function when the input value is small, while keeping the gradient stable by using the linear property of the ReLU function when the input value is large, however, directly combining these two functions may not be the optimal choice because they differ in the output range and gradient properties, and therefore a transition mechanism needs to be designed to smoothly connect these two regions:
Assuming that there is a smooth switching function s (x) that gradually increases from 0 to 1 as x increases, s (x) may be defined using a sigmoid function or similar smoothing function, and then TanhReLU functions may be defined as:
TanhReLU(x)=s(x)·ReLU(x)+(1-s(x))·Tanh(x)
Where β is used to control the smoothness of the handover (the greater β, the steeper handover), and θ is the threshold at which the handover occurs (i.e., s (x) =0.5 when x=θ);
Substituting the definition of s (x) into TanhReLU functions, the reasoning is:
Wherein, beta and theta are super parameters, and are adjusted according to specific tasks;
The TanhReLU function has a smooth curve and good nonlinear characteristics, is favorable for keeping the stability of the gradient in the initial stage of training, avoids the training problem caused by overlarge or overlarge gradient, and in addition, the TanhReLU function can automatically adjust the behavior of the function according to different input values by introducing a learnable switching mechanism, so that the model is favorable for fitting complex data distribution better;
In the deep learning model, the regularization function has the advantages of preventing overfitting, improving generalization capability, reducing model complexity and the like, and the embodiment provides a new regularization function combining L1 and L2 regularization characteristics, which is called as 'ELASTICNET-like' regularization, and improves the regularization function to be unique.
Smooth ElasticNet regularization function expression is as follows:
where w is the parameter vector of the model, wi is the ith parameter, n is the number of parameters, λ1、λ2、λ3 is the regularization coefficient for controlling the importance of the different regularization terms, ε is a small positive number for avoiding zero-removal errors at wi =0 and making the function smoother as wi approaches 0, and the third term is a smoothing term that encourages the differences between adjacent parameters to be as small as possible, thereby helping to produce smoother weight changes;
the Smooth ElasticNet regularization function provided by the invention has the following characteristics:
sparsity byTerm Smooth ElasticNet regularization encourages the model to produce sparse weights, similar to L1 regularization;
The smoothness is that the L2 regularization term and the additional smoothing term act together, so that the weight value is smoother, the overfitting is reduced, and the stability of the model is improved;
Flexibility by adjusting the value of lambda1、λ2、λ3, the relative importance of sparsity, smoothness and weight change rate can be flexibly controlled;
therefore, when the GRU model is constructed, the original activation function and regularization function are improved, so that the overall performance of the model is further improved.
Step S3, optimizing the super parameters in the model by utilizing an improved particle swarm optimization algorithm, wherein the process is shown in FIG. 3 and comprises the following steps:
Super-parameters are "knobs" in machine learning algorithms that are used to control the learning process, such as learning rate, regularization coefficients, number of hidden layers, number of neurons, etc. By systematically adjusting these parameters, the optimal configuration that best suits the current data set and task can be found, thereby significantly improving the performance of the model. The quantum behavior particle swarm optimization (QPSO) algorithm is an optimization algorithm for simulating the behavior of a shoal or a shoal in nature, introduces the concept of quantum dynamic kinematics on the basis of a classical Particle Swarm Optimization (PSO) algorithm, enhances the randomness and global searching capability of particles through the quantum behavior, and can search a global optimal solution in the whole feasible solution space. In model hyper-parametric optimization, the QPSO algorithm can be applied to find optimal model configurations that optimize the performance of the model on a given data set. The basic algorithm for particle evolution QPSO is as follows:
Wherein, the average optimal position of the mbest particle group, Pij is the optimal position of the ith particle in the jth dimension, Pgj is the optimal position of the particle in the jth dimension; is a random position between Pij and Pgj, M is the size of the particle population, M and u are random numbers in one [0,1], and α is the shrink diffusion coefficient.
Step S4, outputting a visual result, which comprises the following steps:
In the visual interface, the detected outlier data points or outlier patterns are highlighted in a special way (e.g., different colors, shapes, sizes, or markers) so that the monitoring personnel can quickly identify which data are outliers. This intuitive approach helps to quickly locate the problem.
On the other hand, the embodiment also provides a detection system based on the method for detecting the abnormal condition of the monitoring data of the transformer substation based on deep learning, which comprises the following steps:
the data preprocessing module is used for preprocessing the original data acquired in the substation monitoring system and extracting the characteristics;
The model building and analyzing module is used for analyzing the extracted features by adopting a deep learning algorithm after the feature extraction is completed, identifying abnormal points or abnormal modes in the data, and performing super-parameter optimization on the constructed deep learning algorithm model;
And the result output module is used for outputting the abnormality detection result in a visual mode.
While the preferred embodiment of the present application has been described in detail, the present application is not limited to the embodiments described above, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the present application, and these equivalent modifications and substitutions are intended to be included in the scope of the present application as defined in the appended claims.