Disclosure of Invention
The embodiment of the application provides a method, a device, equipment, a storage medium and a program product for detecting a back door of electric power data, which can reduce the calculation cost for detecting the electric power data, thereby realizing the real-time detection of the large-scale electric power data.
In a first aspect, an embodiment of the present application provides a method for detecting power data, including:
Inputting the power data to be detected into a feature extraction network to obtain the data features of the power data to be detected, wherein the feature extraction network comprises a plurality of hidden layers;
Performing feature dimension reduction processing on the data features to obtain low-dimension data features;
Determining a first Euclidean distance between the low-dimensional data characteristic and a reference cluster center of each of a plurality of category clusters corresponding to a predetermined clean power data set;
And determining a detection result of the power data to be detected based on the plurality of first Euclidean distances and the predetermined distance constraint of each class cluster, wherein the detection result is clean power data or toxic power data.
In one embodiment, the determining of the detection result for the power data to be detected based on the plurality of first euclidean distances and the predetermined distance constraint of each category cluster includes determining that the power data to be detected is clean power data when determining that any one of the first euclidean distances is less than or equal to the distance constraint corresponding to the category cluster corresponding to the first euclidean distance, and determining that the power data to be detected is poisoning power data when determining that each of the first euclidean distances is greater than the distance constraint corresponding to the category cluster corresponding to the first euclidean distance.
In one embodiment, reference clustering centers of a plurality of class clusters corresponding to a clean power data set are determined by inputting a plurality of power data in the acquired clean power data set into a feature extraction network to obtain a data feature matrix corresponding to the plurality of power data, performing feature dimension reduction processing on the data feature matrix to obtain a low-dimension feature matrix, clustering the low-dimension feature matrix to obtain a plurality of class clusters, determining an initial clustering center of each class cluster based on each low-dimension feature in each class cluster and the number of low-dimension features in the class cluster, and updating the initial clustering center of each class cluster by a minimized objective function to obtain the reference clustering center of each class cluster.
In one embodiment, the distance constraint for each category cluster is determined by determining a second Euclidean distance between each low dimensional feature in each category cluster and a reference cluster center for the category cluster and determining the distance constraint for each category cluster based on a plurality of second Euclidean distances corresponding to each category cluster.
In one embodiment, determining the distance constraint of each category cluster based on the plurality of second Euclidean distances corresponding to each category cluster comprises determining an average value of the plurality of second Euclidean distances corresponding to the category cluster for each category cluster, and taking the product of the average value and a preset value as the distance constraint of the category cluster.
In one embodiment, performing feature dimension reduction processing on the data features to obtain low-dimensional data features comprises performing feature dimension reduction processing on the data features by using a principal component analysis method to obtain the low-dimensional data features.
In a second aspect, the present application provides a power data back door detection apparatus, the apparatus comprising:
the device comprises a feature extraction module, a feature extraction module and a detection module, wherein the feature extraction module is used for inputting power data to be detected into a feature extraction network to obtain the data features of the power data to be detected;
the feature dimension reduction module is used for carrying out feature dimension reduction processing on the data features to obtain low-dimension data features;
The determining module is used for determining a first Euclidean distance between the low-dimensional data characteristic and a reference cluster center of each of a plurality of category clusters corresponding to a predetermined clean power data set;
The determining module is further used for determining a detection result of the power data to be detected based on the plurality of first Euclidean distances and the predetermined distance constraint of each category cluster, wherein the detection result is clean power data or poisoning power data.
In a third aspect, the present application provides a computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
Inputting the power data to be detected into a feature extraction network to obtain the data features of the power data to be detected, wherein the feature extraction network comprises a plurality of hidden layers;
Performing feature dimension reduction processing on the data features to obtain low-dimension data features;
Determining a first Euclidean distance between the low-dimensional data characteristic and a reference cluster center of each of a plurality of category clusters corresponding to a predetermined clean power data set;
And determining a detection result of the power data to be detected based on the plurality of first Euclidean distances and the predetermined distance constraint of each class cluster, wherein the detection result is clean power data or toxic power data.
In a fourth aspect, the present application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
Inputting the power data to be detected into a feature extraction network to obtain the data features of the power data to be detected, wherein the feature extraction network comprises a plurality of hidden layers;
Performing feature dimension reduction processing on the data features to obtain low-dimension data features;
Determining a first Euclidean distance between the low-dimensional data characteristic and a reference cluster center of each of a plurality of category clusters corresponding to a predetermined clean power data set;
And determining a detection result of the power data to be detected based on the plurality of first Euclidean distances and the predetermined distance constraint of each class cluster, wherein the detection result is clean power data or toxic power data.
In a fifth aspect, the application also provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of:
Inputting the power data to be detected into a feature extraction network to obtain the data features of the power data to be detected, wherein the feature extraction network comprises a plurality of hidden layers;
Performing feature dimension reduction processing on the data features to obtain low-dimension data features;
Determining a first Euclidean distance between the low-dimensional data characteristic and a reference cluster center of each of a plurality of category clusters corresponding to a predetermined clean power data set;
And determining a detection result of the power data to be detected based on the plurality of first Euclidean distances and the predetermined distance constraint of each class cluster, wherein the detection result is clean power data or toxic power data.
The power data back door detection method, the device, the equipment, the storage medium and the program product can enable computer equipment to input power data to be detected into a feature extraction network to obtain data features of the power data to be detected, wherein the feature extraction network comprises a plurality of hidden layers, feature dimension reduction processing is conducted on the data features to obtain low-dimension data features, first Euclidean distances between the low-dimension data features and reference cluster centers of a plurality of category clusters corresponding to a predetermined clean power data set are determined, detection results of the power data to be detected are determined based on the first Euclidean distances and distance constraint of each category cluster, and the detection results are clean power data or toxic power data. By adopting the method, the computer equipment can obtain the low-dimensional deep data characteristic by extracting the deep data characteristic of the electric power data to be detected and performing dimension reduction processing on the deep data characteristic, then determine the Euclidean distance between the low-dimensional deep data characteristic and the reference cluster center of each class cluster corresponding to the predetermined clean electric power data set, judge whether the distance constraint is met, and further determine the detection result aiming at the electric power data to be detected. Compared with the existing power data detection method, the method has the advantages that complicated data processing is not needed for input data, so that the calculation cost for detecting the power data can be reduced, and further real-time detection of large-scale power data is realized.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The application scenario of the power data detection method provided by the embodiment of the application is described below.
Referring to fig. 1, fig. 1 is a schematic diagram of an application scenario of a power data back door detection method according to an embodiment of the present application. As shown in fig. 1, the system includes a computer device 101 (the computer device 101 is shown as a terminal device in fig. 1 as an example) and a database server 102, wherein data transmission between the computer device 101 and the database server 102 is performed via a network.
The database server 102 may be configured to store the power data to be detected, a reference cluster center corresponding to a predetermined clean power data set and a distance constraint of each class cluster.
The computer device 101 may obtain the power data to be detected from the database server 102, input the power data to be detected into a feature extraction network including a plurality of hidden layers to obtain deep data features of the power data to be detected, perform feature dimension reduction on the deep data features to obtain low-dimensional deep data features of the power data to be detected, obtain reference cluster centers of a plurality of category clusters corresponding to a predetermined clean power data set from the database server 102, determine euclidean distances between the low-dimensional deep data features and the reference cluster centers of each category cluster, obtain distance constraints of each category cluster from the database server 102, determine a detection result of the power data to be detected based on the plurality of euclidean distances and the distance constraints of each category cluster, and obtain the detection result as the clean power data or the toxic power data. By adopting the method, the computer equipment can obtain the low-dimensional deep data characteristic by extracting the deep data characteristic of the electric power data to be detected and performing dimension reduction processing on the deep data characteristic, then determine the Euclidean distance between the low-dimensional deep data characteristic and the reference cluster center of each class cluster corresponding to the predetermined clean electric power data set, judge whether the distance constraint is met, and further determine the detection result aiming at the electric power data to be detected. Compared with the existing power data detection method, the method has the advantages that complicated data processing is not needed for input data, so that the calculation cost for detecting the power data can be reduced, and further real-time detection of large-scale power data is realized.
Alternatively, the computer device 101 may be a terminal device or a server. Among them, the terminal devices mentioned herein may include, but are not limited to, smart phones, tablet computers, notebook computers, desktop computers, smart watches, smart televisions, smart car terminals, etc. The server mentioned here may be a separate physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers.
Referring to fig. 2, fig. 2 is a flow chart of a method for detecting a back door of power data according to an embodiment of the application. The method may be performed by a computer device, such as the computer device 101 described above. As shown in fig. 2, the power data back door detection method may include, but is not limited to, the following steps:
S201, inputting the power data to be detected into a feature extraction network to obtain the data features of the power data to be detected, wherein the feature extraction network comprises a plurality of hidden layers.
Because the feature extraction network comprises a plurality of hidden layers, the computer equipment inputs the power data to be detected into the feature extraction network, and the obtained data features of the power data to be detected are deep data features. The deep data features contain richer semantic information, which is more beneficial to distinguishing clean power data from poisoning power data. Therefore, the accuracy of the detection result can be improved when the electric power data to be detected is detected based on the deep data characteristics.
And S202, performing feature dimension reduction processing on the data features to obtain low-dimension data features.
In this way, the computer device can obtain the low-dimensional data feature by reducing the dimension of the data feature, thereby being beneficial to reducing the calculation cost of detecting the power data.
S203, determining a first Euclidean distance between the low-dimensional data characteristic and a reference cluster center of each of a plurality of category clusters corresponding to a predetermined clean power data set.
In an alternative embodiment, before step S203, the computer device may further determine a reference cluster center of each of the plurality of category clusters corresponding to the clean power data set.
S204, determining a detection result of the power data to be detected based on a plurality of first Euclidean distances and a predetermined distance constraint of each category cluster, wherein the detection result is clean power data or poisoning power data.
In an alternative embodiment, prior to step S204, the computer device may further determine respective distance constraints for a plurality of category clusters corresponding to the clean power dataset.
In the embodiment of the application, the computer equipment can input the electric power data to be detected into the characteristic extraction network to obtain the data characteristic of the electric power data to be detected, the characteristic extraction network comprises a plurality of hidden layers, the characteristic dimension reduction processing is carried out on the data characteristic to obtain the low-dimension data characteristic, the first Euclidean distance between the low-dimension data characteristic and the reference clustering center of each of a plurality of category clusters corresponding to a predetermined clean electric power data set is determined, the detection result of the electric power data to be detected is determined based on the first Euclidean distances and the distance constraint of each category cluster, and the detection result is clean electric power data or toxic electric power data. By adopting the method, the computer equipment can obtain the low-dimensional deep data characteristic by extracting the deep data characteristic of the electric power data to be detected and performing dimension reduction processing on the deep data characteristic, then determine the Euclidean distance between the low-dimensional deep data characteristic and the reference cluster center of each class cluster corresponding to the predetermined clean electric power data set, judge whether the distance constraint is met, and further determine the detection result aiming at the electric power data to be detected. Compared with the existing power data detection method, the method has the advantages that complicated data processing is not needed for input data, so that the calculation cost for detecting the power data can be reduced, and further real-time detection of large-scale power data is realized.
Referring to fig. 3, fig. 3 is a flowchart illustrating another power data back door detection method according to an embodiment of the application. In comparison with the power data back door detection method shown in fig. 2, the method shown in fig. 3 further illustrates how the computer device determines the reference cluster center of each of the plurality of class clusters corresponding to the clean power data set, and how to determine the distance constraint of each class cluster. As shown in fig. 3, the power data back door detection method may include, but is not limited to, the following steps:
s301, acquiring a clean power data set.
In defending against a back door attack of a power system, a computer device may acquire a clean power data set and verify the basic performance of the power system based on the clean power data set, assuming that an defender cannot access the original training power data set, but can access a set of clean verification power data sets (or become the clean power data set).
S302, inputting a plurality of pieces of power data in the obtained clean power data set into a feature extraction network to obtain a data feature matrix corresponding to the plurality of pieces of power data, wherein the feature extraction network comprises a plurality of hidden layers.
Because the feature extraction network comprises a plurality of hidden layers, the computer equipment inputs a plurality of pieces of power data in the clean power data set into the feature extraction network, and the obtained data feature matrix of the plurality of pieces of power data comprises deep data features of each piece of power data in the plurality of pieces of power data.
S303, performing feature dimension reduction processing on the data feature matrix to obtain a low-dimension feature matrix.
In an alternative embodiment, the computer device performs feature dimension reduction processing on the data feature matrix to obtain a low-dimensional feature matrix, which may include performing feature dimension reduction processing on the data feature matrix by using a principal component analysis method to obtain the low-dimensional feature matrix.
The principal component analysis method (PRINCIPAL COMPONENT ANALYSIS, PCA) is also called principal component analysis, and aims to convert multiple indexes into a few comprehensive indexes (i.e. principal components) by using the idea of dimension reduction, wherein each principal component can reflect most of information of an original variable, and the contained information is not repeated. The method introduces various variables and simultaneously attributes complex factors into a plurality of main components, so that the problems are simplified, and the obtained result is more scientific and effective data information.
The process of obtaining a low-dimensional feature matrix by performing feature dimension reduction processing on a data feature matrix (denoted as fl) by using a principal component analysis method by using computer equipment is described below.
Step one, determining a covariance matrix of the data feature matrix fl.
And secondly, decomposing the eigenvalue of the data eigenvalue matrix fl through a covariance matrix, and determining the main component in fl.
And thirdly, selecting m principal components with the largest reservation, and projecting the original data characteristic fl onto the m principal components so as to form a matrix P= (P1,P2,...,Pm).
And fourthly, processing the matrix P to obtain a plurality of low-dimensional feature matrices Zpca corresponding to the electric power data.
Optionally, when the computer device processes the matrix P to obtain the low-dimensional feature matrix Zpca corresponding to the plurality of electric power data, the following formula (1) may be adopted.
(1)
In the formula (1), Z represents a data normalization result of the data feature matrix fl.
In the application, the computer equipment performs dimension reduction processing on the deep data feature matrix of the electric power data, maps the high-dimensional features into the low-dimensional space, can enable the k-means cluster to more effectively distinguish different types of data, and reduces noise existing in the high-dimensional data features.
S304, clustering the low-dimensional feature matrix to obtain a plurality of category clusters.
In an alternative embodiment, the computer device may use a K-means clustering method to cluster multiple low-dimensional features to obtain multiple class clusters.
Each class cluster contains one or more low-dimensional characteristics of clean power data, and the clean power data corresponding to the low-dimensional characteristics in each class cluster has the same or similar properties. In addition, each class cluster has a cluster center for representing the average point of the class cluster.
S305, determining an initial clustering center of each category cluster based on each low-dimensional feature in each category cluster and the number of low-dimensional features in the category cluster.
In an alternative embodiment, the computer device may employ the following equation (2) when determining the initial cluster center for each category cluster based on each low-dimensional feature in each category cluster and the number of low-dimensional features in that category cluster.
(2)
In formula (2), Cj represents a set of jth category clusters; The number of data points (low-dimensional features) belonging to the j-th class cluster is represented. Zi represents the data points (low-dimensional features) assigned to the jth class cluster (i.e., Cj), and Zi represents the initial cluster center of the jth class cluster from the previously determined low-dimensional feature matrix Zpca;uj.
S306, updating the initial cluster center of each category cluster through the minimized objective function to obtain the reference cluster center of each category cluster.
In an alternative embodiment, the objective function may be as shown in equation (3) below.
(3)
In equation (3), C represents the total number of category clusters, Cj represents the set of j-th category clusters, zi represents the data points (low-dimensional features) assigned to the j-th category cluster (i.e., Cj), uj represents the initial cluster center of the j-th category cluster, and E represents the objective function.
Optionally, the computer device, by minimizing equation (3) above, determining a reference cluster center (denoted as) Then, the reference cluster centers corresponding to all the category clusters can be determinedWherein C represents the total number of category clusters.
S307, determining a second Euclidean distance between each low dimensional feature in each category cluster and the reference cluster center of the category cluster.
In an alternative embodiment, the computer device may employ the following equation (4) in determining the second Euclidean distance between each low dimensional feature in each category cluster and the reference cluster center of that category cluster.
(4)
In equation (4), zi represents the i-th low-dimensional feature assigned to the j-th class cluster; The reference cluster center of the jth category cluster is represented, and di represents the Euclidean distance between the ith low-dimensional feature in the jth category cluster and the reference cluster center of the jth category cluster.
S308, determining the distance constraint of each category cluster based on a plurality of second Euclidean distances corresponding to each category cluster.
In an alternative embodiment, the computer device determines a distance constraint of each category cluster based on the plurality of second Euclidean distances corresponding to each category cluster, and the method can include determining, for each category cluster, an average value of the plurality of second Euclidean distances corresponding to the category cluster, and taking the product of the average value and a preset value as the distance constraint of the category cluster.
In this embodiment, when the computer device determines, for each category cluster, an average value of the plurality of second euclidean distances corresponding to the category cluster, the following formula (5) may be adopted.
(5)
In the formula (5) of the present invention,The average value of the Euclidean distances corresponding to the jth category cluster is represented, di is represented by the Euclidean distance between the ith low dimensional feature in the jth category cluster and the reference cluster center of the jth category cluster, and the physical meaning of other letters can refer to the explanation of the physical meaning of the letters in the formula (4) above, and the description is omitted here.
In this embodiment, the preset value is, for example, 3. Wherein 3 is an empirical value based on multiple experiments. In the case of a preset value of 3, the computer device may determine that the distance constraint of the jth category cluster is。
S309, inputting the power data to be detected into a feature extraction network to obtain the data features of the power data to be detected.
In an alternative embodiment, the description of step S309 may be referred to in the foregoing description of step S301, and will not be repeated here.
And S310, performing feature dimension reduction processing on the data features by using a principal component analysis method to obtain low-dimensional data features.
S311, determining a first Euclidean distance between the low-dimensional data characteristic and a reference cluster center of each of a plurality of category clusters corresponding to a predetermined clean power data set.
In an alternative embodiment, the computer device may calculate reference cluster centers for all class clusters corresponding to the low-dimensional data features and the clean power data setA first euclidean distance between each of the reference cluster centers.
S312, determining a detection result of the power data to be detected based on the plurality of first Euclidean distances and the predetermined distance constraint of each class cluster, wherein the detection result is clean power data or poisoning power data.
In an alternative embodiment, the computer device determines the detection result of the power data to be detected based on the plurality of first euclidean distances and the predetermined distance constraint of each category cluster, and may include determining that the power data to be detected is clean power data if any one of the first euclidean distances is determined to be less than or equal to the distance constraint corresponding to the category cluster corresponding to the first euclidean distance, and determining that the power data to be detected is toxic power data if each of the first euclidean distances is determined to be greater than the distance constraint corresponding to the category cluster corresponding to the first euclidean distance.
For example, assume that among a plurality of first euclidean distances, a first euclidean distance 1 is a euclidean distance between a low dimensional data feature of power data to be detected and a reference cluster center of a1 st category cluster in a predetermined clean power data set, the first euclidean distance 1 is 0.18, and a distance constraint of the 1 st category cluster is=0.21. In this case, the computer device may determine that the first euclidean distance 1 (0.18) is less than the distance constraint (0.21) of the 1 st category cluster, at which point the computer device may determine that the power data to be detected is clean power data.
For another example, assume that the plurality of category clusters includes a category cluster 1, a category cluster 2, and a category cluster 3, wherein a distance constraint of the category cluster 1 is 0.21, a distance constraint of the category cluster 2 is 0.24, a distance constraint of the category cluster 3 is 0.18, and a plurality of first euclidean distances include a first euclidean distance 1, a first euclidean distance 2, and a first euclidean distance 3, wherein the first euclidean distance 1 is a euclidean distance between a low dimensional data feature of the power data to be detected and a reference cluster center of the category cluster 1, the first euclidean distance 1 is 0.22, the first euclidean distance 2 is a euclidean distance between a low dimensional data feature of the power data to be detected and a reference cluster center of the category cluster 2, the first euclidean distance 2 is 0.26, and the first euclidean distance 3 is a euclidean distance between a low dimensional data feature of the power data to be detected and a reference cluster center of the category cluster 2. In this case, the computer device may determine that the first euclidean distance 1 (0.22) is greater than the distance constraint (0.21) of the category cluster 1, the first euclidean distance 2 (0.26) is greater than the distance constraint (0.24) of the category cluster 2, and the first euclidean distance 3 (0.20) is greater than the distance constraint (0.18) of the category cluster 3, at which point the computer device may determine that the power data to be detected is poisoning power data.
In the embodiment of the application, the computer equipment can acquire a clean power data set, determine the reference cluster centers of a plurality of class clusters corresponding to the clean power data set, then determine the Euclidean distance between all samples in each class cluster and the reference cluster center of the class cluster, determine the distance constraint of each class cluster based on the Euclidean distances, then determine the low-dimensional data characteristic of the power data to be detected in the process of detecting the power data to be detected, determine the Euclidean distance between the low-dimensional data characteristic and the reference cluster center of each class cluster, and finally determine the power data to be detected as clean power data or as poisoning power data based on the Euclidean distances and the distance constraint of each class cluster. Compared with the existing power data detection method, the method has the advantages that complicated data processing is not needed for input data, so that the calculation cost for detecting the power data can be reduced, and further real-time detection of large-scale power data is realized.
In addition, the application utilizes the feature extraction network comprising a plurality of hidden layers to extract the features of the electric data, and can extract the deep features of the electric data (or utilize the deep features of the electric data output by the last hidden layer in the feature extraction network), then, the deep features are used as the basis of back door analysis, and as the deep features are used as the highest-order features of the input electric data, the decision process of the model is determined, and meanwhile, the decision process of how to correctly identify clean samples by the back door model is implied, and the reasons for incorrectly predicting the back door samples as target types are implied, therefore, the deep features are analyzed, the accuracy of detecting the electric data to be detected can be improved, or the poisoning electric data and the clean electric data can be distinguished more accurately.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a power data back door detection device for realizing the power data back door detection method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the device for detecting a back door of electric power data provided below may refer to the limitation of the method for detecting a back door of electric power data hereinabove, and will not be repeated here.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a power data back door detection device according to an embodiment of the application. As shown in fig. 4, the power data detection apparatus may include, but is not limited to:
the feature extraction module 401 is configured to input power data to be detected into a feature extraction network to obtain a data feature of the power data to be detected;
the feature dimension reduction module 402 is configured to perform feature dimension reduction processing on the data feature to obtain a low-dimension data feature;
A determining module 403, configured to determine a first euclidean distance between the low dimensional data feature and a reference cluster center of each of a plurality of category clusters corresponding to a predetermined clean power data set;
The determining module 403 is further configured to determine a detection result of the power data to be detected based on the plurality of first euclidean distances and a predetermined distance constraint of each category cluster, where the detection result is clean power data or toxic power data.
In one embodiment, the determining module 403 is configured to determine, when determining a detection result for the power data to be detected based on the plurality of first euclidean distances and the predetermined distance constraint of each class cluster, specifically configured to determine that the power data to be detected is clean power data if any one of the first euclidean distances is less than or equal to the distance constraint corresponding to the class cluster corresponding to the first euclidean distance, and determine that the power data to be detected is toxic power data if each of the first euclidean distances is greater than the distance constraint corresponding to the class cluster corresponding to the first euclidean distance.
In one embodiment, the feature extraction module 401 is further configured to input a plurality of pieces of power data in the obtained clean power data set into the feature extraction network to obtain a data feature matrix corresponding to the plurality of pieces of power data, the feature dimension reduction module 402 is further configured to perform feature dimension reduction processing on the data feature matrix to obtain a low-dimensional feature matrix, cluster the low-dimensional feature matrix to obtain a plurality of class clusters, the determination module 403 is further configured to determine an initial cluster center of each class cluster based on each low-dimensional feature in each class cluster and the number of low-dimensional features in the class cluster, and update the initial cluster center of each class cluster by minimizing an objective function to obtain a reference cluster center of each class cluster.
In one embodiment, the determining module 403 is further configured to determine a second euclidean distance between each low dimensional feature in each category cluster and the reference cluster center of the category cluster, and determine a distance constraint for each category cluster based on the plurality of second euclidean distances corresponding to each category cluster.
In one embodiment, the determining module 403 is configured to determine, for each category cluster, an average value of the plurality of second euclidean distances corresponding to the category cluster, and taking a product of the average value and a preset value as the distance constraint of the category cluster when determining the distance constraint of each category cluster based on the plurality of second euclidean distances corresponding to the category cluster.
In one embodiment, the feature dimension reduction module 402 is configured to perform feature dimension reduction processing on the data feature to obtain a low-dimensional data feature, and is specifically configured to perform feature dimension reduction processing on the data feature by using a principal component analysis method to obtain the low-dimensional data feature.
The above-mentioned various modules in the power data back door detection apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or independent of a processor in the terminal device, or may be stored in software in a memory in the terminal device, so that the processor may call and execute operations corresponding to the above modules.
In an exemplary embodiment, an embodiment of the present application provides a computer device, which may be a terminal, and an internal structure diagram thereof may be as shown in fig. 5. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The Communication interface of the computer device is used for conducting wired or wireless Communication with an external terminal, and the wireless Communication can be realized through WIFI, a mobile cellular network, near field Communication (NEAR FIELD Communication) or other technologies. The computer program when executed by a processor implements a power data back door detection method.
It will be appreciated by those skilled in the art that the structure shown in FIG. 5 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In one exemplary embodiment, the application provides a computer device comprising a memory and a processor, the memory having stored therein a computer program which when executed by the processor performs the steps of:
Inputting the power data to be detected into a feature extraction network to obtain the data features of the power data to be detected, wherein the feature extraction network comprises a plurality of hidden layers;
Performing feature dimension reduction processing on the data features to obtain low-dimension data features;
Determining a first Euclidean distance between the low-dimensional data characteristic and a reference cluster center of each of a plurality of category clusters corresponding to a predetermined clean power data set;
And determining a detection result of the power data to be detected based on the plurality of first Euclidean distances and the predetermined distance constraint of each class cluster, wherein the detection result is clean power data or toxic power data.
In one embodiment, when the processor executes the computer program to determine a detection result for the power data to be detected based on a plurality of first euclidean distances and a predetermined distance constraint of each category cluster, the method specifically includes determining that the power data to be detected is clean power data if any one of the first euclidean distances is determined to be less than or equal to a distance constraint corresponding to the category cluster corresponding to the first euclidean distance, and determining that the power data to be detected is toxic power data if each of the first euclidean distances is determined to be greater than the distance constraint corresponding to the category cluster corresponding to the first euclidean distance.
In one embodiment, the processor executes the computer program to further perform the steps of inputting a plurality of pieces of power data in the obtained clean power data set into the feature extraction network to obtain a data feature matrix corresponding to the plurality of pieces of power data, performing feature dimension reduction processing on the data feature matrix to obtain a low-dimensional feature matrix, clustering the low-dimensional feature matrix to obtain a plurality of category clusters, determining an initial cluster center of each category cluster based on each low-dimensional feature in each category cluster and the number of the low-dimensional features in the category cluster, and updating the initial cluster center of each category cluster by minimizing an objective function to obtain a reference cluster center of each category cluster.
In one embodiment, the processor executing the computer program further implements the steps of determining a second Euclidean distance between each low dimensional feature in each category cluster and the reference cluster center of the category cluster, and determining a distance constraint for each category cluster based on a plurality of second Euclidean distances corresponding to each category cluster.
In one embodiment, when the processor executes the computer program to determine the distance constraint of each category cluster based on the plurality of second Euclidean distances corresponding to each category cluster, the method specifically comprises the steps of determining, for each category cluster, an average value of the plurality of second Euclidean distances corresponding to the category cluster, and taking the product of the average value and a preset value as the distance constraint of the category cluster.
In one embodiment, when the processor executes the computer program to realize the feature dimension reduction processing on the data features to obtain the low-dimension data features, the processor specifically realizes the steps of utilizing a principal component analysis method to perform the feature dimension reduction processing on the data features to obtain the low-dimension data features.
In one exemplary embodiment, the application provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
Inputting the power data to be detected into a feature extraction network to obtain the data features of the power data to be detected, wherein the feature extraction network comprises a plurality of hidden layers;
Performing feature dimension reduction processing on the data features to obtain low-dimension data features;
Determining a first Euclidean distance between the low-dimensional data characteristic and a reference cluster center of each of a plurality of category clusters corresponding to a predetermined clean power data set;
And determining a detection result of the power data to be detected based on the plurality of first Euclidean distances and the predetermined distance constraint of each class cluster, wherein the detection result is clean power data or toxic power data.
In one embodiment, when the computer program is executed by the processor to determine a detection result for the power data to be detected based on the plurality of first euclidean distances and the predetermined distance constraint of each category cluster, the method specifically includes determining that the power data to be detected is clean power data if any one of the first euclidean distances is determined to be less than or equal to the distance constraint corresponding to the category cluster corresponding to the first euclidean distance, and determining that the power data to be detected is toxic power data if each of the first euclidean distances is determined to be greater than the distance constraint corresponding to the category cluster corresponding to the first euclidean distance.
In one embodiment, the computer program when executed by the processor further comprises the steps of inputting a plurality of pieces of power data in the obtained clean power data set into the feature extraction network to obtain data feature matrixes corresponding to the plurality of pieces of power data, performing feature dimension reduction processing on the data feature matrixes to obtain low-dimensional feature matrixes, clustering the low-dimensional feature matrixes to obtain a plurality of category clusters, determining an initial cluster center of each category cluster based on each low-dimensional feature in each category cluster and the number of the low-dimensional features in the category cluster, and updating the initial cluster center of each category cluster by a minimized objective function to obtain a reference cluster center of each category cluster.
In one embodiment, the computer program when executed by the processor further performs the steps of determining a second Euclidean distance between each low dimensional feature in each category cluster and a reference cluster center of the category cluster, and determining a distance constraint for each category cluster based on a plurality of second Euclidean distances corresponding to each category cluster.
In one embodiment, when the computer program is executed by the processor to determine the distance constraint of each category cluster based on the plurality of second euclidean distances corresponding to each category cluster, the method specifically includes determining, for each category cluster, an average value of the plurality of second euclidean distances corresponding to the category cluster, and taking a product of the average value and a preset value as the distance constraint of the category cluster.
In one embodiment, when the computer program is executed by the processor to perform feature dimension reduction processing on the data features to obtain low-dimensional data features, the method specifically includes the steps of performing feature dimension reduction processing on the data features by using a principal component analysis method to obtain the low-dimensional data features.
In one exemplary embodiment, the application provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of:
Inputting the power data to be detected into a feature extraction network to obtain the data features of the power data to be detected, wherein the feature extraction network comprises a plurality of hidden layers;
Performing feature dimension reduction processing on the data features to obtain low-dimension data features;
Determining a first Euclidean distance between the low-dimensional data characteristic and a reference cluster center of each of a plurality of category clusters corresponding to a predetermined clean power data set;
And determining a detection result of the power data to be detected based on the plurality of first Euclidean distances and the predetermined distance constraint of each class cluster, wherein the detection result is clean power data or toxic power data.
In one embodiment, when the computer program is executed by the processor to determine a detection result for the power data to be detected based on the plurality of first euclidean distances and the predetermined distance constraint of each category cluster, the method specifically includes determining that the power data to be detected is clean power data if any one of the first euclidean distances is determined to be less than or equal to the distance constraint corresponding to the category cluster corresponding to the first euclidean distance, and determining that the power data to be detected is toxic power data if each of the first euclidean distances is determined to be greater than the distance constraint corresponding to the category cluster corresponding to the first euclidean distance.
In one embodiment, the computer program when executed by the processor further comprises the steps of inputting a plurality of pieces of power data in the obtained clean power data set into the feature extraction network to obtain data feature matrixes corresponding to the plurality of pieces of power data, performing feature dimension reduction processing on the data feature matrixes to obtain low-dimensional feature matrixes, clustering the low-dimensional feature matrixes to obtain a plurality of category clusters, determining an initial cluster center of each category cluster based on each low-dimensional feature in each category cluster and the number of the low-dimensional features in the category cluster, and updating the initial cluster center of each category cluster by a minimized objective function to obtain a reference cluster center of each category cluster.
In one embodiment, the computer program when executed by the processor further performs the steps of determining a second Euclidean distance between each low dimensional feature in each category cluster and a reference cluster center of the category cluster, and determining a distance constraint for each category cluster based on a plurality of second Euclidean distances corresponding to each category cluster.
In one embodiment, when the computer program is executed by the processor to determine the distance constraint of each category cluster based on the plurality of second euclidean distances corresponding to each category cluster, the method specifically includes determining, for each category cluster, an average value of the plurality of second euclidean distances corresponding to the category cluster, and taking a product of the average value and a preset value as the distance constraint of the category cluster.
In one embodiment, when the computer program is executed by the processor to perform feature dimension reduction processing on the data features to obtain low-dimensional data features, the method specifically includes the steps of performing feature dimension reduction processing on the data features by using a principal component analysis method to obtain the low-dimensional data features.
It should be noted that, the data (including but not limited to the power data to be detected, the clean power data set, etc.) related to the present application are all information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data are required to meet the related regulations.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile memory and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (RESISTIVE RANDOM ACCESS MEMORY, reRAM), magneto-resistive Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (PHASE CHANGE Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computation, an artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) processor, or the like, but is not limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the present application.
The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.