CN117725496A

Movatterモバイル変換

Info

Publication number: CN117725496A
Application number: CN202311615940.7A
Authority: CN
Inventors: 孙赫阳; 张彬; 孙峰; 李广野; 吴林桥; 荆澜涛; 宋进良; 刘扬; 张佳鑫; 阎宇航; 邱兵兵; 佟帅辰; 姜力行; 李菁菁; 李欢; 刘齐; 佟浩松; 孙茜; 肖楠; 朱紫煜
Original assignee: Electric Power Research Institute of State Grid Liaoning Electric Power Co Ltd; Shenyang Institute of Engineering; State Grid Corp of China SGCC
Current assignee: Electric Power Research Institute of State Grid Liaoning Electric Power Co Ltd; State Grid Corp of China SGCC
Priority date: 2023-11-29
Filing date: 2023-11-29
Publication date: 2024-03-19

Abstract

一种基于相似性度量和决策树支持向量机的窃电监测方法，包括，采集用户电力使用数据并进行预处理，通过基于AMI的用户行为分析来获取用户的用电特征数据；基于获取的用户用电特征数据，利用一维生成对抗网络1D‑WGAN生成样本数据，将生成的样本数据与获取的用户用电特征数据进行结合，得到待分析数据；根据相似性度量算法从所述待分析数据中确定可疑用户，并利用决策树支持向量机DT‑KSVM对可疑用户进行检测，识别非法窃电用户。本发明的方案可以更加精确、快速地进行专变用户窃电监测，保障电力企业利益。

A power theft monitoring method based on similarity measurement and decision tree support vector machine, including collecting user power usage data and preprocessing, and obtaining user power consumption characteristic data through user behavior analysis based on AMI; based on the obtained user For electricity consumption characteristic data, the one-dimensional generative adversarial network 1D-WGAN is used to generate sample data, and the generated sample data is combined with the acquired user electricity consumption characteristic data to obtain the data to be analyzed; according to the similarity measurement algorithm, the data to be analyzed is obtained Identify suspicious users, and use decision tree support vector machine DT‑KSVM to detect suspicious users and identify illegal power stealing users. The solution of the present invention can more accurately and quickly monitor electricity theft by specific users and protect the interests of electric power enterprises.

Description

Translated fromChinese

基于相似性度量和决策树支持向量机的窃电监测方法Electricity theft monitoring method based on similarity measure and decision tree support vector machine

技术领域Technical field

本发明属于反用户窃电领域，特别涉及一种基于相似性度量和决策树支持向量机的用户窃电监测方法和系统。The invention belongs to the field of anti-user electricity theft, and particularly relates to a user electricity theft monitoring method and system based on similarity measurement and decision tree support vector machine.

背景技术Background technique

窃电行为是指在电力供应与使用过程中，用户采用秘密窃取的方式非法占用国家电能，以达到不交或少交电费用电的违法行为。窃电行为影响着供电质量和电网运行安全，严重影响着社会的稳定与和谐。为了防止窃电，供电企业采取了多种手段，但传统的反窃电方法尚未达到可监督和智能化的要求。具体而言，传统的反专变用户窃电方法需要稽查人员经过长期的工作经验积累进行判断，存在误判的可能，同时用电量的检测需要人工参与度较高，对于一个区域内的窃电排查工作往往需要大量工作人员的参与，检测效率低且缺乏智能化。Electricity theft refers to the illegal act of users illegally occupying national electric energy through secret theft during the supply and use of electricity in order to avoid paying or paying less electricity. Electricity theft affects the quality of power supply and the safety of power grid operations, and seriously affects social stability and harmony. In order to prevent electricity theft, power supply companies have adopted a variety of measures, but traditional anti-electricity theft methods have not yet reached the requirements of supervision and intelligence. Specifically, the traditional method of anti-theft of electricity by special users requires inspectors to make judgments after long-term work experience accumulation, and there is the possibility of misjudgment. At the same time, the detection of electricity consumption requires a high degree of manual participation. For theft of electricity in an area, Electrical inspection work often requires the participation of a large number of staff, and the detection efficiency is low and lacks intelligence.

发明内容Contents of the invention

为了解决现有技术中存在的不足，本发明提供了一种基于相似性度量和决策树支持向量机的窃电监测方法和系统，以实现科学合理、实用性强，监测结果准确可靠，保护电网安全运行的基于数相似性度量和决策树支持向量机的智能化用户窃电监测。In order to solve the deficiencies in the existing technology, the present invention provides a power theft monitoring method and system based on similarity measurement and decision tree support vector machine to achieve scientific and reasonable and practical results, accurate and reliable monitoring results, and protection of the power grid. Safe operation of intelligent user electricity theft monitoring based on numerical similarity measure and decision tree support vector machine.

为解决上述技术问题，本发明采用如下的技术方案。In order to solve the above technical problems, the present invention adopts the following technical solutions.

本发明首先公开了一种窃电监测方法，该方法包括以下步骤：The invention first discloses a method for monitoring electricity theft, which method includes the following steps:

步骤1，采集用户电力使用数据并进行预处理，通过基于AMI的用户行为分析来获取用户的用电特征数据；Step 1: Collect user power usage data and perform preprocessing, and obtain user power usage characteristic data through user behavior analysis based on AMI;

步骤2，基于获取的用户用电特征数据，利用一维生成对抗网络1D-WGAN生成样本数据，将生成的样本数据与获取的用户用电特征数据进行结合，得到待分析数据；Step 2: Based on the obtained user electricity consumption characteristic data, use the one-dimensional generative adversarial network 1D-WGAN to generate sample data, and combine the generated sample data with the obtained user electricity consumption characteristic data to obtain the data to be analyzed;

步骤3，根据相似性度量算法从所述待分析数据中确定可疑用户，并利用决策树支持向量机DT-KSVM对所述可疑用户进行检测，识别非法窃电用户。Step 3: Determine suspicious users from the data to be analyzed based on the similarity measurement algorithm, and use the decision tree support vector machine DT-KSVM to detect the suspicious users and identify illegal electricity stealing users.

本发明进一步包括以下优选方案：The present invention further includes the following preferred solutions:

所述步骤1中，采集用户电力使用数据并进行预处理，进一步包括：In step 1, user power usage data is collected and preprocessed, further including:

将n时刻m个用户的测量数据用矩阵表示为：The measurement data of m users at time n is expressed as a matrix:

式中X_ij表示第i个用户在第j个测量周期内对智能电表的测量值；In the formula, X_ij represents the measurement value of the smart meter by the i-th user in the j-th measurement period;

对测量值进行规范化处理，进行最小值和最大值的归一化，映射为结果值[0，1]：The measured values are normalized, the minimum and maximum values are normalized, and mapped to the result value [0, 1]:

其中x为实际测量数据；x_max为样本数据的最大值；x_min为样本数据的最小值；x^*为归一化后的用电量数据。Among them, x is the actual measurement data; x_max is the maximum value of the sample data; x_min is the minimum value of the sample data; x^* is the normalized electricity consumption data.

所述步骤2中，利用一维生成对抗网络1D-WGAN生成样本数据，进一步包括：In step 2, the one-dimensional generative adversarial network 1D-WGAN is used to generate sample data, which further includes:

步骤2.1：利用基于一维卷积层的生成对抗网络GAN学习测量数据的特征分布；Step 2.1: Use the generative adversarial network GAN based on one-dimensional convolutional layer to learn the characteristic distribution of the measurement data;

步骤2.2：以Wasserstein距离为目标对所述生成对抗网络GAN进行训练，基于预定义相似性约束和真实性约束生成符合所述特征分布的样本数据。Step 2.2: Train the generative adversarial network GAN with the Wasserstein distance as the goal, and generate sample data that conforms to the feature distribution based on predefined similarity constraints and authenticity constraints.

所述生成对抗网络包括生成器G和鉴别器D，其中生成器G的损失函数定义为目标函数为/>鉴别器D的损失函数定义为目标函数为The generative adversarial network includes a generator G and a discriminator D, where the loss function of the generator G is defined as The objective function is/> The loss function of discriminator D is defined as The objective function is

整个博弈过程的目标函数为：The objective function of the entire game process is:

其中p_data(X)为作为训练集的窃电数据，p_Z(z)为噪声变量z的高斯分布，G(z)为生成器输出。Among them, p_{data (X)} is the electricity theft data as the training set, p_Z (z) is the Gaussian distribution of the noise variable z, and G (z) is the generator output.

所述步骤3，进一步包括：The step 3 further includes:

步骤3.1：计算每个用户的特征曲线与正常用电用户的特征曲线的相似性度量距离，并通过判断相似性度量距离与预设相似性阈值的关系来确定可疑用户；Step 3.1: Calculate the similarity measurement distance between each user's characteristic curve and the characteristic curve of normal electricity users, and determine the suspicious user by judging the relationship between the similarity measurement distance and the preset similarity threshold;

步骤3.2：构建对应于每一种窃电特征的多类SVM分类器，根据层次分类模型构建决策树，根据样本与分类超平面之间的距离，选择SVM分类和KNN分类算法对可疑用户的行为特征进行分类，得到窃电用户识别结果。Step 3.2: Construct a multi-class SVM classifier corresponding to each type of electricity theft feature, build a decision tree based on the hierarchical classification model, and select SVM classification and KNN classification algorithms for suspicious user behavior based on the distance between the sample and the classification hyperplane. Characteristics are classified to obtain electricity stealing user identification results.

所述样本与分类超平面之间的距离定义为：The distance between the sample and the classification hyperplane is defined as:

其中s为训练样本中的支持向量；为待分类的样本。where s is the support vector in the training sample; for the samples to be classified.

本发明同时公开了一种利用前述基于相似性度量和决策树支持向量机的窃电监测方法的基于相似性度量和决策树支持向量机的窃电监测系统，包括用电特征数据获取模块、样本数据生成模块和窃电用户识别模块。The invention also discloses an electricity theft monitoring system based on similarity measurement and decision tree support vector machine that utilizes the aforementioned electricity theft monitoring method based on similarity measurement and decision tree support vector machine, including an electricity consumption characteristic data acquisition module and a sample Data generation module and electricity stealing user identification module.

所述用电特征数据获取模块，用于采集用户电力使用数据并进行预处理，通过基于AMI的用户行为分析来获取用户的用电特征数据；The power consumption characteristic data acquisition module is used to collect user power usage data and perform preprocessing, and obtain the user's power consumption characteristic data through user behavior analysis based on AMI;

所述样本数据生成模块，用于基于获取的用户用电特征数据，利用一维生成对抗网络1D-WGAN生成样本数据，将生成的样本数据与获取的用户用电特征数据进行结合，得到待分析数据；The sample data generation module is used to generate sample data based on the obtained user electricity consumption characteristic data using the one-dimensional generative adversarial network 1D-WGAN, and combine the generated sample data with the obtained user electricity consumption characteristic data to obtain the data to be analyzed data;

所述窃电用户识别模块，用于根据相似性度量算法从所述待分析数据中确定可疑用户，并利用决策树支持向量机DT-KSVM对所述可疑用户进行检测，识别非法窃电用户。The power-stealing user identification module is used to determine suspicious users from the data to be analyzed based on the similarity measurement algorithm, and use the decision tree support vector machine DT-KSVM to detect the suspicious users and identify illegal power-stealing users.

相应地，本申请还公开了一种终端，包括处理器及存储介质；Correspondingly, this application also discloses a terminal, including a processor and a storage medium;

所述存储介质用于存储指令；The storage medium is used to store instructions;

所述处理器用于根据所述指令进行操作以执行根据前述基于相似性度量和决策树支持向量机的窃电监测方法的步骤。The processor is configured to operate according to the instructions to perform steps according to the aforementioned power theft monitoring method based on similarity measures and decision tree support vector machines.

相应地，本申请还公开了一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现前述基于相似性度量和决策树支持向量机的窃电监测方法的步骤。Correspondingly, this application also discloses a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, the steps of the power theft monitoring method based on similarity measurement and decision tree support vector machine are implemented.

本发明的有益效果在于，与现有技术相比，本发明提供了一种基于相似性度量和决策树支持向量机的窃电监测方法和系统，采用基于相似性度量和决策树支持向量机的用户窃电监测模型，在进行专变用户窃电监测时更加精确、快速地发现可疑的用户和确定的窃电用户，可节省人力资源，且监测方式更具智能化，监测可靠、效率更高，能够弥补传统的转变用户窃电方法中的不足，有力支撑配变增容稽查工作的开展、提高稽查精度，保障电力企业的经济利益。本发明能够及时核对各区域电网功率潮流数值确保电力调度的安全进行，对于维护电力系统的安全稳定运行、支撑电网数字化、全面化发展具有十分重要的现实意义，其科学合理，实用性强。The beneficial effect of the present invention is that, compared with the existing technology, the present invention provides a power theft monitoring method and system based on similarity measurement and decision tree support vector machine. The user electricity theft monitoring model can more accurately and quickly discover suspicious users and confirmed electricity theft users when conducting special user electricity theft monitoring, which can save human resources, and the monitoring method is more intelligent, reliable and efficient. , can make up for the shortcomings of the traditional method of changing users' electricity theft, effectively support the development of distribution transformer capacity increase inspection work, improve inspection accuracy, and protect the economic interests of power companies. The invention can timely check the power flow values of each regional power grid to ensure the safe progress of power dispatching. It has very important practical significance for maintaining the safe and stable operation of the power system and supporting the digitalization and comprehensive development of the power grid. It is scientific, reasonable and highly practical.

附图说明Description of the drawings

图1为本发明中的定位窃电用户的方法流程图。Figure 1 is a flow chart of a method for locating electricity stealing users in the present invention.

图2为本发明中的DT-SVM的结构示意图。Figure 2 is a schematic structural diagram of the DT-SVM in the present invention.

图3为本发明中的定位窃电用户的方法示意图中基于KSVM的窃电检测流程图。Figure 3 is a KSVM-based electricity theft detection flow chart in the schematic diagram of the method for locating electricity thieves in the present invention.

图4为本发明具体示例中的日负荷曲线示意图。Figure 4 is a schematic diagram of the daily load curve in a specific example of the present invention.

图5为本发明具体示例中的生成样本与真实样本的对比示意图。Figure 5 is a schematic diagram comparing generated samples and real samples in specific examples of the present invention.

图6为本发明具体示例中的不同检测方法下的漏检率和过检率对比示意图。Figure 6 is a schematic diagram comparing the missed detection rate and over-detection rate under different detection methods in specific examples of the present invention.

图7为本发明具体示例中的训练样本的百分比和准确率对比示意图。Figure 7 is a schematic diagram comparing the percentage and accuracy of training samples in a specific example of the present invention.

图8为本发明中的基于相似性度量和决策树支持向量机的窃电监测系统的结构示意图。Figure 8 is a schematic structural diagram of the power theft monitoring system based on similarity measurement and decision tree support vector machine in the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对本发明的技术方案进行清楚、完整地描述。In order to make the purpose, technical solutions and advantages of the present invention more clear, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

本申请所描述的实施例仅仅是本发明一部分的实施例，而不是全部实施例。基于本发明精神，本领域普通技术人员在没有作出创造性劳动前提下所获得的有所其它实施例，都属于本发明的保护范围。The embodiments described in this application are only part of the embodiments of the present invention, rather than all embodiments. Based on the spirit of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

针对现有技术的不足，本发明提出一种基于相似性度量和决策树支持向量机的窃电监测方法和系统，首先进行基于先进的计量基础设施(Advanced MeteringInfrastructure AMI)的用户行为分析，然后基于一维生成对抗网络(1D-GenerativeAdversarial Networks GAN)与Wasserstein结合的1D-WGAN生成样本数据。最后基于相似性度量和决策树支持向量机(Decision Tree Combined K-Nearest Neighbor andSupport Vector Machine DT-KSVM)算法完成窃电用户定位，通过对比实验证明本发明方法的准确率、鲁棒性和抗噪声能力。In view of the shortcomings of the existing technology, the present invention proposes a power theft monitoring method and system based on similarity measurement and decision tree support vector machine. First, user behavior analysis based on Advanced Metering Infrastructure (AMI) is performed, and then based on 1D-WGAN combined with 1D-Generative Adversarial Networks GAN and Wasserstein generates sample data. Finally, the location of electricity stealing users is completed based on the similarity measure and the Decision Tree Combined K-Nearest Neighbor and Support Vector Machine DT-KSVM algorithm. The accuracy, robustness and noise resistance of the method of the present invention are proved through comparative experiments. ability.

本发明公开的基于相似性度量和决策树支持向量机的窃电监测方法，包括以下步骤：The electricity theft monitoring method disclosed by the present invention based on similarity measurement and decision tree support vector machine includes the following steps:

步骤1：采集用户电力使用数据并进行预处理，通过基于AMI的用户行为分析来获取用户的用电特征数据。Step 1: Collect user power usage data and perform preprocessing, and obtain user power usage characteristic data through user behavior analysis based on AMI.

基于先进的计量基础设施(Advanced Metering Infrastructure，AMI)的用户行为分析，AMI是指用于测量、收集、存储、分析和使用客户能源使用情况的系统收集的数据，包括各种大、中、小型典型变压器用户，以及380V/220V低压居民的数据收集信息，包括电能数据、事件记录和其他数据等数据项。对采集到的数据项进行采集、分析、存储，以得到用户的用电信息和用户的用电行为。其中，用户用电数据具有多重特征，对于不同的用户，选择不同的特征集来分析用户的用电量行为会产生不同的分析结果，因此需要选择有效的特征来表示电力消费行为。这些特征与用户的行为密切相关，各种特征之间存在着显著的相关性。特征空间中包含的冗余信息导致分析结果较差。去除重叠和冗余的信息，分析用户的用电行为，有助于提高分析的性能。User behavior analysis based on Advanced Metering Infrastructure (AMI). AMI refers to data collected by systems used to measure, collect, store, analyze and use customer energy usage, including various large, medium and small Data collection information for typical transformer users and 380V/220V low-voltage residents, including data items such as power data, event records and other data. The collected data items are collected, analyzed, and stored to obtain the user's electricity consumption information and user's electricity consumption behavior. Among them, user electricity consumption data has multiple characteristics. For different users, selecting different feature sets to analyze the user's electricity consumption behavior will produce different analysis results. Therefore, it is necessary to select effective features to represent the electricity consumption behavior. These features are closely related to user behavior, and there are significant correlations between various features. The redundant information contained in the feature space leads to poor analysis results. Removing overlapping and redundant information and analyzing users' power consumption behavior can help improve the performance of analysis.

代表用户用电行为特征的典型特征包括：统计特征，包括日用电量、年用电量、季节用电量、日最大最小负荷、平均负荷率等；时间序列特征，包括峰时耗电量、谷功率系数等；以及其他相关特征，包括房屋面积、家庭人数等。Typical features that represent users’ electricity consumption behavior include: statistical features, including daily power consumption, annual power consumption, seasonal power consumption, daily maximum and minimum loads, average load rate, etc.; time series features, including peak power consumption. , valley power coefficient, etc.; and other related characteristics, including house area, family size, etc.

在进一步实施例中，基于AMI的用户行为分析，根据基尔霍夫定律及能量守恒定律，当网损过大时，此时窃电行为的可能性非常高，此时也被称为是可疑点。网损的总消耗量由变压器测量并进行对比。AMI系统获得的窃电数据在n时刻涉及m个用户，其数据形式用矩阵表示。同一用户在不同时间段j的数据可以指定为X_j，对于不同用户，在n时刻的数据矩阵如下：In a further embodiment, user behavior analysis based on AMI, according to Kirchhoff's law and the law of conservation of energy, when the network loss is too large, the possibility of electricity theft is very high, which is also called suspicious. point. The total consumption of network losses is measured by the transformer and compared. The electricity theft data obtained by the AMI system involves m users at n time, and its data form is represented by a matrix. The data of the same user in different time periods j can be designated as X_j . For different users, the data matrix at time n is as follows:

式中X_i，j表示第i个用户在第j个测量周期内对智能电表的测量值。In the formula, X_{i, j} represents the measurement value of the smart meter by the i-th user in the j-th measurement period.

对原始测量值进行规范化处理，包括最小值和最大值的归一化，将原始数据的线性变换映射为结果值[0，1]。转换函数如下：The original measurement values are normalized, including the normalization of the minimum and maximum values, and the linear transformation of the original data is mapped to the resulting value [0, 1]. The conversion function is as follows:

式中，x为实际测量数据；x_max为样本数据的最大值；x_min为样本数据的最小值；x^*为归一化后的用电量数据。In the formula, x is the actual measurement data; x_max is the maximum value of the sample data; x_min is the minimum value of the sample data; x^* is the normalized electricity consumption data.

步骤2：基于获取的用户用电特征数据，利用一维生成对抗网络1D-WGAN生成样本数据，将生成的样本数据与获取的用户用电特征数据进行结合，得到待分析数据。Step 2: Based on the obtained user electricity consumption characteristic data, use the one-dimensional generative adversarial network 1D-WGAN to generate sample data, and combine the generated sample data with the obtained user electricity consumption characteristic data to obtain the data to be analyzed.

基于一维生成对抗网络(1D-generative adversarial networks GAN)与Wasserstein结合的1D-WGAN的样本数据生成。由于测量数据是一维时间序列，因此设计基于一维卷积层的GAN网络结构。深度神经网络从少量的样本数据中学习测量数据的基本特征，并基于Wasserstein距离、相似性约束和真实性约束，生成符合窃电特征的高精度测量数据。最后，将生成的样本与已有样本结合，得到更多的样本。Sample data generation based on 1D-WGAN combined with 1D-generative adversarial networks GAN and Wasserstein. Since the measurement data is a one-dimensional time series, a GAN network structure based on one-dimensional convolutional layers is designed. The deep neural network learns the basic characteristics of measurement data from a small amount of sample data, and based on Wasserstein distance, similarity constraints and authenticity constraints, generates high-precision measurement data that conforms to the characteristics of electricity theft. Finally, the generated samples are combined with existing samples to obtain more samples.

所述步骤2具体包括：The step 2 specifically includes:

步骤2.1：利用基于一维卷积层的生成对抗网络GAN学习测量数据的特征分布。Step 2.1: Use the generative adversarial network GAN based on one-dimensional convolutional layers to learn the characteristic distribution of the measurement data.

首先，生成对抗网络一般包含生成器、判别器两个主要模块，生成器的作用是利用随机噪声来尽可能的生成符合要求的逼真样本,,判别器的作用则是尽可能准确的区分生成器生成的虚假样本与真实样本。选择已有的少量窃电数据作为训练集。窃电数据难以利用明确的数学模型来描述，将其定义为p_data(X)，同时，引入一组随机噪声变量z，满足高斯分布p_Z(z)。映射的建立过程通过GAN的训练来实现。通过这种方式，可以从已知的分布中抽样，生成符合原始数据分布的新数据。生成器G用于学习样本分布的规律并生成新样本。G由一个神经网络组成，输入是一个高斯分布P_Z(z)，对应随机变量z，输出是G(z)。生成数据的分布规律Pg(z)逐渐拟合样本数据P_data(X)。生成器的目标是生成尽可能真实的数据来混淆鉴别器D，因此其损失函数定义为生成器的目标函数为：First of all, generative adversarial networks generally include two main modules: generator and discriminator. The function of the generator is to use random noise to generate as realistic samples as possible that meet the requirements, and the function of the discriminator is to distinguish the generator as accurately as possible. Generated fake samples and real samples. Select a small amount of existing electricity theft data as the training set. Electricity theft data is difficult to describe using a clear mathematical model, so it is defined as p_data (X). At the same time, a set of random noise variables z is introduced to satisfy the Gaussian distribution p_Z (z). The mapping establishment process is implemented through GAN training. In this way, it is possible to sample from a known distribution and generate new data that conforms to the distribution of the original data. Generator G is used to learn the pattern of sample distribution and generate new samples. G consists of a neural network, the input is a Gaussian distribution P_Z (z), corresponding to the random variable z, and the output is G(z). The distribution pattern Pg(z) of the generated data gradually fits the sample data P_data (X). The goal of the generator is to generate data that is as realistic as possible to confuse the discriminator D, so its loss function is defined as The objective function of the generator is:

其中：E为期望值或熵，z为随机变量，p_Z(z)为z的分布；Among them: E is the expected value or entropy, z is a random variable, and p_Z (z) is the distribution of z;

鉴别器(D)用于确定输入数据是否真实。D也是一个神经网络，但输入是实际数据或生成器生成的数据。鉴别器的主要任务是区分两类数据，所以其输出是0到1之间的标量，这是属于实际数据或生成数据的概率。D的损失函数可以定义为其目标函数为：The discriminator (D) is used to determine whether the input data is authentic. D is also a neural network, but the input is actual data or data generated by a generator. The main task of the discriminator is to distinguish between two types of data, so its output is a scalar between 0 and 1, which is the probability of belonging to actual data or generated data. The loss function of D can be defined as Its objective function is:

其中：E为期望值或熵，X为真实数据样本，p_Z(z)为z的分布；P_data(X)为X的分布Among them: E is the expected value or entropy, X is the real data sample, p_Z (z) is the distribution of z; P_data (X) is the distribution of X

G(z)表示在生成器中基于z生成的假数据，D(X)表示判别器在真实数据x上判断出的结果，D(G(z))表示判别器在假数据G(z)上判断出的结果，其中D(X)和D(G(z))都是数据为真的概率，是所有X都是真实数据时logD(X)的期望，是所有数据都是生成数据时log(1-D(G(z))的期望。G(z) represents the fake data generated based on z in the generator, D(X) represents the result judged by the discriminator on the real data x, and D(G(z)) represents the result of the discriminator on the fake data G(z) The result judged above, where D(X) and D(G(z)) are both probabilities that the data is true, is the expectation of logD(X) when all X are real data, is the expectation of log(1-D(G(z)) when all data is generated data.

步骤2.2：以Wasserstein距离为目标对所述生成对抗网络GAN进行训练，基于预定义相似性约束和真实性约束生成符合所述特征分布的样本数据。Step 2.2: Train the generative adversarial network GAN with the Wasserstein distance as the goal, and generate sample data that conforms to the characteristic distribution based on predefined similarity constraints and authenticity constraints.

使用Wasserstein距离代替JS散度。以最小的Wasserstein距离为目标训练GAN，有效地提高了GAN训练的稳定性。使用Wasserstein距离可以缓解训练过程中梯度消失的问题，提高训练稳定性。Use Wasserstein distance instead of JS divergence. Training GAN with the minimum Wasserstein distance as the goal effectively improves the stability of GAN training. Using Wasserstein distance can alleviate the problem of gradient disappearance during training and improve training stability.

所述Wasserstein距离定义为：The Wasserstein distance is defined as:

式中为联合分布集γ，其中p_data、/>为边际分布；/>是γ(x,y)期望值的最小值，意味着/>拟合到p_data需要x到y的距离，inf为集合的下确界运算。由于难以直接计算任意分布之间的Wasserstein距离，因此采用其Antonovich-Rubinstein对偶形式：in the formula is the joint distribution set γ, where p_data ,/> is the marginal distribution;/> is the minimum value of the expected value of γ(x,y), which means/> Fitting to p_data requires the distance from x to y, and inf is the lower bound operation of the set. Since it is difficult to directly calculate the Wasserstein distance between arbitrary distributions, its Antonovich-Rubinstein dual form is adopted:

式中sup为集合的上确界运算，K表示Lipschitz常数，其具体含义为：一个连续函数f(),要求存在一个常数K≥0，使得f()定义域内任意两个元素x1和x2均满足：|f(x₁-x₂)|≤K|x₁-x₂|,||f||≤K表明函数f(x)具有连续性，且其导数的绝对值具有一个上极值。经过训练后，WGAN可以生成无限数量的满足分布的样本。In the formula, sup is the supremum operation of the set, and K represents the Lipschitz constant. Its specific meaning is: a continuous function f() requires the existence of a constant K≥0, so that any two elements x1 and x2 in the domain of f() are equal. Satisfy: |f(x₁ -x₂ )|≤K|x₁ -x₂ |,||f||≤K indicates that the function f(x) has continuity, and the absolute value of its derivative has an upper extreme value . After training, WGAN can generate an unlimited number of samples that satisfy the distribution.

为了保证所生成的测量数据既有真实性又必须满足相似性约束。真实性约束用于确保生成的数据能够接近真实情况。将真实性损失L_r定义为：In order to ensure that the generated measurement data is both authentic and must satisfy similarity constraints. Reality constraints are used to ensure that the generated data is close to the real situation. Define the authenticity loss L_r as:

L_r＝W(G(z)；G(X)) (8)L_r =W(G(z);G(X)) (8)

式中G(x)表示真实样本数据,G(z)为生成器的生成数据，W(G(z)；G(x))表示生成的数据与真实样本之间的Wasserstein距离。In the formula, G(x) represents the real sample data, G(z) is the generated data of the generator, and W(G(z); G(x)) represents the Wasserstein distance between the generated data and the real sample.

生成的数据应尽可能与实际数据相似，故将相似度损失Ls定义为：The generated data should be as similar as possible to the actual data, so the similarity loss Ls is defined as:

L_s＝||G(z),I||₂ (9)L_s =||G(z),I||₂ (9)

式中I为实际数据，范数2用于度量两个矩阵的相似性。In the formula, I is the actual data, and norm 2 is used to measure the similarity of the two matrices.

因此，数据生成的最终优化目标是：Therefore, the ultimate optimization goal of data generation is:

利用所述优化目标，使用优化器对随机噪声变量z进行优化，使生成的数据更接近实际数据。最终的总样本作为待分析数据：Using the optimization objective, an optimizer is used to optimize the random noise variable z to make the generated data closer to the actual data. final total sample As data to be analyzed:

步骤3：根据相似性度量算法从所述待分析数据中确定可疑用户，并利用决策树支持向量机DT-KSVM对所述可疑用户进行检测，识别非法窃电用户。Step 3: Determine suspicious users from the data to be analyzed according to the similarity measurement algorithm, and use the decision tree support vector machine DT-KSVM to detect the suspicious users and identify illegal electricity stealing users.

如图1所示，采用基于相似性度量和决策树支持向量机(Decision Tree CombinedK-Nearest Neighbor and Support Vector Machine DT-KSVM)的窃电用户定位。由于SVM相对于现有分类器的优势，本发明选择SVM作为本文方案中的分类器之一。SVM已经经过严格的测试，可以提供比其他分类器更高的效率。它能够处理过拟合问题，即处理未知数据集的适当处理以产生相关输出。支持向量机采用不同的核分离方法，对不能线性分离的数据进行分离。定位过程分为两个步骤，首先根据相似性度量确定可疑用户，然后利用DT-KSVM对初步检测确定的可疑用户进行检测并输出非法用户。As shown in Figure 1, the power-stealing user location is based on similarity measure and Decision Tree CombinedK-Nearest Neighbor and Support Vector Machine DT-KSVM. Due to the advantages of SVM over existing classifiers, this invention selects SVM as one of the classifiers in this article's solution. SVM has been rigorously tested and can provide higher efficiency than other classifiers. It is able to deal with the problem of overfitting, i.e. appropriate processing of unknown data sets to produce relevant outputs. Support vector machines use different kernel separation methods to separate data that cannot be linearly separated. The positioning process is divided into two steps. First, the suspicious users are determined based on the similarity measurement, and then DT-KSVM is used to detect the suspicious users determined by the preliminary detection and output the illegal users.

在进一步的实施例中，设定两个参数D₁、D₂且D₁<D₂。对于不同的用户需要设定不同的参数，设定D_whole为相似性度量得到的距离，当D_whole<D₁时，此时用户被认为是非窃电用户。当D₁<D_whole<D₂时，对用户进行二次检测来判断是否具有窃电行为。当D_whole>D₂时，用户具有窃电行为的概率较大，但此时需要进一步验证。In a further embodiment, two parameters D₁ , D₂ are set such that D₁ <D₂ . Different parameters need to be set for different users, and D_whole is set as the distance obtained by the similarity measure. When D_whole <D₁ , the user is considered to be a non-stealing user. When D₁ <D_whole <D₂ , a secondary detection is performed on the user to determine whether there is electricity theft. When D_whole >D₂ , the user has a higher probability of stealing electricity, but further verification is required at this time.

根据具体实施例，所述步骤3进一步包括：According to a specific embodiment, the step 3 further includes:

步骤3.1：计算每个用户的特征曲线与正常用电用户的特征曲线的相似性度量距离，并通过判断相似性度量距离与预设相似性阈值的关系来确定可疑用户。Step 3.1: Calculate the similarity metric distance between each user's characteristic curve and the characteristic curve of normal electricity users, and determine the suspicious user by judging the relationship between the similarity metric distance and the preset similarity threshold.

其中，根据相似性度量确定可疑用户，用户的用电特征和用电行为可以利用用户特性曲线来描述。在相似性度量中，首先确定用户正常用电的特征曲线，使用加权平均法得到用户在正常用电模式下的特性曲线。时间序列的相似性包括两个方面：价值相似性和形态相似性，并且使用欧氏距离来度量价值相似性，使用动态时间归整算法(Dynamic TimeWarping，DTW)来度量形态特征。为了简单准确地描述曲线在各个时期的如上升、下降、稳定等形态特性，用直线的斜率来描述。因此将长度为n的时间序列简化为长度n-1的形态序列：Among them, suspicious users are determined based on similarity measures, and the user's power consumption characteristics and power consumption behavior can be described by the user characteristic curve. In the similarity measurement, the characteristic curve of the user's normal power consumption is first determined, and the weighted average method is used to obtain the characteristic curve of the user in the normal power consumption mode. The similarity of time series includes two aspects: value similarity and morphological similarity. Euclidean distance is used to measure value similarity, and dynamic time warping (DTW) algorithm is used to measure morphological characteristics. In order to simply and accurately describe the morphological characteristics of the curve in each period, such as rising, falling, and stabilizing, the slope of the straight line is used to describe it. Therefore, a time series of length n is simplified into a morphological sequence of length n-1:

其中x_i表示i时刻用户正常用电的特性曲线的纵坐标，Δt表示特性曲线时间间隔，即t_i+1-t_i。Among them, x_i represents the ordinate of the characteristic curve of the user's normal electricity consumption at time i, and Δt represents the time interval of the characteristic curve, that is, t_i+1 -t_i .

基于欧氏距离的形态序列的引入，可以克服只依赖于每个时间点的值而忽略形态特征的缺点，但其测量效果依赖于距离函数的选择，因此形态序列的测量还需要使用精确的测量方法。DTW算法可以弯曲时间轴进行点与点的匹配，根据形态精确测量时间序列，满足测量要求。建立两个独立的形态序列X′＝(x₁，x₂，…，x_n-1)和Y′＝(y₁，y₂，…，y_m-1)。为了使两个序列结构相似，构建一个距离矩阵。其中矩阵中的每个元素用欧氏距离表示：The introduction of morphological sequences based on Euclidean distance can overcome the shortcomings of relying only on the value of each time point and ignoring morphological characteristics. However, its measurement effect depends on the choice of distance function, so the measurement of morphological sequences also requires the use of accurate measurements. method. The DTW algorithm can bend the time axis for point-to-point matching, accurately measure time series according to the shape, and meet measurement requirements. Two independent morphological sequences X'=(x₁ , x₂ ,..., x_n-1 ) and Y'=(y₁ , y₂ ,..., y_m-1 ) are established. To make two sequences structurally similar, a distance matrix is constructed. Each element in the matrix is represented by Euclidean distance:

D(i，j)＝||x_i-y_j||₂ (13)D(i,j)=||x_i -y_j ||₂ (13)

然而，DTW的路径不是随机选择的，它具备边界条件、连续性和单调性约束。在满足这三个约束条件后，选择一条使最终得到的总距离最小的路径。However, the path of DTW is not randomly selected, it has boundary conditions, continuity and monotonicity constraints. After satisfying these three constraints, choose a path that minimizes the final total distance.

累积距离γ是利用动态规划的方法构造的，累积距离γ(i，j)是D(i，j)与可以到达该点的最小相邻元素的累积距离之和：The cumulative distance γ is constructed using the dynamic programming method. The cumulative distance γ(i, j) is the sum of the cumulative distances between D(i, j) and the smallest adjacent element that can reach the point:

累积距离最小的路径是到达该点的最佳路径，假设存在两条曲线的时间序列X＝(x₁，x₂，..，x_n)和Y＝(y₁，y₂，..，y_m)，分别表示试验曲线和特征曲线，分别计算形态序列X′和Y′。这种相似性度量方法既满足数值特征，又满足形态特征：The path with the smallest cumulative distance is the best path to the point. Assume that there are two curved time series X=(x₁ , x₂ , .., x_n ) and Y=(y₁ , y₂ , .., y_m ), representing the test curve and the characteristic curve respectively, and calculating the morphological sequences X′ and Y′ respectively. This similarity measurement method satisfies both numerical characteristics and morphological characteristics:

式中，D₂(X，Y)代表的是试验曲线和特征曲线的价值序列之间的欧式距离，DTW(X′，Y′)代表的是试验曲线和特征曲线形态序列之间的欧式距离，α和λ分别为价值和形状的权重系数，α+λ＝1。In the formula, D₂ (X, Y) represents the Euclidean distance between the test curve and the value sequence of the characteristic curve, and DTW (X′, Y′) represents the Euclidean distance between the test curve and the characteristic curve shape sequence. , α and λ are the weight coefficients of value and shape respectively, α+λ=1.

利用DT-KSVM对初步检测确定的可疑用户进行检测并输出非法用户，DT-KSVM用于分析给定数据并识别相对于输出的输入值的模式或趋势。通过非线性函数将输入映射到高维特征空间中，然后在该空间中构建一个平面，可以有效分离输入输出值。该算法仅需要构建数量较少的多类SVM分类器，易于训练和分类，在训练时间和分类精度方面具有一定的优势，而且不存在不可分割的区域，在分类时不需要遍历所有的分类器。DT-KSVM is used to detect suspicious users determined by preliminary detection and output illegal users. DT-KSVM is used to analyze the given data and identify patterns or trends relative to the input values of the output. The input is mapped into a high-dimensional feature space through a nonlinear function, and then a plane is constructed in this space, which can effectively separate the input and output values. This algorithm only needs to build a small number of multi-class SVM classifiers, is easy to train and classify, has certain advantages in training time and classification accuracy, and there are no indivisible areas, and there is no need to traverse all classifiers during classification. .

其中，多类SVM分类器的构建步骤如下：Among them, the steps to build a multi-class SVM classifier are as follows:

(1)将第一类窃电特征从第2、第3、…、第n和正常样本的窃电特征中分离出来，构建SVM₁；(1) Separate the first type of electricity stealing features from the electricity stealing characteristics of the 2nd, 3rd,..., nth and normal samples, and construct SVM₁ ;

(2)将第i类窃电特征从第i+1、i+2、…、n和正常样本的窃电特征分离，构建SVM_i；(2) Separate the i-th type of power-stealing features from the i+1, i+2,...,n and normal samples’ power-stealing features to construct SVM_i ;

(3)将第n类窃电特征从正常样本中分离，构建SVM_n。(3) Separate the nth type of electricity stealing features from normal samples and construct SVM_n .

最后，根据二叉树的结构构建n个SVM分类器。在进行窃电检测时，每层支持向量机SVM只识别一种窃电类型。剩余的样本集由下一级支持向量机识别，并逐渐减少。最后一层的SVM将窃电的最后一个特征从正常样本中分离出来，决策树DT的叶节点为窃电类型。Finally, n SVM classifiers are constructed according to the structure of the binary tree. When performing power theft detection, each layer of support vector machine SVM only recognizes one type of power theft. The remaining sample set is identified by the next level support vector machine and gradually reduced. The last layer of SVM separates the last feature of electricity stealing from the normal samples, and the leaf nodes of the decision tree DT are the electricity stealing type.

DT-SVM的结构如图2所示。决策树是作为层次分类模型构建的，由于错误积累会影响分类的准确性。如果使用偏二叉树进行分类，首先需要构建一个错误积累少、分类精度高的决策树。为了减少错误积累的影响，采用投影向量方法度量类间的分离程度，并在此基础上构建有偏二叉决策树。The structure of DT-SVM is shown in Figure 2. Decision trees are built as hierarchical classification models, and classification accuracy is affected due to error accumulation. If you use a partial binary tree for classification, you first need to build a decision tree with less error accumulation and high classification accuracy. In order to reduce the impact of error accumulation, the projection vector method is used to measure the degree of separation between classes, and a biased binary decision tree is constructed on this basis.

由于数据离超平面较远，SVM算法可以准确分类。但当距离超平面较近时，分类效果较低，在超平面附近容易发生误分类。因此，利用界面附近样本提供的信息来提高分类精度，将SVM与K近邻算法(K-Nearest Neighbor KNN)相结合，建立SVM-KNN(KSVM)组合分类器。在对待识别样本进行分类时，计算样本与分类超平面之间的距离。如果距离大于给定阈值a，则直接应用SVM分类。否则，使用KNN进行分类。Since the data is far away from the hyperplane, the SVM algorithm can classify accurately. But when the distance is close to the hyperplane, the classification effect is low, and misclassification is easy to occur near the hyperplane. Therefore, the information provided by samples near the interface is used to improve classification accuracy, and SVM is combined with the K-Nearest Neighbor KNN algorithm to establish an SVM-KNN (KSVM) combined classifier. When classifying a sample to be identified, the distance between the sample and the classification hyperplane is calculated. If the distance is greater than a given threshold a, SVM classification is applied directly. Otherwise, use KNN for classification.

在KNN分类中，通过表示每一类的支持向量来计算被识别样本与每个支持向量机之间的距离，这个距离就是特征空间，而不是原始空间。待划分样本的类别由距离决定。所述距离计算公式如下：In KNN classification, the distance between the identified sample and each support vector machine is calculated by representing the support vector of each category. This distance is the feature space, not the original space. The categories of samples to be divided are determined by distance. The distance calculation formula is as follows:

式中s为训练样本中的支持向量；为待分类的样本，k()表示核函数,其中φ(x)表示将原始空间中的x向量变换到特征空间中。本发明的基于KSVM的窃电检测流程图如图3所示。where s is the support vector in the training sample; is the sample to be classified, k() represents the kernel function, where φ(x) represents transforming the x vector in the original space into the feature space. The KSVM-based electricity theft detection flow chart of the present invention is shown in Figure 3.

本实施例采用某地区某企业的用电数据作为数据样本。首先对用户的用电行为进行分析，根据窃电的实际情况，本发明将窃电分为六种类型。第一种类型为所有样本乘以相同的随机选择的系数。第二种类型为一种“开-关”攻击，在这种攻击中，在一段时间内报告的用电量为零。第三种类型为将消耗量乘以随时间变化的随机因素。第四种窃电行为是第二种与第三种窃电行为的结合。第五种类型的窃电是在高峰时段乘以同样随机选择的系数。第六种类型的窃电是随机时段的“开-关”攻击，但持续时间短且不连续，减少了总用电量。与第二种类型相比，由于所选时间段的随机性，第六种类型更难检测。本文对常用来表示用户用电行为的特征进行了编号，包括谷电量系数和高峰用电负荷率。常见的特征被用作表征用户用电行为的特征库。This embodiment uses the electricity consumption data of a certain enterprise in a certain region as a data sample. First, the user's electricity consumption behavior is analyzed. According to the actual situation of electricity theft, the present invention divides electricity theft into six types. The first type multiplies all samples by the same randomly chosen coefficient. The second type is an "on-off" attack in which zero power usage is reported for a period of time. The third type is to multiply consumption by a random factor that changes over time. The fourth type of electricity theft is a combination of the second and third types of electricity theft. The fifth type of electricity theft is multiplied by the same randomly chosen coefficient during peak hours. The sixth type of power theft is an "on-off" attack with random periods, but the duration is short and discontinuous, reducing the total power consumption. Compared to the second type, the sixth type is more difficult to detect due to the randomness of the selected time period. This paper numbers the characteristics commonly used to represent user electricity consumption behavior, including valley power coefficient and peak power load factor. Common features are used as a feature library to characterize users' electricity consumption behavior.

在确定了代表用电量行为的特征数量后，根据选择的特征将数据进行聚类。根据加权平均法得到日负荷特性曲线，如图4所示。After determining the number of features that represent electricity consumption behavior, the data is clustered based on the selected features. The daily load characteristic curve is obtained according to the weighted average method, as shown in Figure 4.

由于正常用户与异常用户之间的不平衡，采用1D-WGAN产生窃电数据。在不同的训练片段下，真实样本与生成样本之间的可视化图像如图5所示。5(a)为经过192后通过该生成器产生的数据训练的片段。从图5(c)中可以看出，经过192次训练后，生成器已经初步了解了真实样本的分布情况，但与真实样本的距离仍然较大。图5(b)为生成器经过3840条训练后生成的数据。从图5(b)可以看出，3840个训练生成器生成的样本与真实样本之间的差距很小。从图5(c)中可以看出，生成器经过6400条训练后，生成的样本已经可以骗过鉴别器。将生成的样本与真实样本进行对比，如图5所示，基于1D-WGAN生成的样本与原始样本并不完全相同，但它们之间的波动规律相同，在特定位置存在差异，从而保证了生成的窃电样本的多样性。在实际情况下中，数据的生成应当根据窃电数据的多少来决定。1D-WGAN生成的新样本与原始样本相似，但不相同。生成的样本效果很好，可以解决分类器过拟合的问题。同时，1D-WGAN模型降低了对抗学习过程中噪声的干扰影响，具有较强的鲁棒性和泛化性。Due to the imbalance between normal users and abnormal users, 1D-WGAN is used to generate power theft data. Under different training clips, the visual image between real samples and generated samples is shown in Figure 5. 5(a) is a segment trained on the data generated by the generator after 192 passes. As can be seen from Figure 5(c), after 192 times of training, the generator has initially understood the distribution of real samples, but the distance from the real samples is still large. Figure 5(b) shows the data generated by the generator after 3840 training sessions. As can be seen from Figure 5(b), the gap between the samples generated by the 3840 training generators and the real samples is very small. As can be seen from Figure 5(c), after the generator has been trained for 6400 times, the generated samples can already fool the discriminator. Compare the generated samples with real samples, as shown in Figure 5. The samples generated based on 1D-WGAN are not exactly the same as the original samples, but the fluctuation rules between them are the same, and there are differences in specific positions, thus ensuring the generation of Diversity of electricity theft samples. In actual situations, the generation of data should be determined based on the amount of power theft data. The new samples generated by 1D-WGAN are similar to the original samples, but not the same. The generated samples work very well and can solve the problem of classifier overfitting. At the same time, the 1D-WGAN model reduces the interference of noise during the adversarial learning process and has strong robustness and generalization.

经过多次测试，第一个阈值D2＝3，第二个阈值D1＝0.7。在此间隔内设置检测阈值，可以保证可疑用户中正常用户最少，正常用户中非法用户较少。在表5中，这些值由公式(15)计算得到。取两者的权重α＝λ＝0.5。初步检测可以检测到第4、5类窃电行为。随着数据量的增加，三种方法的检测精度也随之提高。但在相同样本数下，本文方法的准确率最高，证实了本发明方法的准确性。After many tests, the first threshold D2=3 and the second threshold D1=0.7. Setting the detection threshold within this interval can ensure that there are the least normal users among suspicious users and there are fewer illegal users among normal users. In Table 5, these values are calculated from equation (15). Take the weight of the two α=λ=0.5. Preliminary detection can detect Category 4 and 5 electricity theft. As the amount of data increases, the detection accuracy of the three methods also increases. However, under the same number of samples, the accuracy of this method is the highest, confirming the accuracy of the method of the present invention.

图6给出了不同检测方法下的漏检概率和检测过检率，包括欧氏距离、相关系数、欧氏距离与系数的联合相关以及本发明的方法。通过仿真验证，与其他方法相比，本文所采用的方法具有最低的漏检率和过检率。Figure 6 shows the missed detection probability and detection over-detection rate under different detection methods, including Euclidean distance, correlation coefficient, joint correlation of Euclidean distance and coefficient, and the method of the present invention. Through simulation verification, compared with other methods, the method used in this article has the lowest missed detection rate and over-detection rate.

随着正常样本数和功率窃取样本数的增加，对SVM、DT-SVM和本发明的方法的准确率进行对比。图7为三种方法在窃电样本数和正常样本数增加时的准确率，并将本发明方法与相同样本数下的SVM和DT-SVM进行对比。随着数据量的增加，三种方法的检测精度也随之提高。但在相同样本数下，本发明方法的准确率最高，证实了本发明方法的准确性。As the number of normal samples and power stealing samples increases, the accuracy rates of SVM, DT-SVM and the method of the present invention are compared. Figure 7 shows the accuracy of the three methods when the number of electricity stealing samples and normal samples increases, and compares the method of the present invention with SVM and DT-SVM under the same number of samples. As the amount of data increases, the detection accuracy of the three methods also increases. However, under the same number of samples, the method of the present invention has the highest accuracy, which confirms the accuracy of the method of the present invention.

第一种窃电检测方法定义为无生成数据，将相似性测度与DT-KSVM相结合；第二种窃电检测方法定义为无生成数据，将相似度测度与SVM相结合；第三种窃电检测方法定义为无生成数据和相似性度量；第四种窃电检测方法定义为生成数据与支持向量机相结合，并将相似性测度与DT-KSVM相结合；第五种窃电检测方法定义为生成数据，并将相似度测度与SVM相结合；定义为第六种窃电检测方法。结果表明，本发明的方法具有较好的鲁棒性和抗噪声能力。The first electricity theft detection method is defined as no generated data and combines the similarity measure with DT-KSVM; the second electricity theft detection method is defined as no generated data and combines the similarity measure with SVM; the third electricity theft detection method is defined as no generated data and combines the similarity measure with SVM. The electricity detection method is defined as having no generated data and similarity measure; the fourth electricity theft detection method is defined as the combination of generated data and support vector machine, and the similarity measure is combined with DT-KSVM; the fifth electricity theft detection method is It is defined as generating data and combining the similarity measure with SVM; it is defined as the sixth electricity theft detection method. The results show that the method of the present invention has better robustness and anti-noise ability.

本发明的有益效果在于，与现有技术相比，本发明提供了一种用户窃电监测方法和系统，采用基于相似性度量和决策树支持向量机的用户窃电监测模型，在进行专变用户窃电监测时更加精确、快速地发现可疑的用户和确定的窃电用户，可节省人力资源，且监测方式更具智能化，监测可靠、效率更高，能够弥补传统的转变用户窃电方法中的不足，有力支撑配变增容稽查工作的开展、提高稽查精度，保障电力企业的经济利益。本发明能够及时核对各区域电网功率潮流数值确保电力调度的安全进行，对于维护电力系统的安全稳定运行、支撑电网数字化、全面化发展具有十分重要的现实意义，其科学合理，实用性强。The beneficial effect of the present invention is that compared with the existing technology, the present invention provides a user electricity theft monitoring method and system, which adopts a user electricity theft monitoring model based on similarity measurement and decision tree support vector machine, and performs special changes. When monitoring user electricity theft, suspicious users and confirmed electricity thieves can be discovered more accurately and quickly, which can save human resources, and the monitoring method is more intelligent, reliable and efficient, which can make up for the traditional method of changing user electricity theft. It can effectively support the development of distribution transformer capacity increase inspection work, improve inspection accuracy, and protect the economic interests of power enterprises. The invention can timely check the power flow values of each regional power grid to ensure the safe progress of power dispatching. It has very important practical significance for maintaining the safe and stable operation of the power system and supporting the digitalization and comprehensive development of the power grid. It is scientific, reasonable and highly practical.

本发明可以是系统、方法和/或计算机程序产品。参见图8，本发明同时公开了一种基于前述的基于相似性度量和决策树支持向量机的窃电监测方法的基于相似性度量和决策树支持向量机的窃电监测系统，包括用电特征数据获取模块1、样本数据生成模块2和窃电用户识别模块3。The invention may be a system, method and/or computer program product. Referring to Figure 8, the present invention also discloses a power theft monitoring system based on similarity metric and decision tree support vector machine based on the aforementioned power theft monitoring method based on similarity metric and decision tree support vector machine, including power consumption characteristics. Data acquisition module 1, sample data generation module 2 and electricity stealing user identification module 3.

所述用电特征数据获取模块1，用于采集用户电力使用数据并进行预处理，通过基于AMI的用户行为分析来获取用户的用电特征数据；The power consumption characteristic data acquisition module 1 is used to collect user power usage data and perform preprocessing, and obtain the user's power consumption characteristic data through user behavior analysis based on AMI;

所述样本数据生成模块2，用于基于获取的用户用电特征数据，利用一维生成对抗网络1D-WGAN生成样本数据，将生成的样本数据与获取的用户用电特征数据进行结合，得到待分析数据；The sample data generation module 2 is used to generate sample data based on the obtained user electricity consumption characteristic data using the one-dimensional generative adversarial network 1D-WGAN, and combine the generated sample data with the obtained user electricity consumption characteristic data to obtain the desired analyze data;

所述窃电用户识别模块3，用于根据相似性度量算法从所述待分析数据中确定可疑用户，并利用决策树支持向量机DT-KSVM对所述可疑用户进行检测，识别非法窃电用户。The power-stealing user identification module 3 is used to determine suspicious users from the data to be analyzed based on the similarity measurement algorithm, and use the decision tree support vector machine DT-KSVM to detect the suspicious users and identify illegal power-stealing users. .

基于本发明的精神，本领域技术人员能够容易想到基于前述基于相似性度量和决策树支持向量机的窃电监测方法可以得到一种计算机程序产品。计算机程序产品可以包括计算机可读存储介质，其上载有用于使处理器实现本公开的各个方面的计算机可读程序指令。即本申请还包括一种终端，包括处理器及存储介质；所述存储介质用于存储指令；所述处理器用于根据所述指令进行操作以执行根据前述基于相似性度量和决策树支持向量机的窃电监测方法的步骤。Based on the spirit of the present invention, those skilled in the art can easily imagine that a computer program product can be obtained based on the aforementioned electricity theft monitoring method based on similarity measure and decision tree support vector machine. A computer program product may include a computer-readable storage medium having thereon computer-readable program instructions for causing a processor to implement aspects of the present disclosure. That is, this application also includes a terminal, including a processor and a storage medium; the storage medium is used to store instructions; the processor is used to operate according to the instructions to execute the support vector machine based on the similarity measure and the decision tree. The steps of electricity theft detection method.

计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是-但不限于-电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括：便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身，诸如无线电波或者其它自由传播的电磁波、通过波导或其它传输媒介传播的电磁波(例如，通过光纤电缆的光脉冲)、或者通过电线传输的电信号。Computer-readable storage media may be tangible devices that can retain and store instructions for use by an instruction execution device. The computer-readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the above. More specific examples (non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) or Flash memory), Static Random Access Memory (SRAM), Compact Disk Read Only Memory (CD-ROM), Digital Versatile Disk (DVD), Memory Stick, Floppy Disk, Mechanical Coding Device, such as a printer with instructions stored on it. Protruding structures in hole cards or grooves, and any suitable combination of the above. As used herein, computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., light pulses through fiber optic cables), or through electrical wires transmitted electrical signals.

这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备，或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令，并转发该计算机可读程序指令，以供存储在各个计算/处理设备中的计算机可读存储介质中。Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to various computing/processing devices, or to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage on a computer-readable storage medium in the respective computing/processing device .

用于执行本公开操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码，所述编程语言包括面向对象的编程语言-诸如Smalltalk、C++等，以及常规的过程式编程语言-诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中，远程计算机可以通过任意种类的网络-包括局域网(LAN)或广域网(WAN)-连接到用户计算机，或者，可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中，通过利用计算机可读程序指令的状态信息来个性化定制电子电路，例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程逻辑阵列(PLA)，该电子电路可以执行计算机可读程序指令，从而实现本公开的各个方面。Computer program instructions for performing operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or instructions in one or more programming languages. Source code or object code written in any combination of object-oriented programming languages - such as Smalltalk, C++, etc., and conventional procedural programming languages - such as the "C" language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server implement. In situations involving remote computers, the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as an Internet service provider through the Internet). connect). In some embodiments, by utilizing state information of computer-readable program instructions to personalize an electronic circuit, such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), the electronic circuit can Computer readable program instructions are executed to implement various aspects of the disclosure.

最后应当说明的是，以上实施例仅用以说明本发明的技术方案而非对其限制，尽管参照上述实施例对本发明进行了详细的说明，所属领域的普通技术人员应当理解：依然可以对本发明的具体实施方式进行修改或者等同替换，而未脱离本发明精神和范围的任何修改或者等同替换，其均应涵盖在本发明的权利要求保护范围之内。Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention and not to limit it. Although the present invention has been described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that the present invention can still be modified. Modifications or equivalent substitutions may be made to the specific embodiments, and any modifications or equivalent substitutions that do not depart from the spirit and scope of the invention shall be covered by the scope of the claims of the invention.