Movatterモバイル変換


[0]ホーム

URL:


CN113779879A - A medium and long-term electrical abnormality detection method based on LSTM-seq2seq-attention model - Google Patents

A medium and long-term electrical abnormality detection method based on LSTM-seq2seq-attention model
Download PDF

Info

Publication number
CN113779879A
CN113779879ACN202111039397.1ACN202111039397ACN113779879ACN 113779879 ACN113779879 ACN 113779879ACN 202111039397 ACN202111039397 ACN 202111039397ACN 113779879 ACN113779879 ACN 113779879A
Authority
CN
China
Prior art keywords
data
lstm
seq2seq
electricity
electricity consumption
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111039397.1A
Other languages
Chinese (zh)
Other versions
CN113779879B (en
Inventor
丁转莲
朱一鸣
吴雨
胡炜鑫
孙登第
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University
Original Assignee
Anhui University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui UniversityfiledCriticalAnhui University
Priority to CN202111039397.1ApriorityCriticalpatent/CN113779879B/en
Publication of CN113779879ApublicationCriticalpatent/CN113779879A/en
Application grantedgrantedCritical
Publication of CN113779879BpublicationCriticalpatent/CN113779879B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种基于LSTM‑seq2seq‑attention模型的中长期用电异常检测方法,包括数据收集步骤、数据预处理步骤、神经网络模型构建步骤、神经网络模型训练步骤、经济数据估计步骤、用电异常综合指数d计算步骤、用电异常判断步骤。根据历史数据,可以通过结合包括GDP,气候,节假日在内的影响因素,分析出不同用户的用电行为特征。利用Seq2Seq‑Attention神经网络,能够快速有效的分析用户数据,对可疑用户进行检测,实施防窃电。本发明的基于LSTM‑seq2seq‑attention模型的中长期用电异常检测方法,具有快速和精确、准确性高与鲁棒性好等优点。

Figure 202111039397

The invention discloses a medium and long-term electricity abnormality detection method based on an LSTM-seq2seq-attention model, comprising a data collection step, a data preprocessing step, a neural network model building step, a neural network model training step, an economic data estimation step, and a The steps of calculating the comprehensive index d of abnormal electricity and the steps of judging abnormal electricity consumption. According to historical data, the characteristics of electricity consumption behavior of different users can be analyzed by combining influencing factors including GDP, climate, and holidays. Using the Seq2Seq-Attention neural network, user data can be quickly and effectively analyzed, suspicious users can be detected, and electricity theft prevention can be implemented. The medium and long-term electrical abnormality detection method based on the LSTM-seq2seq-attention model of the present invention has the advantages of rapidity and accuracy, high accuracy and good robustness.

Figure 202111039397

Description

Translated fromChinese
一种基于LSTM-seq2seq-attention模型的中长期用电异常检 测方法A medium and long-term electrical abnormality detection method based on LSTM-seq2seq-attention model

技术领域technical field

本发明涉及测距技术领域,特别是涉及一种基于LSTM-seq2seq-attention模型的中长期用电异常检测方法。The invention relates to the technical field of ranging, in particular to a medium and long-term abnormality detection method for electricity consumption based on an LSTM-seq2seq-attention model.

背景技术Background technique

随着现在社会的发展,电能成为了生产和生活中的不可缺少的重要能源,成为了现代经济发展的地基。随着我国电力行业的不断发展,电量也有了充足的提升。但在日常用电过程中,一直存在着偷电漏电行为。偷电漏电行为不但会浪费国家的资源,还会造成大量的安全隐患,对居民的人身安全造成威胁,成为阻碍社会经济发展的重要因素之一。With the development of today's society, electric energy has become an indispensable and important energy source in production and life, and has become the foundation of modern economic development. With the continuous development of my country's power industry, electricity has also been fully improved. However, in the process of daily electricity use, there has always been the behavior of stealing electricity and leaking electricity. Stealing and leakage of electricity will not only waste the country's resources, but also cause a large number of potential safety hazards, threaten the personal safety of residents, and become one of the important factors hindering social and economic development.

目前,窃电技术层出不穷,窃电手段逐渐专业化和高科技化,甚至有了一套完整的产业链。随之而来的是更大的防窃电难度。目前的窃电检测主要有人工到现场进行排查、硬件设备通过防电磁干扰、软件实时监控系统等方式。首先,人工排查的方式,存在有需要大量人力物力、劳动强度和工作量大,人工排查过程中存在有漏查现象。其次,硬件设备中,市面上大多数防窃电装备或设备都有着昂贵和不便移动等缺点。再次,软件检测存在漏判误判,并且这些检测方法和设备大幅增加了投资和运营成本,性价比不高。At present, electricity stealing technologies emerge in an endless stream, and the means of electricity stealing are gradually becoming specialized and high-tech, and even a complete industrial chain has been established. What follows is a greater difficulty in preventing electricity theft. At present, electricity theft detection mainly includes manual inspection on site, electromagnetic interference prevention of hardware equipment, and real-time software monitoring system. First of all, the manual inspection method requires a lot of manpower and material resources, labor intensity and workload, and there are omissions in the manual inspection process. Secondly, among hardware devices, most anti-power theft devices or devices on the market have disadvantages such as being expensive and inconvenient to move. Thirdly, there are omissions and misjudgments in software detection, and these detection methods and equipment greatly increase investment and operating costs, and the cost performance is not high.

当前研究主要集中在解决异常用电数据检测的性能方面的问题。随着电力用户数据和用电设备的快速增长,用户用电数据的维度和数据量也快速增加,导致现有用电数据异常检测算法性能低的问题。申请号为201910389132.0的我国发明专利公开了一种在训练用电数据异常检测模型时,先基于历史用电数据,应用LSTM网络解析历史用电数据的数据关联信息以对历史用电数据进行降维处理后再训练用电数据异常检测模型,从而得到了适应用电数据时间关联特性和高维度特性的用电数据异常检测模型,应用这样的用电数据异常检测模型对输入的待检测用电数据进行检测得到检测结果。但是,这种方法存在有不同种类的数据之间相互影响、估计准确度较低等问题。The current research mainly focuses on solving the performance problems of abnormal power consumption data detection. With the rapid growth of power user data and power consumption equipment, the dimension and data volume of user power consumption data also increases rapidly, resulting in the problem of low performance of existing power consumption data anomaly detection algorithms. The Chinese invention patent with the application number of 201910389132.0 discloses a method for training the abnormality detection model of electricity consumption data, firstly based on historical electricity consumption data, using LSTM network to analyze the data correlation information of historical electricity consumption data to reduce the dimension of historical electricity consumption data After processing, the abnormality detection model of electricity consumption data is trained, thereby obtaining an abnormality detection model of electricity consumption data that adapts to the time-related characteristics and high-dimensional characteristics of electricity consumption data. Carry out the test to get the test result. However, this method has problems such as mutual influence between different types of data and low estimation accuracy.

发明内容SUMMARY OF THE INVENTION

本发明所要解决的技术问题是提供一种鲁棒性好和估计准确度高的基于LSTM-seq2seq-attention模型的中长期用电异常检测方法。The technical problem to be solved by the present invention is to provide a medium and long-term electricity abnormality detection method based on the LSTM-seq2seq-attention model with good robustness and high estimation accuracy.

为解决上述技术问题,本发明采用如下的技术方案。In order to solve the above technical problems, the present invention adopts the following technical solutions.

一种基于LSTM-seq2seq-attention模型的中长期用电异常检测方法,包括以下几个步骤:A medium and long-term electrical abnormality detection method based on the LSTM-seq2seq-attention model, including the following steps:

步骤1:数据收集步骤;按照预设的时间周期,收集该时间周期内的用电数据;Step 1: a data collection step; according to a preset time period, collect electricity consumption data within the time period;

步骤2:数据预处理步骤;对收集到的用电数据进行数据清洗、缺失值补齐和归一化处理;Step 2: data preprocessing step; data cleaning, missing value filling and normalization processing are performed on the collected electricity consumption data;

步骤3:神经网络模型构建步骤;以LSTM神经网络为神经元,构建多层LSTM-seq2seq-attention神经网络;Step 3: The neural network model construction step; using the LSTM neural network as a neuron, construct a multi-layer LSTM-seq2seq-attention neural network;

步骤4:神经网络模型训练步骤;利用步骤2中数据预处理后的用电数据,对步骤3的LSTM-seq2seq-attention神经网络进行训练,获得用电正常情况下的电力数据;Step 4: neural network model training step; use the power consumption data after data preprocessing instep 2 to train the LSTM-seq2seq-attention neural network in step 3 to obtain power data under normal power consumption;

步骤5:经济数据估计步骤;以步骤4的用电正常情况下的电力数据作为输入,通过主成分分析法计算获得用电正常情况下的经济数据的估计值;Step 5: an economic data estimation step; taking the power data in step 4 under normal power consumption conditions as input, and calculating the estimated value of economic data under normal power consumption conditions through principal component analysis;

步骤6:用电异常综合指数d计算步骤;根据步骤5的用电正常情况下的经济数据的估计值,计算用电异常综合指数d;Step 6: the step of calculating the comprehensive index d of abnormal electricity consumption; according to the estimated value of the economic data under the normal situation of electricity consumption instep 5, calculate the comprehensive index d of abnormal electricity consumption;

步骤7:用电异常判断步骤;根据预设的阈值σ,通过阈值σ与用电异常综合指数d的比较,判断用电是否存在异常。Step 7: the step of judging abnormal electricity consumption; according to the preset threshold σ, by comparing the threshold σ with the comprehensive index d of abnormal electricity consumption, it is judged whether the electricity consumption is abnormal.

所述步骤1中,所述时间周期为检测当前月之前的48个月。In thestep 1, the time period is 48 months before the current month is detected.

所述用电数据包括用电负荷数据、经济数据GDP和气象数据。The electricity consumption data includes electricity consumption load data, economic data GDP and meteorological data.

气象数据包括降雨量、气温、湿度数据、风速、气压。Meteorological data includes rainfall, air temperature, humidity data, wind speed, air pressure.

所述步骤1中,还包括收集检测当前月的用电数据。In thestep 1, it also includes collecting and detecting the electricity consumption data of the current month.

所述步骤3中,所述多层LSTM-seq2seq-attention神经网络包括编码器和解码器,并引入注意力机制。In the step 3, the multi-layer LSTM-seq2seq-attention neural network includes an encoder and a decoder, and an attention mechanism is introduced.

所述步骤4中,训练过程采用Adam优化算法对模型的参数进行优化。In the step 4, the training process adopts the Adam optimization algorithm to optimize the parameters of the model.

所述步骤6中,采用公式(11)计算用户异常用电异常综合检测值d;In the step 6, formula (11) is used to calculate the abnormal comprehensive detection value d of abnormal electricity consumption by users;

d=|h-s|/h*100% (11)d=|h-s|/h*100% (11)

公式(11)中,h是月平均GDP估计值,s是月平均待检测值,待检测值是该企业当月的月平均经济数据GDP。In formula (11), h is the estimated value of the monthly average GDP, s is the monthly average value to be detected, and the value to be detected is the monthly average economic data GDP of the enterprise in that month.

所述步骤7中,通过阈值σ与用电异常综合指数d的比较,判断用户是处于无窃电嫌疑状态、存在窃电嫌疑状态还是可疑用户需报警状态。In the step 7, through the comparison between the threshold σ and the comprehensive index d of abnormal electricity consumption, it is judged whether the user is in a state of no suspicion of electricity theft, a state of suspicion of electricity theft, or a state where a suspicious user needs an alarm.

本发明的有益效果是:The beneficial effects of the present invention are:

本发明的一种基于LSTM-seq2seq-attention模型的中长期用电异常检测方法,包括数据收集步骤、数据预处理步骤、神经网络模型构建步骤、神经网络模型训练步骤、经济数据估计步骤、用电异常综合指数d计算步骤、用电异常判断步骤。根据历史数据,可以通过结合包括GDP,气候,节假日在内的影响因素,分析出不同用户的用电行为特征。利用LSTM-seq2seq-attention神经网络,能够快速有效的分析用户数据,对可疑用户进行检测,实施防窃电。A medium and long-term electricity abnormality detection method based on the LSTM-seq2seq-attention model of the present invention includes a data collection step, a data preprocessing step, a neural network model building step, a neural network model training step, an economic data estimation step, and a power consumption step. Abnormal comprehensive index d calculation step, electricity abnormal judgment step. According to historical data, the characteristics of electricity consumption behavior of different users can be analyzed by combining influencing factors including GDP, climate, and holidays. Using the LSTM-seq2seq-attention neural network, user data can be quickly and effectively analyzed, suspicious users can be detected, and electricity theft prevention can be implemented.

近些年用电信息采集系统逐渐得到应用,电力企业有着丰富的历史用户数据。根据历史数据,可以通过结合包括经济数据GDP、气象数据和节假日在内等影响因素,分析出不同用户的用电行为特征。利用LSTM-seq2seq-attention神经网络,能够快速有效的分析用户数据,对可疑用户进行检测,实施防窃电。In recent years, the electricity consumption information collection system has been gradually applied, and power companies have rich historical user data. According to historical data, the characteristics of electricity consumption behavior of different users can be analyzed by combining influencing factors including economic data GDP, meteorological data and holidays. Using the LSTM-seq2seq-attention neural network, user data can be quickly and effectively analyzed, suspicious users can be detected, and electricity theft prevention can be implemented.

本发明在使用LSTM作为神经单元的seq2seq结构的同时加入注意力机制,能更好的分配网络权重,同时选用Adam优化算法优化模型参数,提升计算效率,并且seq2seq结构的编码器使用多层LSTM增强了模型的鲁棒性和估计准确度。同时选用主成分分析法,消除了评价指标之间的相互影响,减少了工作量,降低了算法的计算开销。The invention uses LSTM as the seq2seq structure of the neural unit and adds an attention mechanism, which can better allocate network weights, and at the same time selects the Adam optimization algorithm to optimize model parameters to improve computing efficiency, and the encoder of the seq2seq structure uses multi-layer LSTM to enhance the robustness and estimation accuracy of the model. At the same time, the principal component analysis method is used, which eliminates the mutual influence between the evaluation indicators, reduces the workload, and reduces the computational cost of the algorithm.

利用seq2seq-attention与主成分分析双模型完成用电异常检测,提高检测准确性与鲁棒性。相比于原来方法,该神经网络模型判断窃电更为快速和精确。Using the seq2seq-attention and principal component analysis dual model to complete the electrical abnormality detection, improve the detection accuracy and robustness. Compared with the original method, the neural network model judges electricity stealing more quickly and accurately.

本发明的基于LSTM-seq2seq-attention模型的中长期用电异常检测方法,具有快速和精确、准确性高与鲁棒性好等优点。The medium- and long-term electrical abnormality detection method based on the LSTM-seq2seq-attention model of the present invention has the advantages of rapidity and accuracy, high accuracy and good robustness.

附图说明Description of drawings

图1是本发明基于LSTM-seq2seq-attention模型的中长期用电异常检测方法的流程图。FIG. 1 is a flow chart of the method for detecting medium and long-term electricity anomalies based on the LSTM-seq2seq-attention model of the present invention.

图2是本发明基于LSTM-seq2seq-attention模型的中长期用电异常检测方法的LSTM结构图。FIG. 2 is an LSTM structure diagram of the medium and long-term electrical abnormality detection method based on the LSTM-seq2seq-attention model of the present invention.

图3是本发明基于LSTM-seq2seq-attention模型的中长期用电异常检测方法的LSTM-seq2seq-attention神经网络的结构图。FIG. 3 is a structural diagram of the LSTM-seq2seq-attention neural network for the medium and long-term electrical abnormality detection method based on the LSTM-seq2seq-attention model of the present invention.

图4是本发明基于LSTM-seq2seq-attention模型的中长期用电异常检测方法的原始电力负荷数据。FIG. 4 is the original power load data of the medium and long-term power consumption abnormality detection method based on the LSTM-seq2seq-attention model of the present invention.

图5是本发明基于LSTM-seq2seq-attention模型的中长期用电异常检测方法的数据预处理后的电力负荷数据。FIG. 5 is the power load data after data preprocessing of the medium and long-term power consumption abnormality detection method based on the LSTM-seq2seq-attention model of the present invention.

图6是本发明基于LSTM-seq2seq-attention模型的中长期用电异常检测方法的原始数据影响因素。FIG. 6 is the original data influencing factors of the medium and long-term electricity abnormality detection method based on the LSTM-seq2seq-attention model of the present invention.

图7是本发明基于LSTM-seq2seq-attention模型的中长期用电异常检测方法的数据预处理后的影响因素数据。FIG. 7 is the influence factor data after data preprocessing of the medium and long-term electricity abnormality detection method based on the LSTM-seq2seq-attention model of the present invention.

图8是本发明基于LSTM-seq2seq-attention模型的中长期用电异常检测方法的电力负荷估计值。FIG. 8 is an electric load estimation value of the medium and long-term electric power abnormality detection method based on the LSTM-seq2seq-attention model of the present invention.

图9是本发明基于LSTM-seq2seq-attention模型的中长期用电异常检测方法的GDP待检测值与估计值对比图。FIG. 9 is a comparison diagram of the GDP to be detected value and the estimated value of the medium and long-term electricity abnormality detection method based on the LSTM-seq2seq-attention model of the present invention.

具体实施方式Detailed ways

下面结合附图对本发明的较佳实施例进行详细阐述,以使本发明的优点和特征能更易于被本领域技术人员理解,从而对本发明的保护范围做出更为清楚明确的界定。The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings, so that the advantages and features of the present invention can be more easily understood by those skilled in the art, and the protection scope of the present invention can be more clearly defined.

如图1-9,本发明的一种基于LSTM-seq2seq-attention模型的中长期用电异常检测方法,包括以下几个步骤:As shown in Figure 1-9, a medium and long-term electrical abnormality detection method based on the LSTM-seq2seq-attention model of the present invention includes the following steps:

步骤1:数据收集步骤;按照预设的时间周期,收集该时间周期内的用电数据;Step 1: a data collection step; according to a preset time period, collect electricity consumption data within the time period;

如要判断某企业在检测月是否有用电行为异常,需收集检测月前48个月的用电数据。用电数据包括用电负荷数据、经济数据GDP和气象数据。气象数据包括降雨量、气温、湿度数据、风速、气压。并收集检测月当月的用电负荷数据、经济数据GDP、降雨量、气温、湿度数据、风速和平均气压等。To judge whether an enterprise has abnormal electricity usage during the testing month, it is necessary to collect electricity consumption data for 48 months before the testing month. The electricity consumption data includes electricity load data, economic data GDP and meteorological data. Meteorological data includes rainfall, air temperature, humidity data, wind speed, air pressure. And collect the electricity load data, economic data GDP, rainfall, temperature, humidity data, wind speed and average air pressure of the month of detection.

步骤2:数据预处理步骤;对收集到的用电数据进行数据清洗、缺失值补齐和归一化处理;Step 2: data preprocessing step; data cleaning, missing value filling and normalization processing are performed on the collected electricity consumption data;

收集到的数据总可能存在缺失数据或者多余的混乱数据,无论是缺失数据还是混杂的多余数据,都会对最终的估计和分析结果造成误差,需要对这些数据进行数据清理,缺失值补齐,数据格式化消除量纲差异,最后进行归一化。数据预处理过程中,首先进行数据清洗,删除数据集中的重复数据,残缺数据。接着,对历史数据采用滑动平均窗口法进行缺失值补齐。最后,对补齐后的数据集进行Min-max归一化。The collected data may always have missing data or redundant data, whether it is missing data or mixed redundant data, it will cause errors in the final estimation and analysis results. Formatting removes dimensional differences and finally normalizing. In the process of data preprocessing, data cleaning is first performed to delete duplicate data and incomplete data in the data set. Next, the historical data is filled with missing values using the sliding average window method. Finally, Min-max normalization is performed on the padded dataset.

步骤3:神经网络模型构建步骤;以LSTM神经网络为神经元,构建多层LSTM-seq2seq-attention神经网络;Step 3: The neural network model construction step; using the LSTM neural network as a neuron, construct a multi-layer LSTM-seq2seq-attention neural network;

构建seq2seq-attention神经网络过程中,是以LSTM神经网络为神经元构建多层seq2seq神经网络LSTM-seq2seq-attention,并加入注意力机制,选用Mish激活函数作为整个神经网络的输出层激活函数,起到减少梯度爆炸的效果,增加模型训练的稳定性。In the process of constructing the seq2seq-attention neural network, the multi-layer seq2seq neural network LSTM-seq2seq-attention is constructed with the LSTM neural network as the neurons, and the attention mechanism is added, and the Mish activation function is selected as the output layer activation function of the entire neural network. To reduce the effect of gradient explosion and increase the stability of model training.

步骤4:神经网络模型训练步骤;利用步骤2中数据预处理后的用电数据,对步骤3的LSTM-seq2seq-attention神经网络进行训练,获得用电正常情况下的电力数据;Step 4: neural network model training step; use the power consumption data after data preprocessing instep 2 to train the LSTM-seq2seq-attention neural network in step 3 to obtain power data under normal power consumption;

用LSTM-seq2seq-attention神经网络训练归一化后的训练集,并将归一化完毕的待检测集代入训练好的LSTM-seq2seq-attention神经网络模型中进行估计,获得用电正常情况下的电力数据。Use the LSTM-seq2seq-attention neural network to train the normalized training set, and substitute the normalized to-be-detected set into the trained LSTM-seq2seq-attention neural network model for estimation, and obtain the normalized power consumption. Power data.

将获取到的检测月之前48个月的用电数据作为训练数据,将收集到的检测月的当月用电数据作为待检测数据,并使用Min-max方法对训练数据和待检测数据进行归一化,归一化到[0,1]范围内。The obtained electricity consumption data for the 48 months before the detection month is used as training data, and the collected electricity consumption data of the current month of detection is used as the data to be detected, and the training data and the data to be detected are normalized using the Min-max method. , normalized to the range [0,1].

步骤5:经济数据估计步骤;以步骤4的用电正常情况下的电力数据作为输入,通过主成分分析法计算获得用电正常情况下的经济数据的估计值;Step 5: an economic data estimation step; using the power data under normal power consumption in step 4 as an input, calculate and obtain the estimated value of economic data under normal power consumption by principal component analysis;

步骤6:用电异常综合指数d计算步骤;根据步骤5的用电正常情况下的经济数据的估计值,计算用电异常综合指数d;Step 6: the step of calculating the comprehensive index d of abnormal electricity consumption; according to the estimated value of the economic data under the normal situation of electricity consumption instep 5, calculate the comprehensive index d of abnormal electricity consumption;

以步骤4的用电正常情况下的电力数据作为输入,通过主成分分析法计算获得用电正常情况下的经济数据的估计值。Taking the power data under normal power consumption in step 4 as input, the estimated value of economic data under normal power consumption is obtained by calculating the principal component analysis method.

步骤7:用电异常判断步骤;根据预设的阈值σ,通过阈值σ与用电异常综合指数d的比较,判断用电是否存在异常。Step 7: the step of judging abnormal electricity consumption; according to the preset threshold σ, by comparing the threshold σ with the comprehensive index d of abnormal electricity consumption, it is judged whether the electricity consumption is abnormal.

设置阈值σ,将待检测经济数据和估计出来的经济数据相减,将阈值和差值相比较,进行用电异常行为判断。用电行为异常检测时,通过设置好阈值,利用最终GDP估计值和待检测值获取的用电异常综合指数d,来评判用户的用电异常行为程度。The threshold σ is set, the economic data to be detected and the estimated economic data are subtracted, and the threshold and the difference are compared to judge the abnormal behavior of electricity consumption. In the abnormality detection of electricity consumption, a threshold value is set, and the abnormal electricity consumption comprehensive index d obtained from the final GDP estimate and the value to be detected is used to judge the degree of abnormal electricity consumption behavior of users.

所述步骤1中,所述时间周期为检测当前月之前的48个月。In thestep 1, the time period is 48 months before the current month is detected.

所述步骤1中,还包括收集检测当前月的用电数据。In thestep 1, it also includes collecting and detecting the electricity consumption data of the current month.

所述步骤3中,所述多层LSTM-seq2seq-attention神经网络包括编码器和解码器,并引入注意力机制。In the step 3, the multi-layer LSTM-seq2seq-attention neural network includes an encoder and a decoder, and an attention mechanism is introduced.

LSTM-seq2seq-attention神经网络主要由编码器和解码器组成,并引入注意力机制。编码器由多层LSTM(Long Short-Term Memory,长短期记忆网络)组成,对输入数据进行编码,并输出编码后的状态;注意力机制在编码器和解码器中间,解码器是由单层LSTM组成,注意力机制部分的输出作为上下文向量和编码器的输出拼接起来作为解码器的输入,同时以每一步的输出值作为下一步的输入值。The LSTM-seq2seq-attention neural network is mainly composed of an encoder and a decoder, and an attention mechanism is introduced. The encoder consists of multi-layer LSTM (Long Short-Term Memory, long short-term memory network), which encodes the input data and outputs the encoded state; the attention mechanism is between the encoder and the decoder, and the decoder is composed of a single layer. It is composed of LSTM, and the output of the attention mechanism part is spliced as the context vector and the output of the encoder as the input of the decoder, and the output value of each step is used as the input value of the next step.

所述步骤4中,训练过程采用Adam优化算法对模型的参数进行优化。In the step 4, the training process adopts the Adam optimization algorithm to optimize the parameters of the model.

所述步骤6中,采用公式(11)计算用户异常用电异常综合检测值d;In the step 6, formula (11) is used to calculate the abnormal comprehensive detection value d of abnormal electricity consumption by users;

d=|h-s|/h*100% (11)d=|h-s|/h*100% (11)

公式(11)中,h是月平均GDP估计值,s是月平均待检测值,待检测值是该企业当月的月平均经济数据GDP。In formula (11), h is the estimated value of the monthly average GDP, s is the monthly average value to be detected, and the value to be detected is the monthly average economic data GDP of the enterprise in that month.

所述步骤7中,通过阈值σ与用电异常综合指数d的比较,判断用户是处于无窃电嫌疑状态、存在窃电嫌疑状态还是可疑用户需报警状态。In the step 7, through the comparison between the threshold σ and the comprehensive index d of abnormal electricity consumption, it is judged whether the user is in a state of no suspicion of electricity theft, a state of suspicion of electricity theft, or a state where a suspicious user needs an alarm.

本发明的基于LSTM-seq2seq-attention模型的中长期用电异常检测方法,主要包括以下7个步骤。The medium and long-term electrical abnormality detection method based on the LSTM-seq2seq-attention model of the present invention mainly includes the following seven steps.

步骤1:数据收集步骤;按照预设的时间周期,收集该时间周期内的用电数据;Step 1: a data collection step; according to a preset time period, collect electricity consumption data within the time period;

如要判断某企业在检测月是否有用电异常行为,需收集检测月之前48个月的用电数据,按天取用电数据。每天的用电数据包括每天的用电负荷数据、每天的经济数据GDP、每天的气象数据。每天的气象数据包括每天的平均降雨量、平均气温、平均湿度、平均风速和平均气压等数据。To judge whether an enterprise has abnormal electricity usage during the testing month, it is necessary to collect electricity usage data for 48 months before the testing month, and obtain electricity usage data on a daily basis. The daily electricity consumption data includes daily electricity load data, daily economic data GDP, and daily meteorological data. The daily meteorological data includes daily average rainfall, average temperature, average humidity, average wind speed and average air pressure.

通过企业所在区域供电公司,获取该公司每天的用电负荷数据;通过该企业所在地区统计局,获取该公司每天的经济数据GDP;通过该企业所在地区气象局,获取该公司所在地的日平均降雨量、日平均气温、日平均湿度、日平均风速和日平均气压。同时,还需要收集检测月当月每天的用电数据,包括每天的用电负荷数据、经济数据GDP、平均降雨量、平均气温、平均湿度、平均风速和平均气压。Obtain the daily electricity load data of the company through the power supply company in the area where the enterprise is located; obtain the daily economic data GDP of the company through the regional statistical bureau of the enterprise; obtain the daily average rainfall of the company's location through the regional meteorological bureau of the enterprise daily average temperature, daily average humidity, daily average wind speed and daily average air pressure. At the same time, it is also necessary to collect daily electricity consumption data in the month of detection, including daily electricity load data, economic data GDP, average rainfall, average temperature, average humidity, average wind speed and average air pressure.

步骤2:数据预处理步骤;对收集到的用电数据进行数据清洗、缺失值补齐和归一化处理;Step 2: data preprocessing step; data cleaning, missing value filling and normalization processing are performed on the collected electricity consumption data;

首先进行数据清洗,删除数据集中的重复数据,残缺数据。接着对历史数据进行缺失值补齐,采用滑动平均窗口法补差,一个列表a中的第i个位置数据为缺失数据,则取前后window个数据的平均值,作为插补数据。First, data cleaning is performed to remove duplicate data and incomplete data in the data set. Then, the missing values of the historical data are filled up, and the sliding average window method is used to make up the difference. The i-th position data in a list a is the missing data, and the average value of the data before and after the window is taken as the imputation data.

例如:列表a=[1,2,4,2,None,6,3,2,1],其中None为缺失数据,选择window=3;For example: list a=[1,2,4,2,None,6,3,2,1], where None is missing data, select window=3;

则None位置的数据为:(2+4+2+6+3+2)/6=3.15。即该数据的插补数据为其前3和后3个数据这6个数据的平均值。Then the data of the None position is: (2+4+2+6+3+2)/6=3.15. That is, the interpolation data of the data is the average of the first 3 data and the last 3 data of the 6 data.

各输入数据之间存在着量纲差异,数量级与量化单位差异较大,为了消除这些数据量纲之间差异对模型训练与估计的影响,使用Min-max方法对输入数据进行标准化,以加快模型收敛速度和提高模型精度。标准化范围为[1,0],其表达式如下式(1)。There are dimensional differences between each input data, and the order of magnitude and quantification unit are quite different. In order to eliminate the impact of these data dimensional differences on model training and estimation, the Min-max method is used to standardize the input data to speed up the model. Convergence speed and improved model accuracy. The normalized range is [1,0], and its expression is as follows (1).

Figure BDA0003248519110000071
Figure BDA0003248519110000071

式(1)中,x_min表示x序列中最小值,x_max表示x序列中最大值,x表示x序列中的值,从x_1~x_n,y表示新生成的序列,从y_1~y_n。In formula (1), x_min represents the minimum value in the x sequence, x_max represents the maximum value in the x sequence, x represents the value in the x sequence, from x_1 to x_n, y represents the newly generated sequence, from y_1 to y_n.

将获取到的检测月之前48个月的数据作为训练数据,将收集到的检测月的当月数据作为待检测数据。The acquired data of 48 months before the detection month is used as training data, and the collected data of the current month of detection is used as the data to be detected.

定义输入序列格式xl={Dl,Tl,Rl,Gl,Sl,Fl,Pl}。Define the input sequence format xl = {Dl , Tl , Rl , Gl , Sl , Fl , Pl }.

其中Dl是日电力负荷数据,Tl是日平均气温,Rl是日平均降雨量,Gl是该公司每天的GDP生产值,Sl是日平均湿度,Fl是日平均风速,Pl是日平均气压。where Dl is the daily power load data, Tl is the daily average temperature, Rl is the daily average rainfall, Gl is the daily GDP production value of the company, Sl is the daily average humidity, Fl is the daily average wind speed, Pl is the daily mean air pressure.

分别对电力负荷数据及其影响因素按上述的步骤进行数据预处理。电力负荷数据进行数据预处理前后对比效果如图1和图2。影响因素进行数据预处理前后对比效果如图4和图5。从图4和5中的比较可以看出,数据预处理的过程,去除了数值异常点,提高了历史数据的质量,为提升估计值的精确度做好了准备。Data preprocessing is performed on the power load data and its influencing factors respectively according to the above steps. The comparison results of the power load data before and after data preprocessing are shown in Figure 1 and Figure 2. The comparison effect of influencing factors before and after data preprocessing is shown in Figure 4 and Figure 5. From the comparison in Figures 4 and 5, it can be seen that the process of data preprocessing removes numerical anomalies, improves the quality of historical data, and prepares for improving the accuracy of estimated values.

步骤3:神经网络模型构建步骤;以LSTM神经网络为神经元,构建多层LSTM-seq2seq-attention神经网络;Step 3: The neural network model construction step; using the LSTM neural network as a neuron, construct a multi-layer LSTM-seq2seq-attention neural network;

从第1天的电力负荷历史数据开始,取第1天的日GDP历史数据G1、日电力负荷历史数据D1、日平均气温历史数据T1、日平均降雨量历史数据R1、日平均湿度历史数据S1、日平均风速历史数据F1、日平均气压历史数据P1,为LSTM-seq2seq-attention神经网络的第1层输入x1={D1,T1,R1,G1,S1,F1,P1}。当前时间的隐藏状态由上一时间和当前时间的输入共同决定,即h1=f(h0,x1)。直到第l天的电力负荷历史数据为止,取第l天的日GDP历史数据Gl,日电力负荷历史数据Dl,日平均气温历史数据Tl,日平均降雨量历史数据Rl,日平均湿度历史数据Sl,日平均湿度历史数据Fl,日平均湿度历史数据。为Seq2Seq-Attention神经网络的第l层输入为xl。其中xl={Dl,Tl,Rl,Gl,Sl,Fl,Pl},当前时间的隐藏状态由上一时间和当前时间的输入共同决定,即hl=f(hl-1,xl)。其中,h是隐藏状态,就是一个向量,LSTM中的隐藏层状态,当前时间的隐藏状态由上一时间和当前时间的输入共同决定,而取决于上一时间多少100%还是50%由常量f决定,常量f可自取。Starting from the historical power load data on the first day, take the daily GDP historical data G1 , the daily power load historical data D1 , the daily average temperature historical data T1 , the daily average rainfall historical data R1 , the daily average historical data on the first day Humidity historical data S1 , daily average wind speed historical data F1 , daily average air pressure historical data P1 , input x1 = {D1 , T1 , R1 , G1 for the first layer of LSTM-seq2seq-attention neural network , S1 , F1 , P1 }. The hidden state of the current time is jointly determined by the input of the previous time and the current time, that is, h1 =f(h0 ,x1 ). Until the historical data of the power load of the lth day, take the historical data of the daily GDP Gl of the lth day, the historical data of the daily power load Dl , the historical data of the daily average temperature Tl , the historical data of the daily average rainfall Rl , the daily average historical data Humidity historical data Sl , daily average humidity historical data Fl , daily average humidity historical data. The input for the l-th layer of the Seq2Seq-Attention neural network is xl . where xl ={Dl , Tl , Rl , Gl , Sl , Fl , Pl }, the hidden state of the current time is determined by the input of the previous time and the current time, that is, hl =f( hl-1 , xl ). Among them, h is the hidden state, which is a vector, the hidden layer state in LSTM, the hidden state of the current time is determined by the input of the previous time and the current time, and depends on how much 100% or 50% of the previous time is determined by the constant f It is decided that the constant f can be taken by itself.

多层LSTM-seq2seq-attention神经网络包括编解码部分、解码器部分。The multi-layer LSTM-seq2seq-attention neural network includes an encoder-decoder part and a decoder part.

编码部分得到各个隐藏层的输出然后汇总,生成语义向量C:C=q(h1,h2,...,hl)。其中,q是用来控制隐藏层输出汇总大小的常量,数值可自取;参数q使得方便计算,适应计算时数量级的要求。The encoding part obtains the outputs of each hidden layer and then summarizes them to generate a semantic vector C: C=q(h1 , h2 , . . . , hl ). Among them, q is a constant used to control the output summary size of the hidden layer, and the value can be taken by yourself; the parameter q makes the calculation convenient and meets the requirements of the order of magnitude in the calculation.

解码部分根据给定的语义向量C和输出序列

Figure BDA0003248519110000081
来估计下一个输出的
Figure BDA0003248519110000082
Figure BDA0003248519110000083
其中g()代表非线性激活函数,
Figure BDA0003248519110000084
表示与输入x对应的输出。The decoding part is based on the given semantic vector C and the output sequence
Figure BDA0003248519110000081
to estimate the next output
Figure BDA0003248519110000082
which is
Figure BDA0003248519110000083
where g() represents the nonlinear activation function,
Figure BDA0003248519110000084
represents the output corresponding to the input x.

多层LSTM-seq2seq-attention神经网络还引入了注意力机制。The multi-layer LSTM-seq2seq-attention neural network also introduces an attention mechanism.

编码部分得到各个隐藏向量h1,h2,...,hl按权重相加得到,生成l=i时的语义向量ci。其中,

Figure BDA0003248519110000085
αij为权重值。In the coding part, each hidden vector h1 , h2 ,..., hl is obtained by adding weights to generate the semantic vector c i when l=i . in,
Figure BDA0003248519110000085
αij is the weight value.

在解码部分定义条件概率,权重值αij由第i-1个输出隐藏状态si-1和输入中各个隐藏状态共同决定的,即

Figure BDA0003248519110000086
eij=a(si-1,hj)。其中eij是编码器Encoder中j时刻Encoder隐层状态hj对解码器Decoder中i时刻隐层状态si的影响程度。通过softmax函数
Figure BDA0003248519110000087
将影响程度eij概率归一化为αij,权重值αij的值越高,表示在第i个输出在第j个输入上分配的注意力越多,在生成第i个输出受第j个输入的影响也越大。由此计算出解码器Decoder的下一个层隐藏状态
Figure BDA0003248519110000088
(解码器i时的隐藏状态),以及该位置的输出
Figure BDA0003248519110000089
The conditional probability is defined in the decoding part, and the weight value αij is jointly determined by the i-1th output hidden state si-1 and each hidden state in the input, namely
Figure BDA0003248519110000086
eij =a(si-1 ,hj ). where eij is the influence of the hidden layer state hj of the Encoder at time j in the encoder Encoder on the hidden layer state si at timei in the decoder Decoder. through the softmax function
Figure BDA0003248519110000087
The probability of the influence degree eij is normalized to αij , the higher the weight value αij is, the more attention is allocated to the jth input on the ith output, and the jth output is affected by the generation of the ith output. The impact of each input is also greater. From this, the hidden state of the next layer of the decoder Decoder is calculated
Figure BDA0003248519110000088
(hidden state at decoder i), and the output at that position
Figure BDA0003248519110000089

本发明还采用了主成分分析法来进行数据的处理。The invention also adopts the principal component analysis method to process the data.

步骤4:神经网络模型训练步骤;利用步骤2中数据预处理后的用电数据,对步骤3的LSTM-seq2seq-attention神经网络进行训练,获得用电正常情况下的电力数据;Step 4: neural network model training step; use the power consumption data after data preprocessing instep 2 to train the LSTM-seq2seq-attention neural network in step 3 to obtain power data under normal power consumption;

训练模型的数据取加利福尼亚州某公司2016年7月1日至2020年6月30日的电力负荷数据以及其影响因素数据。The data for training the model is the power load data and its influencing factors of a company in California from July 1, 2016 to June 30, 2020.

选择基于LSTM的seq2seq模型,seq2seq结构编码器与解码器组成。编码器端具有接受数据灵活的优点,解码器中上一步的输出会作为下一步的数据输入解码器,基于这个特性使其能更好的学习数据间的时序关系,加入注意力机制优化权重分配的同时使用LSTM优秀的数据挖掘能力来解决用电行为异常问题。LSTM-seq2seq-attention神经网络的网络结构如图2所示,网络的相关参数表达如下。Select the LSTM-based seq2seq model, which consists of a seq2seq structure encoder and decoder. The encoder side has the advantage of being flexible in accepting data. The output of the previous step in the decoder will be used as the next data input to the decoder. Based on this feature, it can better learn the time series relationship between the data, and add an attention mechanism to optimize the weight distribution. At the same time, it uses the excellent data mining ability of LSTM to solve the problem of abnormal electricity consumption. The network structure of the LSTM-seq2seq-attention neural network is shown in Figure 2, and the relevant parameters of the network are expressed as follows.

输出ht:ht=ot*tanh(ct)Output ht : ht =ot *tanh(ct )

候选状态:

Figure BDA0003248519110000091
Candidate status:
Figure BDA0003248519110000091

输入门it:it=σ(Wi*Ct-1+Wi*ht-1+Wi*xt+bi);Input gate it : it =σ(Wi *Ct-1 +Wi *ht-1 +Wi *xt +bi );

遗忘门ft:ft=σ(Wi*Ct-1+Wi*ht-1+Wi*xt+bf);Forgetting gate ft : ft =σ(Wi *Ct-1 +Wi *ht-1 +Wi *xt +bf );

细胞状态ct

Figure BDA0003248519110000092
Cell state ct :
Figure BDA0003248519110000092

输出门ot:ot=σ(Wi*Ct-1+Wi*ht-1+Wi*xt+bo)。Output gate ot : ot =σ(Wi *Ct-1 +Wi *ht-1 +Wi *xt +bo ).

LSTM-seq2seq-attention神经网络的输入为:ct-1、xt、ht-1,LSTM-seq2seq-attention神经网络的输出为:ct、htThe input of the LSTM-seq2seq-attention neural network is: ct-1 , xt , ht-1 , and the output of the LSTM-seq2seq-attention neural network is: ct , ht .

遗忘门ft:将上一步细胞状态中信息选择性遗忘,通过sigmoid层实现的“忘记门”。以上一步的ht-1和这一步的xt作为输入,然后为ct-1里的每个数字输出一个0-1间的值,记为ft,表示保留多少信息,1代表完全保留,0表示完全舍弃。Forgetting gate ft : a "forget gate" implemented by the sigmoid layer to selectively forget the information in the cell state in the previous step. The ht-1 in the previous step and the xt in this step are used as input, and then a value between 0-1 is output for each number in ct-1 , denoted as ft , which indicates how much information is retained, and 1 represents complete retention. , 0 means completely discarded.

输入门it:决定在细胞状态里存什么,将新的信息选择性的记录到细胞状态中。sigmoid层(输入门层)决定我们要更新什么值,这个概率表示为it。tanh层创建一个候选值向量

Figure BDA0003248519110000093
将会被增加到细胞状态中。其中,W是权重矩阵。Inputgate it: decide what to store in the cell state, and selectively record new information into the cell state. The sigmoid layer (input gate layer) decides what value we want to update, and this probability isdenoted as it. The tanh layer creates a vector of candidate values
Figure BDA0003248519110000093
will be added to the cell state. where W is the weight matrix.

输出门ot:通过sigmoid层(输出层门)来决定输出的本细胞状态ct的哪些部分,然后我们将细胞状态通过tanh层(使值在-1~1之间),然后与sigmoid层的输出相乘得到最终的输出htOutput gate ot : through the sigmoid layer (output layer gate) to determine which parts of the output cell state ct , and then we pass the cell state through the tanh layer (to make the value between -1 ~ 1), and then with the sigmoid layer The outputs of are multiplied to obtain the final output ht .

其中,bo和bc、bi、bf一样是对应的该门的偏置参数。σ是sigmiod函数,可理解为阈值。Among them, bo , like bc , bi , and bf , are the corresponding bias parameters of the gate. σ is a sigmiod function, which can be understood as a threshold.

Sigmiod函数表达式:

Figure BDA0003248519110000101
Sigmiod function expression:
Figure BDA0003248519110000101

Tanh函数表达式:

Figure BDA0003248519110000102
Tanh function expression:
Figure BDA0003248519110000102

最后模型在输出层使用了Mish激活函数,利用该激活函数在区间内非常平滑的特点,进一步避免梯度消失和梯度爆炸问题。Finally, the model uses the Mish activation function in the output layer, which is very smooth in the interval to further avoid the problems of gradient disappearance and gradient explosion.

Mish函数的函数表达式如下式(4)所示。The function expression of the Mish function is shown in the following formula (4).

Mish=x*tanh(Ln(1+ex)) (4)Mish=x*tanh(Ln(1+ex )) (4)

最小化目标函数为:

Figure BDA0003248519110000103
The objective function to minimize is:
Figure BDA0003248519110000103

使用Adam优化算法对模型进行参数优化,表达式如下式(5)~(10)。The parameters of the model are optimized using the Adam optimization algorithm, and the expressions are as follows (5)-(10).

Figure BDA0003248519110000104
Figure BDA0003248519110000104

mt=β1·mt-1+(1-β1)·gt (6)mt1 ·mt-1 +(1-β1 )·gt (6)

vt=β2·vt-1+(1-β2)·gt2 (7)vt2 ·vt-1 +(1-β2 )·gt2 (7)

Figure BDA0003248519110000105
Figure BDA0003248519110000105

Figure BDA0003248519110000106
Figure BDA0003248519110000106

Figure BDA0003248519110000107
Figure BDA0003248519110000107

δ是学习速率或称为步长,αij为权重值,β1是第一次矩估计的指数衰减率,β2第二次矩估计的指数衰次减率,mt为有偏一阶矩估计,vt有偏二阶矩估计,

Figure BDA0003248519110000108
修正后的有偏一阶矩估计,
Figure BDA0003248519110000109
修正后的有偏二阶矩估计,θt为更新参数,t表示时间。δ is the learning rate or step size, αij is the weight value, β1 is the exponential decay rate of the first moment estimation, β2 is the exponential decay rate of the second moment estimation, mt is the biased first order moment estimate, vt has a partial second-order moment estimate,
Figure BDA0003248519110000108
The revised biased first-order moment estimate,
Figure BDA0003248519110000109
The revised biased second-order moment estimate, θt is the update parameter, and t is the time.

式(6)、式(7)分别是对梯度的一阶矩估计和二阶矩估计,可以看作是对期望E|gt|和E|gt|的估计。式(8)、式(9)是对一阶二阶矩估计的校正,这样可以近似为对期望的无偏估计。Equation (6) and Equation (7) are the estimation of the first-order moment and the second-order moment of the gradient, respectively, which can be regarded as estimates of the expected E|gt | and E|gt |. Equations (8) and (9) are corrections to the first-order second-order moment estimates, which can be approximated as unbiased estimates of expectations.

Adam优化算法具有计算效率高,适合大规模数据运算和参数优化、对内存需求少等优点,而且相较于经典的随机梯度下降算法能更有效地更新网络权重。The Adam optimization algorithm has the advantages of high computational efficiency, suitable for large-scale data operations and parameter optimization, and less memory requirements. Compared with the classic stochastic gradient descent algorithm, the Adam optimization algorithm can update the network weights more effectively.

LSTM-seq2seq-attention神经网络的结构如图3。The structure of the LSTM-seq2seq-attention neural network is shown in Figure 3.

如图3,LSTM-seq2seq-attention神经网络由编码器和解码器组成,并引入注意力机制。As shown in Figure 3, the LSTM-seq2seq-attention neural network consists of an encoder and a decoder, and an attention mechanism is introduced.

编码器由多层LSTM组成,对输入数据进行编码,并输出编码后的状态。注意力机制在编码器和解码器中间。The encoder consists of multiple layers of LSTM, encodes the input data, and outputs the encoded state. The attention mechanism is between the encoder and the decoder.

解码器是由单层LSTM组成,注意力机制部分的输出作为上下文向量和编码器的输出拼接起来作为解码器的输入,同时以每一步的输出值作为下一步的输入值。The decoder is composed of a single-layer LSTM. The output of the attention mechanism is spliced as the context vector and the output of the encoder as the input of the decoder, and the output value of each step is used as the input value of the next step.

用LSTM-seq2seq-attention神经网络训练归一化后的训练集,并将归一化完毕的待检测数据代入训练好的LSTM-seq2seq-attention神经网络模型中进行估计,获得用电正常情况下的电力数据。Use the LSTM-seq2seq-attention neural network to train the normalized training set, and substitute the normalized data to be detected into the trained LSTM-seq2seq-attention neural network model for estimation, and obtain the normalized electricity consumption. Power data.

步骤5:经济数据估计步骤;以步骤4的用电正常情况下的电力数据作为输入,通过主成分分析法计算获得用电正常情况下的经济数据的估计值;Step 5: an economic data estimation step; taking the power data in step 4 under normal power consumption conditions as input, and calculating the estimated value of economic data under normal power consumption conditions through principal component analysis;

获取待检测月每天的GDP数据G0、日平均气温数据T0、日平均降雨量数据R0、日平均湿度数据S0、日平均风速数据F0、日平均气压数据P0等作为待检测数据,将待检测数据导入到训练好的LSTM-seq2seq-attention神经网络中,求出储能系统工作当月的电力负荷估计数据

Figure BDA0003248519110000111
检测月数据取加利福尼亚州某公司2020年7月1日至2020年7月31日数据。电力负荷估计值如图8。Obtain the daily GDP data G0 , the daily average temperature data T0 , the daily average rainfall data R0 , the daily average humidity data S0 , the daily average wind speed data F0 , and the daily average air pressure data P0 of the month to be detected as the data to be detected. data, import the data to be detected into the trained LSTM-seq2seq-attention neural network, and obtain the estimated power load data for the month when the energy storage system works
Figure BDA0003248519110000111
The testing month data is from July 1, 2020 to July 31, 2020 of a company in California. The estimated electrical load is shown in Figure 8.

将得到的当月电力估计数据

Figure BDA0003248519110000112
结合同时期的气温数据、降雨量数据、湿度数据、风速数据、气压数据等数据,进行主成分分析,得到同时期的GDP数值。The estimated electricity data for the current month will be obtained
Figure BDA0003248519110000112
Combined with the temperature data, rainfall data, humidity data, wind speed data, air pressure data and other data in the same period, carry out principal component analysis to obtain the GDP value of the same period.

确定影响GDP的6个指标:用电负荷数据、气温、降雨量、湿度、风速和气压,收集l个月的指标数值,每月的6个指标分别为a1,a2,a3,a4,a5,a6,则可得l*6阶矩阵。记原来的变量指标为a1,a2,a3,a4,a5,a6,它们的综合指标(新变量指标)为b1,b2,...,bl,新的指标由原来的指标a1,a2,a3,a4,a5,a6线性表示。Determine 6 indicators that affect GDP: electricity load data, temperature, rainfall, humidity, wind speed and air pressure, and collect the index values for one month. The 6 indicators per month are a1 , a2 , a3 , a4 , a5 , a6 , then a matrix of order l*6 can be obtained. Remember the original variable index as a1 , a2 , a3 , a4 , a5 , a6 , their comprehensive index (new variable index) is b1 , b2 , ..., bl , the new index It is linearly represented by the original indexes a1 , a2 , a3 , a4 , a5 , and a6 .

观测得样本矩阵

Figure BDA0003248519110000121
所述样本矩阵A标准化后表示为下式(2)。其中提到的指标,例如a1是电力负荷数据,只不过在二维数组中写成a11~aL1(就是A的第一列))。observed sample matrix
Figure BDA0003248519110000121
The sample matrix A is expressed as the following formula (2) after normalization. The indicators mentioned in it, such as a1 is the power load data, but it is written as a11~aL1 in the two-dimensional array (that is, the first column of A)).

Figure BDA0003248519110000122
Figure BDA0003248519110000122

计算样本的相关系数矩阵R,R=corrcoef(x)。其中,相关系数矩阵R为

Figure BDA0003248519110000123
rij(i=1,2,...,l;j=1,2,3,4,5,6)是原来变量ai和aj的相关系数,rij计算公式如下式(3)所示。Calculate the correlation coefficient matrix R of the samples, R=corrcoef(x). Among them, the correlation coefficient matrix R is
Figure BDA0003248519110000123
rij (i= 1, 2,. shown.

Figure BDA0003248519110000124
Figure BDA0003248519110000124

式(3)中,

Figure BDA0003248519110000125
Figure BDA0003248519110000126
为第i个和第j个指标的平均值。In formula (3),
Figure BDA0003248519110000125
and
Figure BDA0003248519110000126
is the average of the ith and jth indicators.

对应于相关系数矩阵R,用雅克比方法求特征方程的6个非负的特征值,对应于6个非负的特征值λ1~λ6:λ1>λ2>λ3>λ4>λ5>λ6>0。Corresponding to the correlation coefficient matrix R, the Jacobian method is used to find 6 non-negative eigenvalues of the characteristic equation, corresponding to the 6 non-negative eigenvalues λ1 ~λ6 : λ1234 > λ56 >0.

选择3个主成分,如果前面3个主成分的方差和全部总方差的比例接近于1时,就选取前面3个因子作为第1主成分、第2主成分、第3主成分。这样因子数目将由6个减少为3个,起到了筛选因子的作用。选取V>85%的成分,其中

Figure BDA0003248519110000127
Three principal components are selected. If the ratio of the variance of the first three principal components to the total variance is close to 1, the first three factors are selected as the first principal component, the second principal component, and the third principal component. In this way, the number of factors will be reduced from 6 to 3, which plays the role of screening factors. Select components with V > 85%, where
Figure BDA0003248519110000127

基于主成分分析的负荷估计误差总体上小于未经主成分分析的负荷估计误差,将影响因素由6个减少到3个,在保留原有信息的情况下减少需要计算的影响因素个数,减少计算量,提高估计准确性。The load estimation error based on principal component analysis is generally smaller than the load estimation error without principal component analysis, and the influencing factors are reduced from 6 to 3, and the number of influencing factors to be calculated is reduced while retaining the original information. The amount of calculation to improve the estimation accuracy.

进行主成分分析,选取3个影响度最高的主成分分别为气温、降雨量和风速带入进行计算。得到同时期的GDP数值。之后通过图形来比较GDP数据的估计值与检测值。图9为待检测月GDP待检测值与估计值对比图。Principal component analysis was carried out, and the three most influential principal components were selected as temperature, rainfall and wind speed for calculation. Get the GDP value for the same period. The estimated and detected GDP data are then graphically compared. Figure 9 is a comparison diagram of the to-be-detected monthly GDP value to be detected and the estimated value.

步骤6:用电异常综合指数d计算步骤;根据步骤5的用电正常情况下的经济数据的估计值,计算用电异常综合指数d;Step 6: the step of calculating the comprehensive index d of abnormal electricity consumption; according to the estimated value of the economic data under the normal situation of electricity consumption instep 5, calculate the comprehensive index d of abnormal electricity consumption;

步骤7:用电异常判断步骤;根据预设的阈值σ,通过阈值σ与用电异常综合指数d的比较,判断用电是否存在异常。Step 7: the step of judging abnormal electricity consumption; according to the preset threshold σ, by comparing the threshold σ with the comprehensive index d of abnormal electricity consumption, it is judged whether the electricity consumption is abnormal.

为实现异常用电检测,还需获取用户当前历史数据。通过选定区域统计局获取以当前监测时间为开始时间之后的时间段s的历史经济数据。In order to detect abnormal power consumption, it is also necessary to obtain the current historical data of the user. Obtain historical economic data for the time period s after the current monitoring time through the selected regional statistical bureau.

采用公式(11)计算用户异常用电异常综合检测值d。Formula (11) is used to calculate the comprehensive detection value d of abnormal power consumption of users.

d=|h-s|/h*100% (11)d=|h-s|/h*100% (11)

公式(11)中,h是月平均GDP估计值,s是月平均待检测值,待检测值是该企业当月的月平均经济数据GDP。In formula (11), h is the estimated value of the monthly average GDP, s is the monthly average value to be detected, and the value to be detected is the monthly average economic data GDP of the enterprise in that month.

设置阈值σ,具体判断规则如下。The threshold σ is set, and the specific judgment rules are as follows.

若d<σ,则用户无窃电嫌疑,并将当前监测月份电量数据、GDP数据、气温数据、降雨数据、气温数据、风速数据和气压数据添加至历史数据并覆盖之前的同期数据,保证过往数据的准确性。If d<σ, the user is not suspected of stealing electricity, and the current monitoring month electricity data, GDP data, air temperature data, rainfall data, air temperature data, wind speed data and air pressure data are added to the historical data and cover the previous contemporaneous data to ensure the past data accuracy.

若σ<d<2σ,则用户存在窃电嫌疑,或存在电力设备故障等突发事故引起误差,输出为可疑事件并作初步报告。If σ<d<2σ, the user is suspected of stealing electricity, or there is an error caused by a sudden accident such as a power equipment failure, and the output is a suspicious event and a preliminary report is made.

若d>2σ,则输出为可疑用户并报警。If d>2σ, the output is a suspicious user and an alarm is given.

在大部分情况下,该模型可作为判断是否窃电的一个判别方法,但不能作为确认判别依据。该方法主要目的是方便供电企业进行企业窃电的监测,提高企业的监测效率,减少工作量及检测成本,是否窃电还需更为严谨的检测。出现可疑用户时,需后续调研确定是否有特殊情况。例如:特殊节假日或外部不可抗力等影响因素。In most cases, the model can be used as a discriminating method for judging whether electricity is stolen, but it cannot be used as a basis for confirmation. The main purpose of this method is to facilitate the monitoring of power theft by power supply enterprises, improve the monitoring efficiency of enterprises, reduce workload and detection costs, and more rigorous detection is required for power theft. When suspicious users appear, follow-up investigations are required to determine whether there are special circumstances. For example: special holidays or external force majeure and other influencing factors.

本发明的基于LSTM-seq2seq-attention模型的中长期用电异常检测方法,使用seq2seq网络结构搭建预测模型,使用LSTM神经网络单元组成seq2seq结构的编码器与解码器,以增强模型学习数据时序性的学习力,并在模型中加入attention注意力及机制,优化网络权重配置,为了提升预测精度,为更好地完成预测任务,将多个外界影响因素纳入考量,增强预测模型的稳定性,准确性。The medium and long-term electricity abnormality detection method based on the LSTM-seq2seq-attention model of the present invention uses the seq2seq network structure to build a prediction model, and uses the LSTM neural network unit to form an encoder and a decoder of the seq2seq structure, so as to enhance the model learning data time series. Learning ability, and add attention and mechanism to the model, optimize the network weight configuration, in order to improve the prediction accuracy, in order to better complete the prediction task, take multiple external factors into consideration, enhance the stability and accuracy of the prediction model .

对于本领域技术人员而言,显然本发明不限于上述示范性实施例的细节,而且在不背离本发明的精神或基本特征的情况下,能够以其他的具体形式实现本发明。因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本发明的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化囊括在本发明内。不应将权利要求中的任何附图标记视为限制所涉及的权利要求。It will be apparent to those skilled in the art that the present invention is not limited to the details of the above-described exemplary embodiments, but that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics of the invention. Therefore, the embodiments are to be regarded in all respects as illustrative and not restrictive, and the scope of the invention is to be defined by the appended claims rather than the foregoing description, which are therefore intended to fall within the scope of the claims. All changes within the meaning and scope of the equivalents of , are included in the present invention. Any reference signs in the claims shall not be construed as limiting the involved claim.

此外,应当理解,虽然本说明书按照实施方式加以描述,但并非每个实施方式仅包含一个独立的技术方案,说明书的这种叙述方式仅仅是为清楚起见,本领域技术人员应当将说明书作为一个整体,各实施例中的技术方案也可以经适当组合,形成本领域技术人员可以理解的其他实施方式。In addition, it should be understood that although this specification is described in terms of embodiments, not each embodiment only includes an independent technical solution, and this description in the specification is only for the sake of clarity, and those skilled in the art should take the specification as a whole , the technical solutions in each embodiment can also be appropriately combined to form other implementations that can be understood by those skilled in the art.

Claims (9)

Translated fromChinese
1.一种基于LSTM-seq2seq-attention模型的中长期用电异常检测方法,其特征在于,包括以下几个步骤:1. A medium and long-term electrical abnormality detection method based on LSTM-seq2seq-attention model, is characterized in that, comprises the following steps:步骤1:数据收集步骤;按照预设的时间周期,收集该时间周期内的用电数据;Step 1: a data collection step; according to a preset time period, collect electricity consumption data within the time period;步骤2:数据预处理步骤;对收集到的用电数据进行数据清洗、缺失值补齐和归一化处理;Step 2: data preprocessing step; data cleaning, missing value filling and normalization processing are performed on the collected electricity consumption data;步骤3:神经网络模型构建步骤;以LSTM神经网络为神经元,构建多层LSTM-seq2seq-attention神经网络;Step 3: The neural network model construction step; using the LSTM neural network as a neuron, construct a multi-layer LSTM-seq2seq-attention neural network;步骤4:神经网络模型训练步骤;利用步骤2中数据预处理后的用电数据,对步骤3的LSTM-seq2seq-attention神经网络进行训练,获得用电正常情况下的电力数据;Step 4: neural network model training step; use the power consumption data after data preprocessing in step 2 to train the LSTM-seq2seq-attention neural network in step 3 to obtain power data under normal power consumption;步骤5:经济数据估计步骤;以步骤4的用电正常情况下的电力数据作为输入,通过主成分分析法计算获得用电正常情况下的经济数据的估计值;Step 5: an economic data estimation step; taking the power data in step 4 under normal power consumption conditions as input, and calculating the estimated value of economic data under normal power consumption conditions through principal component analysis;步骤6:用电异常综合指数d计算步骤;根据步骤5的用电正常情况下的经济数据的估计值,计算用电异常综合指数d;Step 6: the step of calculating the comprehensive index d of abnormal electricity consumption; according to the estimated value of the economic data under the normal situation of electricity consumption in step 5, calculate the comprehensive index d of abnormal electricity consumption;步骤7:用电异常判断步骤;根据预设的阈值σ,通过阈值σ与用电异常综合指数d的比较,判断用电是否存在异常。Step 7: the step of judging abnormal electricity consumption; according to the preset threshold σ, by comparing the threshold σ with the comprehensive index d of abnormal electricity consumption, it is judged whether the electricity consumption is abnormal.2.根据权利要求1所述的基于LSTM-seq2seq-attention模型的中长期用电异常检测方法,其特征在于,所述步骤1中,所述时间周期为检测当前月之前的48个月。2. The medium- and long-term electricity abnormality detection method based on the LSTM-seq2seq-attention model according to claim 1, wherein in the step 1, the time period is 48 months before the current month is detected.3.根据权利要求1所述的基于LSTM-seq2seq-attention模型的中长期用电异常检测方法,其特征在于,所述用电数据包括用电负荷数据、经济数据GDP和气象数据。3 . The medium and long-term electricity abnormality detection method based on the LSTM-seq2seq-attention model according to claim 1 , wherein the electricity consumption data includes electricity consumption load data, economic data GDP and meteorological data. 4 .4.根据权利要求3所述的基于LSTM-seq2seq-attention模型的中长期用电异常检测方法,其特征在于,所述气象数据包括降雨量、气温、湿度数据、风速、气压。4. The medium- and long-term electricity abnormality detection method based on the LSTM-seq2seq-attention model according to claim 3, wherein the meteorological data includes rainfall, air temperature, humidity data, wind speed, and air pressure.5.根据权利要求1所述的基于LSTM-seq2seq-attention模型的中长期用电异常检测方法,其特征在于,所述步骤1中,还包括收集检测当前月的用电数据。5. The medium and long-term abnormality detection method for electricity consumption based on the LSTM-seq2seq-attention model according to claim 1, wherein in the step 1, the method further comprises collecting and detecting electricity consumption data of the current month.6.根据权利要求1所述的基于LSTM-seq2seq-attention模型的中长期用电异常检测方法,其特征在于,所述步骤3中,所述多层LSTM-seq2seq-attention神经网络包括编码器和解码器,并引入注意力机制。6. The medium and long-term electricity abnormality detection method based on LSTM-seq2seq-attention model according to claim 1, is characterized in that, in described step 3, described multilayer LSTM-seq2seq-attention neural network comprises encoder and decoder, and introduce attention mechanism.7.根据权利要求1所述的基于LSTM-seq2seq-attention模型的中长期用电异常检测方法,其特征在于,所述步骤4中,训练过程采用Adam优化算法对模型的参数进行优化。7. The medium- and long-term electricity abnormality detection method based on the LSTM-seq2seq-attention model according to claim 1, wherein in the step 4, the training process adopts the Adam optimization algorithm to optimize the parameters of the model.8.根据权利要求1所述的基于LSTM-seq2seq-attention模型的中长期用电异常检测方法,其特征在于,所述步骤6中,采用公式(11)计算用户异常用电异常综合检测值d;8. the medium and long-term abnormal electricity detection method based on LSTM-seq2seq-attention model according to claim 1, is characterized in that, in described step 6, adopts formula (11) to calculate user's abnormal comprehensive detection value d of abnormal electricity consumption ;d=|h-s|/h*100% (11)d=|h-s|/h*100% (11)公式(11)中,h是月平均GDP估计值,s是月平均待检测值,待检测值是该企业当月的月平均经济数据GDP。In formula (11), h is the estimated value of the monthly average GDP, s is the monthly average value to be detected, and the value to be detected is the monthly average economic data GDP of the enterprise in that month.9.根据权利要求1所述的基于LSTM-seq2seq-attention模型的中长期用电异常检测方法,其特征在于,所述步骤7中,通过阈值σ与用电异常综合指数d的比较,判断用户是处于无窃电嫌疑状态、存在窃电嫌疑状态还是可疑用户需报警状态。9. The medium and long-term electricity abnormality detection method based on LSTM-seq2seq-attention model according to claim 1, is characterized in that, in described step 7, through the comparison of threshold value σ and electricity consumption abnormal comprehensive index d, judge user Whether it is in the state of no electricity stealing suspicion, the existence of electricity stealing suspicion state, or the state where the suspicious user needs to report to the police.
CN202111039397.1A2021-09-062021-09-06Middle-long-term electricity utilization abnormality detection method based on LSTM-seq2seq-attention modelActiveCN113779879B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202111039397.1ACN113779879B (en)2021-09-062021-09-06Middle-long-term electricity utilization abnormality detection method based on LSTM-seq2seq-attention model

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202111039397.1ACN113779879B (en)2021-09-062021-09-06Middle-long-term electricity utilization abnormality detection method based on LSTM-seq2seq-attention model

Publications (2)

Publication NumberPublication Date
CN113779879Atrue CN113779879A (en)2021-12-10
CN113779879B CN113779879B (en)2024-06-18

Family

ID=78841082

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202111039397.1AActiveCN113779879B (en)2021-09-062021-09-06Middle-long-term electricity utilization abnormality detection method based on LSTM-seq2seq-attention model

Country Status (1)

CountryLink
CN (1)CN113779879B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114401135A (en)*2022-01-142022-04-26国网河北省电力有限公司电力科学研究院Internal threat detection method based on LSTM-Attention user and entity behavior analysis technology
CN114418071A (en)*2022-01-242022-04-29中国光大银行股份有限公司Cyclic neural network training method
CN114936523A (en)*2022-05-182022-08-23山东浪潮智慧医疗科技有限公司 A method and system for predicting inflow runoff of a hydroelectric power station

Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20190081476A1 (en)*2017-09-122019-03-14Sas Institute Inc.Electric power grid supply and load prediction
CN112163689A (en)*2020-08-182021-01-01国网浙江省电力有限公司绍兴供电公司 Short-term load quantile probability prediction method based on deep Attention-LSTM
CN112288137A (en)*2020-10-092021-01-29国网电力科学研究院有限公司 An LSTM short-term load forecasting method and device considering electricity price and Attention mechanism
CN112308402A (en)*2020-10-292021-02-02复旦大学 Anomaly detection method for power time series data based on long short-term memory network
CN113139605A (en)*2021-04-272021-07-20武汉理工大学Power load prediction method based on principal component analysis and LSTM neural network
CN117689229A (en)*2023-12-142024-03-12国网北京市电力公司 Forecasting methods and forecasting devices for GDP data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20190081476A1 (en)*2017-09-122019-03-14Sas Institute Inc.Electric power grid supply and load prediction
CN112163689A (en)*2020-08-182021-01-01国网浙江省电力有限公司绍兴供电公司 Short-term load quantile probability prediction method based on deep Attention-LSTM
CN112288137A (en)*2020-10-092021-01-29国网电力科学研究院有限公司 An LSTM short-term load forecasting method and device considering electricity price and Attention mechanism
CN112308402A (en)*2020-10-292021-02-02复旦大学 Anomaly detection method for power time series data based on long short-term memory network
CN113139605A (en)*2021-04-272021-07-20武汉理工大学Power load prediction method based on principal component analysis and LSTM neural network
CN117689229A (en)*2023-12-142024-03-12国网北京市电力公司 Forecasting methods and forecasting devices for GDP data

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
XIN WU,ET AL: "Electricity Consumption and Weather Reflect Macro-Economic Status", 《2019 IEEE PES INNOVATIVE SMART GRID TECHNOLOGIES ASIA》*
ZHIFENG LIN,ET AL: "Electricity Consumption Prediction Based on LSTM with Attention Mechanism", 《IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING》, vol. 15, no. 4*
丁柏宏: "智能电网环境下的短期负荷预测研究", 《中国优秀硕士学位论文全文数据库 (工程科技Ⅱ辑)》, no. 07*
周克男;刘进波;: "基于主成分分析的BP神经网络预测电力负荷", 数学学习与研究, no. 23*
陈素玲;姚建刚;龚磊;: "基于加权偏最小二乘回归的中长期负荷预测", 电力需求侧管理, no. 01*

Cited By (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114401135A (en)*2022-01-142022-04-26国网河北省电力有限公司电力科学研究院Internal threat detection method based on LSTM-Attention user and entity behavior analysis technology
CN114418071A (en)*2022-01-242022-04-29中国光大银行股份有限公司Cyclic neural network training method
CN114936523A (en)*2022-05-182022-08-23山东浪潮智慧医疗科技有限公司 A method and system for predicting inflow runoff of a hydroelectric power station

Also Published As

Publication numberPublication date
CN113779879B (en)2024-06-18

Similar Documents

PublicationPublication DateTitle
CN111914873B (en)Two-stage cloud server unsupervised anomaly prediction method
CN111813084B (en)Mechanical equipment fault diagnosis method based on deep learning
CN108197648B (en)Hydroelectric generating set fault diagnosis method and system based on LSTM deep learning model
CN113779879A (en) A medium and long-term electrical abnormality detection method based on LSTM-seq2seq-attention model
CN111191855B (en)Water quality abnormal event identification and early warning method based on pipe network multi-element water quality time sequence data
CN110636066B (en) Network security threat situation assessment method based on unsupervised generative reasoning
Xie et al.Anomaly detection for multivariate times series through the multi-scale convolutional recurrent variational autoencoder
CN108197743A (en)A kind of prediction model flexible measurement method based on deep learning
CN115470850B (en) A water quality abnormality event identification and early warning method based on water quality spatiotemporal data of pipe network
CN117591942B (en) A method, system, medium and device for detecting abnormality of power load data
CN114519923B (en)Intelligent diagnosis and early warning method and system for power plant
CN114841250A (en) Anomaly detection and diagnosis method for industrial system production based on multi-dimensional sensor data
CN110119758A (en)A kind of electricity consumption data abnormality detection and model training method, device
CN113988210B (en) Method, device and storage medium for repairing distorted data of structural monitoring sensor network
CN114357670A (en)Power distribution network power consumption data abnormity early warning method based on BLS and self-encoder
CN118690226A (en) A tower condition monitoring method based on time series
CN117072891A (en)Real-time intelligent leakage monitoring and positioning method for hydrogen conveying pipe network under abnormal sample-free condition
CN116680639A (en)Deep-learning-based anomaly detection method for sensor data of deep-sea submersible
CN116819423A (en)Method and system for detecting abnormal running state of gateway electric energy metering device
CN117452063A (en) A semi-supervised power theft time location method
Liu et al.Information-based Gradient enhanced Causal Learning Graph Neural Network for fault diagnosis of complex industrial processes
CN118939699A (en) Anomaly detection method for multi-period time series based on spatiotemporal graph neural network
CN120277398A (en)Drilling overflow early-stage identification method based on self-supervision learning
CN117113243A (en)Photovoltaic equipment abnormality detection method
CN120408188A (en) A production anomaly early warning method based on anomaly detection

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp