CN107169628A

Movatterモバイル変換

Info

Publication number: CN107169628A
Application number: CN201710244420.8A
Authority: CN
Inventors: 李妍; 盛梦雨; 刘婉兵; 杜明秋; 杨秉臻; 杨晨光; 王少荣
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2017-04-14
Filing date: 2017-04-14
Publication date: 2017-09-15
Anticipated expiration: 2037-04-14
Also published as: CN107169628B

Abstract

Translated fromChinese

本发明涉及配电网规划领域，提供一种基于大数据互信息属性约简的配电网可靠性评估方法，该方法从大数据出发，利用粗糙集中的互信息概念衡量基本指标之间的相关性，筛选海量多类指标中与可靠性指标强相关且相互独立的指标，以这些指标作为输入，用基于遗传算法的BP神经网络模型开展配电网可靠性评估工作。本发明突破了传统的蒙特卡洛模拟和解析法的局限，针对电力大数据，实现基于大数据互信息属性约简的配电网可靠性评估。

The invention relates to the field of distribution network planning, and provides a distribution network reliability evaluation method based on big data mutual information attribute reduction. The method starts from big data and uses the concept of rough set mutual information to measure the correlation between basic indicators. In order to select the indicators that are strongly correlated with the reliability indicators and independent of each other among the massive multi-type indicators, and use these indicators as input, the BP neural network model based on the genetic algorithm is used to carry out the distribution network reliability evaluation work. The invention breaks through the limitations of the traditional Monte Carlo simulation and analysis method, and realizes the distribution network reliability evaluation based on the big data mutual information attribute reduction for electric power big data.

Description

Translated fromChinese

一种基于大数据互信息属性约简的配电网可靠性评估方法A Reliability Evaluation Method of Distribution Network Based on Big Data Mutual Information Attribute Reduction

技术领域technical field

本发明涉及配电网规划领域，具体涉及一种基于大数据互信息属性约简的配电网可靠性评估方法。The invention relates to the field of distribution network planning, in particular to a distribution network reliability evaluation method based on big data mutual information attribute reduction.

背景技术Background technique

随着互联网、数据库等技术的发展和生产环境的自动化，金融、电力、气象等领域产生了海量种类繁多且增长迅速的数据，称之为大数据，如今大数据已渗透到各个领域，成为重要的生产因素，并因其巨大的利用价值正在成为推动产业变革的新引擎。对大数据加以挖掘分析，提取其主要信息并合理运用，才能实现大数据的价值，配电网可靠性是一个与多种因素强相关的技术指标，其中与配电网可靠性相关的有气温、风速、售电量、线损率等多方面数据。传统可靠性指标一般通过建模或抽样模拟用多个指标进行评估，如负荷点指标、停电时间指标、停电经济类指标等，但解析法在处理复杂电力系统时有很大局限性、蒙特卡洛抽样法状态冗余性而导致的耗时很长，大数据技术为开展配电网可靠性评估提供了新思路。With the development of technologies such as the Internet and databases and the automation of the production environment, a large amount of diverse and rapidly growing data has been generated in the fields of finance, electricity, and meteorology, which are called big data. Today, big data has penetrated into various fields and has become an important production factors, and because of its huge utilization value, it is becoming a new engine to promote industrial transformation. Only by mining and analyzing big data, extracting its main information and using it reasonably can the value of big data be realized. The reliability of distribution network is a technical index that is strongly related to many factors, among which the reliability of distribution network is related to air temperature , wind speed, electricity sales, line loss rate and other data. Traditional reliability indicators are generally evaluated by multiple indicators through modeling or sampling simulation, such as load point indicators, outage time indicators, outage economic indicators, etc., but the analytical method has great limitations when dealing with complex power systems. Due to the state redundancy of Luo sampling method, the time-consuming is very long, and the big data technology provides a new idea for the distribution network reliability assessment.

发明内容Contents of the invention

本发明的目的是为了克服上述现有技术的不足之处，提出一种基于大数据互信息属性约简的配电网可靠性评估方法，从大数据出发，利用粗糙集中的互信息概念衡量基本指标之间的相关性，筛选海量多类指标中与可靠性指标强相关且相互独立的指标，以这些指标作为输入、用基于遗传算法的BP神经网络模型开展配电网可靠性评估工作。本发明突破了传统的蒙特卡洛模拟和解析法的局限，针对电力大数据，实现基于大数据互信息属性约简的配电网可靠性评估。The purpose of the present invention is to overcome the deficiencies of the above-mentioned prior art, and propose a distribution network reliability evaluation method based on big data mutual information attribute reduction. The correlation between indicators, screening the indicators that are strongly correlated with the reliability indicators and independent of each other among the massive multi-type indicators, using these indicators as input, and using the BP neural network model based on the genetic algorithm to carry out the distribution network reliability assessment work. The invention breaks through the limitations of the traditional Monte Carlo simulation and analysis method, and realizes the distribution network reliability evaluation based on the big data mutual information attribute reduction for electric power big data.

本发明的目的是通过以下技术措施实现的。The purpose of the present invention is achieved through the following technical measures.

一种基于大数据互信息属性约简的配电网可靠性评估方法，该方法对与配电网可靠性有关的指标进行预处理，包括连续型指标的离散化，基于信息熵的概念计算指标之间的互信息值，进行去量纲操作后得到指标间的熵相关系数，据此判断各指标与可靠性指标的相关性和各指标相互之间的相关性，进行指标约简，再针对约简后得到的与可靠性指标强相关且相互独立的指标利用BP神经网络拟合它们的非线性关系，并结合遗传算法的寻优特性弥补神经网络方法的不足。具体包括以下步骤：A distribution network reliability evaluation method based on big data mutual information attribute reduction, which preprocesses the indicators related to distribution network reliability, including the discretization of continuous indicators, and calculates indicators based on the concept of information entropy The mutual information value between the indicators, after the dimensionless operation, the entropy correlation coefficient between the indicators is obtained, based on which the correlation between each indicator and the reliability indicator and the correlation between each indicator are judged, the indicators are reduced, and then for After the reduction, the indicators that are strongly correlated with the reliability indicators and independent of each other are fitted with BP neural network to fit their nonlinear relationship, and combined with the optimization characteristics of the genetic algorithm to make up for the shortcomings of the neural network method. Specifically include the following steps:

步骤1：从学术、气象或统计网站上收集大量与配电网可靠性相关的数据；Step 1: Collect a large amount of data related to distribution network reliability from academic, meteorological or statistical websites;

步骤2：从众多数据中整理出与配电网可靠性有关的指标取值，也就是整理出一份表征可靠性指标和相关指标对应关系的决策表，其中包括1个表示最终配电网可靠性高低的决策属性(即可靠性指标)和多个表示与可靠性相关的因素的条件属性；Step 2: Sorting out the index values related to the reliability of the distribution network from a large number of data, that is, sorting out a decision table representing the corresponding relationship between the reliability index and related indicators, including one that indicates the reliability of the final distribution network A decision attribute (i.e., a reliability index) with high or low reliability and multiple conditional attributes representing factors related to reliability;

步骤3：对决策表中的数据进行预处理：根据各种属性的所有取值，判断属性的取值是连续的还是离散的，对于连续型属性，需利用数理统计中的知识计算出其应该被划分的数目，并用等距离散法将连续型属性离散化；Step 3: Preprocess the data in the decision table: According to all the values of various attributes, judge whether the value of the attribute is continuous or discrete. For continuous attributes, it is necessary to use the knowledge in mathematical statistics to calculate the value The number to be divided, and the continuous attribute is discretized by the equidistant dispersion method;

步骤4：计算每种属性取到特定离散值时的概率，然后求出每种属性各自的信息熵、条件属性对于决策属性的条件熵，进而求得各种条件属性与决策属性之间、两两条件属性之间的互信息；Step 4: Calculate the probability when each attribute takes a specific discrete value, and then obtain the information entropy of each attribute and the conditional entropy of the conditional attribute for the decision attribute, and then obtain the relationship between various conditional attributes and decision attributes. Mutual information between two conditional attributes;

步骤5：对步骤4中算得的条件属性与决策属性之间的互信息进行归一化，结合信息熵求取条件属性和决策属性之间的熵相关系数，由此判断条件属性与决策属性之间的相关性，熵相关系数越小，则相关性越弱，设置一个合适的临界值来衡量属性之间的相关性，剔除与决策属性相关性弱的条件属性；Step 5: Normalize the mutual information between the condition attribute and the decision attribute calculated in step 4, and combine the information entropy to obtain the entropy correlation coefficient between the condition attribute and the decision attribute, so as to judge the relationship between the condition attribute and the decision attribute. The smaller the entropy correlation coefficient, the weaker the correlation. Set an appropriate critical value to measure the correlation between attributes, and eliminate the conditional attributes that are weakly correlated with decision attributes;

步骤6：与步骤5中的方法类似，计算由步骤5剔除后剩余的条件属性两两间的熵相关系数，筛选出与其余条件属性强相关、与决策属性相关性较弱的冗余条件属性，并删除，得到与可靠性指标强相关且相互独立的条件属性集，达到约简属性的目的；Step 6: Similar to the method in step 5, calculate the entropy correlation coefficient between the remaining conditional attributes eliminated in step 5, and screen out the redundant conditional attributes that are strongly correlated with other conditional attributes and weakly correlated with decision-making attributes , and delete them to obtain conditional attribute sets that are strongly correlated with the reliability index and independent of each other, so as to achieve the purpose of reducing attributes;

步骤7：构造三层BP神经网络对约简后的属性集进行训练，以由步骤6得到的与可靠性指标强相关的条件属性作为输入，以决策属性数据作为输出，求得使拟合误差最小的网络中各层节点之间的连接权重和隐含层、输出层的阈值，得到最佳的BP神经网络模型；为提高训练精度，可以运用遗传算法求得最优的初始权重和阈值。Step 7: Construct a three-layer BP neural network to train the reduced attribute set, take the condition attribute strongly related to the reliability index obtained in step 6 as input, and use the decision attribute data as output to obtain the fitting error The connection weight between the nodes of each layer in the smallest network and the threshold of the hidden layer and the output layer are used to obtain the best BP neural network model; in order to improve the training accuracy, the genetic algorithm can be used to obtain the optimal initial weight and threshold.

在上述技术方案中，所述步骤2包含以下步骤：In the above technical solution, said step 2 includes the following steps:

步骤2.1：根据所收集的与某市配电网可靠性相关的大量数据建立一份m×n的配电网可靠性评估决策表，其中n表示决策属性和条件属性的总个数，对应的决策属性和条件属性构成一组属性数据，m表示属性数据的总组数(即样本数)；Step 2.1: Establish an m×n distribution network reliability evaluation decision table based on a large amount of collected data related to the reliability of a city’s distribution network, where n represents the total number of decision attributes and condition attributes, and the corresponding Decision attributes and conditional attributes constitute a set of attribute data, and m represents the total number of groups of attribute data (that is, the number of samples);

步骤2.2：将决策表中直接表示或决定配电网可靠性的指标作为决策属性，如：供电可靠率，其余与可靠性相关的指标作为条件属性，如：月份、气温、综合电压合格率等。Step 2.2: Use the indicators in the decision table that directly represent or determine the reliability of the distribution network as decision attributes, such as: power supply reliability rate, and other reliability-related indicators as conditional attributes, such as: month, temperature, comprehensive voltage pass rate, etc. .

在上述技术方案中，所述步骤3包含以下步骤：In the above technical solution, said step 3 includes the following steps:

步骤3.1：根据决策表中所有属性的取值，判断属性数据是连续的还是离散的，比如：年份、月份等属性只取到固定的几个整数，是离散型数据，全社会用电量、负荷率、综合电压合格率等属性会取到一个区间中的所有数值，是连续型数据；Step 3.1: According to the values of all attributes in the decision table, determine whether the attribute data is continuous or discrete. For example, attributes such as year and month only take a few fixed integers, which are discrete data. The electricity consumption of the whole society, Attributes such as load rate and comprehensive voltage qualification rate will take all the values in an interval, which are continuous data;

步骤3.2：按照各个因素的数据分布特性及相关客观因素，根据公式(1)计算连续型属性应该被划分的分区数；Step 3.2: According to the data distribution characteristics of each factor and related objective factors, calculate the number of partitions that the continuous attribute should be divided into according to the formula (1);

k＝1.87×(m-1)^2/5 (1)k＝1.87×(m-1)^2/5 (1)

式中，m为属性数据的样本数，k为连续属性值域的分区数；In the formula, m is the sample number of attribute data, and k is the partition number of continuous attribute value range;

步骤3.3：根据步骤3.2中算得的分区数计算连续型属性的区间长度，用等距离散法将连续型属性的值域划分为k个区间，对每个区间赋一个离散的整数值，并计算连续型属性的离散化结果，完成连续数据的离散化。Step 3.3: Calculate the interval length of the continuous attribute according to the number of partitions calculated in step 3.2, divide the value range of the continuous attribute into k intervals with the equidistant dispersion method, assign a discrete integer value to each interval, and calculate The discretization result of continuous attributes completes the discretization of continuous data.

在上述技术方案中，所述步骤4包含以下步骤：In the above technical solution, said step 4 includes the following steps:

步骤4.1：统计每个属性取到每个离散整数值的样本数，根据公式(2)计算属性取到特定离散值时的概率；Step 4.1: Count the number of samples that each attribute takes each discrete integer value, and calculate the probability when the attribute takes a specific discrete value according to formula (2);

式中，k表示属性x的离散化分区数，X_i表示属性x的第i个取值，c(X_i)表示属性x取值为X_i的样本数，U表示总样本即论域，c(U)表示总样本数，p(X_i)表示属性x取值为X_i的概率；In the formula, k represents the number of discretized partitions of attribute x, X_i represents the i-th value of attribute x, c(X_i ) represents the number of samples whose value of attribute x is Xi_i , U represents the total sample, that is, the domain of discourse, c(U) represents the total number of samples, p(X_i )_represents the probability that attribute x takes the value of Xi;

步骤4.2：根据公式(3)、(4)求出每种属性各自的信息熵、条件属性对于决策属性和某一种条件属性对于另一条件属性的条件熵，需要说明的是，此处的信息熵用于度量属性提供的信息量，也表示属性序列的有序化程度，条件熵表示在完全已知某一种属性的前提下，另一种属性的信息量还有多少；Step 4.2: According to the formulas (3) and (4), calculate the respective information entropy of each attribute, the conditional entropy of the conditional attribute for the decision attribute and the conditional entropy of a certain conditional attribute for another conditional attribute. It should be noted that here Information entropy is used to measure the amount of information provided by an attribute, and also indicates the degree of ordering of the attribute sequence. Conditional entropy indicates how much information is left for another attribute under the premise that one attribute is fully known;

式中，H(x)表示属性x的信息熵；In the formula, H(x) represents the information entropy of attribute x;

式中，p(Y_j|X_i)表示在X_i发生的前提下，Y_j发生的概率，H(y|x)表示属性y对于x的条件熵或基于x的y的条件熵；In the formula, p(Y_j |X_i ) represents the probability that Y_j occurs under the premise that X_i occurs, and H(y|x) represents the conditional entropy of attribute y for x or the conditional entropy of y based on x;

步骤4.3：利用步骤4.2的计算结果，根据公式(5)求得每种条件属性与决策属性之间、条件属性两两之间的互信息，以表示这些属性之间共有信息量的大小，Step 4.3: Using the calculation result of step 4.2, according to the formula (5), obtain the mutual information between each conditional attribute and the decision attribute, and between two conditional attributes, to represent the size of the shared information between these attributes,

I(x,y)＝H(y)-H(y|x) (5)I(x,y)=H(y)-H(y|x) (5)

式中，H(y)表示属性y的信息熵，I(x,y)表示属性x与y的互信息，可以认为是属性y与x共有的信息量。In the formula, H(y) represents the information entropy of attribute y, and I(x, y) represents the mutual information of attribute x and y, which can be considered as the amount of information shared by attribute y and x.

在上述技术方案中，所述步骤5包含以下步骤：In the above technical solution, said step 5 includes the following steps:

步骤5.1：为消除量纲影响，利用公式(6)对由步骤4.3算得的条件属性与决策属性的互信息进行归一化，求取熵相关系数值，据此判断条件属性与决策属性之间的相关性，熵相关系数越小，表示相关性越弱，条件属性对于配电网可靠性评估的作用也就越小；Step 5.1: In order to eliminate the impact of dimensions, use the formula (6) to normalize the mutual information of the condition attribute and the decision attribute calculated in step 4.3, and obtain the value of the entropy correlation coefficient, and judge the relationship between the condition attribute and the decision attribute. The smaller the entropy correlation coefficient, the weaker the correlation, and the smaller the effect of the condition attribute on the reliability evaluation of the distribution network;

式中，ρ_xy为属性x与y的熵相关系数，表示x与y的相关程度；In the formula, ρ_xy is the entropy correlation coefficient between attributes x and y, which indicates the degree of correlation between x and y;

步骤5.2：根据步骤5.1中的熵相关系数计算结果设置一个临界值e1，当某种条件属性与决策属性的熵相关系数小于该临界值时，认为该条件属性对于配电网可靠性的影响不大，将其从决策表中剔除。Step 5.2: Set a critical value e1 according to the calculation result of the entropy correlation coefficient in step 5.1. When the entropy correlation coefficient between a conditional attribute and a decision attribute is less than the critical value, it is considered that the conditional attribute has little influence on the reliability of the distribution network. large, remove it from the decision table.

在上述技术方案中，所述步骤6包含以下步骤：In the above technical solution, said step 6 includes the following steps:

步骤6.1：与步骤5中的方法类似，计算由步骤5.2剔除后剩余的条件属性之间的熵相关系数；Step 6.1: Similar to the method in step 5, calculate the entropy correlation coefficient between the remaining condition attributes after being eliminated by step 5.2;

步骤6.2：根据步骤6.1中的熵相关系数计算结果设置一个临界值e2，当两个条件属性的熵相关系数超过这一临界值时，认为这两个属性的相关性很强，可以互相表示，即两个属性对于配电网可靠性的影响是大致相同的，此时要比较这两个条件属性与决策属性之间的熵相关系数，删除掉与决策属性相关性较弱的条件属性，减小属性集的冗余性，得到与可靠性指标强相关且相互独立的条件属性集。Step 6.2: Set a critical value e2 according to the calculation result of the entropy correlation coefficient in step 6.1. When the entropy correlation coefficient of two conditional attributes exceeds this critical value, it is considered that the correlation between the two attributes is very strong and can be expressed mutually. That is to say, the impact of the two attributes on the reliability of the distribution network is roughly the same. At this time, it is necessary to compare the entropy correlation coefficients between the two condition attributes and the decision attributes, delete the condition attributes that are weakly correlated with the decision attributes, and reduce Redundancy of small attribute sets yields conditional attribute sets that are strongly correlated with reliability indices and independent of each other.

在上述技术方案中，所述步骤7包含以下步骤：In the above technical solution, said step 7 includes the following steps:

步骤7.1：构造三层BP神经网络对约简后的属性数据进行训练，以由步骤6.2得到的与可靠性指标强相关的条件属性作为输入，以决策属性作为最终的输出；假设约简后的决策表中有p种条件属性，则输入层和输出层的节点个数分别为p和1；在m组属性数据中随机选择b个测试样本，其余样本作为神经网络的训练样本，样本中包括条件属性和决策属性值，对样本中的数据进行归一化处理；Step 7.1: Construct a three-layer BP neural network to train the reduced attribute data, using the conditional attributes strongly related to the reliability index obtained in step 6.2 as input, and taking the decision attribute as the final output; assuming the reduced There are p kinds of conditional attributes in the decision table, so the number of nodes in the input layer and output layer are p and 1 respectively; b test samples are randomly selected in m groups of attribute data, and the remaining samples are used as training samples of the neural network. The samples include Condition attribute and decision attribute value, normalize the data in the sample;

步骤7.2：用计算机随机生成h组BP神经网络中各层节点的初始连接权重和隐含层、输出层的阈值，将其改写为二进制编码形式，构成初始解空间，结合神经网络计算出解空间中解数据的适应度；选出适应度较大的前c个解数据作为父代解数据，对父代数据进行交叉、变异操作得到子代解空间，根据子代解数据的适应度判断是否收敛，如果是，则寻优停止并输出最优初始权重和阈值，否则，继续进行选择、交叉、变异操作；Step 7.2: Use a computer to randomly generate the initial connection weights of the nodes in each layer of the h group of BP neural networks and the thresholds of the hidden layer and the output layer, rewrite them into binary coded form, and form the initial solution space, and combine the neural network to calculate the solution space The fitness of the middle solution data; select the first c solution data with higher fitness as the parent generation solution data, perform crossover and mutation operations on the parent generation data to obtain the offspring solution space, and judge whether Convergence, if it is, stop the optimization and output the optimal initial weight and threshold, otherwise, continue the selection, crossover, and mutation operations;

步骤7.3：对步骤7.2中算得的初始权重和阈值解码，用BP神经网络训练归一化处理后的样本，得到决策属性的估计值和真实值的误差，判断该误差是否满足收敛条件，若不满足，则调整权重和阈值，继续训练网络；若满足，则停止循环，输出使误差最小的权重和阈值。Step 7.3: Decode the initial weights and thresholds calculated in step 7.2, use the BP neural network to train the normalized samples, obtain the error between the estimated value and the real value of the decision attribute, and judge whether the error satisfies the convergence condition, if not If satisfied, adjust the weights and thresholds to continue training the network; if satisfied, stop the loop and output the weights and thresholds that minimize the error.

与现有技术相比，本发明的有益效果为：Compared with prior art, the beneficial effect of the present invention is:

本发明提出了一种基于互信息和改进BP神经网络的配电网可靠性评估方法，针对大数据背景下出现的大量与配电网可靠性相关的多种数据，基于信息熵基础上的互信息概念和去量纲操作得到熵相关系数值，筛选出与配电网可靠性强相关的指标，结合BP神经网络对这些指标进行建模，并运用遗传算法的寻优特性弥补神经网络初始权重和阈值无法确定的不足，实现了配电网可靠性的全面准确快速评估。The present invention proposes a distribution network reliability evaluation method based on mutual information and improved BP neural network. Aiming at a large number of various data related to distribution network reliability under the background of big data, mutual information based on information entropy The entropy correlation coefficient value is obtained by the information concept and the dimensionless operation, and the indicators that are strongly related to the reliability of the distribution network are screened out, and these indicators are modeled by combining the BP neural network, and the optimization characteristics of the genetic algorithm are used to compensate the initial weight of the neural network And the problem that the threshold cannot be determined, realizes the comprehensive, accurate and rapid assessment of the reliability of the distribution network.

附图说明Description of drawings

图1为基于大数据互信息属性约简的配电网可靠性评估流程图；Figure 1 is a flow chart of distribution network reliability assessment based on big data mutual information attribute reduction;

图2为基于互信息约简配电网可靠性相关指标的流程图。Fig. 2 is a flow chart of reducing distribution network reliability-related indicators based on mutual information.

具体实施方式detailed description

为了使本发明的技术手段、创作特征和目的易于了解，下面对本发明作进一步的阐述。In order to make the technical means, creative features and purpose of the present invention easy to understand, the present invention will be further elaborated below.

参见图1、2，本发明实施例提供一种基于大数据互信息属性约简的配电网可靠性评估方法，依次按照以下步骤进行：Referring to Figures 1 and 2, the embodiment of the present invention provides a distribution network reliability evaluation method based on big data mutual information attribute reduction, which is followed in turn by the following steps:

步骤1：从电力企业内部获取某市的大量配用电数据，从气象、统计等网站上获取与该市配电网可靠性相关的各方面数据；Step 1: Obtain a large amount of power distribution and consumption data of a city from the power company, and obtain all aspects of data related to the reliability of the city's distribution network from websites such as meteorology and statistics;

步骤2：从步骤1所收集的大量数据中整理出一份108×15的配电网可靠性评估决策表，包括1个决策属性——供电可靠率(Y，％)，和14个条件属性——年份(X1)、月份(X2)、全社会用电量(X3，万kWh)、售电量(X4，万kWh)、220kV及以下线损率(X5，％)、负荷率(X6，％)、最大负荷(X7，万kW)、综合电压合格率(X8，％)、月降水量(X9，mm)、月平均气温(X10，℃)、月日照时数(X11，h)、月平均风速(X12，m/s)、月大风日数(X13，日)、月雨日数(X14，日)，共有108组属性数据；Step 2: From the large amount of data collected in step 1, sort out a 108×15 distribution network reliability evaluation decision table, including 1 decision attribute - power supply reliability rate (Y, %), and 14 conditional attributes ——year (X1), month (X2), electricity consumption of the whole society (X3, 10,000 kWh), electricity sales (X4, 10,000 kWh), line loss rate of 220kV and below (X5, %), load rate (X6, %), maximum load (X7, 10,000 kW), comprehensive voltage qualification rate (X8, %), monthly precipitation (X9, mm), monthly average temperature (X10, ℃), monthly sunshine hours (X11, h), Monthly average wind speed (X12, m/s), monthly windy days (X13, day), monthly rainy days (X14, day), a total of 108 sets of attribute data;

步骤3：根据决策表中所有属性的取值，判断属性数据是连续的还是离散的，比如：年份、月份等属性只取到固定的几个整数，是离散型数据，全社会用电量、负荷率、综合电压合格率等属性的值取自某个连续区间，是连续型数据；为便于后面的数据相关性分析，需要对连续型数据进行离散化处理，具体处理方式如下：Step 3: According to the values of all attributes in the decision table, determine whether the attribute data is continuous or discrete. For example, attributes such as year and month only take a few fixed integers, which are discrete data. The electricity consumption of the whole society, The values of attributes such as load rate and comprehensive voltage qualification rate are taken from a continuous interval, which is continuous data; in order to facilitate the subsequent data correlation analysis, continuous data needs to be discretized, and the specific processing method is as follows:

按照各个因素的数据分布特性及相关客观因素，根据公式(1)计算连续型属性应该被划分的分区数；According to the data distribution characteristics of each factor and related objective factors, calculate the number of partitions that should be divided into continuous attributes according to formula (1);

k＝1.87×(m-1)^2/5 (1)k＝1.87×(m-1)^2/5 (1)

式中，m为总样本数，k为连续属性的分区数。In the formula, m is the total number of samples, and k is the number of partitions of continuous attributes.

按照公式(1)算得的分区数m＝1.87×(108-1)^2/5＝12.12，即选择将所有属性全部分成12类，结果见表1；According to the formula (1), the number of partitions m=1.87×(108-1)^2/5 =12.12, that is, choose to divide all attributes into 12 categories, and the results are shown in Table 1;

用等距离散法将连续型属性x的取值划分为k个区间，利用公式(2)计算连续型属性在离散化时的区间长度l_x，并对每个区间赋一个离散的整数值，也就是连续型数据在进行离散化后只取到1,2,...,k这些离散整数值；再根据公式(3)算出该属性每个原始取值对应的离散化结果，完成离散化，离散化结果如表1所示。Divide the value of the continuous attribute x into k intervals by the equidistant dispersion method, use the formula (2) to calculate the interval length l_x of the continuous attribute when it is discretized, and assign a discrete integer value to each interval, That is to say, after the continuous data is discretized, only discrete integer values such as 1, 2, ..., k are obtained; and then the discretization results corresponding to each original value of the attribute are calculated according to the formula (3), and the discretization is completed , the discretization results are shown in Table 1.

式中，max([x])和min([x])分别为属性x中所有取值的最大值和最小值，k为设定的离散化区间数目。In the formula, max([x]) and min([x]) are the maximum value and minimum value of all the values in the attribute x, respectively, and k is the set number of discretization intervals.

式中，x_i表示离散化之前属性x的第i个取值，X_i表示离散化之后与x_i对应的属性x的第i个取值，[x]表示向下取整，即比x小的最大整数。In the formula, x_i represents the i-th value of the attribute x before discretization, Xi_i represents the i-th value of the attribute x corresponding to x_i after discretization, and [x] represents rounding down, that is, compared to x Smallest largest integer.

表1离散化结果Table 1 Discretization results

步骤4：利用步骤3中的离散化结果，统计每个属性取到每个离散整数值的样本数，根据公式(4)计算属性取到特定离散值时的概率；Step 4: Using the discretization results in step 3, count the number of samples that each attribute takes each discrete integer value, and calculate the probability when the attribute takes a specific discrete value according to formula (4);

式中，k表示属性x的离散化分区数，X_i表示属性x的第i个取值，c(X_i)表示属性x取值为X_i的样本数，U表示总样本即论域，c(U)表示总样本数，p(X_i)表示属性x取值为X_i的概率。In the formula, k represents the number of discretized partitions of attribute x, X_i represents the i-th value of attribute x, c(X_i ) represents the number of samples whose value of attribute x is Xi_i , U represents the total sample, that is, the domain of discourse, c(U) represents the total number of samples, and p(X_i ) represents the probability that attribute x takes the value of X_i .

利用以上求得的概率分布，根据公式(5)、(6)分别求出每种属性各自的信息熵、条件属性对于决策属性和某一种条件属性对于另一条件属性的条件熵，需要说明的是，此处的信息熵用于度量属性提供的信息量，也表示属性序列的有序化程度，条件熵表示在完全已知某一种属性的前提下，另一种属性的信息数量；Using the probability distribution obtained above, according to the formulas (5) and (6), respectively calculate the information entropy of each attribute, the conditional attribute for the decision attribute and the conditional entropy of a certain conditional attribute for another conditional attribute. It needs to be explained What is interesting is that the information entropy here is used to measure the amount of information provided by the attribute, and also indicates the degree of ordering of the attribute sequence. The conditional entropy indicates the amount of information of another attribute under the premise that one attribute is completely known;

式中，H(x)表示属性x的信息熵。In the formula, H(x) represents the information entropy of attribute x.

式中，p(Y_j|X_i)表示在X_i发生的前提下，Y_j发生的概率，H(y|x)表示属性y对于x的条件熵或基于x的y的条件熵。In the formula, p(Y_j |X_i ) represents the probability that Y_j occurs under the premise that X_i occurs, and H(y|x) represents the conditional entropy of attribute y for x or the conditional entropy of y based on x.

利用以上计算结果，根据公式(7)求得各种条件属性与决策属性之间、条件属性两两之间的互信息，以计量这些属性之间共有信息量的大小。Using the above calculation results, according to the formula (7), the mutual information between various conditional attributes and decision-making attributes, and between two conditional attributes is obtained, so as to measure the amount of information shared between these attributes.

I(x,y)＝H(y)-H(y|x) (7)I(x,y)=H(y)-H(y|x) (7)

步骤5：为消除量纲影响，利用公式(8)对由步骤4算得的条件属性与决策属性的互信息进行归一化，求取熵相关系数值，据此判断条件属性与决策属性之间的相关性，熵相关系数越小，表示相关性越弱，条件属性对于配电网可靠性评估的作用也就越小；各条件属性x_i(i＝1,2,...,14)与决策属性y之间的熵相关系数见表2；Step 5: In order to eliminate the impact of dimensions, use the formula (8) to normalize the mutual information of the condition attribute and the decision attribute calculated in step 4, and obtain the value of the entropy correlation coefficient, and judge the relationship between the condition attribute and the decision attribute accordingly. The smaller the entropy correlation coefficient, the weaker the correlation, and the smaller the effect of the conditional attributes on the reliability evaluation of the distribution network; each conditional attribute x_i (i=1,2,...,14) The entropy correlation coefficient between y and decision attribute y is shown in Table 2;

式中，ρ_xy为属性x与y的熵相关系数，表示x与y的相关程度。In the formula, ρ_xy is the entropy correlation coefficient between attributes x and y, indicating the degree of correlation between x and y.

表2条件属性与决策属性之间的熵相关系数Table 2 Entropy correlation coefficients between condition attributes and decision attributes

条件属性condition attributeX1X1X2X2X3X3X4X4X5X5X6X6X7X7熵相关系数entropy correlation coefficient0.27700.27700.14880.14880.18590.18590.20270.20270.15130.15130.15780.15780.16360.1636条件属性condition attributeX8X8X9X9X10X10X11X11X12X12X13X13X14X14熵相关系数entropy correlation coefficient0.28740.28740.13530.13530.11120.11120.16450.16450.15690.15690.09470.09470.16520.1652

根据熵相关系数计算结果设置一个临界值e1，当某种条件属性与决策属性的熵相关系数小于该临界值时，认为该条件属性对于配电网可靠性的影响不大，将其从决策表中剔除；由表2看出，这些条件属性与决策属性间最大的熵相关系数不超过0.3，在这里选取e1为0.15，将熵相关系数不超过e1的条件属性去掉，即删去月份X2、月降水量X9、月平均气温X10、月大风日数X13。A critical value e1 is set according to the calculation result of the entropy correlation coefficient. When the entropy correlation coefficient between a conditional attribute and a decision attribute is less than the critical value, it is considered that the conditional attribute has little influence on the reliability of the distribution network, and it is removed from the decision table It can be seen from Table 2 that the maximum entropy correlation coefficient between these conditional attributes and decision attributes does not exceed 0.3, here select e1 as 0.15, and remove the conditional attributes whose entropy correlation coefficient does not exceed e1, that is, delete month X2, Monthly precipitation X9, monthly average temperature X10, monthly number of windy days X13.

步骤6：与步骤5中的方法类似，计算由步骤5剔除后剩余的条件属性相互间的熵相关系数，建立相关矩阵，计算结果如表3所示；Step 6: Similar to the method in step 5, calculate the entropy correlation coefficient between the remaining conditional attributes eliminated by step 5, and establish a correlation matrix. The calculation results are shown in Table 3;

表3主要条件属性相互间的熵相关系数Table 3 Entropy correlation coefficients between main conditional attributes

根据相关矩阵中熵相关系数的取值情况设置一个临界值e2，当两个条件属性的熵相关系数超过这一临界值时，认为这两个属性的相关性很强，可以互相表示，即两个属性对于配电网可靠性的影响是大致相同的，此时要比较这两个条件属性与决策属性之间的熵相关系数，删除掉与决策属性相关性较弱的条件属性，得到与可靠性指标强相关且相互独立的条件属性集，达到属性约简的目的；Set a critical value e2 according to the value of the entropy correlation coefficient in the correlation matrix. When the entropy correlation coefficient of two conditional attributes exceeds this critical value, it is considered that the correlation between the two attributes is very strong and can be expressed mutually, that is, two The impact of each attribute on the reliability of the distribution network is roughly the same. At this time, it is necessary to compare the entropy correlation coefficient between the two condition attributes and the decision attribute, and delete the condition attribute that is weakly correlated with the decision attribute, and obtain the reliability and reliability A set of conditional attributes that are strongly related to each other and independent of each other to achieve the purpose of attribute reduction;

由表3可以看出，X1和X8、X3和X4、X3和X7之间的熵相关系数均超过0.5，在这里选取临界值e2为0.5，又这五个条件属性与决策属性的熵相关系数大小为X8>X1>X4>X3>X7，因此删去相对冗余的条件属性年份X1和全社会用电量X3。It can be seen from Table 3 that the entropy correlation coefficients between X1 and X8, X3 and X4, and X3 and X7 all exceed 0.5. Here, the critical value e2 is selected as 0.5, and the entropy correlation coefficients between these five conditional attributes and decision attributes are The size is X8>X1>X4>X3>X7, so the relatively redundant conditional attribute year X1 and the electricity consumption of the whole society X3 are deleted.

步骤7：构造三层BP神经网络对约简后的属性数据进行训练，以由步骤6得到的与可靠性指标强相关的条件属性作为输入，以决策属性数据作为最终的输出，假设约简后的决策表中有p种条件属性，则输入层和输出层的节点个数分别为p和1；本次算例中共有108组样本数据，从中随机选出8组作为测试样本，其余的100组作为训练样本，样本中包括条件属性和决策属性值，对样本中的数据进行归一化处理；Step 7: Construct a three-layer BP neural network to train the reduced attribute data, take the conditional attribute strongly related to the reliability index obtained in step 6 as input, and use the decision attribute data as the final output, assuming that after the reduction There are p kinds of conditional attributes in the decision table, the number of nodes in the input layer and the output layer are p and 1 respectively; there are 108 groups of sample data in this calculation example, 8 groups are randomly selected as test samples, and the remaining 100 The group is used as a training sample, which includes conditional attributes and decision attribute values, and normalizes the data in the sample;

用计算机随机生成h组BP神经网络中各层节点的初始连接权重和隐含层、输出层的阈值，将其改写为二进制编码形式，构成初始解空间，结合神经网络算出解空间中解数据的适应度；选出适应度较大的前c个解数据作为父代解数据，对父代数据进行交叉、变异操作得到子代解空间，根据子代解数据的适应度判断是否收敛，如果是，则寻优停止并输出最优初始权重和阈值，否则，继续进行选择、交叉、变异操作；Use computer to randomly generate the initial connection weights of each layer node in h group BP neural network and the threshold value of hidden layer and output layer, rewrite them into binary code form to form the initial solution space, and combine the neural network to calculate the solution data in the solution space Adaptability: Select the first c solution data with higher fitness as the parent solution data, perform crossover and mutation operations on the parent data to obtain the offspring solution space, and judge whether to converge according to the fitness of the offspring solution data, if yes , then the optimization stops and outputs the optimal initial weight and threshold, otherwise, continue the selection, crossover, and mutation operations;

对上一步算得的初始权重和阈值解码并输入到神经网络，用BP神经网络训练归一化处理后的100个训练样本，得到决策属性估计值和真实值的误差，判断该误差是否满足收敛条件，若不满足，则调整权重和阈值，继续训练网络；若满足，则停止循环，输出使误差最小的权重和阈值，得到最佳BP网络模型；Decode the initial weights and thresholds calculated in the previous step and input them into the neural network, use the BP neural network to train the normalized 100 training samples, obtain the error between the estimated value and the real value of the decision attribute, and judge whether the error meets the convergence condition , if not satisfied, adjust the weights and thresholds, and continue training the network; if satisfied, stop the cycle, output the weights and thresholds that minimize the error, and obtain the best BP network model;

用训练好的BP神经网络模型对8组测试样本的可靠性进行评估，评估结果与真实值的对比如表4所示，由表4可以看出，评估值与实际值相当接近，最大的绝对误差为0.004，可见，该评估方法的评估效果较好。Use the trained BP neural network model to evaluate the reliability of 8 groups of test samples. The comparison between the evaluation results and the real values is shown in Table 4. It can be seen from Table 4 that the evaluation values are quite close to the actual values, and the largest absolute value is The error is 0.004, it can be seen that the evaluation effect of this evaluation method is better.

表4预测结果Table 4 prediction results

序号serial number真实值actual value预测值Predictive value绝对误差absolute error1199.98999.98999.99099.9900.0010.0012299.97399.97399.97399.9730.0000.0003399.97499.97499.97599.9750.0010.0014499.98999.98999.98599.9850.0040.0045599.99499.99499.99299.9920.0020.0026699.98099.98099.98199.9810.0010.0017799.98899.98899.98799.9870.0010.0018899.98799.98799.98799.9870.0000.000

本说明书中未作详细描述的内容，属于本专业技术人员公知的现有技术。The content not described in detail in this specification belongs to the prior art known to those skilled in the art.