Movatterモバイル変換


[0]ホーム

URL:


CN107169628B - A distribution network reliability assessment method based on big data mutual information attribute reduction - Google Patents

A distribution network reliability assessment method based on big data mutual information attribute reduction
Download PDF

Info

Publication number
CN107169628B
CN107169628BCN201710244420.8ACN201710244420ACN107169628BCN 107169628 BCN107169628 BCN 107169628BCN 201710244420 ACN201710244420 ACN 201710244420ACN 107169628 BCN107169628 BCN 107169628B
Authority
CN
China
Prior art keywords
attribute
decision
condition
entropy
attributes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710244420.8A
Other languages
Chinese (zh)
Other versions
CN107169628A (en
Inventor
李妍
盛梦雨
刘婉兵
杜明秋
杨秉臻
杨晨光
王少荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and TechnologyfiledCriticalHuazhong University of Science and Technology
Priority to CN201710244420.8ApriorityCriticalpatent/CN107169628B/en
Publication of CN107169628ApublicationCriticalpatent/CN107169628A/en
Application grantedgrantedCritical
Publication of CN107169628BpublicationCriticalpatent/CN107169628B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本发明涉及配电网规划领域,提供一种基于大数据互信息属性约简的配电网可靠性评估方法,该方法从大数据出发,利用粗糙集中的互信息概念衡量基本指标之间的相关性,筛选海量多类指标中与可靠性指标强相关且相互独立的指标,以这些指标作为输入,用基于遗传算法的BP神经网络模型开展配电网可靠性评估工作。本发明突破了传统的蒙特卡洛模拟和解析法的局限,针对电力大数据,实现基于大数据互信息属性约简的配电网可靠性评估。

Figure 201710244420

The invention relates to the field of distribution network planning, and provides a distribution network reliability evaluation method based on mutual information attribute reduction of big data. The method starts from big data and uses the concept of mutual information in rough set to measure the correlation between basic indicators The reliability of the distribution network is evaluated by using the BP neural network model based on the genetic algorithm to screen the indicators that are strongly related and independent of the reliability indicators in the massive multi-type indicators. The invention breaks through the limitation of the traditional Monte Carlo simulation and analysis method, and realizes the reliability evaluation of the distribution network based on the reduction of the mutual information attribute of the big data for the electric power big data.

Figure 201710244420

Description

Power distribution network reliability assessment method based on big data mutual information attribute reduction
Technical Field
The invention relates to the field of power distribution network planning, in particular to a power distribution network reliability assessment method based on big data mutual information attribute reduction.
Background
With the development of technologies such as internet, database and the like and the automation of production environment, the fields such as finance, electric power, weather and the like generate massive and various rapidly-growing data, which is called as big data, and nowadays, the big data has penetrated into various fields, becomes an important production factor, and is becoming a new engine for promoting industrial revolution due to the huge utilization value thereof. The big data is mined and analyzed, main information of the big data is extracted and reasonably applied, and the value of the big data can be realized, the reliability of the power distribution network is a technical index strongly related to various factors, wherein the reliability of the power distribution network comprises data in various aspects such as air temperature, air speed, electricity sales, line loss rate and the like. The traditional reliability indexes are generally evaluated by using a plurality of indexes such as load point indexes, power failure time indexes, power failure economic indexes and the like through modeling or sampling simulation, but the analysis method has very large limitation when processing a complex electric power system and has long time consumption caused by state redundancy of a Monte Carlo sampling method, and a big data technology provides a new idea for carrying out reliability evaluation on a power distribution network.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a power distribution network reliability evaluation method based on big data mutual information attribute reduction. The method breaks through the limitations of the traditional Monte Carlo simulation and analysis method, and realizes the reliability evaluation of the power distribution network based on the mutual information attribute reduction of the big data aiming at the big data of the electric power.
The object of the invention is achieved by the following technical measures.
A power distribution network reliability assessment method based on big data mutual information attribute reduction carries out preprocessing on indexes related to power distribution network reliability, and comprises discretization of continuous indexes, mutual information values among the indexes are calculated based on concepts of information entropy, entropy correlation coefficients among the indexes are obtained after dimension removing operation is carried out, accordingly, correlation between each index and the reliability index and correlation between each index are judged, index reduction is carried out, then, BP neural networks are used for fitting nonlinear relations of the indexes which are obtained after reduction and are strongly correlated with the reliability index and mutually independent, and the defects of a neural network method are made up by combining with optimizing characteristics of a genetic algorithm. The method specifically comprises the following steps:
step 1: collecting a large amount of data related to the reliability of the power distribution network from academic, meteorological or statistical websites;
step 2: sorting out index values related to the reliability of the power distribution network from a plurality of data, namely sorting out a decision table for representing the corresponding relation between the reliability indexes and the related indexes, wherein the decision table comprises 1 decision attribute (namely, the reliability index) for representing the final reliability of the power distribution network and a plurality of condition attributes for representing factors related to the reliability;
and step 3: preprocessing data in the decision table: judging whether the value of the attribute is continuous or discrete according to all values of various attributes, calculating the number of the continuous attribute to be divided by using knowledge in mathematical statistics, and discretizing the continuous attribute by using an equidistant dispersion method;
and 4, step 4: calculating the probability of each attribute when the attribute takes a specific discrete value, then solving the respective information entropy of each attribute and the conditional entropy of the conditional attribute to the decision attribute, and further solving the mutual information between each conditional attribute and the decision attribute and between every two conditional attributes;
and 5: normalizing the mutual information between the condition attribute and the decision attribute calculated in the step 4, and solving an entropy correlation coefficient between the condition attribute and the decision attribute by combining the information entropy, so as to judge the correlation between the condition attribute and the decision attribute, wherein the smaller the entropy correlation coefficient is, the weaker the correlation is, a proper critical value is set to measure the correlation between the attributes, and the condition attribute with weak correlation with the decision attribute is removed;
step 6: similar to the method in the step 5, entropy correlation coefficients between every two condition attributes left after being removed in the step 5 are calculated, redundant condition attributes which are strongly correlated with the rest of condition attributes and weaker in correlation with decision attributes are screened out and deleted, and condition attribute sets which are strongly correlated with reliability indexes and are mutually independent are obtained, so that the purpose of reducing the attributes is achieved;
and 7: constructing a three-layer BP neural network to train the reduced attribute set, taking the condition attribute which is obtained in the step 6 and is strongly related to the reliability index as input, taking decision attribute data as output, and solving the connection weight between nodes of each layer in the network which minimizes the fitting error and the threshold values of a hidden layer and an output layer to obtain an optimal BP neural network model; in order to improve the training precision, the optimal initial weight and threshold value can be obtained by using a genetic algorithm.
In the above technical solution, the step 2 includes the following steps:
step 2.1: establishing an m multiplied by n distribution network reliability evaluation decision table according to a large amount of collected data related to the reliability of a certain city distribution network, wherein n represents the total number of decision attributes and condition attributes, the corresponding decision attributes and condition attributes form a group of attribute data, and m represents the total number (namely sample number) of the attribute data;
step 2.2: taking an index directly representing or determining the reliability of the power distribution network in the decision table as a decision attribute, such as: the reliability of power supply and other indexes related to reliability are taken as condition attributes, such as: month, air temperature, integrated voltage yield, etc.
In the above technical solution, the step 3 includes the following steps:
step 3.1: according to the values of all attributes in the decision table, whether the attribute data is continuous or discrete is judged, such as: the attributes such as year, month and the like are only fixed integers and are discrete data, and the attributes such as the power consumption, the load rate, the comprehensive voltage qualification rate and the like of the whole society can obtain all numerical values in an interval and are continuous data;
step 3.2: calculating the number of partitions into which the continuous attribute is to be divided according to the data distribution characteristics of all factors and related objective factors and a formula (1);
k=1.87×(m-1)2/5 (1)
wherein m is the number of samples of the attribute data, and k is the number of partitions of the continuous attribute value range;
step 3.3: and 3.2, calculating the interval length of the continuous attribute according to the number of the partitions calculated in the step 3.2, dividing the value range of the continuous attribute into k intervals by an equidistant dispersion method, assigning a discrete integer value to each interval, calculating the discretization result of the continuous attribute, and completing the discretization of the continuous data.
In the above technical solution, the step 4 includes the following steps:
step 4.1: counting the number of samples of each discrete integer value taken by each attribute, and calculating the probability of taking a specific discrete value by the attribute according to a formula (2);
Figure BDA0001270234590000031
in the formula, k represents the number of discretized partitions of the attribute X, XiThe i-th value, c (X), representing the attribute Xi) The representation attribute X takes the value XiU represents the total sample, i.e. the discourse domain, c (U) represents the total number of samples, p (X)i) The representation attribute X takes the value XiThe probability of (d);
step 4.2: according to the formulas (3) and (4), the respective information entropy of each attribute, the conditional entropy of a conditional attribute to a decision attribute and the conditional entropy of a certain conditional attribute to another conditional attribute are obtained, wherein the information entropy is used for measuring the information quantity provided by the attributes and also representing the ordering degree of the attribute sequence, and the conditional entropy represents the information quantity of another attribute under the premise that the certain attribute is completely known;
Figure BDA0001270234590000032
where h (x) represents the information entropy of attribute x;
Figure BDA0001270234590000041
in the formula, p (Y)j|Xi) Is shown at XiOn the premise of occurrence, YjProbability of occurrence, H (y | x) represents the conditional entropy of attribute y for x or the conditional entropy of y based on x;
step 4.3: using the calculation result of step 4.2, obtaining the mutual information between each condition attribute and decision attribute and between each two condition attributes according to formula (5) to represent the size of the shared information quantity between the attributes,
I(x,y)=H(y)-H(y|x) (5)
in the formula, h (y) represents the information entropy of the attribute y, and I (x, y) represents the mutual information of the attributes x and y, and can be considered as the information amount common to the attributes y and x.
In the above technical solution, the step 5 includes the following steps:
step 5.1: in order to eliminate dimension influence, the formula (6) is utilized to normalize the mutual information of the condition attribute and the decision attribute calculated in the step (4.3) to obtain an entropy correlation coefficient value, and accordingly, the correlation between the condition attribute and the decision attribute is judged, the smaller the entropy correlation coefficient is, the weaker the correlation is, and the smaller the effect of the condition attribute on the reliability evaluation of the power distribution network is;
Figure BDA0001270234590000042
in the formula, ρxyThe entropy correlation coefficient of the attributes x and y represents the correlation degree of x and y;
step 5.2: and (4) setting a critical value e1 according to the calculation result of the entropy correlation coefficient in the step 5.1, and when the entropy correlation coefficient of a certain condition attribute and a decision attribute is smaller than the critical value, considering that the condition attribute has little influence on the reliability of the power distribution network, and removing the condition attribute from the decision table.
In the above technical solution, the step 6 includes the following steps:
step 6.1: similar to the method in step 5, the entropy correlation coefficient between the condition attributes remaining after the elimination in step 5.2 is calculated;
step 6.2: setting a critical value e2 according to the calculation result of the entropy correlation coefficient in step 6.1, when the entropy correlation coefficients of the two condition attributes exceed the critical value, regarding that the correlation of the two attributes is strong, and representing the two attributes mutually, that is, the two attributes have approximately the same influence on the reliability of the power distribution network, at this time, comparing the entropy correlation coefficients between the two condition attributes and the decision attribute, deleting the condition attribute with weak correlation with the decision attribute, reducing the redundancy of the attribute set, and obtaining the condition attribute sets which are strongly correlated with the reliability index and are independent of each other.
In the above technical solution, the step 7 includes the following steps:
step 7.1: constructing a three-layer BP neural network to train the reduced attribute data, taking the condition attribute which is obtained in the step 6.2 and is strongly related to the reliability index as input, and taking the decision attribute as final output; assuming that the reduced decision table has p conditional attributes, the number of nodes of the input layer and the output layer is p and 1 respectively; randomly selecting b test samples from the m groups of attribute data, taking the rest samples as training samples of the neural network, wherein the samples comprise condition attributes and decision attribute values, and carrying out normalization processing on the data in the samples;
step 7.2: randomly generating initial connection weights of nodes of each layer in the h-group BP neural network and thresholds of a hidden layer and an output layer by using a computer, rewriting the initial connection weights and the thresholds into a binary coding form to form an initial solution space, and calculating the fitness of solution data in the solution space by combining the neural network; selecting the first c solution data with larger fitness as parent solution data, performing intersection and mutation operations on the parent data to obtain a child solution space, judging whether convergence occurs or not according to the fitness of the child solution data, if so, optimizing, stopping and outputting the optimal initial weight and threshold, otherwise, continuing the operations of selection, intersection and mutation;
step 7.3: decoding the initial weight and the threshold value calculated in the step 7.2, training the normalized sample by using a BP neural network to obtain the error of the estimated value and the true value of the decision attribute, judging whether the error meets the convergence condition, if not, adjusting the weight and the threshold value, and continuing to train the network; if so, the loop is stopped and the weight and threshold that minimizes the error are output.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a power distribution network reliability evaluation method based on mutual information and an improved BP neural network, which aims at a large amount of various data related to power distribution network reliability appearing in a big data background, obtains entropy-related coefficient values based on mutual information concepts and dimensionless operation on the basis of information entropy, screens out indexes strongly related to the power distribution network reliability, combines the BP neural network to model the indexes, and utilizes the optimizing characteristic of a genetic algorithm to make up the defect that the initial weight and the threshold value of the neural network cannot be determined, thereby realizing the comprehensive, accurate and rapid evaluation of the power distribution network reliability.
Drawings
FIG. 1 is a flow chart of power distribution network reliability assessment based on big data mutual information attribute reduction;
fig. 2 is a flow chart for reducing reliability related indexes of a power distribution network based on mutual information.
Detailed Description
The present invention will be further described below in order to make the technical means, the creation features and the objects of the present invention easy to understand.
Referring to fig. 1 and 2, an embodiment of the present invention provides a power distribution network reliability assessment method based on big data mutual information attribute reduction, which is performed sequentially according to the following steps:
step 1: acquiring a large amount of power distribution and utilization data of a certain city from the interior of a power enterprise, and acquiring data of various aspects related to the reliability of a power distribution network of the city from websites of weather, statistics and the like;
step 2: a 108 × 15 distribution network reliability evaluation decision table is prepared from the large amount of data collected in step 1, and includes 1 decision attribute, i.e., power supply reliability (Y,%), and 14 condition attributes, i.e., year (X1), month (X2), total social power consumption (X3, kWh), power sales (X4, kWh), 220kV and following line loss (X5,%), load rate (X6,%), maximum load (X7, kW), integrated voltage qualification (X8,%), month precipitation (X9, mm), month average air temperature (X10, ° c), month and day lighting hours (X11, h), month average air speed (X12, m/s), month and day wind and day numbers (X13, day), month and day rain and day numbers (X14, day), and there are 108 sets of attribute data;
and step 3: according to the values of all attributes in the decision table, whether the attribute data is continuous or discrete is judged, such as: the attributes such as year, month and the like are only fixed integers and are discrete data, and the values of the attributes such as the power consumption, the load rate, the comprehensive voltage qualification rate and the like of the whole society are taken from a certain continuous interval and are continuous data; in order to facilitate the subsequent data correlation analysis, discretization processing needs to be performed on continuous data, and the specific processing mode is as follows:
calculating the number of partitions into which the continuous attribute is to be divided according to the data distribution characteristics of all factors and related objective factors and a formula (1);
k=1.87×(m-1)2/5 (1)
in the formula, m is the total number of samples, and k is the number of partitions of the continuous attribute.
The number of divisions m, which is calculated according to the formula (1), is 1.87 × (108-1)2/512.12, i.e. choose to divide all attributes into 12 classes, and see table 1 for the results;
dividing the value of the continuous attribute x into k intervals by an equidistant dispersion method, and calculating the interval length l of the continuous attribute in discretization by using a formula (2)xAssigning a discrete integer value to each interval, namely discretizing the continuous data and then only taking the discrete integer values of 1, 2. And calculating a discretization result corresponding to each original value of the attribute according to a formula (3) to complete discretization, wherein the discretization result is shown in table 1.
Figure BDA0001270234590000061
In the formula, max ([ x ]) and min ([ x ]) are respectively the maximum value and the minimum value of all values in the attribute x, and k is the set discretization interval number.
Figure BDA0001270234590000071
In the formula, xiRepresenting the ith value of the attribute X before discretization, XiRepresenting the sum of x after discretizationiCorresponding ith value of attribute x, [ x]Meaning rounded down, i.e. the largest integer smaller than x.
TABLE 1 discretization results
Figure BDA0001270234590000072
And 4, step 4: counting the number of samples of each discrete integer value taken by each attribute by using the discretization result in the step 3, and calculating the probability when the attribute takes a specific discrete value according to a formula (4);
Figure BDA0001270234590000073
in the formula, k represents the number of discretized partitions of the attribute X, XiThe i-th value, c (X), representing the attribute Xi) The representation attribute X takes the value XiU represents the total sample, i.e. the discourse domain, c (U) represents the total number of samples, p (X)i) The representation attribute X takes the value XiThe probability of (c).
Respectively calculating the information entropy of each attribute, the conditional entropy of the conditional attribute to the decision attribute and the conditional entropy of one conditional attribute to another conditional attribute according to formulas (5) and (6) by using the probability distribution obtained above, wherein the information entropy is used for measuring the information quantity provided by the attributes and also representing the ordering degree of the attribute sequence, and the conditional entropy represents the information quantity of another attribute on the premise that one attribute is completely known;
Figure BDA0001270234590000074
in the formula, h (x) represents the information entropy of the attribute x.
Figure BDA0001270234590000081
In the formula, p (Y)j|Xi) Is shown at XiOn the premise of occurrence, YjThe probability of occurrence, H (y | x), represents the conditional entropy of attribute y for x or the conditional entropy of y based on x.
And (3) obtaining mutual information between various condition attributes and decision attributes and between condition attributes in pairs according to a formula (7) by using the calculation results so as to measure the size of the shared information quantity between the attributes.
I(x,y)=H(y)-H(y|x) (7)
In the formula, h (y) represents the information entropy of the attribute y, and I (x, y) represents the mutual information of the attributes x and y, and can be considered as the information amount common to the attributes y and x.
And 5: in order to eliminate dimension influence, normalizing the mutual information of the condition attribute and the decision attribute calculated in the step 4 by using a formula (8) to obtain an entropy correlation coefficient value, and accordingly judging the correlation between the condition attribute and the decision attribute, wherein the smaller the entropy correlation coefficient is, the weaker the correlation is, and the smaller the effect of the condition attribute on the reliability evaluation of the power distribution network is; each condition attribute xiThe entropy correlation coefficient between (i ═ 1, 2.., 14) and decision attribute y is shown in table 2;
Figure BDA0001270234590000082
in the formula, ρxyThe entropy correlation coefficient of the attributes x and y represents the correlation degree of x and y.
TABLE 2 entropy correlation coefficient between conditional and decision attributes
Condition attributesX1X2X3X4X5X6X7
Entropy correlation coefficient0.27700.14880.18590.20270.15130.15780.1636
Condition attributesX8X9X10X11X12X13X14
Entropy correlation coefficient0.28740.13530.11120.16450.15690.09470.1652
Setting a critical value e1 according to the calculation result of the entropy correlation coefficient, and when the entropy correlation coefficient of a certain condition attribute and a decision attribute is smaller than the critical value, considering that the condition attribute has little influence on the reliability of the power distribution network, and removing the condition attribute from the decision table; as shown in Table 2, the maximum entropy correlation coefficient between the condition attributes and the decision attributes does not exceed 0.3, wherein e1 is selected to be 0.15, and the condition attributes with the entropy correlation coefficient not exceeding e1 are removed, namely month X2, month precipitation X9, month average air temperature X10 and month windage number X13 are removed.
Step 6: similar to the method in step 5, entropy correlation coefficients among the condition attributes remaining after being removed in step 5 are calculated, a correlation matrix is established, and the calculation result is shown in table 3;
TABLE 3 entropy correlation coefficient between main conditional attributes
Figure BDA0001270234590000083
Figure BDA0001270234590000091
Setting a critical value e2 according to the value of the entropy correlation coefficient in the correlation matrix, when the entropy correlation coefficient of the two condition attributes exceeds the critical value, considering that the correlation of the two attributes is strong, and expressing the correlation of the two attributes mutually, namely, the two attributes have approximately the same influence on the reliability of the power distribution network, comparing the entropy correlation coefficient between the two condition attributes and the decision attribute, deleting the condition attribute with weak correlation with the decision attribute, obtaining a condition attribute set which is strongly correlated with the reliability index and is independent of each other, and achieving the purpose of attribute reduction;
as can be seen from table 3, the entropy correlation coefficients between X1 and X8, X3 and X4, and between X3 and X7 all exceed 0.5, and the threshold value e2 is selected to be 0.5, and the magnitude of the entropy correlation coefficients of these five condition attributes and decision attributes is X8> X1> X4> X3> X7, so that the relative redundant condition attribute year X1 and the total social power consumption X3 are eliminated.
And 7: constructing a three-layer BP neural network to train the reduced attribute data, taking the condition attribute which is obtained in the step 6 and is strongly related to the reliability index as input, taking decision attribute data as final output, and assuming that p condition attributes exist in a reduced decision table, the number of nodes of an input layer and an output layer is p and 1 respectively; in the present calculation example, 108 groups of sample data are totally obtained, 8 groups of sample data are randomly selected from the sample data as test samples, the rest 100 groups of sample data are used as training samples, the samples comprise condition attributes and decision attribute values, and normalization processing is carried out on the data in the samples;
randomly generating initial connection weights of nodes of each layer in the h-group BP neural network and thresholds of a hidden layer and an output layer by using a computer, rewriting the initial connection weights and the thresholds into a binary coding form to form an initial solution space, and calculating the fitness of solution data in the solution space by combining the neural network; selecting the first c solution data with larger fitness as parent solution data, performing intersection and mutation operations on the parent data to obtain a child solution space, judging whether convergence occurs or not according to the fitness of the child solution data, if so, optimizing, stopping and outputting the optimal initial weight and threshold, otherwise, continuing the operations of selection, intersection and mutation;
decoding the initial weight and the threshold value calculated in the last step and inputting the initial weight and the threshold value into a neural network, training 100 training samples subjected to normalization processing by using a BP neural network to obtain errors of a decision attribute estimated value and a true value, judging whether the errors meet a convergence condition or not, if not, adjusting the weight and the threshold value, and continuing training the network; if so, stopping circulation, and outputting the weight and the threshold value which enable the error to be minimum to obtain an optimal BP network model;
the reliability of 8 groups of test samples is evaluated by using the trained BP neural network model, the comparison between the evaluation result and the true value is shown in Table 4, and as can be seen from Table 4, the evaluation value is quite close to the actual value, the maximum absolute error is 0.004, and therefore, the evaluation effect of the evaluation method is good.
TABLE 4 prediction results
Serial numberTrue valuePrediction valueAbsolute error
199.98999.9900.001
299.97399.9730.000
399.97499.9750.001
499.98999.9850.004
599.99499.9920.002
699.98099.9810.001
799.98899.9870.001
899.98799.9870.000
Details not described in the present specification belong to the prior art known to those skilled in the art.

Claims (1)

1. A power distribution network reliability assessment method based on big data mutual information attribute reduction is characterized by comprising the following steps:
(1) collecting a large amount of data related to the reliability of the power distribution network from academic, meteorological or statistical websites;
(2) a decision table for representing the corresponding relation between the reliability indexes and the related indexes is sorted out from a plurality of data, wherein the decision table comprises 1 decision attribute for representing the reliability of the final power distribution network, namely the reliability index and a plurality of condition attributes for representing factors related to the reliability; the specific mode is as follows:
step one, establishing an m multiplied by n distribution network reliability evaluation decision table according to a large amount of collected data related to the reliability of a certain city distribution network, wherein n represents the total number of decision attributes and condition attributes, the corresponding decision attributes and condition attributes form a group of attribute data, and m represents the total group number of the attribute data, namely the sample number;
step two, taking an index which directly expresses or determines the reliability of the power distribution network in the decision table as a decision attribute, and taking other indexes related to the reliability as condition attributes;
(3) preprocessing data in the decision table: judging whether the value of the attribute is continuous or discrete according to all values of various attributes, calculating the number of the continuous attribute which is to be divided, and discretizing the continuous attribute by an equidistant dispersion method; the specific mode is as follows:
step one, judging whether attribute data is continuous or discrete according to values of all attributes in a decision table;
step two, calculating the number of partitions into which the continuous attribute is to be divided according to the data distribution characteristics of all factors and relevant objective factors and the following formula;
k=1.87×(m-1)2/5
wherein m is the number of samples of the attribute data, and k is the number of partitions of the continuous attribute value range;
calculating the interval length of the continuous attribute according to the calculated partition number, assigning a discrete integer value to each interval, and calculating the discretization result of the continuous attribute to complete the discretization of the continuous data;
(4) calculating the probability of each attribute when the attribute takes a specific discrete value, then calculating the respective information entropy of each attribute and the conditional entropy of the conditional attribute to the decision attribute, further calculating the mutual information between various conditional attributes and the decision attribute and the mutual information between one conditional attribute and another conditional attribute; the specific mode is as follows:
step one, counting the number of samples of each discrete integer value taken by each attribute, calculating the probability of the attribute taking a specific discrete value according to the following formula,
Figure FDA0002968625480000011
in the formula, k represents the number of discretized partitions of the attribute X, XiThe i-th value, c (X), representing the attribute Xi) The representation attribute X takes the value XiU represents the total sample, i.e. the discourse domain, c (U) represents the total number of samples, p (X)i) The representation attribute X takes the value XiThe probability of (d);
step two, solving the respective information entropy of each attribute, the conditional entropy of the conditional attribute to the decision attribute and the conditional entropy of one conditional attribute to another conditional attribute according to the following formulas;
Figure FDA0002968625480000021
where h (x) represents the information entropy of attribute x;
Figure FDA0002968625480000022
in the formula, p (Y)j|Xi) Is shown at XiOn the premise of occurrence, YjProbability of occurrence, H (y | x) represents the conditional entropy of attribute y for x or the conditional entropy of y based on x;
step three, using the calculation result of the step, obtaining mutual information between each condition attribute and the decision attribute and mutual information between one condition attribute and another condition attribute according to the following formula,
I(x,y)=H(y)-H(y|x)
in the formula, h (y) represents the information entropy of the attribute y, and I (x, y) represents the mutual information of the attributes x and y, and is the information amount shared by the attributes y and x;
(5) normalizing the mutual information between the condition attribute and the decision attribute calculated in the step (4), solving an entropy correlation coefficient between the condition attribute and the decision attribute by combining the information entropy, setting a critical value e1 according to the calculation result of the entropy correlation coefficient, and removing the entropy correlation coefficient between a certain condition attribute and the decision attribute from the decision table when the entropy correlation coefficient is smaller than the critical value; the specific mode is as follows:
step one, normalizing the mutual information of the calculated condition attribute and the decision attribute by using the following formula to obtain the entropy correlation coefficient value,
Figure FDA0002968625480000023
in the formula, ρxyThe entropy correlation coefficient of the attributes x and y represents the correlation degree of x and y;
step two, setting a critical value e1 according to the calculation result of the entropy correlation coefficient, and removing the entropy correlation coefficient of a certain condition attribute and a decision attribute from the decision table when the entropy correlation coefficient is smaller than the critical value;
(6) calculating entropy correlation coefficients among the remaining condition attributes after being removed in the step (5), setting a critical value e2 according to the value of the entropy correlation coefficients, judging that the two condition attributes have the same influence on the reliability of the power distribution network when the entropy correlation coefficients of the two condition attributes exceed the critical value, comparing the entropy correlation coefficients between the two condition attributes and the decision attribute at the moment, deleting the condition attribute with small entropy correlation coefficient between the two condition attributes and the decision attribute, and obtaining a reduced condition attribute set;
(7) constructing a three-layer BP neural network to train the reduced condition attribute set, taking the condition attribute obtained in the step (6) as input, taking decision attribute data as output, and solving the connection weight between nodes of each layer in the network which enables the fitting error to be minimum and the thresholds of a hidden layer and an output layer to obtain an optimal BP neural network model; the specific mode is as follows:
constructing a three-layer BP neural network to train the reduced attribute data, taking the obtained condition attribute as input, and taking the decision attribute as final output; assuming that the reduced decision table has p conditional attributes, the number of nodes of the input layer and the output layer is p and 1 respectively; randomly selecting b test samples from the m groups of attribute data, taking the rest samples as training samples of the neural network, wherein the samples comprise condition attributes and decision attribute values, and carrying out normalization processing on the data in the samples;
randomly generating initial connection weights of nodes of each layer in the h groups of BP neural networks and thresholds of a hidden layer and an output layer by using a computer, rewriting the initial connection weights and the thresholds into a binary coding form to form an initial solution space, and calculating the fitness of solution data in the solution space by combining the neural networks; selecting the first c solution data with larger fitness as parent solution data, performing intersection and mutation operations on the parent solution data to obtain a child solution space, judging whether convergence occurs or not according to the fitness of the child solution data, if so, optimizing, stopping and outputting the optimal initial weight and threshold, otherwise, continuing the operations of selection, intersection and mutation;
decoding the initial weight and the threshold value calculated in the previous step, training the normalized sample by using a BP neural network to obtain the error of the estimated value and the true value of the decision attribute, judging whether the error meets the convergence condition, if not, adjusting the weight and the threshold value, and continuing to train the network; if so, the loop is stopped and the weight and threshold that minimizes the error are output.
CN201710244420.8A2017-04-142017-04-14 A distribution network reliability assessment method based on big data mutual information attribute reductionActiveCN107169628B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201710244420.8ACN107169628B (en)2017-04-142017-04-14 A distribution network reliability assessment method based on big data mutual information attribute reduction

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201710244420.8ACN107169628B (en)2017-04-142017-04-14 A distribution network reliability assessment method based on big data mutual information attribute reduction

Publications (2)

Publication NumberPublication Date
CN107169628A CN107169628A (en)2017-09-15
CN107169628Btrue CN107169628B (en)2021-05-07

Family

ID=59849026

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201710244420.8AActiveCN107169628B (en)2017-04-142017-04-14 A distribution network reliability assessment method based on big data mutual information attribute reduction

Country Status (1)

CountryLink
CN (1)CN107169628B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108197822B (en)*2018-01-242022-06-21贵州电网有限责任公司Power distribution network fault line selection adaptability evaluation decision method
CN108665181A (en)*2018-05-182018-10-16中国电力科学研究院有限公司A kind of appraisal procedure and device of distribution network reliability
CN108664752A (en)*2018-05-232018-10-16同济大学A kind of process parameter optimizing method based on process rule and big data analysis technology
CN108846422B (en)*2018-05-282021-08-31中国人民公安大学Account number association method and system across social networks
CN109165819B (en)*2018-08-032021-09-14国网山东省电力公司聊城供电公司Active power distribution network reliability rapid evaluation method based on improved AdaBoost. M1-SVM
CN109242150A (en)*2018-08-152019-01-18中国南方电网有限责任公司超高压输电公司南宁监控中心A kind of electric network reliability prediction technique
CN109636660A (en)*2018-10-222019-04-16广东精点数据科技股份有限公司A kind of agricultural weather data redundancy removing method and system based on comentropy
CN109343367A (en)*2018-10-262019-02-15齐鲁工业大学 A method for predicting and controlling flue gas desulfurization based on neural network
CN109615246B (en)*2018-12-142020-10-23内蒙古电力(集团)有限责任公司内蒙古电力科学研究院分公司Method for determining economic operation state of active power distribution network
CN110142803B (en)*2019-05-282023-02-10上海电力学院Method and device for detecting working state of mobile welding robot system
CN110443320A (en)*2019-08-132019-11-12北京明略软件系统有限公司The determination method and device of event similarity
CN113221442B (en)*2020-12-242022-08-30山东鲁能软件技术有限公司Method and device for constructing health assessment model of power plant equipment
CN113326655A (en)*2021-05-252021-08-31广西电网有限责任公司电力科学研究院Comprehensive evaluation method and device for reliability and economy of radiation type power distribution network
CN113220751A (en)*2021-06-032021-08-06国网江苏省电力有限公司营销服务中心Metering system and evaluation method for multi-source data state quantity
CN113537734B (en)*2021-06-282023-02-03国网福建省电力有限公司经济技术研究院 Energy Data Application Directory Extraction Method Based on Maximum Correlation Minimum Redundancy

Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102879677A (en)*2012-09-242013-01-16西北工业大学Intelligent fault diagnosis method based on rough Bayesian network classifier
CN103488802A (en)*2013-10-162014-01-01国家电网公司EHV (Extra-High Voltage) power grid fault rule mining method based on rough set association rule
CN106503802A (en)*2016-10-202017-03-15上海电机学院A kind of method of utilization genetic algorithm optimization BP neural network system
KR102059472B1 (en)*2018-11-292019-12-30대한민국A System and Method for Prediction of Geomagnetic Disturbance Strength based on Solar Coronal Hole Information

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102879677A (en)*2012-09-242013-01-16西北工业大学Intelligent fault diagnosis method based on rough Bayesian network classifier
CN103488802A (en)*2013-10-162014-01-01国家电网公司EHV (Extra-High Voltage) power grid fault rule mining method based on rough set association rule
CN106503802A (en)*2016-10-202017-03-15上海电机学院A kind of method of utilization genetic algorithm optimization BP neural network system
KR102059472B1 (en)*2018-11-292019-12-30대한민국A System and Method for Prediction of Geomagnetic Disturbance Strength based on Solar Coronal Hole Information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于粗糙集理论的配电网可靠性评估;黄海;《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》;20140331;全文*

Also Published As

Publication numberPublication date
CN107169628A (en)2017-09-15

Similar Documents

PublicationPublication DateTitle
CN107169628B (en) A distribution network reliability assessment method based on big data mutual information attribute reduction
CN115578015A (en) The whole process supervision method, system and storage medium of sewage treatment based on Internet of Things
CN107918830B (en)Power distribution network running state evaluation method based on big data technology
CN104951803B (en) Soft sensor method for aviation fuel dry point in atmospheric distillation column based on dynamic moving window least squares support vector machine
CN114548493B (en) A method and system for predicting current overload of electric energy meter
CN113516417A (en)Service evaluation method and device based on intelligent modeling, electronic equipment and medium
CN106815652A (en)A kind of distribution network reliability Forecasting Methodology based on big data correlation analysis
CN117674119B (en) Power grid operation risk assessment method, device, computer equipment and storage medium
CN113919162B (en)Voltage sag risk early warning method based on simulation and multi-source measured data fusion
CN113891342B (en) Base station inspection method, device, electronic equipment and storage medium
CN112508679A (en)Small and micro enterprise loan risk assessment method and device and storage medium
CN111178585A (en) Prediction method of fault reception volume based on multi-algorithm model fusion
WO2017071369A1 (en)Method and device for predicting user unsubscription
CN113469570A (en)Information quality evaluation model construction method, device, equipment and storage medium
CN118550573B (en) IT operation and maintenance management method and IT operation and maintenance management device
CN111612149A (en) A method, system and medium for main network line state detection based on decision tree
CN115187134A (en)Grid-based power distribution network planning method and device and terminal equipment
CN114692781A (en) A fault imbalance classification method for smart meters based on MSL-XGBoost model
CN116933860A (en) Transient stability assessment model update method, device, electronic equipment and storage medium
CN119671365A (en) A project performance control method, device, equipment, product and storage medium
CN118898313A (en) A coal mine area ecological environment monitoring and early warning method based on intelligent algorithm
CN117973848A (en) Power system supply resilience analysis method, device, computer equipment and medium
CN107256461A (en)A kind of electrically-charging equipment builds address evaluation method and system
CN113743460B (en) Method, device and computer equipment for determining fault cause of power transmission line
CN113205274B (en)Quantitative ranking method for construction quality

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp