Detailed Description
Embodiments of the present invention will be described in further detail below with reference to the accompanying drawings in conjunction with examples.
Referring to fig. 1, a wind-light-gas-storage combined dynamic economic dispatch optimization method based on Q learning is characterized in that: because of the uncertainty and randomness of the output of the wind turbine generator, the output of the solar photovoltaic power generation system, the power load and the natural gas supply quantity, the uncertainty and the randomness of random variables are described by using a multi-target opportunity constraint planning model, and meanwhile, the wind power and the photovoltaic power are smoothed by using a pumped storage unit; and finally, solving the multi-target opportunity constraint planning model by using an improved Q learning algorithm to obtain an optimal solution of the wind-light-gas-storage combined dynamic economic dispatching.
Step 1 in fig. 2 describes the process and method of constructing the random variable data matrix and the associated mathematical model. Acquiring historical and real-time data sets of a wind turbine generator and a photovoltaic power generation system in one year in a certain place from an Energy Management System (EMS), and constructing a wind power and photovoltaic power data matrix of a tth time period of a node i by processing, calculating and analyzing the acquired wind speed of the wind turbine generator of the access system node i in the one day time period t and the acquired illumination intensity of the photovoltaic power generation system of the access system node i in the one day time period t:
wherein: pWT(i, t) representing the power of the node i accessed to the wind power in a time period t, i is 1,2node,nnodeIs the total number of nodes; according to the document [1]]The wind turbine generator outputs active power PWT(v) Weibull distribution of wind speed v, probability density f of wind power outputWT(PWT) Cumulative distribution function F of fan outputWT(PWT) The expression is as follows:
in the formula: v, v,
Respectively wind speed, cut-in wind speed, cut-out wind speed and rated wind speed; mu.s
0、μ
1、μ
2、μ
3The shape distribution parameters related to the wind speed-power curve;
the rated power of the fan.
Since wind speeds are variable and cannot be given specific magnitude at any time, the distribution of the wind speed v can be represented by a weibull distribution:
wherein β and kappa are the shape parameter and the scale parameter respectively, and V is the probability density function of V.
By integrating the relationship between the wind power output power and the wind speed and the Weibull distribution function of the wind speed, the probability density of the wind power output can be known as follows:
therefore, the cumulative distribution function of the fan output can be obtained as follows:
wherein: pPV(i, t) representing the power of the photovoltaic power accessed to the node i in a time period t; according to the document [1]]The photovoltaic power generation system outputs active power PPV(t) probability distribution f of solar radiation intensity GG(G) Probability density f of active power output by photovoltaic power generation systemPV(QPV) Active power distribution function F output by photovoltaic power generation systemPV(QPV) The expression is as follows:
photovoltaic power generation system outputs active power PPV(t) is:
in the formula: pSOCThe maximum output power of the solar photovoltaic panel under the standard operation condition, L (T) is the illumination intensity at the time T, ξ is the power temperature coefficient, Tc(t) is the working temperature of the solar photovoltaic panel at the moment t; t isref(t) is a reference temperature, which has a value of 25 ℃; l isSOCThe solar illumination intensity under the standard operation condition is 1kW/m2。
The photovoltaic power generation system equation can be expressed as:
QPV=GHσ
in the formula: g is the intensity of solar radiation, which is a very random factor, so the probability distribution of G can be represented by the beta distribution function of the probability distribution:
in the formula:
G
max、χ
BETA、ρ
BETAthe maximum deviation value, the average deviation value and the standard deviation value of the solar irradiation intensity are respectively expressed.
From the above equation, Q can be derivedPVThe probability density of (a) is:
therefore, the active power distribution function output by the photovoltaic power generation system can be obtained by integrating the probability density function, and is as follows:
wherein: t is the total number of divided time periods per day, T is 1, 2.., T; n-365, the total number of days of the year, corresponds to a particular date. The number of data samples N is 365 and the number of data sets N is 2.
Historical and real-time data sets of the electricity load and natural gas supply someplace and year are obtained from the energy management system EMS,
processing, calculating and analyzing the acquired power load and natural gas supply quantity to construct a load data matrix of the power load and the natural gas supply quantity in a time period t:
wherein:
characterizing the power load demand during the t-th time period; according to the document [2]]System power load
Probability density function of
The expression is as follows:
in the formula:
is a power load;
respectively, the expected value and standard deviation of the electrical load.
Wherein:
characterizing a natural gas supply amount in a t-th time period; according to the document [2]]Natural gas supply of the system
Probability density function of
The expression is as follows:
in the formula:
supply of natural gas;
respectively, the desired value and standard deviation of the natural gas supply.
Wherein: t is the total number of divided time periods per day, T is 1, 2.., T; n-365, the total number of days of the year, corresponds to a particular date. The number of data samples N is 365 and the number of data sets N is 2.
Step 2 in fig. 2 describes a process and a method for constructing an objective function of a wind-light-gas-storage combined dynamic economic dispatching model of a power grid. The main objective of the grid wind-light-gas-storage combined dynamic economic dispatching is to smooth wind power and photovoltaic output through a pumped storage unit, so that wind power and photovoltaic are merged into a grid as much as possible, and are dispatched together with a thermal power unit and a natural gas pipe network, and the requirements of system electric load and natural gas supply quantity are met. Because the wind power generation and the solar power generation do not consume fuel, the pumped storage unit aims to maximally stabilize the minimum fluctuation of wind and light output and reduce the operation cost of the system to the minimum. The objective function and the constraint condition contain random variables, so a multi-objective opportunity constraint planning model is adopted.
The multi-target opportunity constraint planning model formed by taking the minimum variance of the wind-light-storage combined output power of the power grid and the minimum operation cost of the combined system as targets is specifically expressed as follows:
in the formula:
is an objective function f
jAt a confidence level of α
jMinimum value of (a), wherein f
1Minimum variance of the wind-light-storage combined output power, f
2Indicating that the combined system has the lowest operation cost; p
r{. } represents the probability that the event holds in {. }; t is the number of time periods of the research period; omega
WT、Ω
PVRespectively a node set connected with a fan and a photovoltaic;
respectively performing combined output of wind energy storage and light energy storage in the t-th time period of the node i;
respectively taking the average values of the joint output of the wind energy storage and the light energy storage in T time periods in one day of the node i; p
P(i, t) is the pumping and power generation power of the pumping energy storage unit in the tth time period of the node i; n is a radical of
TP、N
APRespectively the total number of thermal power generating units and the total number of natural gas source nodes; omega
1k、ω
2k、ω
3kThe consumption characteristic curve coefficient of the kth thermal power generating unit is obtained; p
k,TP(t) the output of the kth thermal power generating unit in the t-th time period;
a cost coefficient for supplying natural gas to the natural gas source node l in the t-th time period;
and supplying the flow rate of the natural gas for the natural gas source node l in the t-th time period.
Step 3 in fig. 2 describes a process and a method for constructing constraint conditions of a wind-light-gas-storage combined dynamic economic dispatching model of a power grid. The constraint conditions of the power grid wind-light-gas-storage combined dynamic economic dispatching model comprise: the system comprises a system power balance constraint, a thermal power unit output constraint, a thermal power unit climbing constraint, a line power constraint, a natural gas supply quantity constraint of a natural gas pipeline network gas source point, a storage capacity variation quantity and a storage capacity constraint caused by pumped storage, a pumped/generated power constraint of a pumped storage unit, a system rotation standby constraint and the like.
Each constraint is specifically expressed as follows:
1) system power balance constraint:
because wind turbine generator system output active power, photovoltaic power generation system output active power, power load, natural gas supply volume are random variables, consequently write into the chance constraint form with the above equation, promptly:
in the formula β1Indicating a confidence level that the opportunity constraint holds.
2) Output restraint of the thermal power generating unit:
in the formula:
the minimum value and the maximum value of the output of the kth thermal power generating unit are respectively.
3) And (3) climbing restraint of the thermal power generating unit:
dkΔt≤Pk,TP(t+1)-Pk,TP(t)≤ukΔt
in the formula: dk、ukRespectively determining the descending rate and the ascending rate of the output of the kth thermal power generating unit; Δ t is the duration of a time period.
4) Line power constraint:
in the formula:
respectively, the upper and lower power limits of the line link.
5) Natural gas supply quantity constraint of a natural gas pipeline network gas source point:
in the formula:
and respectively supplying the upper limit and the lower limit of the flow of the natural gas in the t-th time period by the natural gas source node l.
6) Reservoir capacity variation and reservoir capacity constraint caused by pumped storage:
in the formula:
η for the upper and lower reservoirs in time t
P、η
DRespectively representing the pumping efficiency and the power generation efficiency of the pumping energy storage unit;
the minimum storage capacity of the upper and lower reservoirs is respectively;
the maximum storage capacities of the upper and lower reservoirs are respectively.
7) And (3) pumping/generating power constraint of the pumped storage group:
or:
PP(i,t)=0
in the formula:
the minimum and maximum power generation power of the pumped storage unit are respectively;
the minimum and maximum pumping power of the pumped storage unit are respectively.
The pumping and sending balance constraint in one period is as follows:
QP=ηPηDQD
in the formula: qP、QDThe total amount of power generation and water pumping of the water pumping and energy storage unit in one period are respectively.
8) And (3) system rotation standby constraint:
considering the extreme condition that wind power and photovoltaic can not be normally connected to the network, the thermal power generating unit undertakes the rotation of the system for standby, namely:
in the formula: sU(t)、SD(t) positive and negative rotation standby requirements of the system during t period, β2、β3The confidence levels that the positive and negative rotational standby constraints need to be met, respectively.
Step 4 in fig. 2 describes the process and method of chance constraints containing random variables. And converting uncertain factors in the opportunity constraint condition into calculable deterministic factors by adopting a stochastic simulation technology. Wherein: according to the document [3], firstly, a Wblrnd random generator in MATLAB is used for generating a wind speed sample value, an illumination intensity sample value, a power load demand sample value and a natural gas supply demand sample value, wind power output is calculated according to a wind turbine generator output power expression, and photovoltaic output is calculated according to photovoltaic power generation system output power. Then, whether the opportunity constraint is satisfied or not is verified by a method using a stochastic simulation technique, in which the opportunity constraint is applied to the system power balance constraint and the spinning standby constraint, and the method is as follows:
1) obtaining n mutually independent f from the probability distribution f (·)1,f2,…,fnA random variable;
2) respectively calculating F according to known formula1、F2、…、FnA value of (d);
3) let nAIs the number of n mutually independent variables which meet the constraint condition, and can be obtained by the law of large numbers, if and only if nA/n≥βjWhen j is 1,2,3, the frequency n is usedAThe value for which the opportunity constraint holds is estimated as/n.
Step 5 in fig. 2 describes a solving process and method of the grid wind-light-gas-storage combined dynamic economic dispatching model. A Q learning algorithm in the reinforcement learning algorithm is adopted, and a Q value table updating formula in the Q learning algorithm is improved by using a self-adaptive differential evolution algorithm, so that the optimal solution of the power grid wind-light-gas-storage combined dynamic economic dispatching model can be solved.
The method comprises the following specific steps:
1) determining an input state space S
1、S
2: predicting the wind power prediction value P in each time interval
WT(i, t), photovoltaic prediction value P
PV(i, t) as a
status input 1, predicting the power load in each period
Natural gas supply quantity predicted value
As state inputs 2:
discretizing the wind power predicted value into interval form, wherein the length of each interval is delta PWTCan be expressed as:
in the formula
Is the installed capacity of wind power. The wind power predicted value is contained after discretization
Waiting for M intervals;
discretizing the photovoltaic predicted value into interval form, wherein the length of each interval is delta PPVCan be expressed as:
in the formula
The maximum output of the photovoltaic panel. The photovoltaic predicted value after discretization comprises
Waiting for N intervals;
discretizing the predicted power load value into interval form and each interval length
Can be expressed as:
in the formula
Is the maximum power load requirement of the system. The power load predicted value is contained after discretization
Waiting for K intervals;
discretizing the predicted value of the natural gas supply amount into interval forms, wherein the length of each interval
Can be expressed as:
in the formula
The maximum natural gas supply requirement of the system. The natural gas supply quantity predicted value is discretized and then contained
Equal R intervals;
finally, the state input space S1Contains in total M × N × T states, a state input space S2Contains K × R × T states in total, state input space S1、S2Respectively expressed as:
S1={s11,s12,…,s1(M×N×T)},S2={s21,s22,…,s2(M×N×T)}
and uniquely determining the state of the system according to the wind power predicted value, the photovoltaic predicted value, the power load predicted value and the natural gas supply predicted value of the system in the period to which the system belongs.
2) Determining action policy set A1、A2: taking the pumping/generating power of the pumped storage unit in the t-th time period of the node i as anaction strategy 1, taking the output of the kth thermal power unit in the t-th time period and the flow of the natural gas supplied by the natural gas source node l in the t-th time period as an action strategy 2, and respectively discretizing the action strategies into a series of fixed values.
The pumping/generating power of the pumped storage group is respectively from 0 to
Discretized into a fixed values. The pumping/generating power corresponding to each fixed value is respectively as follows:
wherein, y is D, P corresponds to the pumping and generating power of the pumping and storing unit respectively; further consider the case that the pumped-storage group pumped/generated power is 0, and contains 2a +1 fixed values.
Respectively enabling the output of the thermal power generating unit k to be from 0 to
Discretized into b fixed values. The output corresponding to each fixed value is respectively as follows:
further, considering the case where the output is 0, b +1 fixed values are included in total.
The flow rate of supplying natural gas from a natural gas source node l to a natural gas source node l is respectively from 0 to
Discretized into c fixed values. The flow rate corresponding to each fixed value is respectively as follows:
further, considering the case where the supply flow rate is 0, c +1 fixed values are included in total.
Finally, theaction strategy 1 contains 2a +1 combination conditions, and each combination corresponds to one action strategy; the action strategy 2 comprises (b +1) × (c +1) combination conditions, and each combination also corresponds to one action strategy; action policy set A1、A2Respectively expressed as:
A1={a11,a12,…,a1(2a+1)},A2={a21,a22,…,a2[(b+1)×(c+1)]}
3) initializing a Q value table: initial values of all elements in the Q value table in the pre-learning initialization stage are 0, and the Q value table is initialized to be a pre-learning reserved Q value table in online learning;
4) determining the current state, and correspondingly selecting an action strategy: determining acurrent state 1 according to the wind power predicted value and the photovoltaic predicted value of the next time period, and determining a current state 2 according to the power load predicted value and the natural gas supply predicted value of the system of the next time period; further randomly selecting an action strategy corresponding to thestate 1, determining the pumping and generating power of the pumped storage unit according to the selected action strategy, further randomly selecting an action strategy corresponding to the state 2, and determining the output of the thermal power unit and the flow supply of the gas source node according to the selected action strategy;
5) the following time status was observed: acquiring actual power values of wind power and photovoltaic power after the next moment, and solving the pumping and generating power of the pumping energy storage unit according to a preset action strategy; acquiring actual demand values of power load and natural gas supply, and acquiring output power of the thermal power generating unit and flow supply values of gas source nodes according to a preset action strategy;
6) calculation of the reward value: the calculation of the return value is corresponding to the multi-objective function, and then the return value is calculated according to the following formula:
7) and improving a Q value table updating formula by using an adaptive differential evolution algorithm: based on the memory function of the Q value table, the construction process and method of the adaptive mutation operator in the adaptive differential evolution algorithm are utilized to improve the forgetting factor gamma in the Q learning algorithmQTherefore, the method can define the gamma in the improved Q learning algorithmQFor adaptive forgetting factor, adaptive forgetting factor gammaQThe design can be as follows:
in the formula: gamma ray0Represents an initial forgetting factor; k is a radical ofmaxIs the maximum iteration number; k denotes the current iteration number.
The adaptive forgetting factor at the beginning of the algorithm is 2 gamma0The method has a large value, keeps action diversity at the initial stage, avoids falling into local optimum, and gradually reduces forgetting factor along with the progress of algorithm until the forgetting rate at the later stage approaches gamma0The optimal actions are preserved, the optimal action strategy is prevented from being damaged, the probability of selecting the global optimal action strategy set is increased, and in addition, a random range of learning rate α can be designedQ=0.5×[1+rand(0,1)]Thus, the average value of the learning rate is kept at 0.75, which helps to maintain motion diversity during the selection process, taking into account the random variation of all possible motions.
In summary, the Q-value table updating formula of the improved Q-learning algorithm is as follows:
or:
in the formula: sjkRepresents the state in the kth iteration, j is 1, 2; a isjkDenotes the control action taken in the k-th iteration, j ═ 1, 2; qk(sjk,ajk) As a function of the optimal action value Q*Represents a pass state sjkAnd select action ajkThen, obtaining the expected value of the accumulated reward;
8) checking whether the learning process converges: the judgment criterion is that the Q value table converges to an optimum value, or reaches a given learning step number or time. If the convergence time period k is not k +1, go back to step 4).
It is to be understood that: the relevant references described herein are as follows:
document [1 ]: research on a Guojianwei, Xiapanghui, wind-solar energy-storage economic dispatching scheme [ J ] scientific wind, 2019(30):213+ 215;
document [2 ]: the electric-gas interconnection comprehensive energy system based on opportunity constraint planning has the random optimal trend [ J ] of the electric power automation equipment, 2018,38(09):121 + 128;
document [3 ]: li xing, Sun Chunshun, Chenhao, Li Yi, Kudzuvian. research on the optimization operation of water, wind and electricity combined based on stochastic programming [ J ] electric technology, 2013(04): 29-32.
The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily change or replace the present invention within the technical scope of the present invention. Therefore, the protection scope of the present invention is subject to the protection scope of the claims.