wherein: p^WT(i, t) representing the power of the node i accessed to the wind power in a time period t, i is 1,2_node，n_nodeIs the total number of nodes; according to the document [1]]The wind turbine generator outputs active power P^WT(v) Weibull distribution of wind speed v, probability density f of wind power output^WT(P^WT) Cumulative distribution function F of fan output^WT(P^WT) The expression is as follows:

in the formula: v, v,

Respectively wind speed, cut-in wind speed, cut-out wind speed and rated wind speed; mu.s₀、μ₁、μ₂、μ₃The shape distribution parameters related to the wind speed-power curve;

the rated power of the fan.

Since wind speeds are variable and cannot be given specific magnitude at any time, the distribution of the wind speed v can be represented by a weibull distribution:

wherein β and kappa are the shape parameter and the scale parameter respectively, and V is the probability density function of V.

By integrating the relationship between the wind power output power and the wind speed and the Weibull distribution function of the wind speed, the probability density of the wind power output can be known as follows:

therefore, the cumulative distribution function of the fan output can be obtained as follows:

wherein: p^PV(i, t) representing the power of the photovoltaic power accessed to the node i in a time period t; according to the document [1]]The photovoltaic power generation system outputs active power P^PV(t) probability distribution f of solar radiation intensity G_G(G) Probability density f of active power output by photovoltaic power generation system^PV(Q^PV) Active power distribution function F output by photovoltaic power generation system^PV(Q^PV) The expression is as follows:

photovoltaic power generation system outputs active power P^PV(t) is:

in the formula: p^SOCThe maximum output power of the solar photovoltaic panel under the standard operation condition, L (T) is the illumination intensity at the time T, ξ is the power temperature coefficient, T^c(t) is the working temperature of the solar photovoltaic panel at the moment t; t is^ref(t) is a reference temperature, which has a value of 25 ℃; l is^SOCThe solar illumination intensity under the standard operation condition is 1kW/m²。

The photovoltaic power generation system equation can be expressed as:

Q^PV＝GHσ

in the formula: g is the intensity of solar radiation, which is a very random factor, so the probability distribution of G can be represented by the beta distribution function of the probability distribution:

in the formula:

G_max、χ_BETA、ρ_BETA

the maximum deviation value, the average deviation value and the standard deviation value of the solar irradiation intensity are respectively expressed.

From the above equation, Q can be derived^PVThe probability density of (a) is:

therefore, the active power distribution function output by the photovoltaic power generation system can be obtained by integrating the probability density function, and is as follows:

wherein: t is the total number of divided time periods per day, T is 1, 2.., T; n-365, the total number of days of the year, corresponds to a particular date. The number of data samples N is 365 and the number of data sets N is 2.

Historical and real-time data sets of the electricity load and natural gas supply someplace and year are obtained from the energy management system EMS,

processing, calculating and analyzing the acquired power load and natural gas supply quantity to construct a load data matrix of the power load and the natural gas supply quantity in a time period t:

wherein:

characterizing the power load demand during the t-th time period; according to the document [2]]System power load

Probability density function of

The expression is as follows:

in the formula:

is a power load;

respectively, the expected value and standard deviation of the electrical load.

Wherein:

characterizing a natural gas supply amount in a t-th time period; according to the document [2]]Natural gas supply of the system

Probability density function of

The expression is as follows:

in the formula:

supply of natural gas;

respectively, the desired value and standard deviation of the natural gas supply.

Step 2 in fig. 2 describes a process and a method for constructing an objective function of a wind-light-gas-storage combined dynamic economic dispatching model of a power grid. The main objective of the grid wind-light-gas-storage combined dynamic economic dispatching is to smooth wind power and photovoltaic output through a pumped storage unit, so that wind power and photovoltaic are merged into a grid as much as possible, and are dispatched together with a thermal power unit and a natural gas pipe network, and the requirements of system electric load and natural gas supply quantity are met. Because the wind power generation and the solar power generation do not consume fuel, the pumped storage unit aims to maximally stabilize the minimum fluctuation of wind and light output and reduce the operation cost of the system to the minimum. The objective function and the constraint condition contain random variables, so a multi-objective opportunity constraint planning model is adopted.

The multi-target opportunity constraint planning model formed by taking the minimum variance of the wind-light-storage combined output power of the power grid and the minimum operation cost of the combined system as targets is specifically expressed as follows:

in the formula:

is an objective function f_jAt a confidence level of α_jMinimum value of (a), wherein f₁Minimum variance of the wind-light-storage combined output power, f₂Indicating that the combined system has the lowest operation cost; p_r{. } represents the probability that the event holds in {. }; t is the number of time periods of the research period; omega_WT、Ω_PVRespectively a node set connected with a fan and a photovoltaic;

respectively performing combined output of wind energy storage and light energy storage in the t-th time period of the node i;

respectively taking the average values of the joint output of the wind energy storage and the light energy storage in T time periods in one day of the node i; p_P(i, t) is the pumping and power generation power of the pumping energy storage unit in the tth time period of the node i; n is a radical of_TP、N_APRespectively the total number of thermal power generating units and the total number of natural gas source nodes; omega_1k、ω_2k、ω_3kThe consumption characteristic curve coefficient of the kth thermal power generating unit is obtained; p_k,TP(t) the output of the kth thermal power generating unit in the t-th time period;

a cost coefficient for supplying natural gas to the natural gas source node l in the t-th time period;

and supplying the flow rate of the natural gas for the natural gas source node l in the t-th time period.

Step 3 in fig. 2 describes a process and a method for constructing constraint conditions of a wind-light-gas-storage combined dynamic economic dispatching model of a power grid. The constraint conditions of the power grid wind-light-gas-storage combined dynamic economic dispatching model comprise: the system comprises a system power balance constraint, a thermal power unit output constraint, a thermal power unit climbing constraint, a line power constraint, a natural gas supply quantity constraint of a natural gas pipeline network gas source point, a storage capacity variation quantity and a storage capacity constraint caused by pumped storage, a pumped/generated power constraint of a pumped storage unit, a system rotation standby constraint and the like.

Each constraint is specifically expressed as follows:

1) system power balance constraint:

because wind turbine generator system output active power, photovoltaic power generation system output active power, power load, natural gas supply volume are random variables, consequently write into the chance constraint form with the above equation, promptly:

in the formula β₁Indicating a confidence level that the opportunity constraint holds.

2) Output restraint of the thermal power generating unit:

in the formula:

the minimum value and the maximum value of the output of the kth thermal power generating unit are respectively.

3) And (3) climbing restraint of the thermal power generating unit:

d_kΔt≤P_k,TP(t+1)-P_k,TP(t)≤u_kΔt

in the formula: d_k、u_kRespectively determining the descending rate and the ascending rate of the output of the kth thermal power generating unit; Δ t is the duration of a time period.

4) Line power constraint:

in the formula:

respectively, the upper and lower power limits of the line link.

5) Natural gas supply quantity constraint of a natural gas pipeline network gas source point:

in the formula:

and respectively supplying the upper limit and the lower limit of the flow of the natural gas in the t-th time period by the natural gas source node l.

6) Reservoir capacity variation and reservoir capacity constraint caused by pumped storage:

in the formula:

η for the upper and lower reservoirs in time t_P、η_DRespectively representing the pumping efficiency and the power generation efficiency of the pumping energy storage unit;

the minimum storage capacity of the upper and lower reservoirs is respectively;

the maximum storage capacities of the upper and lower reservoirs are respectively.

7) And (3) pumping/generating power constraint of the pumped storage group:

or:

P_P(i,t)＝0

in the formula:

the minimum and maximum power generation power of the pumped storage unit are respectively;

the minimum and maximum pumping power of the pumped storage unit are respectively.

The pumping and sending balance constraint in one period is as follows:

Q_P＝η_Pη_DQ_D

in the formula: q_P、Q_DThe total amount of power generation and water pumping of the water pumping and energy storage unit in one period are respectively.

8) And (3) system rotation standby constraint:

considering the extreme condition that wind power and photovoltaic can not be normally connected to the network, the thermal power generating unit undertakes the rotation of the system for standby, namely:

in the formula: s^U(t)、S^D(t) positive and negative rotation standby requirements of the system during t period, β₂、β₃The confidence levels that the positive and negative rotational standby constraints need to be met, respectively.

Step 4 in fig. 2 describes the process and method of chance constraints containing random variables. And converting uncertain factors in the opportunity constraint condition into calculable deterministic factors by adopting a stochastic simulation technology. Wherein: according to the document [3], firstly, a Wblrnd random generator in MATLAB is used for generating a wind speed sample value, an illumination intensity sample value, a power load demand sample value and a natural gas supply demand sample value, wind power output is calculated according to a wind turbine generator output power expression, and photovoltaic output is calculated according to photovoltaic power generation system output power. Then, whether the opportunity constraint is satisfied or not is verified by a method using a stochastic simulation technique, in which the opportunity constraint is applied to the system power balance constraint and the spinning standby constraint, and the method is as follows:

1) obtaining n mutually independent f from the probability distribution f (·)₁，f₂，…，f_nA random variable;

2) respectively calculating F according to known formula₁、F₂、…、F_nA value of (d);

3) let n_AIs the number of n mutually independent variables which meet the constraint condition, and can be obtained by the law of large numbers, if and only if n_A/n≥β_jWhen j is 1,2,3, the frequency n is used_AThe value for which the opportunity constraint holds is estimated as/n.

Step 5 in fig. 2 describes a solving process and method of the grid wind-light-gas-storage combined dynamic economic dispatching model. A Q learning algorithm in the reinforcement learning algorithm is adopted, and a Q value table updating formula in the Q learning algorithm is improved by using a self-adaptive differential evolution algorithm, so that the optimal solution of the power grid wind-light-gas-storage combined dynamic economic dispatching model can be solved.

The method comprises the following specific steps:

1) determining an input state space S₁、S₂: predicting the wind power prediction value P in each time interval^WT(i, t), photovoltaic prediction value P^PV(i, t) as astatus input 1, predicting the power load in each period

Natural gas supply quantity predicted value

As state inputs 2:

discretizing the wind power predicted value into interval form, wherein the length of each interval is delta P^WTCan be expressed as:

in the formula

Is the installed capacity of wind power. The wind power predicted value is contained after discretization

Waiting for M intervals;

discretizing the photovoltaic predicted value into interval form, wherein the length of each interval is delta P^PVCan be expressed as:

in the formula

The maximum output of the photovoltaic panel. The photovoltaic predicted value after discretization comprises

Waiting for N intervals;

discretizing the predicted power load value into interval form and each interval length

Can be expressed as:

in the formula

Is the maximum power load requirement of the system. The power load predicted value is contained after discretization

Waiting for K intervals;

discretizing the predicted value of the natural gas supply amount into interval forms, wherein the length of each interval

Can be expressed as:

in the formula

The maximum natural gas supply requirement of the system. The natural gas supply quantity predicted value is discretized and then contained

Equal R intervals;

finally, the state input space S₁Contains in total M × N × T states, a state input space S₂Contains K × R × T states in total, state input space S₁、S₂Respectively expressed as:

S₁＝{s₁₁,s₁₂,…,s_1(M×N×T)}，S₂＝{s₂₁,s₂₂,…,s_2(M×N×T)}

and uniquely determining the state of the system according to the wind power predicted value, the photovoltaic predicted value, the power load predicted value and the natural gas supply predicted value of the system in the period to which the system belongs.

2) Determining action policy set A₁、A₂: taking the pumping/generating power of the pumped storage unit in the t-th time period of the node i as anaction strategy 1, taking the output of the kth thermal power unit in the t-th time period and the flow of the natural gas supplied by the natural gas source node l in the t-th time period as an action strategy 2, and respectively discretizing the action strategies into a series of fixed values.

The pumping/generating power of the pumped storage group is respectively from 0 to

Discretized into a fixed values. The pumping/generating power corresponding to each fixed value is respectively as follows:

wherein, y is D, P corresponds to the pumping and generating power of the pumping and storing unit respectively; further consider the case that the pumped-storage group pumped/generated power is 0, and contains 2a +1 fixed values.

Respectively enabling the output of the thermal power generating unit k to be from 0 to

Discretized into b fixed values. The output corresponding to each fixed value is respectively as follows:

further, considering the case where the output is 0, b +1 fixed values are included in total.

The flow rate of supplying natural gas from a natural gas source node l to a natural gas source node l is respectively from 0 to

Discretized into c fixed values. The flow rate corresponding to each fixed value is respectively as follows:

further, considering the case where the supply flow rate is 0, c +1 fixed values are included in total.

Finally, theaction strategy 1 contains 2a +1 combination conditions, and each combination corresponds to one action strategy; the action strategy 2 comprises (b +1) × (c +1) combination conditions, and each combination also corresponds to one action strategy; action policy set A₁、A₂Respectively expressed as:

A₁＝{a₁₁,a₁₂,…,a_1(2a+1)}，A₂＝{a₂₁,a₂₂,…,a_{2[(b+1)×(c+1)]}}

3) initializing a Q value table: initial values of all elements in the Q value table in the pre-learning initialization stage are 0, and the Q value table is initialized to be a pre-learning reserved Q value table in online learning;

4) determining the current state, and correspondingly selecting an action strategy: determining acurrent state 1 according to the wind power predicted value and the photovoltaic predicted value of the next time period, and determining a current state 2 according to the power load predicted value and the natural gas supply predicted value of the system of the next time period; further randomly selecting an action strategy corresponding to thestate 1, determining the pumping and generating power of the pumped storage unit according to the selected action strategy, further randomly selecting an action strategy corresponding to the state 2, and determining the output of the thermal power unit and the flow supply of the gas source node according to the selected action strategy;

5) the following time status was observed: acquiring actual power values of wind power and photovoltaic power after the next moment, and solving the pumping and generating power of the pumping energy storage unit according to a preset action strategy; acquiring actual demand values of power load and natural gas supply, and acquiring output power of the thermal power generating unit and flow supply values of gas source nodes according to a preset action strategy;

6) calculation of the reward value: the calculation of the return value is corresponding to the multi-objective function, and then the return value is calculated according to the following formula:

7) and improving a Q value table updating formula by using an adaptive differential evolution algorithm: based on the memory function of the Q value table, the construction process and method of the adaptive mutation operator in the adaptive differential evolution algorithm are utilized to improve the forgetting factor gamma in the Q learning algorithm_QTherefore, the method can define the gamma in the improved Q learning algorithm_QFor adaptive forgetting factor, adaptive forgetting factor gamma_QThe design can be as follows:

γ_Q＝γ₀×2^θ

in the formula: gamma ray₀Represents an initial forgetting factor; k is a radical of_maxIs the maximum iteration number; k denotes the current iteration number.

The adaptive forgetting factor at the beginning of the algorithm is 2 gamma₀The method has a large value, keeps action diversity at the initial stage, avoids falling into local optimum, and gradually reduces forgetting factor along with the progress of algorithm until the forgetting rate at the later stage approaches gamma₀The optimal actions are preserved, the optimal action strategy is prevented from being damaged, the probability of selecting the global optimal action strategy set is increased, and in addition, a random range of learning rate α can be designed_Q＝0.5×[1+rand(0，1)]Thus, the average value of the learning rate is kept at 0.75, which helps to maintain motion diversity during the selection process, taking into account the random variation of all possible motions.

In summary, the Q-value table updating formula of the improved Q-learning algorithm is as follows:

or:

in the formula: s_jkRepresents the state in the kth iteration, j is 1, 2; a is_jkDenotes the control action taken in the k-th iteration, j ═ 1, 2; q^k(s_jk,a_jk) As a function of the optimal action value Q^*Represents a pass state s_jkAnd select action a_jkThen, obtaining the expected value of the accumulated reward;

8) checking whether the learning process converges: the judgment criterion is that the Q value table converges to an optimum value, or reaches a given learning step number or time. If the convergence time period k is not k +1, go back to step 4).

It is to be understood that: the relevant references described herein are as follows:

document [1 ]: research on a Guojianwei, Xiapanghui, wind-solar energy-storage economic dispatching scheme [ J ] scientific wind, 2019(30):213+ 215;

document [2 ]: the electric-gas interconnection comprehensive energy system based on opportunity constraint planning has the random optimal trend [ J ] of the electric power automation equipment, 2018,38(09):121 + 128;

document [3 ]: li xing, Sun Chunshun, Chenhao, Li Yi, Kudzuvian. research on the optimization operation of water, wind and electricity combined based on stochastic programming [ J ] electric technology, 2013(04): 29-32.

The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily change or replace the present invention within the technical scope of the present invention. Therefore, the protection scope of the present invention is subject to the protection scope of the claims.

Claims

1. A wind-light-gas-storage combined dynamic economic dispatching optimization method based on Q learning is characterized by comprising the following steps: because of the uncertainty and randomness of the output of the wind turbine generator, the output of the solar photovoltaic power generation system, the power load and the natural gas supply quantity, the uncertainty and the randomness of random variables are described by using a multi-target opportunity constraint planning model, and meanwhile, the wind power and the photovoltaic power are smoothed by using a pumped storage unit; and finally, solving the multi-target opportunity constraint planning model by using an improved Q learning algorithm to obtain an optimal solution of the wind-light-gas-storage combined dynamic economic dispatching.

2. The wind-light-gas-storage combined dynamic economic dispatching optimization method based on Q learning according to claim 1, characterized in that: the method comprises the following specific steps:

s1, acquiring historical and real-time data sets of a wind turbine generator, a photovoltaic power generation system, a power load and natural gas supply quantity in a certain place within a period of time, and constructing a random variable data matrix;

s2, constructing a target function of the power grid wind-light-gas-storage combined dynamic economic dispatching model according to the random variable data matrix obtained in the step S1;

s3, continuously constructing a power grid wind-light-gas-storage combined dynamic economic dispatching model by using system power balance constraint, thermal power unit output constraint, thermal power unit climbing constraint, line power constraint, natural gas supply quantity constraint of a natural gas pipe network gas source point, reservoir capacity variable quantity and reservoir capacity constraint caused by pumped storage, pumped storage/power generation power constraint of a pumped storage unit and system rotation standby constraint as constraint conditions:

s4, generating a wind speed sample value, an illumination intensity sample value, a power load demand sample value and a natural gas supply demand sample value by using a Wblrnd random generator in MATLAB, and obtaining wind power output according to the output power of a wind turbine generator and photovoltaic output according to the output power of a photovoltaic power generation system; then, applying opportunity constraint in a system power balance constraint condition and a rotating standby constraint condition;

s5, improving a Q value table updating formula in the Q learning algorithm by adopting a Q learning algorithm and utilizing a self-adaptive differential evolution algorithm, and solving an optimal solution of a power grid wind-light-gas-storage combined dynamic economic dispatching model; the method comprises the following specific steps:

s501, determining an input state space S₁、S₂: predicting the wind power prediction value P in each time interval^WT(i, t), photovoltaic prediction value P^PV(i, t) as a status input 1, predicting the power load in each period

Natural gas supply quantity predicted value

As state inputs 2:

in the formula

The installed capacity of wind power; the wind power predicted value contains (0, delta P) after discretization^WT)、(ΔP^WT，2ΔP^WT)、…、

A total of M intervals;

in the formula

The maximum output of the photovoltaic panel is obtained; the photovoltaic predicted value after discretization contains (0, delta P)^PV)、(ΔP^PV，2ΔP^PV)、…、

A total of N intervals;

Can be expressed as:

in the formula

Is the system maximum power load demand; the power load predicted value is contained after discretization

K intervals in total;

Can be expressed as:

in the formula

The maximum natural gas supply requirement of the system; the natural gas supply quantity predicted value is discretized and then contained

A total of R intervals;

according to the wind power predicted value, the photovoltaic predicted value, the power load predicted value and the natural gas supply predicted value of the time period to which the wind power predicted value, the photovoltaic predicted value, the power load predicted value and the natural gas supply predicted value belong to the interval, the state to which the wind power predicted value, the photovoltaic predicted value, the power load predicted value;

s502, determining an action strategy set A₁、A₂: taking the pumped water/generated power of a pumped water storage unit in the t-th time period of a node i as an action strategy 1, taking the output of a kth thermal power unit in the t-th time period and the flow of natural gas supplied by a natural gas source node l in the t-th time period as an action strategy 2, and respectively discretizing the output and the flow into a series of fixed values;

Discretizing into a fixed values; the pumping/generating power corresponding to each fixed value is respectively as follows:

respectively is the maximum value of the pumping power of the pumping energy storage unit,

the maximum values of the generated power of the pumped storage unit are respectively;

wherein, y is D, P corresponds to the pumping and generating power of the pumping and storing unit respectively; the condition that the pumping/generating power of the pumped storage group is 0 is further considered, and the pumped storage/generating power comprises 2a +1 fixed values;

Discretizing into b fixed values, wherein the output force corresponding to each fixed value is respectively as follows:

b +1 fixed values are contained;

respectively taking the maximum output values of the kth thermal power generating unit;

Discretizing into c fixed values; the flow rate corresponding to each fixed value is respectively as follows:

c +1 fixed values are contained; wherein c is 1,2,3 … …;

supplying natural gas to the natural gas source node l within the t-th time period;

finally, action strategy 1 contains 2a +1 combination cases in total, each groupCombining and corresponding to an action strategy; the action strategy 2 comprises (b +1) × (c +1) combination conditions, and each combination also corresponds to one action strategy; action policy set A₁、A₂Respectively expressed as:

s503, initializing a Q value table: initial values of all elements in the Q value table in the pre-learning initialization stage are 0, and the Q value table is initialized to be a pre-learning reserved Q value table in online learning;

s504, determining a current state 1 according to a wind power predicted value and a photovoltaic predicted value of a next time period, determining a current state 2 according to a system power load predicted value and a natural gas supply amount predicted value of the next time period, then randomly selecting an action strategy corresponding to the state 1, determining pumping and generating power of a pumped storage unit according to the selected action strategy, finally randomly selecting an action strategy corresponding to the state 2, and determining output of a thermal power unit and supply of gas source node flow according to the selected action strategy;

s505, after the next period comes, acquiring actual power values of wind power and photovoltaic power, and solving pumping and generating power of the pumped storage unit according to a preset action strategy; acquiring actual demand values of power load and natural gas supply, and acquiring output power of the thermal power generating unit and flow supply values of gas source nodes according to a preset action strategy;

s506, calculating a return value: the calculation of the return value is corresponding to the multi-objective function, and then the return value is calculated according to the following formula:

respectively outputting combined force of the k iteration wind energy storage and the light energy storage;

respectively taking the average values of the combined output of the k-th iteration wind energy storage and the light energy storage; omega_1,k、ω_2,k、ω_3,kThe consumption characteristic curve coefficient of the thermal power generating unit corresponding to the kth iteration is obtained; p_TP,kOutputting power for the thermal power generating unit corresponding to the kth iteration;

a cost factor for supplying natural gas for the kth iteration;

supplying the flow rate of natural gas for the kth iteration;

s507, improving forgetting factor gamma in Q learning algorithm by utilizing construction process and method of adaptive mutation operator in adaptive differential evolution algorithm based on memory function of Q value table_QTherefore, define Gamma in the improved Q learning algorithm_QFor adaptive forgetting factor, adaptive forgetting factor gamma_QThe following were used:

γ_Q＝γ₀×2^θ(ii) a In the formula: gamma ray₀Represents an initial forgetting factor; k is a radical of_maxIs the maximum iteration number; k represents the current iteration number;

or:

in the formula: s_jkRepresents the state in the kth iteration, j is 1, 2; a is_jkDenotes the control action taken in the k-th iteration, j ═ 1, 2; q^k(s_jk,a_jk) To the optimal action valueFunction Q^*Represents a pass state s_jkAnd select action a_jkThereafter, the expected value of the jackpot is obtained.

3. The wind-light-gas-storage combined dynamic economic dispatching optimization method based on Q learning according to claim 1, characterized in that: the step S1 specifically includes:

acquiring historical and real-time data sets of a wind turbine generator and a photovoltaic power generation system in a certain place within a period of time, and accessing the acquired data to a system node i; and constructing a wind power and photovoltaic power data matrix at the tth time of the node i through the wind speed of the wind turbine generator at the node i in one day time t and the obtained illumination intensity of the photovoltaic power generation system accessed to the node i of the system at the time t in one day:

wherein: p^WT(i, t) representing the power of the node i accessed to the wind power in a time period t, i is 1,2_node，n_nodeIs the total number of nodes; wind turbine generator set outputs active power P^WT(v) Weibull distribution of wind speed v, probability density f of wind power output^WT(P^WT) Cumulative distribution function F of fan output^WT(P^WT) The expression is as follows:

in the formula: v, v,

is the amount of the fanFixing power; the distribution of wind speed v is represented by a weibull distribution:

wherein β and kappa are shape parameter and scale parameter respectively, and V is probability density function of V;

and (2) synthesizing the relation between the wind power output power and the wind speed and the Weibull distribution function of the wind speed to obtain the probability density of the wind power output as follows:

wherein: p^PV(i, t) representing the power of the photovoltaic power accessed to the node i in a time period t; photovoltaic power generation system outputs active power P^PV(t) probability distribution f of solar radiation intensity G_G(G) Probability density f of active power output by photovoltaic power generation system^PV(Q^PV) Active power distribution function F output by photovoltaic power generation system^PV(Q^PV) The expression is as follows:

photovoltaic power generation system outputs active power P^PV(t) is:

in the formula: p^SOCThe maximum output power of the solar photovoltaic panel under the standard operation condition, L (T) is the illumination intensity at the time T, ξ is the power temperature coefficient, T^c(t) is the working temperature of the solar photovoltaic panel at the moment t; t is^ref(t) is a reference temperature, which has a value of 25 ℃; l is^SOCThe solar illumination intensity under the standard operation condition is 1kW/m²(ii) a T is the total number of divided time periods per day, T is 1, 2.., T; n-365, the total number of days of the year, corresponds to a particular date. The number of data samples N is 365, and the number of data sets N is 2;

the photovoltaic power generation system equation can be expressed as: q^PV＝GHσ；

In the formula: h is the area of the photovoltaic panel, and σ is the photovoltaic panel conversion efficiency; g is the solar irradiation intensity, and the probability distribution of G is expressed by adopting a beta distribution function of the probability distribution:

in the formula:

G_max、χ_BETA、ρ_BETArespectively representing the maximum deviation value, the average deviation value and the standard deviation value of the solar irradiation intensity;

from the above equation, Q can be derived^PVThe probability density of (a) is:

wherein,

acquiring historical and real-time data sets of power load and natural gas supply quantity of a certain place in one year from an Energy Management System (EMS), and constructing a load data matrix of the power load and the natural gas supply quantity in a time period t by using the acquired power load and natural gas supply quantity:

wherein:

characterizing the power load demand during the t-th time period; system power load

Probability density function of

The expression is as follows:

in the formula:

is a power load;

respectively, an expected value and a standard deviation of the power load;

characterizing a natural gas supply amount in a t-th time period;

system natural gas supply

Probability density function of

The expression is as follows:

in the formula:

for natural gas supplyMeasuring;

respectively, a desired value and a standard deviation of a natural gas supply; t is the total number of divided time periods per day, T is 1, 2.., T; the number n of data sets is 2.

4. The wind-light-gas-storage combined dynamic economic dispatching optimization method based on Q learning according to claim 1, characterized in that: the step S2 specifically includes:

smoothing wind power and photovoltaic output by a pumped storage unit, aiming at stabilizing wind and photovoltaic output fluctuation to the maximum extent and reducing system operation cost to the minimum extent by the pumped storage unit and adopting a multi-target opportunity constraint planning model;

in the formula:

are respectively provided withThe average value of the joint output of the wind energy storage and the light energy storage in T time periods in one day of the node i is obtained; p_P(i, t) is the pumping and power generation power of the pumping energy storage unit in the tth time period of the node i; n is a radical of_TP、N_APRespectively the total number of thermal power generating units and the total number of natural gas source nodes; omega_1k、ω_2k、ω_3kThe consumption characteristic curve coefficient of the kth thermal power generating unit is obtained; p_k,TP(t) the output of the kth thermal power generating unit in the t-th time period;

5. The wind-light-gas-storage combined dynamic economic dispatching optimization method based on Q learning according to claim 1, characterized in that: the constraints in the step S3 are specifically expressed as follows:

s301, system power balance constraint:

wherein, P_r{. } represents the probability that the event holds in {. }; p^WT(i, t) representing the power of the node i accessed to the wind power in a time period t, i is 1,2_node，n_nodeIs the total number of nodes; : p^PV(i, t) representing the power of the photovoltaic power accessed to the node i in a time period t; p_P(i, t) is the pumping and power generation power of the pumping energy storage unit in the tth time period of the node i;

characterizing power load demand during a t-th time period；

Characterizing a natural gas supply amount in a t-th time period;

in the formula β₁Representing a confidence level that the opportunity constraint is satisfied;

s302, thermal power generating unit output restraint:

in the formula:

the output minimum value and the output maximum value of the kth thermal power generating unit are respectively;

3) and (3) climbing restraint of the thermal power generating unit:

d_kΔt≤P_k,TP(t+1)-P_k,TP(t)≤u_kΔ t; in the formula: d_k、u_kRespectively determining the descending rate and the ascending rate of the output of the kth thermal power generating unit; Δ t is the duration of a time period; p_k,TP(t +1) is the output of the kth thermal power generating unit in the t +1 th time period;

s304, line power constraint:

in the formula:

the upper and lower power limits of the line link respectively; p_linkIs the power of the line link;

s305, natural gas supply quantity constraint of a natural gas pipeline network gas source point:

in the formula:

respectively supplying natural gas from a natural gas source node l to the upper limit and the lower limit of the flow rate of the natural gas in the t-th time period;

supplying the natural gas source node l with the flow of the natural gas in the t-th time period;

s306, reservoir capacity variation and reservoir capacity constraint caused by pumped storage:

in the formula:

the minimum storage capacity of the upper and lower reservoirs is respectively;

the maximum storage capacities of the upper reservoir and the lower reservoir are respectively set;

s307.s pumped storage/power generation power constraint of a pumped storage unit:

or:

P_P(i,t)＝0

in the formula:

the minimum and maximum pumping power of the pumped storage unit are respectively; the pumping and sending balance constraint in one period is as follows:

Q_P＝η_Pη_DQ_D

in the formula: q_P、Q_DThe total amount of power generation and water pumping of the water pumping and energy storage unit in one period are respectively; omega_WT、Ω_PVRespectively a node set connected with a fan and a photovoltaic;

s308, system rotation standby constraint:

in the formula: s^U(t)、S^D(t) positive and negative rotation standby requirements of the system during t period, β₂、β₃Confidence levels that positive and negative rotational standby constraints need to be met, respectively;

respectively taking the maximum output values of the kth thermal power generating unit;P_k,TP(t) the output of the kth thermal power generating unit in the t-th time period; n is a radical of_TPThe total number of the thermal power generating units; k is 1,2, …, N_TP。