CN119813234A

Movatterモバイル変換

Info

Publication number: CN119813234A
Application number: CN202510280495.6A
Authority: CN
Inventors: 成润婷; 林楷东; 史训涛; 刘斯亮; 羿应棋; 张勇军; 赖俊锋
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2025-03-11
Filing date: 2025-03-11
Publication date: 2025-04-11

Abstract

The invention discloses a low-voltage distribution network voltage control method and system based on deep reinforcement learning, comprising the following steps: embedding the trained transducer-SAC model into a low-voltage distribution network, enabling the transducer-SAC model to interact with the environment of the low-voltage distribution network to form an action strategy, issuing the action strategy to an intelligent controller at a switch or a user side for controlling the switching of power of distributed voltage and a reactive compensator, executing actions after receiving commands by distributed power supply equipment and the reactive compensator, and forming new environment data to be fed back to the model for self-optimization. According to the invention, a SAC algorithm is adopted to optimize a voltage control strategy of the low-voltage power distribution network, learning efficiency is improved through experience playback and strategy optimization, the model is fed back with the environment of the low-voltage power distribution network in an interactive way, and an intelligent body continuously optimizes the control strategy and adjusts power grid equipment to realize continuous improvement of a voltage adjusting effect.

Description

Low-voltage distribution network voltage control method and system based on deep reinforcement learning

Technical Field

The invention relates to the technical field of automatic control of power systems, in particular to a low-voltage distribution network voltage control method and system based on deep reinforcement learning.

Background

With the large-scale access of distributed new energy sources, voltage control of a low-voltage power distribution network faces new challenges. Low voltage distribution networks typically carry power demands from domestic, commercial and small industrial users, and these loads have high dynamic variability and uncertainty. Meanwhile, more and more distributed power generation units such as photovoltaics and electric automobiles are connected in, so that the running environment of the low-voltage power distribution network is more complex and unpredictable.

Existing voltage control techniques typically rely on preset rules and empirical models, often exhibiting lower robustness and accuracy in the face of high-dimensional state space and complex network topologies. In particular, in low-voltage distribution networks, due to the high dynamics of network topology and the randomness of power flows, conventional voltage control methods have failed to meet the current operating requirements of the power network.

The reinforcement learning can continuously optimize the control strategy through continuous interaction with the environment, and gradually realize self-adaptive adjustment, so that the reinforcement learning has certain advantages in the low-voltage distribution network environment with complex and uncertain processing. However, reinforcement learning algorithms also face some challenges in practical applications. Firstly, the state space of the distribution network generally has higher dimensionality, the traditional deep reinforcement learning method can be difficult to effectively process the high dimensionality, so that the training efficiency is low and the convergence is slow, and secondly, the topology structure of the low-voltage distribution network is often complex and dynamically changed, so that the control strategy based on the classical model lacks enough flexibility and generalization capability.

The existing multi-time-scale power distribution network voltage optimization control method (such as China patent literature CN 202410803630.6) and the multi-time-scale voltage regulation method (such as China patent literature CN 202411696765.3) based on deep reinforcement learning construct a voltage control model through a conventional deep learning method, complex nonlinear characteristics of a system cannot be captured fully under the uncertainty conditions such as power distribution network topology change and equipment failure when the power distribution network load fluctuates or emergencies are faced, and the voltage regulation effect needs to be further improved.

Disclosure of Invention

The invention aims to provide a low-voltage distribution network voltage control method and system based on deep reinforcement learning, according to the method, a transducer network and an SAC algorithm are introduced, so that the problem that voltage is difficult to adjust due to the fact that new energy is accessed into a low-voltage distribution network in a large scale is solved. Specifically, global features are extracted through a transducer network, and a SAC algorithm is combined to optimize a control strategy, so that the optimal control of the voltage of the low-voltage distribution network is realized. The transform network is used as a powerful sequence modeling tool, can extract effective global features from high-dimensional data, and overcomes the defects of the traditional deep learning method in the process of complex low-voltage power distribution network states. The SAC algorithm enhances the exploration capacity and stability of the system by optimizing strategy entropy and expected return, further improves the accuracy and robustness of voltage control, and can provide a more efficient solution for voltage control of the low-voltage distribution network.

The invention is realized at least by one of the following technical schemes.

The low-voltage power distribution network voltage control method based on deep reinforcement learning comprises the following steps:

Embedding the trained transducer-SAC model into a voltage control system of the low-voltage power distribution network, and enabling the transducer-SAC model to interact with the environment of the low-voltage power distribution network to form an action strategy;

the action strategy is issued to the execution equipment to control the power of the distributed voltage and the switching quantity of the reactive compensator;

The execution device receives the command and then executes the action, the low-voltage distribution network forms a new state variable, and the new state variable is fed back to the transducer-SAC model in real time for optimization;

the Transformer-SAC model and the environment of the low-voltage power distribution network interact to form an action strategy, and the method comprises the following steps:

S1, extracting an associated feature matrix from state variables of a low-voltage power distribution network by using a transducer network, and outputting the feature matrix after the associated feature matrix is processed by the transducer network;

S2, inputting the feature matrix output by the transducer network into a SAC algorithm to generate an action strategy.

Further, in step S1, a transform network is used to extract an associated feature matrix for a state variable of the low-voltage power distribution network, and the associated feature matrix is processed by the transform network to output a feature matrix, which includes the following steps:

s1.1, extracting node voltage, active power load, reactive power load and node voltage sensitivity in a state variable of a low-voltage power distribution network as an associated feature matrix;

s1.2, calculating the attention weight of each attention head by the associated feature matrix through a multi-head attention mechanism;

S1.3, splicing the weights of all attention heads to obtain a spliced feature matrix, and performing linear mapping; adding the spliced feature matrix and the associated feature matrix to realize residual connection, and carrying out normalization processing to obtain a normalized feature matrix;

S1.4, inputting the normalized feature matrix into a feedforward neural network to obtain the feature matrix processed by the feedforward neural network.

Further, in step S2, the SAC algorithm includes:

1) A policy optimization objective for the SAC algorithm, the policy optimization objective being to maximize expected return and policy entropy:

2) Experience playback, namely storing interactive data of the intelligent agent and the environment, wherein the interactive data comprises the state, action and rewards fed back by the environment;

3) UpdatingNetwork by minimizingBelman error update of a functionA network;

4) Updating policy network by maximizing policy entropy sumThe value updates the policy network.

Further, a bonus functionCalculated from the following formula:

;

In the middle ofIs thatThe moment of time voltage out-of-limit penalty value,Is thatThe cost is controlled at the moment of time,Is thatThe moment is used for punishing the power loss of the low-voltage distribution network,Is thatA moment node voltage sensitivity penalty value,、、、Weights of the corresponding sub-items respectively; To at the same timeThe decision action of the moment in time,Is thatState variables of the low-voltage distribution network at the moment.

Further, training of the transducer-SAC model comprises the following steps:

(1) Constructing a virtual environment of the low-voltage power distribution network in a computer based on the historical data of the low-voltage power distribution network, and acquiring a state variable through the virtual environment by a Transformer network to extract an associated feature matrix and generate a new state variable;

(2) Inputting a new state variable into an action strategy obtained by the SAC algorithm and outputting an action;

(3) Executing the action by the execution equipment to change the state of the virtual environment of the low-voltage power distribution network, and feeding back the changed state variable to the transducer-SAC model;

(4) Optimizing the transducer-SAC model by using the changed state variables;

(5) Updating an action strategy by the optimized transducer-SAC model;

Repeating the step (3) -the step (5) for carrying out the learning iteration of the transducer-SAC model until convergence, and completing the training of the transducer-SAC model.

Further, the optimization of the transducer-SAC model includes the optimization of parameters of the transducer network, with the use of a loss function that minimizes the mean square error, back propagation through the chain law, and the optimization of parameters of the updated transducer network by the adaptive moment estimation optimizer.

Further, embedding the trained transducer-SAC model into a voltage control system of the low-voltage power distribution network, and enabling the transducer-SAC model to interact with the environment of the low-voltage power distribution network to form an action strategy, wherein the method comprises the following steps of:

1.1, generating an action strategy based on a trained transducer-SAC model of a real-time power grid state, and issuing an action instruction to execution equipment of a low-voltage power distribution network;

1.2, executing the selected control action by the execution equipment to change the state of the actual low-voltage power distribution network environment, and feeding back the changed power grid state to a trained converter-SAC model;

And 1.3, carrying out feature processing on the input power grid state by a converter network, optimizing the weight and bias items of the converter network, extracting the latest association feature matrix by the converter network according to the low-voltage power distribution network state updated in real time, inputting the latest association feature matrix into a SAC algorithm to generate a control strategy, and completing real-time iterative training of a converter-SAC model and voltage regulation control of the low-voltage power distribution network.

The system for realizing the low-voltage distribution network voltage control method based on deep reinforcement learning comprises a voltage control system of the low-voltage distribution network and a transducer-SAC control module;

The transducer-SAC control module comprises a transducer-SAC model for extracting features from a power grid state and generating a control strategy, wherein a transducer network is used for extracting the features and optimizing input features of the control strategy, and a SAC algorithm is used for generating an optimized control strategy through a strategy network and the control strategyA network to train;

The voltage control system of the low-voltage power distribution network comprises a power distribution network state monitoring system, a central controller and execution equipment;

The power distribution network state monitoring system monitors the state of a power grid in real time through a sensor, and the sensor feeds real-time data back to a transducer-SAC model for evaluating the effect of a control strategy;

The central controller sends a control instruction formed by the transducer-SAC model to the execution equipment through a communication network, the execution equipment adjusts the running state according to the instruction after receiving the control instruction, the environment state in the equipment execution process is fed back to the transducer-SAC model in real time through a sensor, and the transducer-SAC model realizes voltage control and optimization of the transducer network parameters in the transducer-SAC model according to the feedback.

Further, the execution device comprises a reactive compensator for adjusting the reactive power output of the power grid, a distributed power supply device for adjusting the active power output according to the control instructions.

Further, the communication network employs an IEC 61850 standardized communication protocol, and control instructions are transmitted from the central controller to each substation and forwarded by the substation to the device.

The method combines a Transformer network and an SAC algorithm, solves the problem of forward and reverse out-of-limit voltage under the condition that massive new energy is accessed into the low-voltage distribution network, and realizes the optimal control of the low-voltage distribution network voltage. When the low-voltage distribution network voltage optimization problem is processed, the method has obvious advantages in the aspects of global feature extraction, robustness and adaptability, real-time feedback mechanism, topology change adaptability and the like.

Compared with the prior art, the invention has the following beneficial effects:

(1) By introducing a transducer network to perform global feature extraction, the self-attention mechanism of the transducer network is utilized to identify complex dependency relationships among different nodes in the low-voltage power distribution network, so that global information among all nodes is captured when various factors such as voltage, power and load of the low-voltage power distribution network are considered, and the accuracy and adaptability of a voltage control strategy are improved.

(2) According to the invention, entropy regularization is introduced, so that the intelligent agent can balance exploration and utilization in the optimization process, the robustness of the system is improved, and particularly, when the uncertainty of the running condition of the power grid is faced, a relatively stable control effect can be maintained.

(3) Through real-time interactive feedback with the low-voltage distribution network environment, the intelligent agent can dynamically adjust and optimize control decisions according to the instant rewards and the new state. The self-adaptive learning process ensures that the intelligent body can adjust the control strategy in real time under the condition that the running condition of the power grid is continuously changed, thereby ensuring the optimal voltage regulation effect and further ensuring the stable running of the power grid.

Drawings

Fig. 1 is a flowchart of a low-voltage distribution network voltage control method based on deep reinforcement learning according to an embodiment.

FIG. 2 is a graph illustrating the convergence of a bonus function for operation of a low voltage distribution network voltage control system based on deep reinforcement learning according to an embodiment.

FIG. 3 is a graph of loss of a cost function for an embodiment of deep reinforcement learning based low voltage distribution network voltage control system operation.

Fig. 4 is a graph of node voltage out-of-limit for a low voltage distribution network under operation of a low voltage distribution network voltage control system based on deep reinforcement learning according to an embodiment.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1, the low-voltage distribution network voltage control method based on deep reinforcement learning of the embodiment includes the following steps:

The method comprises the steps of embedding a trained transducer-SAC model into a low-voltage distribution network, forming an action strategy by interaction between the transducer-SAC model and the environment of the low-voltage distribution network, issuing the action strategy to an intelligent controller at a switch or a user side for controlling the power of distributed voltage and the switching quantity of reactive compensators, executing actions after receiving commands by distributed power supply equipment and the reactive compensators, forming new environment data, and feeding the new environment data back to the transducer-SAC model for self-optimization in real time.

The action strategy formed by interaction of the transducer-SAC model and the environment of the low-voltage power distribution network comprises the following steps:

S1, extracting an associated feature matrix from state variables of a low-voltage power distribution network by using a transducer network, and outputting the feature matrix after processing by using the transducer network, wherein the method specifically comprises the following steps of:

S1.1 extractionState variable of low-voltage distribution network at momentVoltage of middle nodeActive power loadReactive power loadNode voltage sensitivityAs an associated feature matrix:

;

Wherein,,Representation ofThe real space of the dimensions is such that,The number of nodes is indicated and,As the number of features, the present embodiment has 4 features, namely。

State variable of low-voltage distribution network of this embodimentIncludes node voltageActive power loadReactive power loadNetwork topology connection matrixNode voltage sensitivityDistributed energy source power generationSwitching amount of capacitor。

Wherein the node voltage isThe unit is the per unit value,Is the firstThe individual nodes are at the momentIs used for the voltage value of the (c),The number of nodes is represented, and the value range of each node voltage converted into per unit value is 0.95-1.05.

Active power load isThe unit is kilowatt,Represent the firstThe individual nodes are at the momentTypical base load ranges are [20 kW, 500 kW ].

Reactive power load ofThe unit is kilovolt ampere,Represent the firstThe individual nodes are at the momentIs used for the reactive power load of the (a).

The distributed energy source generating power isThe unit is kilowatt,Represent the firstThe individual distributed power supplies are at timeIs used for generating power by using distributed energy sources,The distributed power source is distributed power source quantity, and the power generation range of the distributed energy source is [0 kW, 250 and kW ].

The switching capacity of the capacitor isThe unit is kilovolt ampere,Representing a capacitorAt the moment of timeReactive switching amount of (3).

Network topology matrixFor representing connection relations between nodes, matrix elementsRepresenting nodesAnd nodeIs connected with the two ends of the connecting rod,0 Represents a nodeAnd nodeAre not connected.

As one embodiment, the connection between the nodes in this embodiment is based on the standard topology of the IEEE 33 node low-voltage distribution network, and the network topology matrixContaining connection information for 33 nodes. Node voltage sensitivity,Is shown at the momentFirst, theThe sensitivity of the individual node voltages to active power changes is calculated at time by the following equationFirst, theSensitivity of individual node voltage to active power variation:

;

Wherein,Is the firstThe individual nodes are atAndThe difference in the per unit value of the voltages at the front and rear time instants,Is the firstThe individual nodes are atAndActive power difference at the front and rear time.

S1.2, associated feature matrixThe attention weight of each attention head is calculated through a multi-head attention mechanism:

;

Wherein the method comprises the steps ofIs the firstThe weight of the attention header is calculated,,Is the number of attention heads; Respectively represent a polling matrix, a key matrix and a value matrix,Representation ofThe real space of the dimensions is such that,Is the dimension of each of the attention heads,Is composed ofInput features of time of dayThe linear transformation is obtained and the linear transformation is performed,;Representing a normalization operation;、、 The weight matrix is a weight matrix of linear transformation and is used as a parameter obtained in a self-adaptive manner in the process of training a transducer network; Representation ofThe real space of the dimensions is such that,Is the feature number.

S1.3, splicing the weights of all the attention heads to obtain a spliced feature matrixAnd performing linear mapping:

;

Wherein,The function transversely splices the output matrixes of the plurality of attention heads along the channel dimension to form a complete spliced characteristic matrix,In order to pay attention to the total number of heads,Is the firstThe number of attention points of the user is that,Is the output weight matrix of the attention head and the spliced feature matrixCorrelation feature matrix with inputAdding to realize residual connection, and processing by normalization:

;

Wherein the method comprises the steps ofIs a normalization operation; Representing the normalized feature matrix.

S1.4, normalizing the feature matrixThrough the feed-forward neural network:

;

Wherein,Is thatThe feature matrix is subjected to residual connection processing,As a matrix of the feedforward neural network processed outputs,、Respectively a weight matrix of different layers of the feedforward network,For the middle layer dimension of the feed-forward network,;Is a first layer of a feedforward network with the size ofIs applied to the linear transformation result of the first layer; is a second layer of a feedforward network with the size ofIs applied to the linear transformation result of the second layer; For activating function, the linear conversion result is represented by a matrix obtained by applying a rectifying linear unit to the linear conversion result and processing the feedforward neural networkAndNormalization:

;

Wherein,Representing the output characteristics of the transducer network.

S2, inputting the feature matrix output by the transducer network into a SAC algorithm to generate an action strategy. The SAC algorithm comprises:

;

Wherein,For the purpose of entropy regularization of the coefficients,For the policy entropy,As a function of the reward,To at the same timeThe decision action of the moment in time,Is a state variable of the low-voltage power distribution network; Is a strategy in reinforcement learning, used to represent rules for taking action on the low voltage distribution network; Represented in state variablesThe probability distribution of the lower selection action; Ensuring the attenuation of the long-term rewards for a discount factor of a desired rewards; Representation based on policyIs a desired value operation of (1); representing policiesIs a target function of (2); Is an infinite step.

2) And experience playback, namely storing interaction data of the intelligent agent and the environment, wherein the interaction data comprise the state, action, rewards and the like fed back by the environment, and in the training process, the intelligent agent interacts with the environment of the low-voltage power distribution network and collects information such as the state, action, rewards and the like fed back by the environment, and the information forms an experience learning sample. In a certain training period, the intelligent agent stores the experience learning samples in an experience playback pool and randomly samples the experience learning samples (i.e. takes the samples from the storage pool for learning), so that the diversity of the samples is increased, and training deviation in the reinforcement learning process is avoided.

3) By minimizingBelman error update of a functionNetwork:

;

Wherein,Is thatThe discount factor of the function,Representation ofTime state variableExecute action downwardsA kind of electronic deviceA value reflecting the expected return obtained after taking the action,Representing next policy actionsIs used as a means for controlling the speed of the vehicle,Is thatError weighted averaging of the network; Is shown inLower pair ofThe network errors are weighted-averaged and,Is thatOutput features of a time-of-day converter networkThe time-state variable is used to determine,Is thatTime of day Transformer network outputThe state variable of the time-dependent variable,Is thatThe rewards of the moment of time,Is thatThe action performed by the agent at the moment,Is thatAction taken by the agent at the moment.

4) Updating policy network by maximizing policy entropy sumValue update policy network:

;

In the formula,As a function of the loss of the policy network,Represented in state variablesA weighted average of the following; To be in policyNext, policy selection based actionsIs used as a reference to the desired value of (a),To be in state variableUnder policySelection actionIs a probability of (2).

Reward functionCalculated from the following formula:

;

The voltage out-of-limit penalty value indicates that the voltage of the penalty node exceeds the allowable range, and the calculation formula is as follows:

;

In the middle ofIn order to be a penalty value,As a reference voltage to be applied to the circuit,In order to allow for the maximum deviation to be achieved,Is thatTime of day (time)The per-unit value of the voltage at each node,Is the number of nodes.

The control cost represents the execution cost of punishment control actions, and the calculation formula is as follows:

;

In the middle ofTo control the movementIs used for the cost factor of (a),Is the total execution cost.

The power loss calculation formula of the punishd low-voltage distribution network is as follows:

;

In the middle ofIs the firstThe power loss of the individual nodes is such that,Is the total power loss value.

The node voltage sensitivity penalty is used for guiding the intelligent agent to adjust the node which has influence on the whole voltage preferentially, and the calculation formula is as follows:

;

In the middle ofIs the firstThe amount of reactive power variation of the individual nodes,Is the firstThe individual nodes are at the momentIs used for the detection of the sensitivity of (a),Is a voltage sensitivity penalty.

The embodiment trains a transducer-SAC model based on load history data, topology parameters, distributed power parameters and capacitor parameters of a low-voltage power distribution network, and comprises the following steps:

(1) And constructing a virtual environment of the low-voltage power distribution network based on the historical data of the low-voltage power distribution network, acquiring a state variable through the virtual environment by the Transformer network, extracting an associated feature matrix, and generating a new state variable.

As one embodiment, pyTorch/TensorFlow is used for constructing a virtual environment of the low-voltage power distribution network, the IEEE 33 node low-voltage power distribution network is used for simulation, and different load fluctuation and distributed energy access conditions are simulated. The low-voltage distribution network historical data is relevant historical data of an actual engineering environment, and comprises load historical data including time sequence, network topology matrix parameters, distributed power supply parameters, capacitor parameters and the like. The simulation environment is formed by mathematical modeling of a topology matrix, historical active load information and reactive load information, and then the simulation environment operation states including voltage and current under different instructions or control strategies given by deep reinforcement learning DRL are obtained through load flow calculation.

The virtual environment is used for providing state variables of the low-voltage distribution network required by a transducer-SAC model and receiving a control strategy generated by the transducer-SAC modelDownward movementIs executed according to the execution result of (a).

(2) Action strategy obtained by SAC algorithm based on initial virtual environment stateThe transducer-SAC model outputs action instructionsAction ofIncluding discrete control actions for adjusting the reactive compensator and continuous control actions for adjusting the generated power of the distributed energy source.

(3) And the reactive compensator and the distributed power supply execute actions to change the state of the virtual environment of the low-voltage power distribution network, and the changed state variable is fed back to the transducer-SAC model for iterative optimization.

(4) The transducer-SAC model is optimized by using the changed state variables, and the training is iterated through loss calculation, back propagation and parameter updating.

(5) And updating the action strategy by the optimized transducer-SAC model.

And (3) repeating the step (5) to perform the learning iteration of the transducer-SAC model until convergence, thereby completing the training of the transducer-SAC model.

The optimization of the transducer-SAC model comprises the parameter optimization of the transducer network, and the parameter optimization of the transducer network is updated by using a loss function which minimizes the mean square error, back propagation through a chain rule and an adaptive moment estimation optimizer.

The loss function is to minimize the mean square error:

Wherein the method comprises the steps ofAs a function of the loss,Is the firstThe characteristic predictive value of each node is calculated,Is the firstThe true value of the characteristics of the individual nodes,Is the number of nodes. The back propagation calculates the gradient of the loss function relative to the transducer network parameters (including query, key, weights of the value matrix, bias) by the chain law:

;

Wherein the method comprises the steps ofThe output of each layer of the transducer network is that the gradient is reversely transferred through each layer of the transducer network, and the gradient of each parameter is calculated; To include、、、、Is a parameter of (a).

And updating parameters of the transducer network according to the calculated gradient optimization by using an adaptive moment estimation optimizer, wherein the optimization rule is as follows:

;

Wherein,AndIs thatThe momentum of the temporal gradient and the square gradient,AndIs thatThe time-of-day offset correction,Is the rate of learning to be performed,Is a constant that prevents divide-by-zero errors,、The momentum decay coefficients of the optimizer are estimated for the two adaptive moments.

Embedding the trained transducer-SAC model into a voltage control system of a low-voltage distribution network, and forming the transducer-SAC model interacting with the environment, wherein the method comprises the following steps of:

1.1, inputting real-time power grid state variables into a trained transducer-SAC model to generate an action strategy, and issuing action instructions to a distributed power supply and a reactive compensator of a low-voltage power distribution network.

And 1.2, the reactive compensator and the distributed power supply execute selected control actions to change the state of the actual low-voltage distribution network environment, and the changed power grid state is fed back to a trained converter-SAC model.

And 1.3, carrying out feature processing on the input power grid state by a converter network, optimizing the weight and bias items of the converter network based on the step 4, extracting the latest association feature matrix by the converter network according to the low-voltage power distribution network state updated in real time, inputting the latest association feature matrix into a SAC algorithm to generate a control strategy, and completing real-time iterative training of a converter-SAC model and voltage regulation control of the low-voltage power distribution network.

The embodiment also provides a system for realizing the low-voltage distribution network voltage control method based on deep reinforcement learning, which comprises a voltage control system of the low-voltage distribution network and a Transformer-SAC control module.

The transducer-SAC control module comprises a transducer-SAC model composed of a transducer network and a SAC algorithm and is used for extracting features from the power grid state and generating a control strategy. The transducer network is used for extracting characteristics and optimizing input characteristics of a control strategy. The SAC algorithm is used for generating an optimized control strategy, and the control strategy is controlled through a strategy network andThe network performs training.

The voltage control system of the low-voltage distribution network comprises:

The power distribution network state monitoring system monitors the state (including voltage, power and the like) of a power grid in real time through a sensor (such as a voltage sensor and a current sensor), and the sensor feeds real-time data back to a transducer-SAC model for evaluating the effect of a control strategy and guiding the next action.

And the central controller is arranged in a control system of the low-voltage power distribution side, acquires a control strategy of the power grid based on a transducer-SAC model, and issues a control instruction through a communication protocol.

And the communication network adopts an IEC 61850 standardized communication protocol, and transmits the control command from the central controller to each substation and forwards the control command from the substation to the execution equipment.

The execution device comprises a reactive compensator for adjusting reactive power output of the power grid and a distributed power supply device for adjusting active power output of the reactive compensator according to a control instruction.

The working flow of the system for realizing the low-voltage distribution network voltage control method based on deep reinforcement learning is as follows:

The distribution side central controller sends control instructions formed by a Transformer-SAC model (the control instructions comprise capacitance switching quantity adjustment quantity of a reactive compensator in the low-voltage power distribution network and active power output quantity of distributed power supply equipment) to all sub-stations of the low-voltage power distribution network through a standardized communication protocol IEC 61850, and the sub-stations serve as intermediate layers and are responsible for forwarding the received control instructions to the corresponding reactive compensator and the distributed power supply equipment;

After receiving the control command, the reactive power compensator and the distributed power equipment adjust the running state according to the command, and change the power output of the distributed power equipment and the capacitance switching quantity of the reactive power compensator;

The transducer-SAC model realizes voltage control according to environmental feedback and self-optimization of transducer parameters in the transducer-SAC model.

The control effect of the invention is shown in the attached drawing of the specification, wherein the rewarding function curve of the transducer-SAC model is shown in fig. 2, the cost function loss curve is shown in fig. 3, and the fact that the transducer-SAC model starts to converge at 82 steps of iteration is shown. Finally, the node voltage control effect is shown in fig. 4, and the node out-of-limit condition disappears after the step 63 iteration.

The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other modifications, substitutions, combinations, and simplifications without departing from the spirit and principles of the present invention should be made in the equivalent manner, and are included in the scope of the present invention.

Claims

Translated fromChinese

1.基于深度强化学习的低压配电网电压控制方法，其特征在于，包括以下步骤：1. A low-voltage distribution network voltage control method based on deep reinforcement learning, characterized in that it includes the following steps:

将训练好的Transformer-SAC模型嵌入低压配电网的电压控制系统中，使Transformer-SAC模型与低压配电网的环境交互形成动作策略；The trained Transformer-SAC model is embedded into the voltage control system of the low-voltage distribution network, so that the Transformer-SAC model interacts with the environment of the low-voltage distribution network to form an action strategy;

动作策略下发至执行设备，用以控制分布电压的功率和无功补偿器的投切数量；The action strategy is sent to the execution device to control the power of the distributed voltage and the number of switching on and off of the reactive power compensator;

执行设备接收命令后执行动作，低压配电网形成新的状态变量，新的状态变量实时反馈给Transformer-SAC模型进行优化；After receiving the command, the execution device executes the action, and the low-voltage distribution network forms new state variables, which are fed back to the Transformer-SAC model in real time for optimization;

所述Transformer-SAC模型与低压配电网的环境交互形成动作策略，包括以下步骤：The Transformer-SAC model interacts with the environment of the low-voltage distribution network to form an action strategy, including the following steps:

S1、采用Transformer网络对低压配电网的状态变量提取关联特征矩阵，关联特征矩阵经过Transformer网络处理后输出特征矩阵；S1. Use the Transformer network to extract the associated feature matrix from the state variables of the low-voltage distribution network. After the associated feature matrix is processed by the Transformer network, the feature matrix is output;

S2、将Transformer网络输出的特征矩阵输入SAC算法，生成动作策略。S2. Input the feature matrix output by the Transformer network into the SAC algorithm to generate an action strategy.

2.根据权利要求1所述的基于深度强化学习的低压配电网电压控制方法，其特征在于，步骤S1具体包括以下步骤：2. According to the low-voltage distribution network voltage control method based on deep reinforcement learning according to claim 1, it is characterized in that step S1 specifically comprises the following steps:

S1.1、提取低压配电网的状态变量中的节点电压、有功功率负荷、无功功率负荷、节点电压灵敏度作为关联特征矩阵；S1.1, extracting node voltage, active power load, reactive power load, and node voltage sensitivity from the state variables of the low-voltage distribution network as a correlation feature matrix;

S1.2、关联特征矩阵通过多头注意力机制计算每个注意力头的注意力权重；S1.2, the associated feature matrix calculates the attention weight of each attention head through the multi-head attention mechanism;

S1.3、将所有注意力头的权重拼接得到拼接特征矩阵，并进行线性映射；拼接特征矩阵与关联特征矩相加以实现残差连接，再通过归一化进行处理，得到归一化后的特征矩阵；S1.3, concatenate the weights of all attention heads to obtain a concatenated feature matrix, and perform linear mapping; add the concatenated feature matrix to the associated feature moment to realize residual connection, and then process it through normalization to obtain a normalized feature matrix;

S1.4、将归一化后的特征矩阵通输入前馈神经网络得到前馈神经网络处理后的特征矩阵。S1.4. Input the normalized feature matrix into the feedforward neural network to obtain the feature matrix processed by the feedforward neural network.

3.根据权利要求2所述的基于深度强化学习的低压配电网电压控制方法，其特征在于，步骤S2所述SAC算法包括：3. The low-voltage distribution network voltage control method based on deep reinforcement learning according to claim 2 is characterized in that the SAC algorithm described in step S2 includes:

SAC算法的策略优化目标：所述策略优化目标为最大化期望回报和策略熵：The strategy optimization goal of the SAC algorithm is to maximize the expected return and strategy entropy:

经验回放：存储智能体与环境的交互数据，交互数据包括环境反馈的状态、动作、奖励；Experience replay: stores the interaction data between the agent and the environment, including the state, action, and reward of the environment feedback;

更新网络：通过最小化函数的贝尔曼误差更新网络；renew Network: By minimizing Bellman error update of a function network;

更新策略网络，通过最大化策略熵和值更新策略网络。Update the policy network by maximizing the policy entropy and Value update policy network.

4.根据权利要求3所述的基于深度强化学习的低压配电网电压控制方法，其特征在于，所述SAC算法采用的奖励函数为：4. The low-voltage distribution network voltage control method based on deep reinforcement learning according to claim 3 is characterized in that the reward function adopted by the SAC algorithm is for:

； ;

式中为时刻电压越限惩罚值，为时刻控制成本，为时刻用于惩罚低压配电网的功率损耗，为时刻节点电压灵敏度惩罚值，、、、分别为对应的子项的权重；为在时刻的决策动作，为时刻低压配电网的状态变量。In the formula for The voltage over-limit penalty value at the moment, for Always control costs, for It is always used to punish the power loss of low voltage distribution network. for The node voltage sensitivity penalty value at the moment, , , , are the weights of the corresponding sub-items respectively; For Moment-by-moment decision making, for State variables of the low-voltage distribution network at each moment.

5.根据权利要求1所述的基于深度强化学习的低压配电网电压控制方法，其特征在于，Transformer-SAC模型的训练，包括以下步骤：5. The low-voltage distribution network voltage control method based on deep reinforcement learning according to claim 1 is characterized in that the training of the Transformer-SAC model comprises the following steps:

（1）、基于低压配电网历史数据在计算机中构建低压配电网的虚拟环境，Transformer网络通过虚拟环境获取状态变量提取关联特征矩阵，生成新的状态变量；(1) Based on the historical data of the low-voltage distribution network, a virtual environment of the low-voltage distribution network is constructed in the computer. The Transformer network obtains state variables through the virtual environment to extract the associated feature matrix and generate new state variables;

（2）、新的状态变量输入SAC算法获得的动作策略并输出动作；(2) The new state variable is input into the action strategy obtained by the SAC algorithm and the action is output;

（3）、执行设备执行动作，使低压配电网的虚拟环境的状态发生变化，变化后的状态变量反馈给Transformer-SAC模型；(3) The execution device performs actions to change the state of the virtual environment of the low-voltage distribution network, and the changed state variables are fed back to the Transformer-SAC model;

（4）、Transformer-SAC模型使用变化后的状态变量进行优化；(4) The Transformer-SAC model uses the changed state variables for optimization;

（5）、优化后的Transformer-SAC模型更新动作策略；(5) Update action strategy of the optimized Transformer-SAC model;

重复步骤（3）-步骤（5）进行Transformer-SAC模型学习迭代直至收敛，完成Transformer-SAC模型训练。Repeat steps (3) to (5) to iterate the Transformer-SAC model learning until convergence, completing the Transformer-SAC model training.

6.根据权利要求1~5任一项所述的基于深度强化学习的低压配电网电压控制方法，其特征在于，Transformer-SAC模型进行优化包括Transformer网络的参数优化，使用最小化均方误差的损失函数、反向传播通过链式法则、自适应矩估计优化器优化更新Transformer网络的参数。6. The low-voltage distribution network voltage control method based on deep reinforcement learning according to any one of claims 1 to 5 is characterized in that the optimization of the Transformer-SAC model includes parameter optimization of the Transformer network, using a loss function that minimizes the mean square error, back propagation through the chain rule, and an adaptive moment estimation optimizer to optimize and update the parameters of the Transformer network.

7.根据权利要求6所述的基于深度强化学习的低压配电网电压控制方法，其特征在于，将训练好的Transformer-SAC模型嵌入低压配电网的电压控制系统中，使Transformer-SAC模型与低压配电网的环境交互形成动作策略，包括以下步骤：7. The low-voltage distribution network voltage control method based on deep reinforcement learning according to claim 6 is characterized in that the trained Transformer-SAC model is embedded in the voltage control system of the low-voltage distribution network, so that the Transformer-SAC model interacts with the environment of the low-voltage distribution network to form an action strategy, comprising the following steps:

1.1、基于实时电网状态经过训练好的Transformer-SAC模型生成动作策略，向低压配电网的执行设备发布动作指令；1.1. Generate action strategies based on the trained Transformer-SAC model based on the real-time power grid status and issue action instructions to the execution devices of the low-voltage distribution network;

1.2、执行设备执行所选的控制动作，使实际低压配电网环境的状态发生变化，并将变化后的电网状态反馈训练好的Transformer-SAC模型；1.2. The execution device executes the selected control action to change the state of the actual low-voltage distribution network environment, and feeds the changed grid state back to the trained Transformer-SAC model;

1.3、Transformer 网络对输入的电网状态进行特征处理，并优化Transformer网络的权重和偏置项，Transformer网络根据实时更新的低压配电网状态提取最新的关联特征矩阵并输入SAC算法以生成控制策略，完成Transformer-SAC模型实时迭代训练和低压配电网调压控制。1.3. The Transformer network performs feature processing on the input power grid state and optimizes the weights and bias items of the Transformer network. The Transformer network extracts the latest associated feature matrix based on the real-time updated low-voltage distribution network state and inputs it into the SAC algorithm to generate a control strategy, completing the real-time iterative training of the Transformer-SAC model and the voltage regulation control of the low-voltage distribution network.

8.实现权利要求1所述的基于深度强化学习的低压配电网电压控制方法的系统，其特征在于，包括低压配电网的电压控制系统与Transformer-SAC 控制模块；8. A system for implementing the low-voltage distribution network voltage control method based on deep reinforcement learning as described in claim 1, characterized in that it includes a voltage control system of the low-voltage distribution network and a Transformer-SAC control module;

所述Transformer-SAC 控制模块包括Transformer-SAC模型，用于从电网状态中提取特征并生成控制策略；其中Transformer网络用于特征提取，优化控制策略的输入特征；SAC算法用于生成优化后的控制策略，通过策略网络和网络来进行训练；The Transformer-SAC control module includes a Transformer-SAC model, which is used to extract features from the power grid state and generate a control strategy; wherein the Transformer network is used to extract features and optimize the input features of the control strategy; the SAC algorithm is used to generate the optimized control strategy, through the strategy network and Network for training;

低压配电网的电压控制系统包括：配电网状态监测系统、中央控制器、执行设备；The voltage control system of the low voltage distribution network includes: distribution network status monitoring system, central controller, and execution equipment;

配电网状态监测系统，通过传感器实时监测电网的状态，传感器将实时数据反馈给Transformer-SAC模型，用于评估控制策略的效果；The distribution network status monitoring system monitors the status of the power grid in real time through sensors. The sensors feed real-time data back to the Transformer-SAC model to evaluate the effectiveness of the control strategy.

中央控制器将Transformer-SAC模型形成的控制指令通过通信网络发送至执行设备；执行设备接收到控制指令后，按照指令调整运行状态，设备执行过程中的环境状态通过传感器实时反馈给Transformer-SAC模型；Transformer-SAC模型根据反馈实现电压控制与Transformer-SAC模型内Transformer网络参数的优化。The central controller sends the control instructions formed by the Transformer-SAC model to the execution device through the communication network; after receiving the control instructions, the execution device adjusts the operating status according to the instructions, and the environmental status of the equipment during execution is fed back to the Transformer-SAC model in real time through sensors; the Transformer-SAC model realizes voltage control and optimization of the Transformer network parameters within the Transformer-SAC model based on the feedback.

9.根据权利要求8所述的基于深度强化学习的低压配电网电压控制方法的系统，其特征在于，所述执行设备包括用于调节电网的无功功率输出的无功补偿器、根据控制指令调整有功功率输出的分布式电源设备。9. According to the system of the low-voltage distribution network voltage control method based on deep reinforcement learning in claim 8, it is characterized in that the execution device includes a reactive compensator for adjusting the reactive power output of the power grid and a distributed power supply device for adjusting the active power output according to the control instruction.

10.根据权利要求8所述的基于深度强化学习的低压配电网电压控制方法的系统，其特征在于，所述通信网络采用 IEC 61850 标准化通信协议，将控制指令从中央控制器传输至各子站，并由子站转发至执行设备。10. According to the system of the low-voltage distribution network voltage control method based on deep reinforcement learning in claim 8, it is characterized in that the communication network adopts the IEC 61850 standardized communication protocol to transmit the control instructions from the central controller to each substation, and the substation forwards them to the execution device.