CN109784545A

Movatterモバイル変換

Info

Publication number: CN109784545A
Application number: CN201811581070.5A
Authority: CN
Inventors: 吴新; 史军; 林子钊; 程韧俐; 马伟哲; 郑晓辉; 黄双; 余涛; 陈俊斌; 张孝顺
Original assignee: South China University of Technology SCUT; Shenzhen Power Supply Bureau Co Ltd
Current assignee: South China University of Technology SCUT; Shenzhen Power Supply Bureau Co Ltd
Priority date: 2018-12-24
Filing date: 2018-12-24
Publication date: 2019-05-21

Abstract

本发明提供一种基于多智能体的分布式能源枢纽调度方法，该方法包括：S1、将输出最多种类能源载体的枢纽设置为售电方智能体，其余枢纽设置为购电方智能体，并确定调度的目标函数；S2、购电方智能体确定是否接受售电方智能体确定的当前最优联合动作策略，若不接受，则执行S3；S3、购电方智能体确定其能源产量；S4、购电方智能体计算其能源产量对应的动作值，形成每一个购电方智能体的产量‑动作对；S5、购电方智能体计算产量‑动作对的奖励函数，并根据奖励函数更新知识矩阵；S6、购电方智能体根据更新的知识矩阵更新动作策略，与售电方智能体进行博弈。本发明能够在分布式能源枢纽中有效求得平衡点，并能有效提高最优解的精确性。

The present invention provides a multi-agent-based distributed energy hub scheduling method. The method includes: S1. The hub that outputs the most types of energy carriers is set as the electricity seller's agent, and the rest of the hubs are set as the electricity buyer's agent, and Determine the scheduling objective function; S2, the power buyer agent determines whether to accept the current optimal joint action strategy determined by the power seller agent, if not, execute S3; S3, the power buyer agent determines its energy output; S4, the power purchaser agent calculates the action value corresponding to its energy output, and forms the output-action pair of each power purchaser agent; S5, the power purchaser agent calculates the reward function of the output-action pair, and according to the reward function Update the knowledge matrix; S6, the electricity buyer agent updates the action strategy according to the updated knowledge matrix, and performs a game with the electricity seller agent. The invention can effectively obtain the balance point in the distributed energy hub, and can effectively improve the accuracy of the optimal solution.

Description

Translated fromChinese

一种基于多智能体的分布式能源枢纽的调度方法A Distributed Energy Hub Scheduling Method Based on Multi-agent

技术领域technical field

本发明涉及分布式能源调度技术领域，尤其涉及一种基于多智能体的分布式能源枢纽的调度方法。The invention relates to the technical field of distributed energy scheduling, in particular to a multi-agent-based scheduling method for a distributed energy hub.

背景技术Background technique

能源系统应当对各类用户提供安全可靠、合乎标准的电能，时刻满足电力用户即负荷的电量需求。在满足用户需求的同时，应提高能源利用率，降低碳排放和提高能源使用的灵活性。在此背景下，提出了能源枢纽的概念，能源枢纽可用于不同能源载体之间的转化，存储和调度。在此基础上，本专利提出了一种基于多智能体讨价还价博弈学习算法的分布式能源枢纽经济调度方法。现有的调度优化方法大都属于集中式优化算法，容易给处理器带来较大的计算压力。同时随着规模和复杂度的上涨，难以找到最优解。The energy system should provide all kinds of users with safe, reliable, and standard electric energy, and always meet the electric power demand of the power users, that is, the load. While meeting the needs of users, it should improve energy utilization, reduce carbon emissions and increase the flexibility of energy use. In this context, the concept of energy hub is proposed, which can be used for conversion, storage and dispatch between different energy carriers. On this basis, this patent proposes a distributed energy hub economic dispatch method based on a multi-agent bargaining game learning algorithm. Most of the existing scheduling optimization methods belong to the centralized optimization algorithm, which is easy to bring greater computational pressure to the processor. At the same time, with the increase of scale and complexity, it is difficult to find the optimal solution.

发明内容SUMMARY OF THE INVENTION

本发明所要解决的技术问题在于，提供一种多智能体的分布式能源枢纽的调度方法，该方法能够在分布式能源枢纽中有效求得平衡点，并能有效提高最优解的精确性。The technical problem to be solved by the present invention is to provide a multi-agent distributed energy hub scheduling method, which can effectively obtain a balance point in the distributed energy hub and can effectively improve the accuracy of the optimal solution.

为了解决上述技术问题，本发明提供一种基于多智能体的分布式能源枢纽的调度方法，包括如下步骤：In order to solve the above technical problems, the present invention provides a multi-agent-based distributed energy hub scheduling method, which includes the following steps:

S1、将输出最多种类能源载体的枢纽设置为售电方智能体，其余枢纽设置为购电方智能体，并确定调度的目标函数；S1. Set the hub that outputs the most types of energy carriers as the electricity seller's agent, and the rest of the hubs as the electricity buyer's agent, and determine the scheduling objective function;

S2、购电方智能体确定是否接受售电方智能体确定的当前最优联合动作策略，若不接受，则执行步骤S3；S2. The power purchaser agent determines whether to accept the current optimal joint action strategy determined by the power seller agent, and if not, executes step S3;

S3、购电方智能体确定其能源产量，S3. The power purchaser agent determines its energy output,

S4、购电方智能体计算其对应的能源产量对应的动作值，形成每一个购电方智能体的产量-动作对；S4, the power purchaser agent calculates the action value corresponding to its corresponding energy output, and forms the output-action pair of each power purchaser agent;

S5、购电方智能体计算产量-动作对的奖励函数，并根据奖励函数更新可控变量的知识矩阵；S5. The power purchaser agent calculates the reward function of the output-action pair, and updates the knowledge matrix of the controllable variables according to the reward function;

S6、购电方智能体根据更新的知识矩阵更新动作策略，与售电方智能体进行博弈。S6, the power buyer agent updates the action strategy according to the updated knowledge matrix, and performs a game with the power seller agent.

其中，所述S1中确定的目标函数为：Wherein, the objective function determined in S1 is:

其中，f_I(x)为发电成本，f_C(x)为电能损耗，x为整个能源系统的可控变量，包括每个能源载体的产量和每个分配因子；x_m表示第m个能源集线器的可控变量向量；小标m和p分别表示第m个能源集线器和第p个能源载体，M表示能源集线器的总数量，P是能源载体的集合；表示能源系统的第p个能源载体的需求，n_m^p为与第m个能源集线器的第p个输入能源载体相关联的能源数量，n_m^e是第m个能源集线器具有阀点效应的发电机数量，和是第j个能源的第一、第二、第三成本系数；and为考虑发电机取值点效应的附加整流正弦分量的第一、第二成本系数；为第j个能源的输入，为第j个能源的发电机的出力下限，和分别为第p个能源对第m个能源集线器的能源输入和输出。Among them, f_I (x) is the power generation cost, f_C (x) is the power loss, x is the controllable variable of the entire energy system, including the output of each energy carrier and each distribution factor; x_m represents the mth energy source The controllable variable vector of the hub; the subscripts m and p represent the mth energy hub and the pth energy carrier, respectively, M represents the total number of energy hubs, and P is the collection of energy carriers; represents the demand of the p-th energy carrier of the energy system, n_m^p is the amount of energy associated with the p-th input energy carrier of the m-th energy hub, and n_m^e is the generation of the m-th energy hub with valve point effect number of machines, and are the first, second, and third cost coefficients of the jth energy; and are the first and second cost coefficients of the additional rectified sinusoidal component considering the generator value point effect; is the input of the jth energy, is the lower output limit of the generator of the jth energy source, and are the energy input and output of the p-th energy source to the m-th energy hub, respectively.

其中，所述步骤S2中售电方智能体确定的最优联合动作策略为：Wherein, the optimal joint action strategy determined by the agent of the electricity seller in the step S2 is:

其中，k表示迭代次数；x_k^*表示第k次迭代的最优联合动作策略；表示第i个购电方智能体的讨价还价行动策略；表示在第(k-1)次迭代中，除了第i个智能体，其他购电方智能体的联合行动策略；为第k-1次迭代中购电方智能体的联合博弈策略，U_i表示第i个购电方智能体的效用函数；n表示购电方智能体的数目；U_s表示售电方智能体的效用函数。Among them, k represents the number of iterations; x_k^* represents the optimal joint action strategy for the k-th iteration; represents the bargaining action strategy of the i-th electricity buyer agent; In the (k-1)th iteration, except for the i-th agent, the joint action strategy of other power-purchasing agents; is the joint game strategy of the power purchaser agents in the k-1th iteration, U_i represents the utility function of the i-th power purchaser agent; n represents the number of power purchaser agents; U_s represents the electricity seller’s intelligence body's utility function.

其中，所述步骤S3具体包括：Wherein, the step S3 specifically includes:

其中，表示第i个购电方智能体的能源产量，和分别是第i个购电方智能体在第v个状态的下界和上界；和分别为第j个购电方智能体的输入上下界，表示第p个输入能源载体对第m个能源集线器的当前能源输出量。in, represents the energy output of the i-th electricity buyer agent, and are the lower and upper bounds of the i-th electricity buyer agent in the v-th state, respectively; and are the upper and lower bounds of the input of the jth electricity buyer agent, respectively, represents the current energy output of the pth input energy carrier to the mth energy hub.

其中，所述步骤S4具体包括：Wherein, the step S4 specifically includes:

其中，是第k次迭代中，第i个购电方智能体的第h个可控变量的知识矩阵，q₀是[0,1]内的随机值；ε是开发率；a_rand表示随机动作；表示对第i个智能体来说，第h个变量在第d个区间的最优值；和分别表示第d个区间的上界和下界；和分别表示第h个变量的上界和下界；A_ih是x_ih的动作空间；Δ(k,y)表示随着迭代次数增长的衰减函数，y为所述衰减函数的输入变量，r是[0,1]内的随机值；b是表征了非一致性程度的系统参数；k_max表示最大迭代次数，是第i个购电方智能体的第h个可控变量的动作范围，是第i个购电方智能体的第h个可控变量的动作值。in, is the knowledge matrix of the h-th controllable variable of the i-th electricity buyer agent in the k-th iteration, q₀ is a random value in [0,1]; ε is the development rate; a_rand represents random actions; Represents the optimal value of the h-th variable in the d-th interval for the i-th agent; and represent the upper and lower bounds of the d-th interval, respectively; and represent the upper and lower bounds of the hth variable, respectively; A_ih is the action space of x_ih ; Δ(k, y) represents the decay function that increases with the number of iterations, y is the input variable of the decay function, and r is [ 0,1]; b is a system parameter that characterizes the degree of inconsistency; k_max represents the maximum number of iterations, is the action range of the h-th controllable variable of the i-th electricity buyer agent, is the action value of the h-th controllable variable of the i-th electricity buyer agent.

其中，所述步骤S5中计算获得的奖励函数为：Wherein, the reward function calculated in the step S5 is:

其中，F_i^kj表示在第k次迭代中，第j个智能体的适应度函数；p_m是正系数；SA_i^Best表示在第k次迭代，第i个智能体的最优动作集；f为前述惩罚函数；NC_i表示对第i个购电方智能体的约束数目；PF_i^u表示对第i个购电方智能体第u个约束的惩罚函数；χ是惩罚因数；Z_i^u表示对第i个购电方智能体的第u个约束；Z_i^u,lim表示与Z_i^u相对应的约束限制。Among them, F_i^kj represents the fitness function of the j-th agent in the k-th iteration; p_m is a positive coefficient; SA_i^Best represents the optimal action set of the i-th agent in the k-th iteration; f is the aforementioned penalty function; NC_i represents the number of constraints on the i-th power buyer agent; PF_i^u represents the penalty function for the u-th constraint on the i-th power buyer agent; χ is the penalty factor; Z_i^u Represents the u-th constraint on the i-th electricity buyer agent; Z_i^u,lim represents the constraint limit corresponding to Z_i^u .

其中，所述步骤S5中根据奖励函数更新可控变量的知识矩阵具体包括：Wherein, in the step S5, the knowledge matrix for updating the controllable variables according to the reward function specifically includes:

其中，Q_ih表示第i个购电方智能体的第h个变量的知识矩阵；ΔQ表示知识量的增长；α表示知识学习率；γ表示折扣系数；表示第j个智能体对可控变量x_ih所执行的状态-动作；R(s^k,s^k+1,a^k)表示当选择动作a^k从状态s^k转移到状态s^k+1时的立即奖励；a_ih表示任意一个可选择的动作策略；A_ih表示x_ih的动作集；n_i表示第i个购电方智能体的可控变量数目；J表示合作群的种群规模。Among them, Q_ih represents the knowledge matrix of the h-th variable of the i-th electricity buyer agent; ΔQ represents the growth of knowledge; α represents the knowledge learning rate; γ represents the discount coefficient; Represents the state-action performed by the j-th agent on the controllable variable x_ih ; R(s^k , s^k+1 , a^k ) represents when the selected action a^k transfers from state^sk to state^sk+1 a_ih represents any optional action strategy; A_ih represents the action set of x_ih ; n_i represents the number of controllable variables of the i-th electricity buyer agent; J represents the population size of the cooperative group.

其中，所述步骤S6具体包括：Wherein, the step S6 specifically includes:

其中，i＝1,2,...,n。where i=1,2,...,n.

本发明实施例的有益效果在于：采用一个售电方和N个购电方的博弈模型，首先售电方智能体确定当前最优联合动作策略，在各购电方智能体不接受售电方智能体的动作策略的情况下，各购电方智能体确定每一个可控变量的状态-动作对，并计算每一个状态-动作对的奖励函数，根据奖励函数更新知识矩阵，从而更新每一个购电方智能体的动作策略进行博弈。该方法采用一个售电方和N个购电方的博弈模型，能够在分布式能源枢纽中有效求得平衡点，本发明采用联想记忆和群体智能，能够加速知识矩阵的收敛，同时探索机制的存在能有效提高最优解的精确性。The beneficial effects of the embodiments of the present invention are: adopting a game model of one electricity seller and N electricity buyers, first the electricity seller agent determines the current optimal joint action strategy, and each electricity buyer agent does not accept the electricity seller In the case of the agent's action strategy, each power buyer agent determines the state-action pair of each controllable variable, calculates the reward function of each state-action pair, and updates the knowledge matrix according to the reward function, thereby updating each state-action pair. The action strategy of the purchaser's agent plays a game. The method adopts a game model of one electricity seller and N electricity buyers, which can effectively obtain a balance point in a distributed energy hub. The present invention adopts associative memory and group intelligence, which can accelerate the convergence of the knowledge matrix and explore the mechanism of Existence can effectively improve the accuracy of the optimal solution.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.

图1是本发明实施例的一种基于多智能体的分布式能源枢纽的调度方法的流程示意图。FIG. 1 is a schematic flowchart of a scheduling method for a multi-agent-based distributed energy hub according to an embodiment of the present invention.

具体实施方式Detailed ways

以下各实施例的说明是参考附图，用以示例本发明可以用以实施的特定实施例。The following descriptions of the various embodiments refer to the accompanying drawings to illustrate specific embodiments in which the invention may be practiced.

以下参照图1进行说明，本发明实施例一提供一种基于多智能体的分布式能源枢纽调度方法，其包括如下步骤：1 , the first embodiment of the present invention provides a multi-agent-based distributed energy hub scheduling method, which includes the following steps:

S1、将输出最多种类能源载体的枢纽设置为售电方智能体，其余枢纽设置为购电方智能体，并确定目标函数。S1. The hub that outputs the most types of energy carriers is set as the electricity seller agent, and the rest of the hubs are set as the electricity buyer agent, and the objective function is determined.

具体地，选择输出最多种类能源载体的集线器为售电方智能体，其余集线器为购电方智能体。Specifically, the hub that outputs the most types of energy carriers is selected as the electricity seller agent, and the rest of the hubs are the electricity buyer agents.

目标函数为考虑发电方的成本以及电能损耗的综合函数其中f_I(x)为发电成本，f_C(x)为电能损耗，x为整个能源系统的可控变量，包括每个能源载体的产量和每个分配因子；x_m表示第m个能源集线器的可控变量向量；小标m和p分别表示第m和能源集线器和第p个能源载体，M表示能源集线器的总数量，P是能源载体的集合；表示能源系统的第p个能源载体的需求，f_C^m和f_L^m分别表示第m个能源集线器的发电成本和能源损耗，分别计算如下：The objective function is a comprehensive function that considers the cost of the generator and the power loss in f_I (x) is the power generation cost, f_C (x) is the power loss, x is the controllable variable of the entire energy system, including the output of each energy carrier and each distribution factor; x_m represents the mth energy hub Controllable variable vector; subscripts m and p represent the mth and energy hubs and the pth energy carrier, respectively, M represents the total number of energy hubs, and P is the set of energy carriers; represents the demand of the p-th energy carrier of the energy system, f_C^m and f_L^m represent the power generation cost and energy loss of the m-th energy hub, respectively, and are calculated as follows:

其中n_m^p为与第m个能源集线器的第p个输入能源载体相关联的能源数量，n_m^e是第m个能源集线器具有阀点效应的发电机数量，和是第j个能源的成本系数；and为考虑发电机取值点效应的附加整流正弦分量的成本系数；为第j个能源的输入，为第j个发电机的出力下限，和分别为第p个能源对第m个能源集线器的能源出入和输出。where n_m^p is the amount of energy associated with the p-th input energy carrier of the m-th energy hub, n_m^e is the number of generators with valve point effects at the m-th energy hub, and is the cost coefficient of the jth energy; and is the cost factor for the additional rectified sinusoidal component considering the generator value point effect; is the input of the jth energy, is the output lower limit of the jth generator, and are the energy input and output of the p-th energy source to the m-th energy hub, respectively.

S2、购电方智能体确定是否接受售电方智能体确定的当前最优联合动作策略，若不接受，则执行步骤S3。S2. The power purchaser agent determines whether to accept the current optimal joint action strategy determined by the power seller agent, and if not, executes step S3.

具体地，售电方智能体根据下式确定当前最优联合策略：Specifically, the agent of the electricity seller determines the current optimal joint strategy according to the following formula:

其中，k表示迭代次数；x_k^*表示第k次迭代的最优联合动作策略；表示第i个购电方智能体的讨价还价行动策略；表示在第(k-1)次迭代中，除了第i个智能体，其他购电方智能体的联合行动策略；为第k-1次迭代中所有购电方联合博弈策略，U_i表示第i个购电方智能体的效用函数；n表示购电方智能体的数目；U_s表示售电方智能体的效用函数。Among them, k represents the number of iterations; x_k^* represents the optimal joint action strategy for the k-th iteration; represents the bargaining action strategy of the i-th electricity buyer agent; In the (k-1)th iteration, except for the i-th agent, the joint action strategy of other power-purchasing agents; is the joint game strategy of all electricity buyers in the k-1th iteration, U_i represents the utility function of the ith electricity buyer agent; n represents the number of electricity buyer agents; U_s represents the Utility Function.

S3、购电方智能体确定其能源产量。S3. The power purchaser agent determines its energy output.

具体地，购电方各智能体若接受售电方智能体的策略，则迭代结束；若不接受，则购电方各智能体根据下式确定第一个可控变量状态，即购电方智能体的能源产量。Specifically, if the agents of the electricity buyer accept the strategy of the electricity seller, the iteration ends; if not, the agents of the electricity buyer determine the first controllable variable state according to the following formula, that is, the electricity buyer The energy production of the agent.

此处，表示第i个购电方的第k个变量的状态，即各购电方智能体的能源产量，和分别是第i个能源在第v个状态的下界和上界；和分别为第j个能源的输入上下界，表示第p个输入能源载体对第m个能源集线器的当前能源输出量。here, represents the state of the k-th variable of the i-th electricity buyer, that is, the energy output of each electricity buyer’s agent, and are the lower and upper bounds of the i-th energy in the v-th state, respectively; and are the upper and lower bounds of the input of the jth energy, respectively, represents the current energy output of the pth input energy carrier to the mth energy hub.

S4、购电方智能体计算其对应的能源产量对应的动作值，形成每一个购电方智能体的产量-动作对。S4, the power purchaser agent calculates the action value corresponding to its corresponding energy output, and forms an output-action pair of each power purchaser agent.

具体地，每一个购电方智能体根据相应的知识矩阵对可控变量选择一个动作策略，其次根据相应区间的局部最优解，利用非均匀突变算子计算出精确值。Specifically, each power buyer agent selects an action strategy for the controllable variables according to the corresponding knowledge matrix, and then calculates the precise value using the non-uniform mutation operator according to the local optimal solution in the corresponding interval.

更具体地，每一个购电方智能体根据下式对可控变量选择一个动作策略的范围More specifically, each power buyer agent selects a range of action strategies for the controllable variables according to the following formula:

更具体地，根据相应区间的局部最优解，利用非均匀突变算子计算出动作的精确值具体包括：More specifically, according to the local optimal solution of the corresponding interval, the exact value of the action is calculated using the non-uniform mutation operator Specifically include:

是第k次迭代中，第i个购电方智能体的第h个可控变量的知识矩阵，q₀是[0,1]内的随机值；ε是开发率；a_rand表示随机动作；表示对第i个智能体来说，第h个变量在第d个区间的最优值；和分别表示第d个区间的上界和下界；和分别表示第h个变量的上界和下界；A_ih是x_ih的动作空间；Δ(k,y)表示随着迭代次数增长的衰减函数，y为衰减函数的输入变量；r是[0,1]内的随机值；b表征了非一致性程度的系统参数；k_max表示最大迭代次数。 is the knowledge matrix of the h-th controllable variable of the i-th electricity buyer agent in the k-th iteration, q₀ is a random value in [0,1]; ε is the development rate; a_rand represents random actions; Represents the optimal value of the h-th variable in the d-th interval for the i-th agent; and represent the upper and lower bounds of the d-th interval, respectively; and represent the upper and lower bounds of the hth variable, respectively; A_ih is the action space of x_ih ; Δ(k, y) represents the decay function that increases with the number of iterations, and y is the input variable of the decay function; r is [0, 1]; b represents the system parameter of the degree of inconsistency; k_max represents the maximum number of iterations.

S5、购电方智能体计算产量-动作对的奖励函数，并根据奖励函数更新可控变量的知识矩阵。S5. The power purchaser agent calculates the reward function of the output-action pair, and updates the knowledge matrix of the controllable variables according to the reward function.

具体地，各购电方智能体根据下式计算每一可控变量的状态-动作对的奖励函数：Specifically, each power buyer agent calculates the reward function of the state-action pair of each controllable variable according to the following formula:

其中，表示在第k次迭代中，第j个智能体的适应度函数；p_m是正系数；SA_i^Best表示在第k次迭代，第i个智能体的最优动作集；f为前述惩罚函数；NC_i表示对第i个购电方智能体的约束数目；PF_i^u表示对第i个购电方智能体第u个约束的惩罚函数；χ是惩罚因数；Z_i^u表示对第i个购电方智能体的第u个约束；Z_i^u,lim表示与Z_i^u相对应的约束限制。in, Represents the fitness function of the j-th agent in the k-th iteration; p_m is a positive coefficient; SA_i^Best represents the optimal action set of the i-th agent in the k-th iteration; f is the aforementioned penalty function; NC_i represents the number of constraints on the i-th power buyer agent; PF_i^u represents the penalty function for the^u -th constraint on the_i -th power buyer agent; χ is the penalty factor; The uth constraint of the purchaser agent; Z_i^u,lim represents the constraint limit corresponding to Z_i^u .

具体地，采用Q学习进行知识矩阵更新，同时为了避免“维数灾”应采用联想记忆来存储知识。Specifically, Q-learning is used to update the knowledge matrix, and in order to avoid the "dimension disaster", associative memory should be used to store knowledge.

根据奖励最大化原则，购电方智能体根据下式对每个可控变量更新知识矩阵，具体包括：According to the principle of reward maximization, the purchasing agent updates the knowledge matrix for each controllable variable according to the following formula, including:

其中，Q_ih表示第i个购电方智能体的第h个变量的知识矩阵；ΔQ表示知识量的增长；α表示知识学习率；γ表示折扣系数；表示第j个个体对可控变量x_ih所执行的状态-动作对；R(s^k,s^k+1,a^k)表示当选择动作a^k从状态s^k转移到状态s^k+1时的立即奖励；a_ih表示任意一个可选择的动作策略；A_ih表示x_ih的动作集；n_i表示第i个购电方智能体的可控变量数目；J表示合作群的种群规模。Among them, Q_ih represents the knowledge matrix of the h-th variable of the i-th electricity buyer agent; ΔQ represents the growth of knowledge; α represents the knowledge learning rate; γ represents the discount coefficient; Represents the state-action pair performed by the jth individual on the controllable variable x_ih ; R(s^k , s^k+1 , a^k ) represents when the selection action a^k transfers from state^sk to state^sk+1 a_ih represents any optional action strategy; A_ih represents the action set of x_ih ; n_i represents the number of controllable variables of the i-th electricity buyer agent; J represents the population size of the cooperative group.

S6、购电方智能体根据更新的知识矩阵更新策略，与售电方智能体进行博弈。S6. The power purchaser agent performs a game with the power seller agent according to the updated knowledge matrix update strategy.

具体地，购电方智能体根据下式更新策略：Specifically, the power purchaser agent updates the policy according to the following formula:

其中，i＝1,2,...,n。where i=1,2,...,n.

本发明实施例的一种基于多智能体的分布式能源枢纽调度方法，采用一个售电方和N个购电方的博弈模型，首先售电方智能体确定当前最优联合动作策略，在各购电方智能体不接受售电方智能体的动作策略的情况下，各购电方智能体确定每一个可控变量的状态-动作对，并计算每一个状态-动作对的奖励函数，根据奖励函数更新知识矩阵，从而更新每一个购电方智能体的动作策略进行博弈。该方法采用一个售电方和N个购电方的博弈模型，能够在分布式能源枢纽中有效求得平衡点，本发明采用联想记忆和群体智能，能够加速知识矩阵的收敛，同时探索机制的存在能有效提高最优解的精确性。A multi-agent-based distributed energy hub scheduling method according to an embodiment of the present invention adopts a game model between one electricity seller and N electricity buyers. In the case where the power buyer agent does not accept the action strategy of the power seller agent, each power buyer agent determines the state-action pair of each controllable variable, and calculates the reward function of each state-action pair. The reward function updates the knowledge matrix, thereby updating the action strategy of each power buyer agent to play the game. The method adopts a game model of one electricity seller and N electricity buyers, which can effectively obtain a balance point in a distributed energy hub. The present invention adopts associative memory and group intelligence, which can accelerate the convergence of the knowledge matrix and explore the mechanism of Existence can effectively improve the accuracy of the optimal solution.

以上所揭露的仅为本发明较佳实施例而已，当然不能以此来限定本发明之权利范围，因此依本发明权利要求所作的等同变化，仍属本发明所涵盖的范围。The above disclosures are only preferred embodiments of the present invention, and of course, the scope of the rights of the present invention cannot be limited by this. Therefore, equivalent changes made according to the claims of the present invention are still within the scope of the present invention.