CN114340016B

Movatterモバイル変換

Info

Publication number: CN114340016B
Application number: CN202210255851.5A
Authority: CN
Inventors: 丰雷; 周凡钦; 杨洋; 杨志祥; 喻鹏; 李文璟; 李阳阳
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2022-03-16
Filing date: 2022-03-16
Publication date: 2022-07-26
Anticipated expiration: 2042-03-16
Also published as: CN114340016A

Abstract

The invention provides a method and a system for power grid edge calculation unloading distribution, wherein the method comprises the following steps: acquiring network state information of each power terminal in the smart power grid at the current moment; inputting network state information corresponding to a target power terminal into a power grid edge calculation unloading distribution model to obtain an edge calculation unloading distribution strategy of a to-be-processed calculation task in the target power terminal; and according to the edge computing unloading distribution strategy, the computing tasks to be processed are segmented, and the segmented computing tasks to be processed are cached to the corresponding power terminal and/or the mobile edge computing server so as to carry out edge computing unloading on the computing tasks to be processed. The invention uses the multi-agent reinforcement learning solving algorithm to carry out the edge computing unloading distribution decision, fully utilizes the cache and the computing resources of the power terminal equipment, obtains a more accurate and efficient edge computing unloading distribution scheme, and can effectively reduce the transmission delay of the remote mobile edge computing server.

Description

Translated fromChinese

一种电网边缘计算卸载分配方法及系统A method and system for offloading and distributing power grid edge computing

技术领域technical field

本发明涉及移动边缘计算技术领域，尤其涉及一种电网边缘计算卸载分配方法及系统。The present invention relates to the technical field of mobile edge computing, in particular to a method and system for offloading and distributing power grid edge computing.

背景技术Background technique

随着融合5G的电网快速发展，电力业务终端的数量和产生的流量越来越大，这对于现有电网架构提出了很大的挑战。边缘计算作为一种新的计算模式，使数据在源头附近就能得到及时有效的处理，从而为解决电网中海量数据处理提供了新的解决方案。With the rapid development of the 5G-integrated power grid, the number of power service terminals and the traffic generated are increasing, which poses a great challenge to the existing power grid architecture. As a new computing mode, edge computing enables data to be processed in a timely and effective manner near the source, thus providing a new solution for massive data processing in the power grid.

为保证电网服务质量，目前大多数优化方案，采用将计算任务卸载到移动边缘计算（Mobile Edge Computing，简称MEC）服务器的方式，这显著减轻了核心网络负载压力，大大减少用户请求的传输距离。然而，当面临海量电力用户终端请求时，仅依靠MEC服务器进行任务处理，将导致因计算缓存资源不足所带来的额外的排队延迟，以及多用户终端对通信资源的竞争，仍然使得MEC的内容传输面临网络拥塞的挑战。In order to ensure the quality of grid service, most of the current optimization schemes use the method of offloading computing tasks to Mobile Edge Computing (MEC) servers, which significantly reduces the load pressure on the core network and greatly reduces the transmission distance of user requests. However, when faced with massive power user terminal requests, relying only on the MEC server for task processing will lead to additional queuing delays caused by insufficient computing cache resources, as well as multi-user terminal competition for communication resources. Transport faces the challenge of network congestion.

随着电力终端设备能力的提升，终端辅助计算成为了很有潜力的解决方案。终端间距离较近，明显降低远距离传输延迟，此外，多节点并行计算，计算效率显著提升。但是，现有针对临近终端的边缘计算卸载分配方案还不够完善，无法得到较为准确的边缘计算卸载分配方案。因此，现在亟需一种电网边缘计算卸载分配方法及系统来解决上述问题。With the improvement of power terminal equipment capabilities, terminal-assisted computing has become a promising solution. The distance between terminals is relatively short, which significantly reduces the long-distance transmission delay. In addition, multi-node parallel computing, the computing efficiency is significantly improved. However, the existing edge computing offloading allocation scheme for adjacent terminals is not perfect enough, and a more accurate edge computing offloading allocation scheme cannot be obtained. Therefore, there is an urgent need for a method and system for offloading and distributing power grid edge computing to solve the above problems.

发明内容SUMMARY OF THE INVENTION

针对现有技术存在的问题，本发明提供一种电网边缘计算卸载分配方法及系统。In view of the problems existing in the prior art, the present invention provides a method and system for offloading and distributing power grid edge computing.

本发明提供一种电网边缘计算卸载分配方法，包括：The present invention provides a method for offloading and distributing power grid edge computing, comprising:

获取智能电网中每个电力终端在当前时刻的网络状态信息；Obtain the network status information of each power terminal in the smart grid at the current moment;

将目标电力终端对应的网络状态信息，输入到电网边缘计算卸载分配模型，得到所述目标电力终端中待处理计算任务的边缘计算卸载分配策略；Inputting the network status information corresponding to the target power terminal into the power grid edge computing offloading distribution model to obtain the edge computing offloading allocation strategy of the computing task to be processed in the target power terminal;

根据所述边缘计算卸载分配策略，将所述待处理计算任务进行分割，并将分割后的待处理计算任务缓存到对应的电力终端和/或移动边缘计算服务器，以对所述待处理计算任务进行边缘计算卸载；According to the edge computing offloading and allocation strategy, the to-be-processed computing tasks are divided, and the divided to-be-processed computing tasks are cached in the corresponding power terminal and/or mobile edge computing server, so as to dispose of the to-be-processed computing tasks. Perform edge computing offloading;

其中，所述电网边缘计算卸载分配模型是由样本网络状态信息和所述样本网络状态信息对应的任务缓存比例和任务卸载位置，对多智能体强化学习网络进行训练得到的。The power grid edge computing offload distribution model is obtained by training a multi-agent reinforcement learning network based on the sample network state information and the task cache ratio and task offload position corresponding to the sample network state information.

根据本发明提供的一种电网边缘计算卸载分配方法，所述电网边缘计算卸载分配模型通过以下步骤训练得到：According to a grid edge computing offloading distribution method provided by the present invention, the grid edge computing offloading distribution model is obtained by training through the following steps:

基于每个电力终端的历史网络状态信息，构建各个电力终端对应智能体的样本网络状态信息，并根据所述样本网络状态信息，构建第一样本观测状态；Based on the historical network status information of each power terminal, construct sample network status information of the agent corresponding to each power terminal, and construct a first sample observation status according to the sample network status information;

获取所述样本网络状态信息对应的任务缓存比例和任务卸载位置，并根据所述任务缓存比例和所述任务卸载位置，构建每个智能体的动作；Obtain the task cache ratio and the task unloading position corresponding to the sample network state information, and construct the action of each agent according to the task cache ratio and the task unloading position;

基于每个电力终端在进行边缘计算卸载时的能耗和时延，以每个电力终端的能耗最小化为优化目标，构建智能体的奖励；Based on the energy consumption and delay of each power terminal when the edge computing is offloaded, the optimization goal is to minimize the energy consumption of each power terminal, and the reward of the agent is constructed;

根据所述第一样本观测状态、所述动作和所述奖励，构建训练样本集；constructing a training sample set according to the first sample observation state, the action and the reward;

通过所述训练样本集，对多智能体强化学习网络进行训练，得到电网边缘计算卸载分配模型。Through the training sample set, the multi-agent reinforcement learning network is trained to obtain a power grid edge computing offloading distribution model.

根据本发明提供的一种电网边缘计算卸载分配方法，所述基于每个电力终端在进行边缘计算卸载时的能耗和时延，以每个电力终端的能耗最小化为优化目标，构建智能体的奖励，包括：According to a method for offloading and distributing power grid edge computing provided by the present invention, based on the energy consumption and time delay of each power terminal when performing edge computing offloading, and with the optimization goal of minimizing the energy consumption of each power terminal, an intelligent Physical rewards, including:

根据每个电力终端的计算能耗和传输能耗，获取每个电力终端在进行边缘计算卸载时的能耗；According to the computing energy consumption and transmission energy consumption of each power terminal, the energy consumption of each power terminal when the edge computing is offloaded is obtained;

根据每个电力终端的传输时延和计算时延，获取每个电力终端在进行边缘计算卸载时的时延；According to the transmission delay and calculation delay of each power terminal, obtain the delay of each power terminal when performing edge computing offloading;

将每个电力终端在进行边缘计算卸载时的时延作为约束条件，以每个电力终端的能耗最小化为优化目标，构建电力终端边缘计算卸载能耗优化模型；Taking the time delay of each power terminal during edge computing offloading as a constraint, and taking the minimization of the energy consumption of each power terminal as the optimization goal, an optimization model of power terminal edge computing offloading energy consumption is constructed;

基于所述电力终端边缘计算卸载能耗优化模型，将每一轮训练过程中电力终端的能耗相反数作为对应智能体的奖励。Based on the power terminal edge computing offloading energy consumption optimization model, the inverse energy consumption of the power terminal in each round of training process is used as the reward for the corresponding agent.

根据本发明提供的一种电网边缘计算卸载分配方法，在所述通过所述训练样本集，对多智能体强化学习网络进行训练，得到电网边缘计算卸载分配模型之前，所述方法还包括：According to a grid edge computing offloading distribution method provided by the present invention, before the multi-agent reinforcement learning network is trained through the training sample set to obtain a grid edge computing offloading distribution model, the method further includes:

将所述样本网络状态信息输入到生成对抗网络，输出第二样本观测状态；inputting the sample network state information into the generative adversarial network, and outputting the second sample observation state;

根据所述第二样本观测状态，对所述训练样本集进行更新，得到更新后的训练样本集；updating the training sample set according to the second sample observation state to obtain an updated training sample set;

所述通过所述训练样本集，对多智能体强化学习网络进行训练，得到电网边缘计算卸载分配模型，包括：The multi-agent reinforcement learning network is trained through the training sample set to obtain a power grid edge computing offloading distribution model, including:

通过所述更新后的训练样本集，对多智能体强化学习网络进行训练，得到电网边缘计算卸载分配模型。Through the updated training sample set, the multi-agent reinforcement learning network is trained to obtain a power grid edge computing offloading distribution model.

根据本发明提供的一种电网边缘计算卸载分配方法，所述电力终端边缘计算卸载能耗优化模型的公式为：According to a power grid edge computing offloading distribution method provided by the present invention, the formula of the power terminal edge computing offloading energy consumption optimization model is:

其中，

表示第i个电力终端在t时刻进行边缘计算卸载时的能耗，

表示第i个电力终端的待计算任务在第j个电力终端的缓存比例；

为任务卸载动作，表示第i个电力终端的待计算任务在第j个电力终端的计算动作；in,

represents the energy consumption of the i-th power terminal when the edge computing is unloaded at time t,

Indicates the cache ratio of the task to be calculated at the i-th power terminal in the j-th power terminal;

is the task unloading action, indicating the computing action of the task to be calculated at the i-th power terminal at the j-th power terminal;

约束条件为：The constraints are:

；

;

；

;

；

;

；

;

；

;

；

;

；

;

；

;

；

;

其中，

表示第i个电力终端与第j个电力终端之间的网络连接状态，

表示完成第i个电力终端的待计算任务的边缘计算卸载和传输的时延，

表示预设时延阈值，

表示第i个电力终端的待计算任务中需要被缓存的任务量，

表示移动边缘计算服务器的缓存总容量，

表示任意电力终端的缓存总容量。in,

represents the network connection status between the i-th power terminal and the j-th power terminal,

represents the delay of edge computing offloading and transmission to complete the task to be computed of the i-th power terminal,

represents the preset delay threshold,

represents the amount of tasks that need to be cached in the tasks to be calculated in the i-th power terminal,

represents the total cache capacity of the mobile edge computing server,

Indicates the total cache capacity of any power terminal.

根据本发明提供的一种电网边缘计算卸载分配方法，所述样本网络状态信息包括网络连接状态、计算能力、缓存能力、待缓存卸载任务计算量和缓存卸载后任务传输量。According to a grid edge computing offloading distribution method provided by the present invention, the sample network state information includes network connection state, computing capability, cache capability, the computation amount of tasks to be cached offloaded, and the task transmission amount after cache offloading.

本发明还提供一种电网边缘计算卸载分配系统，包括：The present invention also provides a grid edge computing offloading distribution system, including:

电力终端网络状态采集模块，用于获取智能电网中每个电力终端在当前时刻的网络状态信息；The power terminal network status acquisition module is used to obtain the network status information of each power terminal in the smart grid at the current moment;

电网边缘计算卸载分配策略生成模块，用于将目标电力终端对应的网络状态信息，输入到电网边缘计算卸载分配模型，得到所述目标电力终端中待处理计算任务的边缘计算卸载分配策略；The grid edge computing offloading allocation strategy generation module is used for inputting the network status information corresponding to the target power terminal into the grid edge computing offloading allocation model to obtain the edge computing offloading allocation strategy of the computing task to be processed in the target power terminal;

边缘计算卸载模块，用于根据所述边缘计算卸载分配策略，将所述待处理计算任务进行分割，并将分割后的待处理计算任务缓存到对应的电力终端和/或移动边缘计算服务器，以对所述待处理计算任务进行边缘计算卸载；An edge computing offloading module, configured to divide the to-be-processed computing task according to the edge computing offloading allocation strategy, and cache the divided to-be-processed computing tasks to the corresponding power terminal and/or mobile edge computing server, so as to performing edge computing offload on the to-be-processed computing task;

根据本发明提供的一种电网边缘计算卸载分配系统，所述系统还包括：According to a grid edge computing offloading distribution system provided by the present invention, the system further includes:

样本构建模块，用于基于每个电力终端的历史网络状态信息，构建各个电力终端对应智能体的样本网络状态信息，并根据所述样本网络状态信息，构建第一样本观测状态；a sample construction module, configured to construct sample network status information of the agent corresponding to each power terminal based on the historical network status information of each power terminal, and construct a first sample observation status according to the sample network status information;

动作构建模块，用于获取所述样本网络状态信息对应的任务缓存比例和任务卸载位置，并根据所述任务缓存比例和所述任务卸载位置，构建每个智能体的动作；an action building module, configured to obtain the task cache ratio and the task unloading position corresponding to the sample network state information, and construct the action of each agent according to the task cache ratio and the task unloading position;

智能体奖励构建模块，用于基于每个电力终端在进行边缘计算卸载时的能耗和时延，以每个电力终端的能耗最小化为优化目标，构建智能体的奖励；The agent reward building module is used to build the reward of the agent based on the energy consumption and delay of each power terminal when the edge computing is offloaded, with the optimization goal of minimizing the energy consumption of each power terminal;

训练集生成模块，用于根据所述第一样本观测状态、所述动作和所述奖励，构建训练样本集；a training set generation module, configured to construct a training sample set according to the first sample observation state, the action and the reward;

训练模块，用于通过所述训练样本集，对多智能体强化学习网络进行训练，得到电网边缘计算卸载分配模型。The training module is used to train the multi-agent reinforcement learning network through the training sample set to obtain a power grid edge computing offloading distribution model.

本发明还提供一种电子设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述程序时实现如上述任一种所述电网边缘计算卸载分配方法的步骤。The present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and running on the processor, the processor implementing the power grid edge computing as described above when the processor executes the program Steps to uninstall the distribution method.

本发明还提供一种非暂态计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现如上述任一种所述电网边缘计算卸载分配方法的步骤。The present invention also provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the steps of any of the above-mentioned grid edge computing offloading and distribution methods.

本发明提供的一种电网边缘计算卸载分配方法及系统，通过构建移动边缘计算服务器与电力终端协作的混合式缓存与卸载框架，使用多智能体强化学习求解算法进行边缘计算卸载分配决策，充分利用电力终端设备的缓存和计算资源，得到更为准确且高效的边缘计算卸载分配方案，从而解决以往多任务请求时，单一依靠移动边缘计算服务器进行边缘计算，而面临的资源不足与网络拥塞等问题，并且终端间的近距离协作，可有效降低远距离移动边缘计算服务器的传输时延。The invention provides a method and system for offloading and distributing power grid edge computing. By constructing a hybrid caching and offloading framework in which a mobile edge computing server and power terminals cooperate, a multi-agent reinforcement learning algorithm is used to make edge computing offloading and distribution decisions. The cache and computing resources of the power terminal equipment are used to obtain a more accurate and efficient edge computing offloading allocation scheme, so as to solve the problems such as insufficient resources and network congestion when relying solely on the mobile edge computing server for edge computing in the past multi-task requests. , and the close cooperation between terminals can effectively reduce the transmission delay of the long-distance mobile edge computing server.

附图说明Description of drawings

为了更清楚地说明本发明或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图进行简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to explain the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that are required in the description of the embodiments or the prior art. Obviously, the drawings in the following description are of the present invention. For some embodiments of the present invention, for those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.

图1为本发明提供的电网边缘计算卸载分配方法的流程示意图；1 is a schematic flowchart of a method for offloading and distributing power grid edge computing provided by the present invention;

图2为本发明提供的电网边缘计算卸载分配系统的结构示意图；2 is a schematic structural diagram of a grid edge computing offloading and distribution system provided by the present invention;

图3为本发明提供的电子设备的结构示意图。FIG. 3 is a schematic structural diagram of an electronic device provided by the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚，下面将结合本发明中的附图，对本发明中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the objectives, technical solutions and advantages of the present invention clearer, the technical solutions in the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are part of the embodiments of the present invention. , not all examples. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

随着电力终端设备能力的提升，临近终端辅助计算成为了很有潜力的解决方案。由于每个电力终端之间的距离较近，可明显降低远距离传输延迟；此外，多个电力终端作为节点并行计算，计算效率显著提升，使得电力终端终端设备从资源消耗者变为资源提供者，改善了系统资源利用率。同时，为了平衡计算与通信资源消耗，电力终端设备以及MEC服务器可预先缓存相关资源，在后续进行边缘计算卸载时，可以直接对这些资源进行计算处理，并将计算结果传输到发起边缘计算卸载任务的目标电力终端。With the improvement of power terminal equipment capabilities, near-terminal assisted computing has become a promising solution. Due to the short distance between each power terminal, the long-distance transmission delay can be significantly reduced; in addition, multiple power terminals are used as nodes for parallel computing, and the computing efficiency is significantly improved, making the power terminal equipment change from resource consumers to resource providers. , which improves system resource utilization. At the same time, in order to balance the consumption of computing and communication resources, the power terminal equipment and the MEC server can cache related resources in advance, and when the edge computing offloading is performed subsequently, these resources can be directly calculated and processed, and the calculation results can be transmitted to initiating the edge computing offloading task. target power terminal.

在现有方案中，主要是将一些固有的新技术引入边缘计算任务卸载分配中，但是没有考虑在电网中设备计算与存储资源，以及服务请求动态变化的环境，大部分的决策过程都是以整个电网的最低能耗作为目标，导致实际的边缘计算任务卸载决策中，出现计算卸载分配不均匀，可能会出现部分节点分配较多的计算卸载任务，而有些节点分配到较少的计算卸载任务。因此，现有方案在进行边缘计算任务卸载决策时，算法的收敛速度和性能还有待得到进一步提高。In the existing scheme, some inherent new technologies are mainly introduced into the offloading assignment of edge computing tasks, but the computing and storage resources of equipment in the power grid and the dynamically changing environment of service requests are not considered. Most of the decision-making processes are based on The minimum energy consumption of the entire power grid is the target, which leads to uneven distribution of computing offloading in the actual edge computing task offloading decision. Some nodes may be allocated more computing offloading tasks, while some nodes are allocated less computing offloading tasks. . Therefore, the convergence speed and performance of the algorithm need to be further improved when the existing solutions make edge computing task offloading decisions.

针对现有技术中存在的问题，本发明提供了一种智能电网中多智能体终端与MEC协作的计算缓存联合优化机制。在本发明中，依据电力终端用户请求和网络状态信息，对于任意一个电力终端设备中待处理的计算任务内容，可预先在MEC服务器、临近用户终端（即临近电力终端，与待进行计算任务的电力终端之间网络连接）和本地节点（即待进行计算任务的电力终端）等三个位置进行缓存分割、计算内容传输（包括待进行处理的计算任务以及已完成计算的计算任务结果）以及计算处理，其缓存计算策略共分为：本地节点（缓存）和本地节点（计算），MEC服务器（缓存）和MEC服务器/本地节点（计算），临近终端（缓存）和临近终端/本地节点（计算）。本发明采用多智能体强化学习，求解智能电网中任意一个电力终端的计算任务对应的最优缓存卸载策略，该问题可以归结为资源分配和任务卸载的联合优化，是一个合作与竞争的混合问题。此外，本发明为了使多智能体框架能够考虑全面以及极端情况，高效应对不同环境状态下的策略制定，提出了一种基于生成对抗网络（GenerativeAdversarial Network，简称GAN）的有经验智能体训练方法，从而进一步提高算法的收敛速度和性能。Aiming at the problems existing in the prior art, the present invention provides a computing cache joint optimization mechanism in which a multi-agent terminal and an MEC cooperate in a smart grid. In the present invention, according to the user request of the power terminal and the network status information, for the content of the computing task to be processed in any power terminal device, the content of the calculation task to be processed in the MEC server, the adjacent user terminal (that is, the adjacent power terminal, and the calculation task to be performed can be preliminarily stored in the MEC server. The network connection between the power terminals) and the local node (that is, the power terminal to be performed the computing task) and other three locations for cache segmentation, calculation content transmission (including the calculation tasks to be processed and the calculation task results that have been completed), and calculation processing, its cache computing strategy is divided into: local node (cache) and local node (computing), MEC server (cache) and MEC server/local node (computing), adjacent terminal (cache) and adjacent terminal/local node (computing) ). The invention adopts multi-agent reinforcement learning to solve the optimal cache unloading strategy corresponding to the computing task of any power terminal in the smart grid. This problem can be attributed to the joint optimization of resource allocation and task unloading, which is a mixed problem of cooperation and competition. . In addition, the present invention proposes an experienced agent training method based on Generative Adversarial Network (GAN) in order to enable the multi-agent framework to consider comprehensive and extreme situations and efficiently deal with policy formulation in different environmental states. Thereby, the convergence speed and performance of the algorithm are further improved.

图1为本发明提供的电网边缘计算卸载分配方法的流程示意图，如图1所示，本发明提供了一种电网边缘计算卸载分配方法，包括：1 is a schematic flowchart of a method for offloading and distributing power grid edge computing provided by the present invention. As shown in FIG. 1 , the present invention provides a method for offloading and allocating power grid edge computing, including:

步骤101，获取智能电网中每个电力终端在当前时刻的网络状态信息。Step 101: Obtain network status information of each power terminal in the smart grid at the current moment.

在本发明中，实时获取电力终端的网络状态信息，其中，网络状态信息包括电力终端之间的网络连接状态，电力终端的CPU频率（即该电力终端的计算能力），电力终端的缓存容量，电力终端的待缓存卸载任务计算量（即电力终端中待处理的计算任务需要被缓存的计算内容大小），电力终端的卸载后任务传输量（即电力终端i缓存到其他节点的卸载任务，在其他节点完成计算后传输到电力终端i的计算任务结果总量）。需要说明的是，本发明除了获取电力终端之间的网络状态信息时，还可以获取电力终端与MEC服务器之间的网络状态信息，即获取电力终端与MEC服务器之间的网络连接状态，以及MEC服务器的CPU频率等。In the present invention, the network status information of the power terminal is acquired in real time, wherein the network status information includes the network connection status between the power terminals, the CPU frequency of the power terminal (ie the computing capability of the power terminal), the cache capacity of the power terminal, The calculation amount of the unloaded tasks to be cached by the power terminal (that is, the size of the calculation content that needs to be cached for the computing tasks to be processed in the power terminal), and the amount of post-unloaded task transmission of the power terminal (that is, the unloaded tasks cached by the power terminal i to other nodes, in The total amount of computing task results transmitted to power terminal i after other nodes have completed their calculations). It should be noted that, in addition to obtaining the network status information between the power terminals, the present invention can also obtain the network status information between the power terminal and the MEC server, that is, the network connection status between the power terminal and the MEC server, and the MEC server. The server's CPU frequency, etc.

由于本发明提供的电网边缘计算卸载分配模型在进行边缘计算卸载分配时，同时也考虑了MEC服务器的网络状态。因此，该模型在训练过程时，结合了电力终端和MEC服务器的网络状态信息，使得模型根据电力终端在当前时刻的网络状态信息，在考虑是否将计算任务卸载到临近电力终端（也可能将计算任务在本地终端处理完成）的同时，还考虑是否需要将计算任务卸载到MEC服务器处理。Because the power grid edge computing offload distribution model provided by the present invention also takes the network state of the MEC server into consideration when performing the edge computing offload distribution. Therefore, during the training process of the model, the network status information of the power terminal and the MEC server is combined, so that the model is considering whether to offload the computing task to the adjacent power terminal (or may also calculate the calculation task according to the current network status information of the power terminal). When the task is processed at the local terminal), it is also considered whether the computing task needs to be offloaded to the MEC server for processing.

步骤102，将目标电力终端对应的网络状态信息，输入到电网边缘计算卸载分配模型，得到所述目标电力终端中待处理计算任务的边缘计算卸载分配策略。Step 102: Input the network status information corresponding to the target power terminal into the grid edge computing offloading distribution model to obtain the edge computing offloading allocation strategy of the computing tasks to be processed in the target power terminal.

在本发明中，将拥有计算任务的电力终端作为目标电力终端，通过上述实施例获取到目标电力终端对应的网络状态信息之后，将该网络状态信息输入到电网边缘计算卸载模型中，其中，在本发明中，目标电力终端对应的网络状态信息，除了目标电力终端自身的网络状态信息之外，还包括与目标电力终端临近的其他电力终端的网络状态信息（这些临近的电力终端的网络状态信息主要包括网络连接状态，电力终端的CPU频率等）。该电网边缘计算卸载模型用于实现电网中任务最优缓存与卸载策略，涉及到缓存以及计算等资源的合理分配。模型从体验质量（Quality of Experience，简称QoE）和设备功率有限的角度出发，选择延迟和能耗作为优化目标。因此，在实际应用中，每个电力终端在进行边缘计算卸载时，只关注自身的业务质量（即时延）和能源消耗。In the present invention, a power terminal with computing tasks is used as a target power terminal, and after obtaining the network status information corresponding to the target power terminal through the above embodiment, the network status information is input into the grid edge computing offloading model, wherein, in In the present invention, the network status information corresponding to the target power terminal, in addition to the network status information of the target power terminal itself, also includes network status information of other power terminals adjacent to the target power terminal (the network status information of these adjacent power terminals). It mainly includes the network connection status, the CPU frequency of the power terminal, etc.). The grid edge computing offloading model is used to realize the optimal cache and offload strategy of tasks in the grid, which involves the rational allocation of resources such as cache and computing. The model selects latency and energy consumption as optimization goals from the perspective of Quality of Experience (QoE) and limited device power. Therefore, in practical applications, each power terminal only pays attention to its own service quality (ie delay) and energy consumption when offloading edge computing.

进一步地，电网边缘计算卸载模型基于目标电力终端的网络状态信息，决策出最优的边缘计算卸载分配策略，该分配策略涉及到目标电力终端中待进行计算任务的任务分割比例，以及待缓存的节点位置（即分割后的任务需要缓存到哪些临近电力终端进行任务处理）。Further, the grid edge computing offloading model decides the optimal edge computing offloading allocation strategy based on the network state information of the target power terminal, and the allocation strategy involves the task division ratio of the computing tasks to be performed in the target power terminal, and the amount of the data to be cached. Node location (that is, to which adjacent power terminals the split task needs to be cached for task processing).

步骤103，根据所述边缘计算卸载分配策略，将所述待处理计算任务进行分割，并将分割后的待处理计算任务缓存到对应的电力终端和/或移动边缘计算服务器，以对所述待处理计算任务进行边缘计算卸载；Step 103: According to the edge computing offloading allocation strategy, the to-be-processed computing tasks are divided, and the divided to-be-processed computing tasks are cached in the corresponding power terminal and/or mobile edge computing server, so that the to-be-processed computing tasks are cached. Processing computing tasks for edge computing offloading;

在本发明中，根据目标电力终端与其他节点（包括MEC服务器以及至少1个临近电力终端）的网路连接状态、计算能力、缓存能力和任务特征等信息，采用多智能体强化学习（Multi-Agent Deep Deterministic Policy Gradient，简称MADDPG）框架生成的高效资源分配和任务卸载决策，实现每个电力终端在进行边缘计算卸载传输的能耗最小化目标，从而确定目标电力终端中的待处理计算任务，在对应临近电力终端和/或MEC服务器中的缓存比例，以进行任务的缓存与卸载计算。In the present invention, multi-agent reinforcement learning (Multi-agent reinforcement learning) is adopted according to the network connection status, computing capability, caching capability and task characteristics of the target power terminal and other nodes (including the MEC server and at least one adjacent power terminal). The efficient resource allocation and task offloading decisions generated by the Agent Deep Deterministic Policy Gradient (MADDPG) framework can achieve the goal of minimizing the energy consumption of each power terminal during edge computing offload transmission, thereby determining the pending computing tasks in the target power terminal. The cache ratio in the corresponding adjacent power terminal and/or MEC server to perform the task cache and offload calculation.

本发明提供的电网边缘计算卸载分配方法，通过构建移动边缘计算服务器与电力终端协作的混合式缓存与卸载框架，使用多智能体强化学习求解算法进行边缘计算卸载分配决策，充分利用电力终端设备的缓存和计算资源，得到更为准确且高效的边缘计算卸载分配方案，从而解决以往多任务请求时，单一依靠移动边缘计算服务器进行边缘计算，而面临的资源不足与网络拥塞等问题，并且终端间的近距离协作，可有效降低远距离移动边缘计算服务器的传输时延。The method for offloading and distributing power grid edge computing provided by the present invention constructs a hybrid caching and offloading framework in which the mobile edge computing server and the power terminal cooperate, uses a multi-agent reinforcement learning algorithm to make edge computing offloading and distribution decisions, and makes full use of the power terminal equipment. Cache and computing resources, get a more accurate and efficient edge computing offloading allocation scheme, so as to solve the problems such as insufficient resources and network congestion faced by relying solely on the mobile edge computing server for edge computing in the past multi-task requests, and between terminals. It can effectively reduce the transmission delay of long-distance mobile edge computing servers.

在上述实施例的基础上，所述电网边缘计算卸载分配模型通过以下步骤训练得到：On the basis of the above embodiment, the grid edge computing offloading distribution model is obtained by training the following steps:

基于每个电力终端的历史网络状态信息，构建各个电力终端对应智能体的样本网络状态信息，并根据所述样本网络状态信息，构建第一样本观测状态；所述样本网络状态信息包括网络连接状态、计算能力、缓存能力、待缓存卸载任务计算量和缓存卸载后任务传输量；Based on the historical network status information of each power terminal, construct sample network status information of the agent corresponding to each power terminal, and construct a first sample observation status according to the sample network status information; the sample network status information includes network connections Status, computing power, cache capacity, the amount of task calculation to be unloaded from the cache, and the amount of task transfer after the cache is unloaded;

通过所述训练样本集，对多智能体强化学习网络进行训练，得到电网边缘计算卸载分配模型。Through the training sample set, the multi-agent reinforcement learning network is trained to obtain a power grid edge computing offload distribution model.

在本发明中，MADDPG网络以集中学习和分散执行的方式工作，即每个智能体根据自身策略得到当前状态执行的动作，并与环境交互，从而得到经验存入自身的经验缓存池。待所有智能体与环境交互后，每个智能体从经验池（即训练样本集）中随机抽取经验，训练各自的神经网络。在本发明中，MADDPG网络中的多智能体、状态、动作以及奖励函数的设计如下：In the present invention, the MADDPG network works in the mode of centralized learning and decentralized execution, that is, each agent obtains the actions performed by the current state according to its own strategy, and interacts with the environment, so as to obtain experience and store it in its own experience buffer pool. After all agents interact with the environment, each agent randomly selects experience from the experience pool (ie, the training sample set) to train its own neural network. In the present invention, the design of the multi-agent, state, action and reward function in the MADDPG network is as follows:

智能体：所有用户终端，即电力终端。Agent: All user terminals, namely power terminals.

动作：根据电力终端任务的缓存比例

，即待处理计算任务经过分割之后分配到每个节点的缓存比例；以及卸载位置

，即任务是否在对应节点进行计算，构成动作

。因为MADDPG用于解决连续变量求解，但模型中卸载位置

为离散变量，所以，本发明将变量转化为：Action: According to the cache ratio of the power terminal task

, that is, the proportion of the cache allocated to each node after the to-be-processed computing task is divided; and the unloading location

, that is, whether the task is calculated at the corresponding node and constitutes an action

. Because MADDPG is used to solve continuous variables, but the unloaded position in the model

is a discrete variable, so the present invention converts the variable into:

；

;

因此，

。therefore,

.

状态：每个智能体的本地状态（即网络状态信息）包括终端间的网络连接状态

、计算能力

、缓存能力

缓存内容大小

（目标电力终端中待缓存到其他节点进行计算的卸载任务计算量）以及计算内容大小

（目标电力终端缓存到其他节点的卸载任务，在完成计算后通过返回到目标电力终端进行整合，得到的缓存卸载任务计算结果，即通过边缘计算卸载后传输的结果数据的大小），即：State: The local state of each agent (that is, network state information) includes the state of network connections between terminals

, Calculate ability

, cache capability

cache content size

(The amount of offloading tasks in the target power terminal to be cached to other nodes for calculation) and the size of the calculation content

(The unloading task cached by the target power terminal to other nodes is integrated by returning to the target power terminal after the calculation is completed, and the calculation result of the cache unloading task is obtained, that is, the size of the result data transmitted after unloading through edge computing), namely:

。

.

所有智能体在时隙t，即第t个时刻的联合状态为：The joint state of all agents at time slot t, that is, the t-th time, is:

。

.

奖励：为实现能耗最小化，进行边端协作优化任务缓存与卸载的目标，将每个智能体的奖励r设为其对应终端用户的能耗相反数，即

。Reward: In order to minimize energy consumption and optimize task caching and offloading by side-end collaboration, the reward r of each agent is set as the opposite number of energy consumption of its corresponding end user, namely

.

进一步地，在训练过程中，为加速智能体的学习过程，在本发明中，Critic网络的输入主要包括其他智能体的观察状态和采取的动作，通过最小化损失以更新Critic 网络参数，进而通过梯度下降法计算更新动作网络的参数。Further, in the training process, in order to accelerate the learning process of the agent, in the present invention, the input of the Critic network mainly includes the observation states and actions taken by other agents, and the Critic network parameters are updated by minimizing the loss, and then through The gradient descent method computes the parameters of the updated action network.

具体地，在MADDPG算法中，智能体i的连续策略

通过关于

的目标函数梯度进行优化：Specifically, in the MADDPG algorithm, the continuous policy of agent i

by about

The objective function gradient is optimized:

；

;

其中，

是集中式的actor-value函数；

为动作，

为奖励；

为所有智能体的新状态，即下一轮训练过程中的智能体对应的观测状态；

表示经验存储，该元组被存储在经验回放池中，即构建用于训练的样本集；

表示

个智能体的策略集合；

表示

个智能体策略的参数；

表示所有智能体的观测状态。每个智能体可以根据本地观测状态，做出独立的决策，即

。in,

is a centralized actor-value function;

for action,

for reward;

is the new state of all agents, that is, the observed state corresponding to the agent in the next round of training;

Represents experience storage, the tuple is stored in the experience playback pool, that is, the sample set for training is constructed;

express

The set of policies of an agent;

express

parameters of an agent's strategy;

Represents the observed state of all agents. Each agent can make independent decisions based on the local observation state, namely

.

因此，每个Critic网络就可以获得所有智能体的状态和动作行为。然后，根据损失函数更新智能体i的集中动作值函数, 即 Critic网络的训练通过如下Loss函数：Therefore, each critical network can obtain the states and actions of all agents. Then, the centralized action value function of agent i is updated according to the loss function, that is, the training of the Critic network passes the following Loss function:

；

;

。

.

在上述实施例的基础上，所述基于每个电力终端在进行边缘计算卸载时的能耗和时延，以每个电力终端的能耗最小化为优化目标，构建智能体的奖励，包括：On the basis of the above embodiment, based on the energy consumption and delay of each power terminal when performing edge computing offloading, and with the optimization goal of minimizing the energy consumption of each power terminal, the reward for constructing the agent includes:

在本发明中，在对多智能体强化学习网络进行训练的场景中，该场景包含1个MEC服务器

，以及多个电力终端

。具体地，本发明设计了三种缓存卸载模式：模式1，本地缓存/卸载，即计算任务在本地终端进行处理；模式2，临近终端缓存/卸载，即将计算任务在相邻的1个或多个邻近终端进行处理；模式3，MEC缓存/卸载，将计算任务在终端所属的MEC服务器进行处理。在本发明中，为了实现资源的合理利用，减少任务处理时延，每个电力终端的计算任务将动态划分为不同比例进行缓存和卸载最优模式选择。In the present invention, in the scenario of training a multi-agent reinforcement learning network, the scenario includes one MEC server

, and multiple power terminals

. Specifically, the present invention designs three cache unloading modes: mode 1, local cache/unload, that is, computing tasks are processed at the local terminal; Each adjacent terminal performs processing; Mode 3, MEC caching/unloading, processes computing tasks on the MEC server to which the terminal belongs. In the present invention, in order to realize rational utilization of resources and reduce task processing delay, the computing tasks of each power terminal are dynamically divided into different proportions for optimal mode selection of buffering and unloading.

具体地，在电网边缘计算卸载分配模型的应用场景中，

，表示第i个电力终端（为了方便描述，第i个电力终端可作为目标电力终端）的任务在第j个节点（节点可以是本地电力终端或其他临近电力终端，也可以是MEC服务器）的缓存比例，因此，第i个电力终端计算任务在其他节点的缓存比例可表示为：Specifically, in the application scenario of the grid edge computing offload distribution model,

, indicating that the task of the i-th power terminal (for the convenience of description, the i-th power terminal can be used as the target power terminal) is at the j-th node (the node can be a local power terminal or other nearby power terminals, or it can be an MEC server). cache ratio, therefore, the cache ratio of the i-th power terminal computing task in other nodes can be expressed as:

进一步地，

表示第i个电力终端的计算任务在节点j计算动作：further,

Indicates that the computing task of the ith power terminal computes the action at node j:

其中，

表示第i个电力终端的计算任务在节点j计算；否则，

。in,

Indicates that the computing task of the ith power terminal is computed at node j; otherwise,

.

表示节点之间的网络连接状态，

表示第i个电力终端与节点j连接，可以进行任务的缓存与卸载计算；否则，

。

Represents the network connection status between nodes,

Indicates that the i-th power terminal is connected to node j, and can perform task caching and offloading calculations; otherwise,

.

进一步地，构建训练场景中的缓存模型：Further, build the cached model in the training scene:

假设第i个电力终端的任务需要被缓存的内容为

，缓存内容经计算后形成的输出内容为

，对于需要被缓存的内容

，具体约束条件为：Assume that the task of the i-th power terminal needs to be cached as

, the output content formed after the cache content is calculated is

, for content that needs to be cached

, the specific constraints are:

；

;

；

;

其中，

表示MEC服务器的最大缓存容量，

表示电力终端的最大缓存容量，以上公式表示目标电力终端缓存到其他节点的计算内容，不能超过MEC服务器和电力终端的最大缓存容量。in,

Indicates the maximum cache capacity of the MEC server,

Represents the maximum cache capacity of the power terminal. The above formula represents the calculation content cached by the target power terminal to other nodes, which cannot exceed the maximum cache capacity of the MEC server and the power terminal.

进一步地，

，该公式表示第i个电力终端的任务所对应的计算内容被系统（即用于计算卸载的节点，包括本地终端、临近终端和MEC服务器）完整缓存。further,

, this formula indicates that the computing content corresponding to the task of the ith power terminal is completely cached by the system (ie, the node used for computing offload, including the local terminal, adjacent terminals, and MEC server).

进一步地，构建训练场景中每个节点的计算模型：Further, build a computational model for each node in the training scene:

表示MEC服务器的CPU频率（单位：cycle/s）；

表示电力终端的CPU频率。节点i（即第i个电力终端）的任务将由MEC服务器以及临近终端协作完成，其计算能耗由对应的各部分计算能耗组成，表示如下：

Indicates the CPU frequency of the MEC server (unit: cycle/s);

Indicates the CPU frequency of the power terminal. The task of node i (i.e. the i-th power terminal) will be completed by the MEC server and the adjacent terminals.

；

;

其中，

表示每cycle消耗的能耗，k为与CPU相关常数，

；

为常数，表示计算每bit需要多少cycle。in,

Indicates the energy consumption per cycle, k is a constant related to the CPU,

;

is a constant, indicating how many cycles are required to calculate each bit.

对于节点i的任务，在进行边缘计算卸载时，计算时延的公式表示如下：For the task of node i, when edge computing is offloaded, the formula for computing delay is as follows:

；

;

进一步地，构建训练场景中节点的通信模型：Further, build the communication model of the nodes in the training scene:

第i个电力终端的任务缓存内容，或计算完成内容将由MEC服务器和临近终端进行内容传输。节点j与节点i之间的传输速率计算如下：The content of the task cache of the i-th power terminal, or the content after the calculation is completed, will be transmitted by the MEC server and the adjacent terminal. The transmission rate between node j and node i is calculated as follows:

；

;

；

;

其中，

表示节点j与节点i之间的传输带宽，

表示节点j与节点i之间的信干噪比，

表示节点j的发射功率，

表示节点j与节点i之间信道增益，

为白噪声。in,

represents the transmission bandwidth between node j and node i,

represents the signal-to-interference-noise ratio between node j and node i,

represents the transmit power of node j,

represents the channel gain between node j and node i,

is white noise.

如果

，

表示节点j无缓存无计算，所以不会产生内容传输，无传输能耗；如果

，

表示节点j有缓存无计算，所以缓存内容将传输给本地节点i计算，那么对于节点i卸载的任务，节点j的传输能耗计算公式如下：if

,

Indicates that node j has no cache and no calculation, so there will be no content transmission and no transmission energy consumption; if

,

Indicates that node j has cache and no calculation, so the cache content will be transmitted to local node i for calculation, then for the task unloaded by node i, the calculation formula of node j's transmission energy consumption is as follows:

；

;

如果

，无论

值为多少，节点j计算形成的内容都将传输给本地节点i进行整合，传输能耗如下所示：if

,regardless

What is the value, the content calculated by node j will be transmitted to the local node i for integration, and the transmission energy consumption is as follows:

；

;

综上所述，节点i的在进行边缘计算卸载时，传输能耗为：To sum up, when node i performs edge computing offloading, the transmission energy consumption is:

；

;

节点i的传输时延取决于并行传输过程中最长的时延，具体公式如下所示：The transmission delay of node i depends on the longest delay in the parallel transmission process, and the specific formula is as follows:

；

;

进一步地，节点i总能耗由计算能耗和传输能耗组成，公式如下：Further, the total energy consumption of node i is composed of computing energy consumption and transmission energy consumption, and the formula is as follows:

；

;

节点i总时延为所有处理其任务的节点j的传输时延与计算时延之和的最大值，公式为：The total delay of node i is the maximum value of the sum of the transmission delay and calculation delay of all nodes j that process its tasks. The formula is:

；

;

进一步地，为了实现计算任务的缓存与卸载优化，以及分布式网络中的终端节能。本发明以每个业务终端的能耗最小化为目标，在上述实施例的基础上，构建电力终端边缘计算卸载能耗优化模型，所述电力终端边缘计算卸载能耗优化模型的公式为：Further, in order to realize the optimization of caching and offloading of computing tasks, and the terminal energy saving in the distributed network. The present invention aims to minimize the energy consumption of each service terminal. On the basis of the above embodiment, an optimization model of energy consumption for power terminal edge computing offloading is constructed. The formula of the power terminal edge computing offloading energy consumption optimization model is:

其中，

表示第i个电力终端在t时刻进行边缘计算卸载时的能耗，

约束条件为：The constraints are:

；公式（1）

;Formula 1)

公式（2）

Formula (2)

；公式（3）

; Equation (3)

公式（4）

Formula (4)

公式（5）

Formula (5)

公式（6）

Formula (6)

，

公式（7）

,

Formula (7)

公式（8）

Formula (8)

公式（9）

Formula (9)

其中，

表示第i个电力终端与第j个电力终端之间的网络连接状态，

表示预设时延阈值，

表示第i个电力终端的待计算任务中需要被缓存的任务量，

表示移动边缘计算服务器的缓存总容量，

表示任意电力终端的缓存总容量。具体地，在上述约束条件中，公式（1）表示第i个电力终端的待计算任务被完整缓存，公式（2）表示待计算任务只能缓存卸载到与本地节点存在网络连接的节点，公式（3）表示任意一个电力终端i都有至少一个节点进行任务计算，公式（4）表示节点j无缓存内容时将不进行计算，公式（5）表示保证每个节点最多处理一个任务，公式（6）表示每个任务的传输时延与处理时延不能超过预设时延阈值，公式（7）表示约束变量取值范围，公式（8）和公式（9）表示所有任务的缓存不能超过总缓存容量。in,

represents the preset delay threshold,

represents the total cache capacity of the mobile edge computing server,

Indicates the total cache capacity of any power terminal. Specifically, in the above constraints, formula (1) indicates that the task to be calculated of the ith power terminal is completely cached, and formula (2) indicates that the task to be calculated can only be cached and offloaded to the node that has a network connection with the local node, the formula (3) means that any power terminal i has at least one node for task calculation, formula (4) means that node j will not perform calculation when there is no cache content, formula (5) means that each node is guaranteed to process at most one task, formula ( 6) Indicates that the transmission delay and processing delay of each task cannot exceed the preset delay threshold, formula (7) indicates the value range of the constraint variable, and formulas (8) and (9) indicate that the cache of all tasks cannot exceed the total. cache capacity.

最后，为实现每个电力终端能耗进行边端协作优化任务缓存与卸载的能耗最小化，将每个智能体的奖励设为其对应终端用户的能耗相反数，即

。Finally, in order to minimize the energy consumption of the side-end collaborative optimization task caching and unloading of the energy consumption of each power terminal, the reward of each agent is set as the opposite of the energy consumption of its corresponding end user, namely

.

在上述实施例的基础上，在所述通过所述训练样本集，对多智能体强化学习网络进行训练，得到电网边缘计算卸载分配模型之前，所述方法还包括：On the basis of the above embodiment, before the multi-agent reinforcement learning network is trained through the training sample set to obtain the grid edge computing offloading distribution model, the method further includes:

通过所述更新后的训练样本集，对多智能体强化学习网络进行训练，得到电网边缘计算卸载分配模型。Through the updated training sample set, the multi-agent reinforcement learning network is trained to obtain a power grid edge computing offload distribution model.

GAN的主要结构包括一个生成器G（Generator）和一个判别器D（Discriminator），其中，生成器G用于生成数据，其分布类似于真实数据分布Z；鉴别器D 用于尝试区分样本是来自生成器G生成的数据，还是真实数据分布Z。为减少实际应用中经验学习不均衡，使得MADDPG算法中每个智能体能够充分学习到全面的不同状态下的经验，即在不同网络连接状态以及计算能力等状态下，终端任务的缓存与卸载决策。因此，本发明提出了基于分布式GAN-MADDPG的框架，通过GAN网络使用MADDPG经验池中的观测状态（包含电力网络连接状态以及网络资源信息的真实数据集），生成包含极端状态的合成状态；然后，将合成状态对应的合成经验（即第二观测状态）与真实经验（即第一观测状态）共同输入MADDPG的智能体进行训练，通过利用GAN来学习极端事件和消除数据集偏差，对智能体观察状态进行增强，以训练更有经验的智能体，创建一个有全面经验的多智能体代理，从而高效应对不同环境状态下的策略制定，具有快速收敛速度和良好性能等优点。The main structure of GAN includes a generator G (Generator) and a discriminator D (Discriminator), where the generator G is used to generate data whose distribution is similar to the real data distribution Z; the discriminator D is used to try to distinguish samples from The data generated by the generator G is still the real data distribution Z. In order to reduce the imbalance of experience learning in practical applications, each agent in the MADDPG algorithm can fully learn a comprehensive experience in different states, that is, under different network connection states and computing power, the caching and unloading decisions of terminal tasks. . Therefore, the present invention proposes a framework based on distributed GAN-MADDPG, and uses the observed state in the MADDPG experience pool (the real data set containing the power network connection state and network resource information) through the GAN network to generate a synthetic state containing extreme states; Then, the synthetic experience corresponding to the synthetic state (ie the second observation state) and the real experience (ie the first observation state) are jointly input into the MADDPG agent for training. The body observation state is enhanced to train more experienced agents, creating a multi-agent agent with comprehensive experience, so as to efficiently deal with policy formulation in different environmental states, with the advantages of fast convergence speed and good performance.

具体地，每个智能体在MADDPG的Actor-Critic架构的基础上增加有一个GAN网络，用于对其观测状态

（即第二样本观测状态）进行生成，其生成的观测状态，与由MADDPG网络的Actor网络生成相应的动作

、奖励

及下一时隙观测状态

组成完整经验存入经验回放池，这使得经验池存储的经验更加全面，用于智能体的训练。因此，GAN的目标即优化发生器G和鉴别器D，用公式表示如下:Specifically, each agent adds a GAN network based on the Actor-Critic architecture of MADDPG to observe its state

(that is, the second sample observation state) is generated, and the generated observation state corresponds to the corresponding action generated by the Actor network of the MADDPG network

,award

and the next time slot observation status

The complete experience is stored in the experience playback pool, which makes the experience stored in the experience pool more comprehensive and used for the training of the agent. Therefore, the goal of GAN is to optimize the generator G and the discriminator D, which is formulated as follows:

；

;

其中，

表示真实样本与生成样本之间的差异程度；

表示固定生成器G，尽可能地让判别器能够最大化地判别出样本来自于真实数据还是生成的数据；令

，

表示在固定判别器D的条件下得到生成器G，这个G要求能够最小化真实样本与生成样本的差异。通过上述min max的博弈过程，使得收敛于生成分布，拟合于真实分布，从而在智能体训练过程中，使用GAN网络对电力网络环境状态进行模拟，通过不同状态下的经验增强智能体的经验，进而有效保证了电网中最优缓存与卸载策略，实现终端节能的优化目标。in,

Represents the degree of difference between the real sample and the generated sample;

Represents a fixed generator G, so that the discriminator can determine as much as possible whether the sample comes from real data or generated data; let

,

Indicates that the generator G is obtained under the condition of a fixed discriminator D, and this G is required to minimize the difference between the real sample and the generated sample. Through the above game process of min max, it converges to the generated distribution and fits to the real distribution, so that in the training process of the agent, the GAN network is used to simulate the environmental state of the power network, and the experience of the agent is enhanced through the experience in different states. , and then effectively ensure the optimal caching and unloading strategy in the power grid, and achieve the optimization goal of terminal energy saving.

本发明提出了一种基于GAN-MADDPG的有经验智能体训练机制，将每个电力终端作为智能体以实现自身能耗最小化为目标，采用MADDPG多智能体强化学习算法求解最优任务缓存与卸载决策；然后，使用GAN网络对智能体的观测状态进行生成，相应的合成经验有效弥补了真实经验分布不均的缺点，使得训练出更有经验的智能体，在面对从未遇到的网络与资源状态，能够高效准确的给出优化策略，具有更快收敛速度和更高的样本效率。The invention proposes an experienced agent training mechanism based on GAN-MADDPG, which takes each power terminal as an agent to minimize its own energy consumption as the goal, and adopts MADDPG multi-agent reinforcement learning algorithm to solve the optimal task cache and Unloading decision; then, the GAN network is used to generate the observed state of the agent, and the corresponding synthetic experience can effectively make up for the disadvantage of uneven distribution of real experience, so that more experienced agents can be trained, in the face of never encountered The network and resource status can give the optimization strategy efficiently and accurately, with faster convergence speed and higher sample efficiency.

下面对本发明提供的电网边缘计算卸载分配系统进行描述，下文描述的电网边缘计算卸载分配系统与上文描述的电网边缘计算卸载分配方法可相互对应参照。The grid edge computing offloading and distribution system provided by the present invention is described below. The grid edge computing offloading and distribution system described below and the grid edge computing offloading and distribution method described above can be referred to each other correspondingly.

图2为本发明提供的电网边缘计算卸载分配系统的结构示意图，如图2所示，本发明提供了一种电网边缘计算卸载分配系统，包括电力终端网络状态采集模块201、电网边缘计算卸载分配策略生成模块202和边缘计算卸载模块203，其中，电力终端网络状态采集模块201用于获取智能电网中每个电力终端在当前时刻的网络状态信息；电网边缘计算卸载分配策略生成模块202用于将目标电力终端对应的网络状态信息，输入到电网边缘计算卸载分配模型，得到所述目标电力终端中待处理计算任务的边缘计算卸载分配策略；边缘计算卸载模块203用于根据所述边缘计算卸载分配策略，将所述待处理计算任务进行分割，并将分割后的待处理计算任务缓存到对应的电力终端和/或移动边缘计算服务器，以对所述待处理计算任务进行边缘计算卸载；FIG. 2 is a schematic structural diagram of a grid edge computing offload distribution system provided by the present invention. As shown in FIG. 2 , the present invention provides a grid edge computing offload distribution system, including a power terminal networkstate acquisition module 201, a grid edge computing offload distribution system Thestrategy generation module 202 and the edgecomputing offload module 203, wherein, the power terminal networkstate acquisition module 201 is used to obtain the network status information of each power terminal in the smart grid at the current moment; the grid edge computing offload distributionstrategy generation module 202 is used to The network status information corresponding to the target power terminal is input into the grid edge computing offloading distribution model to obtain the edge computing offloading allocation strategy for the computing tasks to be processed in the target power terminal; the edgecomputing offloading module 203 is used for the edge computing offloading and allocation according to the edge computing. a strategy, dividing the to-be-processed computing task, and buffering the divided to-be-processed computing task to the corresponding power terminal and/or mobile edge computing server, so as to perform edge computing offloading of the to-be-processed computing task;

本发明提供的电网边缘计算卸载分配系统，通过构建移动边缘计算服务器与电力终端协作的混合式缓存与卸载框架，使用多智能体强化学习求解算法进行边缘计算卸载分配决策，充分利用电力终端设备的缓存和计算资源，得到更为准确且高效的边缘计算卸载分配方案，从而解决以往多任务请求时，单一依靠移动边缘计算服务器进行边缘计算，而面临的资源不足与网络拥塞等问题，并且终端间的近距离协作，可有效降低远距离移动边缘计算服务器的传输时延。The grid edge computing offloading and distribution system provided by the present invention, by constructing a hybrid caching and offloading framework in which the mobile edge computing server and the power terminal cooperate, uses the multi-agent reinforcement learning algorithm to make edge computing offloading and distribution decisions, and makes full use of the power terminal equipment. Cache and computing resources, get a more accurate and efficient edge computing offloading allocation scheme, so as to solve the problems such as insufficient resources and network congestion faced by relying solely on the mobile edge computing server for edge computing in the past multi-task requests, and between terminals. It can effectively reduce the transmission delay of long-distance mobile edge computing servers.

在上述实施例的基础上，所述系统还包括样本构建模块、动作标签标记模块、智能体奖励构建模块、训练集生成模块和训练模块，其中，样本构建模块用于基于每个电力终端的历史网络状态信息，构建各个电力终端对应智能体的样本网络状态信息，并根据所述样本网络状态信息，构建第一样本观测状态；动作标签标记模块用于获取所述样本网络状态信息对应的任务缓存比例和任务卸载位置，并根据所述任务缓存比例和所述任务卸载位置，构建每个智能体的动作；智能体奖励构建模块用于基于每个电力终端在进行边缘计算卸载时的能耗和时延，以每个电力终端的能耗最小化为优化目标，构建智能体的奖励；训练集生成模块用于根据所述第一样本观测状态、所述动作和所述奖励，构建训练样本集；训练模块用于通过所述训练样本集，对多智能体强化学习网络进行训练，得到电网边缘计算卸载分配模型。On the basis of the above-mentioned embodiment, the system further includes a sample building module, an action label marking module, an agent reward building module, a training set generation module and a training module, wherein the sample building module is used based on the history of each power terminal Network status information, construct the sample network status information of the agent corresponding to each power terminal, and construct the first sample observation status according to the sample network status information; the action tag marking module is used to obtain the task corresponding to the sample network status information Cache ratio and task unloading position, and construct the action of each agent according to the task cache ratio and the task unloading position; the agent reward building module is used for the energy consumption of each power terminal when performing edge computing unloading and delay, and the optimization goal is to minimize the energy consumption of each power terminal to construct the reward of the agent; the training set generation module is used to construct a training set based on the observed state, the action and the reward of the first sample. A sample set; the training module is used to train a multi-agent reinforcement learning network through the training sample set to obtain a power grid edge computing offloading distribution model.

本发明提供的系统是用于执行上述各方法实施例的，具体流程和详细内容请参照上述实施例，此处不再赘述。The system provided by the present invention is used to execute the above-mentioned method embodiments. For specific procedures and details, please refer to the above-mentioned embodiments, which will not be repeated here.

图3为本发明提供的电子设备的结构示意图，如图3所示，该电子设备可以包括：处理器（Processor）301、通信接口（Communications Interface）302、存储器（Memory）303和通信总线304，其中，处理器301，通信接口302，存储器303通过通信总线304完成相互间的通信。处理器301可以调用存储器303中的逻辑指令，以执行电网边缘计算卸载分配方法，该方法包括：获取智能电网中每个电力终端在当前时刻的网络状态信息；将目标电力终端对应的网络状态信息，输入到电网边缘计算卸载分配模型，得到所述目标电力终端中待处理计算任务的边缘计算卸载分配策略；根据所述边缘计算卸载分配策略，将所述待处理计算任务进行分割，并将分割后的待处理计算任务缓存到对应的电力终端和/或移动边缘计算服务器，以对所述待处理计算任务进行边缘计算卸载；其中，所述电网边缘计算卸载分配模型是由样本网络状态信息和所述样本网络状态信息对应的任务缓存比例和任务卸载位置，对多智能体强化学习网络进行训练得到的。FIG. 3 is a schematic structural diagram of an electronic device provided by the present invention. As shown in FIG. 3 , the electronic device may include: a processor (Processor) 301, a communication interface (Communications Interface) 302, a memory (Memory) 303 and acommunication bus 304, Theprocessor 301 , thecommunication interface 302 , and thememory 303 communicate with each other through thecommunication bus 304 . Theprocessor 301 can call the logic instructions in thememory 303 to execute the grid edge computing offloading distribution method, the method includes: acquiring the network status information of each power terminal in the smart grid at the current moment; , input into the grid edge computing offloading distribution model to obtain the edge computing offloading allocation strategy of the computing task to be processed in the target power terminal; according to the edge computing offloading allocation strategy, the to-be-processed computing task is divided, and the segmentation The subsequent pending computing tasks are cached to the corresponding power terminal and/or mobile edge computing server, so as to perform edge computing offloading for the pending computing tasks; wherein, the grid edge computing offloading allocation model is determined by the sample network state information and The task cache ratio and task unloading position corresponding to the sample network state information are obtained by training a multi-agent reinforcement learning network.

此外，上述的存储器303中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备（可以是个人计算机，服务器，或者网络设备等）执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器（ROM，Read-Only Memory）、随机存取存储器（RAM，Random Access Memory）、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the above-mentioned logic instructions in thememory 303 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product. Based on this understanding, the technical solution of the present invention can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .

另一方面，本发明还提供一种计算机程序产品，所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序，所述计算机程序包括程序指令，当所述程序指令被计算机执行时，计算机能够执行上述各方法所提供的电网边缘计算卸载分配方法，该方法包括：获取智能电网中每个电力终端在当前时刻的网络状态信息；将目标电力终端对应的网络状态信息，输入到电网边缘计算卸载分配模型，得到所述目标电力终端中待处理计算任务的边缘计算卸载分配策略；根据所述边缘计算卸载分配策略，将所述待处理计算任务进行分割，并将分割后的待处理计算任务缓存到对应的电力终端和/或移动边缘计算服务器，以对所述待处理计算任务进行边缘计算卸载；其中，所述电网边缘计算卸载分配模型是由样本网络状态信息和所述样本网络状态信息对应的任务缓存比例和任务卸载位置，对多智能体强化学习网络进行训练得到的。In another aspect, the present invention also provides a computer program product, the computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, when the program instructions are executed by a computer When executing, the computer can execute the grid edge computing offloading and distribution method provided by the above methods, and the method includes: acquiring the network status information of each power terminal in the smart grid at the current moment; inputting the network status information corresponding to the target power terminal into Go to the grid edge computing offloading distribution model to obtain the edge computing offloading distribution strategy of the computing task to be processed in the target power terminal; according to the edge computing offloading allocation strategy, the to-be-processed computing task is divided, The to-be-processed computing tasks are cached in the corresponding power terminal and/or the mobile edge computing server, so as to perform edge computing offloading on the to-be-processed computing tasks; wherein, the grid edge computing offloading allocation model is determined by the sample network state information and the The task cache ratio and task unloading position corresponding to the sample network state information are obtained by training the multi-agent reinforcement learning network.

又一方面，本发明还提供一种非暂态计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现以执行上述各实施例提供的电网边缘计算卸载分配方法，该方法包括：获取智能电网中每个电力终端在当前时刻的网络状态信息；将目标电力终端对应的网络状态信息，输入到电网边缘计算卸载分配模型，得到所述目标电力终端中待处理计算任务的边缘计算卸载分配策略；根据所述边缘计算卸载分配策略，将所述待处理计算任务进行分割，并将分割后的待处理计算任务缓存到对应的电力终端和/或移动边缘计算服务器，以对所述待处理计算任务进行边缘计算卸载；其中，所述电网边缘计算卸载分配模型是由样本网络状态信息和所述样本网络状态信息对应的任务缓存比例和任务卸载位置，对多智能体强化学习网络进行训练得到的。In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, it is implemented to execute the grid edge computing offloading and distribution methods provided by the above embodiments, The method includes: acquiring the network status information of each power terminal in the smart grid at the current moment; inputting the network status information corresponding to the target power terminal into the grid edge computing offloading distribution model to obtain the to-be-processed computing task in the target power terminal according to the edge computing offloading allocation strategy; according to the edge computing offloading allocation strategy, the to-be-processed computing tasks are divided, and the divided to-be-processed computing tasks are cached in the corresponding power terminals and/or mobile edge computing servers to Perform edge computing offloading on the to-be-processed computing tasks; wherein, the power grid edge computing offloading allocation model is based on the sample network state information and the task cache ratio and task offloading position corresponding to the sample network state information to strengthen the multi-agent The learning network is trained.

以上所描述的装置实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下，即可以理解并实施。The device embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed over multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. Those of ordinary skill in the art can understand and implement it without creative effort.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件。基于这样的理解，上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品可以存储在计算机可读存储介质中，如ROM/RAM、磁碟、光盘等，包括若干指令用以使得一台计算机设备（可以是个人计算机，服务器，或者网络设备等）执行各个实施例或者实施例的某些部分所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on this understanding, the above-mentioned technical solutions can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products can be stored in computer-readable storage media, such as ROM/RAM, magnetic A disc, an optical disc, etc., includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments or some parts of the embodiments.

最后应说明的是：以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be The technical solutions described in the foregoing embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A grid edge computing offload distribution method is characterized by comprising the following steps:

acquiring network state information of each power terminal in the smart grid at the current moment;

inputting network state information corresponding to a target power terminal into a power grid edge calculation unloading distribution model to obtain an edge calculation unloading distribution strategy of a to-be-processed calculation task in the target power terminal;

according to the edge computing unloading distribution strategy, the computing tasks to be processed are segmented, and the segmented computing tasks to be processed are cached to the corresponding power terminal and/or the mobile edge computing server so as to carry out edge computing unloading on the computing tasks to be processed;

the power grid edge calculation unloading distribution model is obtained by training a multi-agent reinforcement learning network according to sample network state information and a task cache proportion and a task unloading position corresponding to the sample network state information;

the power grid edge calculation unloading distribution model is obtained by training through the following steps:

based on historical network state information of each power terminal, constructing sample network state information of an agent corresponding to each power terminal, and constructing a first sample observation state according to the sample network state information;

acquiring a task cache proportion and a task unloading position corresponding to the sample network state information, and constructing the action of each intelligent agent according to the task cache proportion and the task unloading position;

constructing rewards of intelligent agents by taking minimization of the energy consumption of each power terminal as an optimization target based on the energy consumption and time delay of each power terminal during edge calculation unloading;

constructing a training sample set according to the first sample observation state, the action and the reward;

training the multi-agent reinforcement learning network through the training sample set to obtain a power grid edge calculation unloading distribution model;

before the training the multi-agent reinforcement learning network through the training sample set to obtain a power grid edge computing unloading distribution model, the method further includes:

inputting the sample network state information into a generation countermeasure network, and outputting a second sample observation state;

updating the training sample set according to the observation state of the second sample to obtain an updated training sample set;

the training of the multi-agent reinforcement learning network through the training sample set to obtain the power grid edge calculation unloading distribution model comprises the following steps:

and training the multi-agent reinforcement learning network through the updated training sample set to obtain a power grid edge calculation unloading distribution model.

2. The grid edge computing offload distribution method according to claim 1, wherein constructing the reward of the agent with the energy consumption of each power terminal minimized as an optimization objective based on the energy consumption and time delay of each power terminal when performing the edge computing offload comprises:

acquiring the energy consumption of each power terminal when performing edge calculation unloading according to the calculation energy consumption and transmission energy consumption of each power terminal;

acquiring the time delay of each power terminal when performing edge calculation unloading according to the transmission time delay and the calculation time delay of each power terminal;

taking the time delay of each power terminal when performing edge calculation unloading as a constraint condition, and constructing an optimization model of the edge calculation unloading energy consumption of the power terminal by taking the energy consumption minimization of each power terminal as an optimization target;

and calculating an unloading energy consumption optimization model based on the edge of the power terminal, and taking the energy consumption opposite number of the power terminal in each round of training as the reward of the corresponding intelligent agent.

3. The grid edge computing offloading distribution method of claim 2, wherein a formula of the power terminal edge computing offloading energy consumption optimization model is:

；

wherein,

represents the energy consumption of the ith power terminal when the edge calculation is unloaded at the moment t,

the cache proportion of the task to be calculated of the ith electric power terminal at the jth electric power terminal is represented;

for task offload actions, tablesShowing the calculation action of the task to be calculated of the ith electric power terminal on the jth electric power terminal;

the constraint conditions are as follows:

；

；

；

；

；

；

；

；

；

wherein,

indicating a network connection state between the ith power terminal and the jth power terminal,

the time delay of the unloading and transmission of the edge calculation of the task to be calculated of the ith power terminal is shown,

which represents a pre-set time delay threshold value,

the task amount required to be cached in the tasks to be calculated of the ith power terminal is represented,

representing the total cache capacity of the mobile edge compute server,

representing the total cache capacity of any power terminal.

4. The power grid edge computing offload distribution method according to any of claims 1 to 3, wherein the sample network state information includes a network connection state, a computing capacity, a buffering capacity, a task computation amount to be buffered and offloaded, and a task transmission amount after buffering and offloading.

5. A grid edge computing offload distribution system, comprising:

the power terminal network state acquisition module is used for acquiring network state information of each power terminal in the intelligent power grid at the current moment;

the power grid edge calculation unloading distribution strategy generation module is used for inputting network state information corresponding to a target power terminal into a power grid edge calculation unloading distribution model to obtain an edge calculation unloading distribution strategy of a calculation task to be processed in the target power terminal;

the edge computing unloading module is used for segmenting the computing tasks to be processed according to the edge computing unloading distribution strategy and caching the segmented computing tasks to be processed to the corresponding power terminal and/or the mobile edge computing server so as to perform edge computing unloading on the computing tasks to be processed;

the system further comprises:

the sample construction module is used for constructing sample network state information of an intelligent agent corresponding to each power terminal based on historical network state information of each power terminal, and constructing a first sample observation state according to the sample network state information;

the action construction module is used for acquiring a task cache proportion and a task unloading position corresponding to the sample network state information and constructing the action of each intelligent agent according to the task cache proportion and the task unloading position;

the intelligent agent reward building module is used for building the reward of the intelligent agent based on the energy consumption and time delay of each power terminal during the edge calculation unloading and with the energy consumption minimization of each power terminal as an optimization target;

the training set generating module is used for constructing a training sample set according to the first sample observation state, the action and the reward;

the training module is used for training the multi-agent reinforcement learning network through the training sample set to obtain a power grid edge calculation unloading distribution model;

the system is further configured to:

the training module is further configured to:

6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program implements the steps of the grid edge computing offload distribution method according to any of claims 1 to 4.

7. A non-transitory computer readable storage medium, having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the steps of the grid edge computing offload distribution method according to any of claims 1 to 4.