CN119483722B

Movatterモバイル変換

Info

Publication number: CN119483722B
Application number: CN202510038082.7A
Authority: CN
Inventors: 宫永康; 于东晓; 邹逸飞; 成秀珍
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2025-01-10
Filing date: 2025-01-10
Publication date: 2025-03-28
Anticipated expiration: 2045-01-10
Also published as: CN119483722A

Abstract

The invention discloses a method, a device, a medium and equipment for unloading a satellite-ground fusion network multitask, and relates to the technical field of satellite-ground fusion networks. According to the method, firstly, the uplink transmission rate of each ground user when unloading to the low-orbit satellite is determined according to the path loss between each ground user and each low-orbit satellite and the transmitting power of each ground user, then the task unloading rate of each ground user is used as a variable, a communication model of the data transmission rate of the multi-mode task of each ground user is constructed, three multi-mode learning methods are further provided based on the communication model, wherein the computationally intensive task is processed by a centralized Actor-Critic algorithm to improve the data transmission rate, the time delay sensitive task is processed by a distributed multi-agent depth deterministic strategy gradient to reduce network time delay, the privacy protection task is processed by a federal learning algorithm based on quantification to protect privacy data, the requirements of different aspects of the multi-mode task are met, and better resource scheduling is realized.

Description

Star-ground fusion network multitasking unloading method, device, medium and equipment

Technical Field

The invention relates to the technical field of star-ground fusion networks, in particular to a method, a device, a medium and equipment for unloading a plurality of tasks of a star-ground fusion network.

Background

At present, the star-ground fusion network is a promising network architecture, which can help to reduce the load pressure of the ground network, provide huge access capability and have a dense task unloading function. However, due to the hierarchical, heterogeneous and dynamic three-dimensional features of the star-to-ground converged network, the conventional resource management method is difficult to directly apply. Also, a large amount of multi-modal and multi-tasking information including time-varying channels and low-orbit satellite positions prevents an improvement in the quality of service of the satellite network.

In the prior art, resource allocation of task offloading is usually assisted by random scheduling, greedy offloading or Deep Q learning (DQN), wherein random scheduling randomly selects a local execution proportion or a remote satellite offloading proportion for each ground user to optimize a data transmission rate for an original optimization problem, greedy offloading evenly allocates a CPU cycle frequency and a transmission power for each ground user for a plurality of task types, deep Q learning is adapted to a dynamic low-orbit satellite orbit and a time-varying channel gain, and has two structures of a current Q Network and a target Q Network, which can process the CPU cycle frequency, the transmission power and the offloading rate by a centralized method.

However, it is difficult to achieve good task unloading resource scheduling by randomly selecting an unloading proportion or greedy unloading average allocation of CPU cycle frequency and transmitting power, and deep Q learning is too long due to training of a neural network by adopting a centralized method, so that huge transmission delay is caused for delay-sensitive tasks.

Disclosure of Invention

Based on the foregoing, it is necessary to provide a method, a device, a medium and a device for unloading the satellite-ground fusion network multitasking.

The invention adopts the following technical scheme:

the invention provides a satellite-ground fusion network multitasking method, which is applied to a satellite-ground fusion network comprising a plurality of ground users, a plurality of low-orbit satellites and a satellite cloud platform, wherein each ground user at least partially unloads locally generated multi-mode tasks to the low-orbit satellites, the satellite cloud platform is used for assisting task unloading between the ground users and the low-orbit satellites, the multi-mode tasks comprise computation intensive tasks, time delay sensitive tasks and privacy protection tasks, and the method comprises the following steps:

According to the path loss between each ground user and each low-orbit satellite and the transmitting power of each ground user, determining the time-varying channel gain and the uplink transmission rate expression when each ground user is unloaded to the low-orbit satellite under the dynamic low-orbit satellite orbit;

Taking the task unloading rate of each ground user as a variable, and determining the data transmission rate expression of the multi-mode task of each ground user according to the CPU cycle frequency of the local processing task of each ground user and the uplink transmission rate expression when unloading to the low-orbit satellite;

Aiming at a computationally intensive task, a Markov decision process is adopted through a satellite cloud platform, and actions taken under the corresponding network state are determined in a centralized manner according to the network state of a satellite-ground fusion network, and the actions are optimized by taking the maximum data transmission rate corresponding to the actions as a target, wherein the actions comprise CPU cycle frequency, transmitting power and task unloading rate of a ground user;

aiming at time delay sensitive tasks, based on the intelligent agent deployed on each ground user, adopting a distributed multi-intelligent agent depth deterministic strategy gradient, taking the local network environment faced by each ground user as the network state, determining rewards according to the data transmission rate after taking actions, and optimizing the actions of each ground user in a distributed manner

And aiming at the privacy protection type task, carrying out quantitative model analysis on the local model of each ground user for processing the privacy protection type task, uploading the quantized local model parameters to a satellite cloud platform through each low-orbit satellite, and enabling the satellite cloud platform to aggregate the quantized local model parameters through federal learning and update the local model of each ground user.

The invention provides a satellite-ground fusion network multitask unloading device, which comprises:

The system comprises a satellite cloud platform, a satellite ground fusion network, a plurality of ground users, a plurality of low-orbit satellites and a plurality of privacy protection type tasks, wherein the satellite ground fusion network is formed by the ground users, the plurality of low-orbit satellites and the satellite cloud platform;

the unloading rate determining module is used for determining time-varying channel gain and uplink transmission rate expression when each ground user is unloaded to the low-orbit satellite under the dynamic low-orbit satellite orbit according to the path loss between each ground user and each low-orbit satellite and the transmitting power of each ground user;

The transmission rate determining module is used for determining the data transmission rate expression of the multi-mode tasks of each ground user by taking the task unloading rate of each ground user as a variable and according to the CPU cycle frequency of the local processing task of each ground user and the uplink transmission rate expression when the tasks are unloaded to the low-orbit satellite;

The system comprises a dense task processing module, a data transmission module and a task unloading module, wherein the dense task processing module is used for aiming at a computation dense task, adopting a Markov decision process through a satellite cloud platform, intensively determining actions taken under corresponding network states according to the network states of a satellite-ground fusion network, and optimizing the actions by taking the maximum data transmission rate corresponding to the actions as a target;

the time delay task processing module is used for determining rewards according to the data transmission rate after taking actions by taking the local network environment faced by each ground user as the network state of the local network environment based on the intelligent agent deployed on each ground user by adopting a distributed multi-intelligent-agent depth deterministic strategy gradient, and optimizing the actions of each ground user in a distributed manner;

The privacy task processing module is used for carrying out quantitative model analysis on the local model of each ground user for processing the privacy protection task aiming at the privacy protection task, uploading the quantized local model parameters to the satellite cloud platform through each low-orbit satellite, and enabling the satellite cloud platform to aggregate the quantized local model parameters through federal learning and update the local model of each ground user.

The invention provides a computer readable storage medium storing a computer program which when executed by a processor implements the above-described satellite-to-ground fusion network multitasking offload method.

The invention provides a computer device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the satellite-ground fusion network multitasking unloading method when executing the program.

The at least one technical scheme adopted by the invention can achieve the following beneficial effects:

According to the method, firstly, the time-varying channel gain and the uplink transmission rate when each ground user is unloaded to a low-orbit satellite under the dynamic low-orbit satellite orbit are determined according to the path loss between each ground user and each low-orbit satellite and the transmitting power of each ground user, then the task unloading rate of each ground user is used as a variable, and the data transmission rate expression of the multi-modal task of each ground user is constructed, so that the data transmission rate is optimized in different modes through three multi-modal learning methods based on the requirements of different aspects of the multi-modal task, wherein the computationally intensive task is processed through a centralized Actor-Critic algorithm to globally improve the data transmission rate, the delay sensitive task is processed through a distributed multi-agent depth deterministic strategy gradient, the network delay is reduced through a distributed local optimization mode, the privacy protection task is processed through a quantized federal learning algorithm to protect privacy data, the requirements of different aspects of the multi-modal task are met, and better resource scheduling is realized.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:

FIG. 1 is a schematic flow diagram of a method for unloading a satellite-ground fusion network;

FIG. 2 is a schematic diagram of a multi-modal-multi-task scenario of a star-ground fusion network provided by the invention;

FIG. 3 is a schematic diagram of a centralized Actor-Critic network framework according to the present invention;

FIG. 4 is a schematic diagram of a distributed multi-agent depth deterministic strategy gradient framework provided by the present invention;

Fig. 5 is a schematic diagram of a quantization-federal learning framework provided by the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to specific embodiments of the present invention and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Currently, a star-to-ground converged network is a promising network architecture that may help reduce ground network load pressure, provide giant access capabilities, and dense task offloading functions. However, due to the hierarchical, heterogeneous and dynamic three-dimensional features of the star-to-ground converged network, the conventional resource management method is difficult to directly apply. And, a large amount of multi-modal and multi-tasking information hampers the quality of service of the star network.

Therefore, the invention designs a multi-task fused calculation unloading model to process multi-mode network information, wherein the information comprises time-varying channel gain and dynamic low-orbit satellite orbit, which can effectively improve the data transmission rate and the privacy protection level. Further, the present invention proposes three multi-modal learning methods, a centralized Actor-Critic algorithm, a distributed multi-agent depth deterministic strategy gradient, and a quantization-based learning algorithm to handle computationally intensive, delay sensitive, and privacy-preserving tasks, which can further optimize local execution or LEO offload rates, CPU cycle frequency, and transmit power.

The following describes in detail the technical solutions provided by the embodiments of the present invention with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of a method for unloading a satellite-ground fusion network in the present invention, which specifically includes the following steps:

S101, according to the path loss between each ground user and each low-orbit satellite and the transmitting power of each ground user, determining the time-varying channel gain and the uplink transmission rate expression when each ground user is unloaded to the low-orbit satellite under the dynamic low-orbit satellite orbit.

S102, determining the data transmission rate expression of the multi-mode tasks of each ground user by taking the task unloading rate of each ground user as a variable and according to the CPU cycle frequency of the local processing tasks of each ground user and the uplink transmission rate expression when the tasks are unloaded to the low-orbit satellite.

S103, aiming at a computationally intensive task, a Markov decision process is adopted through a satellite cloud platform, actions taken in the corresponding network state are determined in a centralized mode according to the network state of the satellite-ground fusion network, and the actions are optimized by taking the maximum data transmission rate corresponding to the actions as a target, wherein the actions comprise CPU cycle frequency, transmitting power and task unloading rate of ground users.

And S104, aiming at the time delay sensitive task, adopting a distributed multi-agent depth deterministic strategy gradient based on agents deployed on each ground user, determining rewards by taking the local network environment faced by each ground user as the network state, and optimizing the actions of each ground user in a distributed manner.

S105, aiming at the privacy protection type task, carrying out quantitative model analysis on the local model of each ground user for processing the privacy protection type task, uploading the quantized local model parameters to the satellite cloud platform through each low-orbit satellite, and enabling the satellite cloud platform to aggregate the quantized local model parameters through federal learning and update the local model of each ground user.

Fig. 2 is a schematic diagram of a satellite-ground fusion network multi-mode-multi-task scene in the present invention, and as can be seen from fig. 2, a multi-task processing system is formed by a plurality of low-orbit satellites, satellite cloud platforms and ground users, which can achieve effective task arrival, unloading and processing. Wherein each ground user may offload locally generated multimodal tasks, which may include computationally intensive tasks, time-delay sensitive tasks, and privacy-preserving tasks, at least in part to low-orbit satellites, the satellite cloud platform being used to assist in task offloading between the ground user and the low-orbit satellites.

In particular, a low-orbit satellite may provide global coverage for ground users whose tasks are scaled down to multiple low-orbit satellites to achieve certain performance metrics. The set of low-orbit satellites is defined asFurther, the corresponding satellite cloud platform will help handle the ground user demand of the low orbit satellite platform, and then train the neural network parameters to accelerate the model convergence rate. Assuming that the total duration isA time slot, which is defined as. Meanwhile, the multi-modal network information may be represented as channel gains between satellites and terrestrial users, dynamic low-orbit satellite orbital positions, and complex mission types. Based on the corresponding task requirements, the task types are classified into a computationally intensive type, a delay sensitive type and a privacy protection type, and the ground user set is expressed as。

In a star-to-ground fusion network, assume that whenEach ground userGenerating computationally intensive tasks whenEach ground userGenerating time-delay sensitive tasks whenEach ground userA privacy-preserving task is generated. Generating information based on the task, and generating information on low orbit satelliteAnd ground usersThe path loss between them can be defined as:。

Wherein,For the path loss between the ground user i and the low-orbit satellite j,Is the carrier frequency of the wave,Is the speed of light, which is the speed of light,For the horizontal distance between the ground user i and the low-orbit satellite j,For the vertical distance between the ground user i and the low-orbit satellite j,For the additive path loss of the line-of-sight link between terrestrial user i and low-earth satellite j,An additive path loss for a non-line-of-sight link between a terrestrial user i and a low-earth satellite j, which, in turn,In relation to the line of sight link, it can be expressed as:。

Wherein,AndIs a constant parameter determined by a dynamic low-orbit satellite. Thus, on-ground usersAnd low orbit satellitesThe uplink transmission rate between them is expressed as:。

Wherein,For the transmit power of the terrestrial user i to the low-orbit satellite j,Is the transmission bandwidth of the packet,Is the channel noise power.

The number of bits handled locally by the surface user is expressed as:。

Wherein,The CPU cycle frequency of the processing tasks locally for each ground user,Is the interval between adjacent time slots,Is the number of CPU cycles required for each ground user to handle a 1-bit task.

For users on the groundThe invention aims to realize optimal task scheduling of a star-ground fusion network, which can maximize the number of bits processed. Furthermore, the present invention definesFor the task proportion of unloading, whereAndRepresenting the corresponding local execution and remote low orbit satellite off-loading ratios. For computationally intensive tasks, this requires high speed data transmission due to the larger task capacity. For latency sensitive tasks, this requires low orbit satellite cooperation to offload tasks and reduce network latency, as it is highly sensitive to network latency itself. For latency sensitive tasks, this requires design of an offload method based on federal learning to protect private data, since it is not willing to share private data itself. Thus, the data transfer rate for the above task can be expressed as:。

the joint optimization problem for the three tasks described above is expressed as:。

wherein the ratio of local execution to remote low orbit satellite execution offloading should be less than 1, and the local CPU cycle frequencyShould be smaller thanTransmit powerShould be smaller thanI is the total number of ground users.

FIG. 3 is a schematic diagram of a centralized Actor-Critic network architecture according to the present invention, as shown in FIG. 3, because the computationally intensive tasks are insensitive to network latency, the present invention proposes a centralized closed-loop network architecture to accommodate dynamic network conditions such as channel gain, dynamic low-rail track location, and complex task types. Since the decision set includes discrete and continuous variables, the proposed learning framework needs to handle multiple task types. However, the traditional DQN method can only handle discrete decisions, which makes it difficult to optimize continuous task strategies. Therefore, the present invention requires the introduction of policy-based offloading decisions and resource scheduling methods. In particular, the present invention contemplates a closed loop network architecture as a Markov decision process, wherein the network states include channel gainsDynamic low orbit satellite positionAnd complex task sets. And, the global state set is expressed as:, variables are generated for the tasks of the ith surface user.

Next, action setInvolving a plurality of execution actions, e.g. CPU cycle frequencyTransmit powerAnd a ratio of local execution to low orbit satellite execution、This is expressed as:。

Wherein,Can be divided into unloading ratiosAnd a resource scheduling unit. This can generate corresponding rewards when the satellite cloud platform interacts with the network environmentIs expressed as。

Based on the predefined status, actions and rewards functions described above, a set of status-actionsIs input to critic neural network and then generates a corresponding Q function. Thus, timing differential errorRepresented as。

Wherein,Is a discount factor. Next, the present invention updates critic the network parameters with the mean square loss function, which is expressed as:。

Wherein,Is the learning rate of critic networks. And, the present invention updates actor the neural network parameters, which are expressed as:。

Wherein,Is actor the learning rate of the neural network, m is the number of training samples selected.

Fig. 4 is a schematic diagram of a distributed multi-agent depth deterministic strategy gradient framework of the present invention, which includes, for delay-sensitive tasks, a current actor network, a target actor network, a current critic network, and a target critic network, as shown in fig. 4. First, each agentInteracting with dynamic network environments and collecting private-stateThen inputTo the current actor network. Assume that the current Actor network policy isThis can generate a corresponding action. The framework may then interact with the dynamic network environment and obtain rewardsAnd transition to the next state. Meanwhile, based on the obtained storage queueThe invention collects global storage queuesWhereinAnd,For the network state of the I-th agent,Actions taken for the I-th agent. Since multiple agents have a bonus function to exchange information, this allows. Thus, the present invention describes a time-sequential differential update of the current critic network, which is expressed as:。

Wherein,AndModel parameters corresponding to the target critic network and the target actor network, andAndIs the corresponding current critic network and current actor network model parameters. Further, the method comprises the steps of,Is a discount factor and the policy gradient of the current critic network is expressed as:。

Wherein,Is the learning rate of the current critic network gradient update. Since gradient updates are the deterministic strategy for the current actor network, the loss gradient is expressed as:。

To simplify the parameter update process, the present invention considers a larger Q value meaning less loss function. Therefore, the invention only needs to add a negative number to the Q value, which is expressed as:。

Finally, based on the current actor network policy described aboveAnd critic network parametersIts target critic network and actor network parameters are defined as:

,

。

Wherein,AndIs the corresponding parameter update rate sumIs the target actor network policy.

FIG. 5 is a diagram of a quantization-federal learning framework in the present invention, which requires a new learning framework designed to protect user privacy for privacy-preserving tasks because multiple ground users are reluctant to share private data. As shown in fig. 4, the present invention proposes a quantization-federal learning framework to transmit model parameters and protect data privacy. When each ground user receives a corresponding task, the set of tasks may be defined asWhereinIs each ground userTotal number of task blocks. Thus, the global task set for all ground users is defined as. Based on the task set, model loss for this federal learning:。

Wherein,Is a model parameter sum consisting of d dimensionsIs the model calculation loss. Further, the present invention divides the learning process into T-rounds and each ground user can receive global weight parameters from a remote satellite cloud platform. Thus, for any iteration roundThe weight parameters of the transmission can be expressed as:。

Wherein,Is an initial satellite cloud platform remote server parameter,Is in experience ofThe model parameters after the step of updating are updated,Is the learning rate of the gradient update. Thus, each surface user transmits updated weight parameters to the satellite cloud platform, which is expressed as:。

Wherein,Is the corresponding task weight ratio sumIs at the firstWeight parameters at round time.

This may take up a lot of storage space and energy consumption when transmitting a lot of model parameters to the satellite cloud platform. Floating point operations result in long computational delays compared to integer operations. Therefore, the present invention introduces quantization strategies to accelerate the model convergence rate. Specifically, the weight parameters transferred by each ground user are:。

Wherein the invention can design quantization functionTo a remote server. Firstly, the invention orders the weight parameters and designs the minimum valueAnd. Assume that the number of quantization bits isThe quantization interval is as follows:。

Wherein the parameter sequenceCan be defined as. Based on the above parameter sequences, each sequence index is expressed as:。

Wherein,. When a certain weight parameter falls into the parameter sequenceThe quantization function is expressed as:,。

where sgn is a sign function, w.p. denotes according to probability. The quantization model parameters are thus expressed as. Since each weight parameter needs to be passed throughQuantizing the number of bits, the total quantized bit number beingWhen transmitting model parameters to a remote server, the conversion method is calculated as:。

Wherein,,。

The invention is based on the satellite-ground fusion network multitask unloading method shown in figure 1, which comprises the steps of firstly determining the uplink transmission rate when each ground user is unloaded to a low-orbit satellite according to the path loss between each ground user and each low-orbit satellite and the transmitting power of each ground user, then constructing a communication model of the data transmission rate of the multi-modal task of each ground user by taking the task unloading rate of each ground user as a variable, and further providing three multi-modal learning methods based on the communication model, wherein the three multi-modal learning methods are used for processing computationally intensive tasks to improve the data transmission rate through a centralized Actor-Critic algorithm, processing delay sensitive tasks to reduce network delay through a distributed multi-agent depth deterministic strategy gradient, processing privacy protection tasks to protect privacy data through a quantized federal learning algorithm, meeting the requirements of different aspects of the multi-modal task, and realizing better resource scheduling.

The invention designs a multi-task fused calculation unloading model to process multi-mode network information, wherein the information comprises time-varying channel gain and dynamic low-orbit satellite orbit, which can effectively improve the data transmission rate and the privacy protection level. Three multi-modal learning methods are presented, a centralized Actor-Critic algorithm, a distributed multi-agent depth deterministic strategy gradient, and a quantization-based learning algorithm to handle computationally intensive, delay sensitive, and privacy-preserving tasks, which can further optimize local execution or LEO offload ratio, CPU cycle frequency, and transmit power.

When the satellite-ground fusion network multitasking method provided by the invention is applied, the method can be executed without the sequence of the steps shown in fig. 1, and the specific execution sequence of the steps can be determined according to the needs, so that the invention is not limited to the steps.

The above method for unloading the satellite-ground fusion network multitasking provided by one or more embodiments of the present invention is based on the same thought, and the present invention further provides a corresponding device for unloading the satellite-ground fusion network multitasking, including:

For specific limitation of the star-to-ground fusion network multiplexing offloading device, reference may be made to the limitation of the star-to-ground fusion network multiplexing offloading method hereinabove, and the description thereof will not be repeated here. The above-mentioned all modules in the star-ground fusion network multitasking offload device can be implemented in whole or in part by software, hardware and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

The present invention also provides a computer readable storage medium storing a computer program operable to perform the above-described method for offloading satellite-to-ground fusion network multitasking provided in fig. 1.

The invention also provides a computer device, which comprises a processor, an internal bus, a network interface, a memory and a nonvolatile memory, and can also comprise hardware needed by other services. The processor reads the corresponding computer program from the nonvolatile memory to the memory and then runs the computer program to realize the satellite-ground fusion network multitasking unloading method provided by the figure 1.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the present invention.

Claims

Translated fromChinese

1.一种星地融合网络多任务卸载方法，其特征在于，所述方法应用于包括多个地面用户、多个低轨卫星和卫星云平台的星地融合网络；各地面用户将本地生成的多模态任务至少部分卸载至低轨卫星，卫星云平台用于协助地面用户和低轨卫星之间的任务卸载；所述多模态任务包括计算密集型任务、时延敏感型任务和隐私保护型任务；所述方法包括：1. A method for offloading multiple tasks in a satellite-ground fusion network, characterized in that the method is applied to a satellite-ground fusion network including multiple ground users, multiple low-orbit satellites and a satellite cloud platform; each ground user offloads at least part of a locally generated multimodal task to a low-orbit satellite, and the satellite cloud platform is used to assist in offloading tasks between ground users and low-orbit satellites; the multimodal tasks include computationally intensive tasks, delay-sensitive tasks and privacy-preserving tasks; the method comprises:

根据各地面用户和各低轨卫星之间的路径损耗与各地面用户的发射功率，确定时变信道增益和动态低轨卫星轨道下各地面用户卸载至低轨卫星时的上行传输速率表达；According to the path loss between each ground user and each low-orbit satellite and the transmission power of each ground user, the expression of the uplink transmission rate when each ground user is unloaded to the low-orbit satellite under the dynamic low-orbit satellite orbit is determined;

以各地面用户的任务卸载比率为变量，并根据各地面用户本地处理任务的CPU周期频率和卸载至低轨卫星时的上行传输速率表达，确定各地面用户的多模态任务的数据传输速率表达；Taking the task offloading ratio of each ground user as a variable, and based on the CPU cycle frequency of each ground user's local processing task and the uplink transmission rate expression when offloading to the low-orbit satellite, the data transmission rate expression of the multimodal task of each ground user is determined;

针对计算密集型任务，通过卫星云平台采用马尔科夫决策过程，根据星地融合网络的网络状态，集中式地确定在对应网络状态下采取的动作，并以动作对应的数据传输速率最大为目标对动作进行优化；所述动作包括地面用户的CPU周期频率、发射功率和任务卸载比率；For computationally intensive tasks, the satellite cloud platform uses a Markov decision process to centrally determine the actions to be taken under the corresponding network status according to the network status of the satellite-ground fusion network, and optimize the actions with the goal of maximizing the data transmission rate corresponding to the actions; the actions include the CPU cycle frequency, transmission power and task offloading ratio of the ground user;

针对时延敏感型任务，基于部署于各地面用户的智能体采用分布式多智能体深度确定性策略梯度，以各地面用户面临的局部网络环境为其网络状态，以采取动作后的数据传输速率确定奖励，分布式地对各地面用户的动作进行优化；For delay-sensitive tasks, a distributed multi-agent deep deterministic policy gradient is used based on the agents deployed on various ground users. The local network environment faced by each ground user is used as the network state, and the reward is determined by the data transmission rate after taking the action, so as to optimize the actions of each ground user in a distributed manner.

针对隐私保护型任务，对各地面用户的本地模型进行量化模型分析，将量化后的本地模型参数通过各低轨卫星上传至卫星云平台，使卫星云平台通过联邦学习聚合量化后的本地模型参数并对各地面用户的本地模型进行更新；所述本地模型用于优化执行隐私保护型任务时地面用户的CPU周期频率、发射功率和任务卸载比率；For privacy-preserving tasks, a quantized model analysis is performed on the local models of each ground user, and the quantized local model parameters are uploaded to the satellite cloud platform through each low-orbit satellite, so that the satellite cloud platform aggregates the quantized local model parameters through federated learning and updates the local models of each ground user; the local model is used to optimize the CPU cycle frequency, transmission power and task offloading ratio of the ground user when performing privacy-preserving tasks;

所述根据各地面用户和各低轨卫星之间的路径损耗与各地面用户的发射功率，确定时变信道增益和动态低轨卫星轨道下各地面用户卸载至低轨卫星时的上行传输速率表达，具体包括：The determining of the time-varying channel gain and the expression of the uplink transmission rate when each ground user is unloaded to the low-orbit satellite under the dynamic low-orbit satellite orbit according to the path loss between each ground user and each low-orbit satellite and the transmission power of each ground user specifically includes:

通过下式确定各地面用户和各低轨卫星之间的路径损耗：The path loss between each ground user and each low-orbit satellite is determined by the following formula:

通过下式根据各地面用户和各低轨卫星之间的路径损耗与各地面用户的发射功率，确定各地面用户卸载至低轨卫星时的上行传输速率：The uplink transmission rate of each ground user when unloading to the low-orbit satellite is determined by the following formula based on the path loss between each ground user and each low-orbit satellite and the transmission power of each ground user:

其中，为地面用户i和低轨卫星j之间的路径损耗，f_c为载波频率，c为光速，m_i,j为地面用户i和低轨卫星j之间的水平距离，n_i,j为地面用户i和低轨卫星j之间的垂直距离，为地面用户i和低轨卫星j之间的视距链路的加性路径损耗，为地面用户i和低轨卫星j之间的非视距链路的加性路径损耗，x₁和x₂是由低轨卫星所确定的常量参数，R_i,j为地面用户i卸载至低轨卫星j时的上行传输速率，B_i,j为地面用户i和低轨卫星j之间的传输带宽，r_i,j为地面用户i到低轨卫星j的发射功率，σ²为信道噪声功率；in, is the path loss between ground user i and low-orbit satellite j, f_c is the carrier frequency, c is the speed of light, mi_,j is the horizontal distance between ground user i and low-orbit satellite j, ni_,j is the vertical distance between ground user i and low-orbit satellite j, is the additive path loss of the line-of-sight link between ground user i and LEO satellite j, is the additive path loss of the non-line-of-sight link between ground user i and low-orbit satellite j,_x1 and_x2 are constant parameters determined by the low-orbit satellite, R_i,j is the uplink transmission rate when ground user i is unloaded to low-orbit satellite j, B_i,j is the transmission bandwidth between ground user i and low-orbit satellite j, r_i,j is the transmission power from ground user i to low-orbit satellite j, and σ² is the channel noise power;

所述各地面用户的多模态任务的数据传输速率表达为：The data transmission rate of the multimodal tasks of each ground user is expressed as:

其中，S_i,j为地面用户i到低轨卫星j的数据传输速率，s为总体任务生成变量，α_i为地面用户i的本地执行比率，D_i为本地用户所处理的比特数目，τ为邻近时隙之间的间隔时间，为地面用户i卸载至低轨卫星j的卸载比率，f_i^t为各地面用户本地处理任务的CPU周期频率，为地面用户处理1比特任务所需要的CPU周期数目。Where S_i,j is the data transmission rate from ground user i to low-orbit satellite j, s is the overall task generation variable, α_i is the local execution ratio of ground user i, D_i is the number of bits processed by the local user, τ is the interval time between adjacent time slots, is the offloading ratio of ground user i to low-orbit satellite j,_fit^is the CPU cycle frequency of local processing tasks of each ground user, The number of CPU cycles required to process a 1-bit task for a ground user.

2.如权利要求1所述的星地融合网络多任务卸载方法，其特征在于，所述根据星地融合网络的网络状态，集中式地确定在对应网络状态下采取的动作，并以动作对应的数据传输速率最大为目标对动作进行优化，具体包括：2. The satellite-ground fusion network multi-task offloading method according to claim 1 is characterized in that the actions to be taken in the corresponding network state are centrally determined according to the network state of the satellite-ground fusion network, and the actions are optimized with the goal of maximizing the data transmission rate corresponding to the actions, specifically including:

将各地面用户和各低轨卫星之间的路径损耗、低轨卫星的动态位置和各地面用户计算密集型任务集合，作为星地融合网络的网络状态；The path loss between each ground user and each low-orbit satellite, the dynamic position of the low-orbit satellite and the set of computationally intensive tasks of each ground user are taken as the network status of the satellite-ground fusion network;

通过动作神经网络根据星地融合网络的网络状态确定采取的动作，并通过价值神经网络根据网络状态和动作确定在网络状态下动作的价值函数，通过下式根据价值函数确定用于对价值神经网络进行更新的时序差分误差：The action to be taken is determined according to the network status of the satellite-ground fusion network through the action neural network, and the value function of the action under the network status is determined according to the network status and action through the value neural network. The time difference error used to update the value neural network is determined according to the value function through the following formula:

δ＝R_t+γQ(S_t+1,A_t+1)-Q(S_t,A_t)，δ＝R_t +γQ(S_t+1 ,A_t+1 )-Q(S_t ,A_t ),

通过下式根据时序差分误差对价值神经网络的参数进行更新：The parameters of the value neural network are updated according to the temporal difference error using the following formula:

通过下式根据价值函数对动作神经网络的参数进行更新：The parameters of the action neural network are updated according to the value function using the following formula:

其中，δ为用于对价值神经网络进行更新的时序差分误差，γ为折扣因子，S_t为星地融合网络t时刻的网络状态，A_t为在网络状态S_t采取的动作，S_t+1为网络状态S_t经动作A_t后t+1时刻的网络状态，A_t+1为在t+1时刻的网络状态S_t+1采取的动作，Q(S_t,A_t)为在网络状态S_t采取的动作A_t的价值函数，R_t为在网络状态S_t采取的动作A_t的奖励，s_i为第i个地面用户的任务生成变量，I为总的地面用户数目，α_i为地面用户i的本地执行比率，D_i为本地用户所处理的比特数目，为地面用户i卸载至低轨卫星j的卸载比率，R_i,j为地面用户i卸载至低轨卫星j时的上行传输速率，w为价值神经网络的参数，α为价值神经网络的学习率，θ为动作神经网络的参数，β为动作神经网络的学习率，m为被挑选的训练样本数目。Wherein, δ is the temporal difference error used to update the value neural network, γ is the discount factor, S_t is the network state of the satellite-ground fusion network at time t, A_t is the action taken in the network state S_t , S_t+1 is the network state of the network state S_t at time t+1 after action A_t , A_t+1 is the action taken in the network state S_t+1 at time t+1, Q(S_t ,A_t ) is the value function of action A_t taken in the network state S_t , R_t is the reward of action A_t taken in the network state S_t , s_i is the task generation variable of the i-th ground user, I is the total number of ground users, α_i is the local execution ratio of ground user i, D_i is the number of bits processed by the local user, is the offloading ratio of ground user i to low-orbit satellite j, R_i,j is the uplink transmission rate when ground user i is offloaded to low-orbit satellite j, w is the parameter of the value neural network, α is the learning rate of the value neural network, θ is the parameter of the action neural network, β is the learning rate of the action neural network, and m is the number of selected training samples.

3.如权利要求1所述的星地融合网络多任务卸载方法，其特征在于，所述基于部署于各地面用户的智能体采用分布式多智能体深度确定性策略梯度，以各地面用户面临的局部网络环境为其网络状态，以采取动作后的数据传输速率确定奖励，分布式地对各地面用户的动作进行优化，具体包括：3. The satellite-ground fusion network multi-task offloading method according to claim 1 is characterized in that the agent deployed on each ground user adopts a distributed multi-agent deep deterministic policy gradient, takes the local network environment faced by each ground user as its network state, and determines the reward based on the data transmission rate after taking the action, and optimizes the actions of each ground user in a distributed manner, specifically including:

基于部署于各地面用户的智能体采用分布式多智能体深度确定性策略梯度，针对每个地面用户部署的智能体，该智能体与动态网络环境交互并收集网络状态S_t，将S_t输入到当前actor网络得到对应的动作A_t；根据该智能体采取动作A_t确定对应奖励R_t；Based on the agents deployed on various ground users, a distributed multi-agent deep deterministic policy gradient is adopted. For each ground user deployed agent, the agent interacts with the dynamic network environment and collects the network state_St , and inputs_St into the current actor network to obtain the corresponding action_At ; the corresponding reward_Rt is determined according to the action_At taken by the agent;

通过下式对当前critic网络进行时序差分更新：Δ_i＝R_t+εQ_t'_arget(S_t'_otal,A_t'_otal|w_t'_arget)-Q_current(S_total,A_total w_current)，The current critic network is updated by temporal difference using the following formula: Δ_i = R_t + ε Q_t '_arget ( S_t '_otal , A_t '_otal | w_t '_arget ) - Q_current ( S_total , A_total w_current ),

当前critic网络的策略梯度为：The current policy gradient of the critic network is:

当前actor网络的损失梯度为：The loss gradient of the current actor network is:

目标critic网络和actor网络参数为：w_t'_arget＝νw_current+(1-ν)w_t'_arget，The target critic network and actor network parameters are: w_t '_arget ＝νw_current +(1-ν)w_t '_arget ,

其中，Q_t'_arget为目标critic网络对应的模型参数，w_t'_arget为目标actor网络对应的模型参数，Q_current为当前critic网络对应的模型参数，w_current为当前actor网络对应的模型参数，ε为折扣因子，为第I个智能体的网络状态，为第I个智能体采取的动作，λ是当前critic网络梯度更新的学习率，π₁(A_t|S_t)为当前actor网络策略，π₂(A_t'|S_t')为目标actor网络策略，ν为目标critic网络对应的参数更新率，为actor网络对应的参数更新率。Among them, Q_t '_arget is the model parameter corresponding to the target critic network, w_t '_arget is the model parameter corresponding to the target actor network, Q_current is the model parameter corresponding to the current critic network, w_current is the model parameter corresponding to the current actor network, ε is the discount factor, is the network state of the I-th agent, is the action taken by the I-th agent, λ is the learning rate of the current critic network gradient update, π₁ (A_t |S_t ) is the current actor network strategy, π₂ (A_t '|S_t ') is the target actor network strategy, ν is the parameter update rate corresponding to the target critic network, is the parameter update rate corresponding to the actor network.

4.一种星地融合网络多任务卸载装置，其特征在于，包括：4. A satellite-ground fusion network multi-task unloading device, characterized by comprising:

多个地面用户、多个低轨卫星和卫星云平台形成的星地融合网络；各地面用户将本地生成的多模态任务至少部分卸载至低轨卫星，卫星云平台用于协助地面用户和低轨卫星之间的任务卸载；所述多模态任务包括计算密集型任务、时延敏感型任务和隐私保护型任务；A satellite-ground fusion network formed by multiple ground users, multiple low-orbit satellites and a satellite cloud platform; each ground user offloads at least part of the multimodal tasks generated locally to the low-orbit satellite, and the satellite cloud platform is used to assist in task offloading between the ground users and the low-orbit satellite; the multimodal tasks include computationally intensive tasks, delay-sensitive tasks and privacy-preserving tasks;

卸载速率确定模块，用于通过下式确定各地面用户和各低轨卫星之间的路径损耗：通过下式根据各地面用户和各低轨卫星之间的路径损耗与各地面用户的发射功率，确定各地面用户卸载至低轨卫星时的上行传输速率：The unloading rate determination module is used to determine the path loss between each ground user and each low-orbit satellite by the following formula: The uplink transmission rate of each ground user when unloading to the low-orbit satellite is determined by the following formula based on the path loss between each ground user and each low-orbit satellite and the transmission power of each ground user:

传输速率确定模块，用于以各地面用户的任务卸载比率为变量，并根据各地面用户本地处理任务的CPU周期频率和卸载至低轨卫星时的上行传输速率表达，确定各地面用户的多模态任务的数据传输速率表达：The transmission rate determination module is used to determine the data transmission rate expression of the multimodal tasks of each ground user based on the task offloading ratio of each ground user as a variable and the CPU cycle frequency of the local processing tasks of each ground user and the uplink transmission rate expression when unloading to the low-orbit satellite:

密集任务处理模块，用于针对计算密集型任务，通过卫星云平台采用马尔科夫决策过程，根据星地融合网络的网络状态，集中式地确定在对应网络状态下采取的动作，并以动作对应的数据传输速率最大为目标对动作进行优化；所述动作包括地面用户的CPU周期频率、发射功率和任务卸载比率；The intensive task processing module is used for computing-intensive tasks. It uses the Markov decision process through the satellite cloud platform to centrally determine the actions to be taken under the corresponding network status according to the network status of the satellite-ground fusion network, and optimizes the actions with the goal of maximizing the data transmission rate corresponding to the actions; the actions include the CPU cycle frequency, transmission power and task offloading ratio of the ground user;

时延任务处理模块，用于针对时延敏感型任务，基于部署于各地面用户的智能体采用分布式多智能体深度确定性策略梯度，以各地面用户面临的局部网络环境为其网络状态，以采取动作后的数据传输速率确定奖励，分布式地对各地面用户的动作进行优化；The delay task processing module is used for delay-sensitive tasks. It uses distributed multi-agent deep deterministic policy gradient based on the agents deployed on various ground users. It uses the local network environment faced by each ground user as its network state and the data transmission rate after taking the action to determine the reward, so as to optimize the actions of each ground user in a distributed manner.

隐私任务处理模块，用于针对隐私保护型任务，对各地面用户用于处理隐私保护型任务的本地模型进行量化模型分析，将量化后的本地模型参数通过各低轨卫星上传至卫星云平台，使卫星云平台通过联邦学习聚合量化后的本地模型参数并对各地面用户的本地模型进行更新；The privacy task processing module is used to perform quantitative model analysis on the local models used by various ground users to process privacy-preserving tasks, and upload the quantified local model parameters to the satellite cloud platform through each low-orbit satellite, so that the satellite cloud platform aggregates the quantified local model parameters through federated learning and updates the local models of various ground users;

其中，为地面用户i和低轨卫星j之间的路径损耗，f_c为载波频率，c为光速，m_i,j为地面用户i和低轨卫星j之间的水平距离，n_i,j为地面用户i和低轨卫星j之间的垂直距离，为地面用户i和低轨卫星j之间的视距链路的加性路径损耗，为地面用户i和低轨卫星j之间的非视距链路的加性路径损耗，x₁和x₂是由低轨卫星所确定的常量参数，B_i,j为地面用户i和低轨卫星j之间的传输带宽，r_i,j为地面用户i到低轨卫星j的发射功率，σ²为信道噪声功率，S_i,j为地面用户i到低轨卫星j的数据传输速率，s为总体任务生成变量，α_i为地面用户i的本地执行比率，D_i为本地用户所处理的比特数目，τ为邻近时隙之间的间隔时间，为地面用户i卸载至低轨卫星j的卸载比率，R_i,j为地面用户i卸载至低轨卫星j时的上行传输速率，f_i^t为各地面用户本地处理任务的CPU周期频率，为地面用户处理1比特任务所需要的CPU周期数目。in, is the path loss between ground user i and low-orbit satellite j, f_c is the carrier frequency, c is the speed of light, mi_,j is the horizontal distance between ground user i and low-orbit satellite j, ni_,j is the vertical distance between ground user i and low-orbit satellite j, is the additive path loss of the line-of-sight link between ground user i and LEO satellite j, is the additive path loss of the non-line-of-sight link between ground user i and LEO satellite j,_x1 and_x2 are constant parameters determined by the LEO satellite, Bi_,j is the transmission bandwidth between ground user i and LEO satellite j, ri_,j is the transmission power from ground user i to LEO satellite j,^σ2 is the channel noise power, Si_,j is the data transmission rate from ground user i to LEO satellite j, s is the overall task generation variable,_αi is the local execution ratio of ground user i,_Di is the number of bits processed by the local user, τ is the interval time between adjacent time slots, is the unloading ratio of ground user i to low-orbit satellite j, R_i,j is the uplink transmission rate when ground user i is unloaded to low-orbit satellite j,_fit^is the CPU cycle frequency of local processing tasks of each ground user, The number of CPU cycles required to process a 1-bit task for a ground user.

5.一种计算机可读存储介质，其特征在于，所述存储介质存储有计算机程序，所述计算机程序被处理器执行时实现如权利要求1～3任一项所述的方法。5. A computer-readable storage medium, characterized in that the storage medium stores a computer program, and when the computer program is executed by a processor, the method according to any one of claims 1 to 3 is implemented.

6.一种计算机设备，其特征在于，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述程序时实现如权利要求1～3任一所述的方法。6. A computer device, characterized in that it comprises a memory, a processor and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, the method according to any one of claims 1 to 3 is implemented.