CN120499018B

Movatterモバイル変換

Info

Publication number: CN120499018B
Application number: CN202510976135.XA
Authority: CN
Inventors: 贾吾财; 唐潇; 杨海滨; 赵轩; 李岩; 刘清涛
Original assignee: CRSC Communication and Information Group Co Ltd CRSCIC
Current assignee: CRSC Communication and Information Group Co Ltd CRSCIC
Priority date: 2025-07-16
Filing date: 2025-07-16
Publication date: 2025-09-26
Anticipated expiration: 2045-07-16
Also published as: CN120499018A

Abstract

Description

Multi-domain Internet of things task allocation optimization model training method, device, equipment and medium

Technical Field

The invention relates to the field of Internet of things, in particular to a multi-domain Internet of things task allocation optimization model training method, device, equipment and medium.

Background

With the development of scientific technology, the technology of the internet of things is continuously improved.

Currently, the explosive growth and complex applications of the internet of things have spawned numerous computationally intensive and delay sensitive tasks. These tasks may involve multiple independent but interrelated domains. Where each domain may deploy a large number of heterogeneous devices such as sensors, cameras, and actuators and generate different types of data such as structured data, images, and video streams. The computing resources of the Internet of things equipment are limited, the energy consumption is limited, and the task processing efficiency is low. The related art improves task processing efficiency by setting an edge server with a relatively high processing capacity.

The processing requirements of different tasks may be different, and when the processing requirements of the tasks are low-delay and the calculated amount is small, the local calculation can be directly performed by the internet of things. When the processing requirement of the task is high reliability and the delay requirement is not high, the task can be unloaded to an edge server with idle resources for processing.

Therefore, in order to meet the processing requirements of different tasks and the diversified requirements of reasonable allocation of computing resources, an effective task allocation optimization mode is needed in the multi-domain internet of things.

Disclosure of Invention

The invention provides a multi-domain internet of things task allocation optimization model training method, device, equipment and medium, which are used for solving the defect that the multi-domain internet of things task allocation cannot meet diversified requirements in the related technology, optimizing the multi-domain internet of things task allocation and meeting the diversified requirements.

In a first aspect, the present invention provides a multi-domain internet of things task allocation optimization model training method, including:

Generating a first multi-dimensional state tensor according to task allocation information, operation performance indexes, task attribute information and task communication quality indexes of a plurality of single-domain Internet of things and an edge server group, wherein the single-domain Internet of things is used for allocating corresponding tasks to be processed to local equipment or the edge server group for processing;

Inputting the first multi-dimensional state tensor into a task allocation optimization model to be trained, so that the task allocation optimization model to be trained executes task allocation optimization based on the first multi-dimensional state tensor to obtain an allocation optimization strategy;

optimizing the allocation of at least one single-domain Internet of things to the task to be processed based on the allocation optimization strategy, and generating a second multidimensional state tensor based on the optimized task allocation information, the operation performance index, the task attribute information and the task communication quality index;

Determining an optimized rewarding value according to the task allocation information, the optimized task allocation information, the constructed task processing cost objective function and the rewarding function;

Updating the task allocation optimization model to be trained based on the optimized reward value, the second multidimensional state tensor, the first multidimensional state tensor and the deep reinforcement learning strategy to obtain a trained task allocation optimization model.

Optionally, the generating a first multidimensional state tensor according to task allocation information, operation performance indexes, task attribute information and task communication quality indexes of the plurality of single-domain internet of things and the edge server group includes:

Generating a task allocation identification matrix, a node characteristic matrix, a task characteristic matrix and a three-dimensional network characteristic tensor according to the task allocation information, the operation performance index, the task attribute information and the task communication quality index of the plurality of single-domain Internet of things and the edge server group;

And taking the task allocation identification matrix, the node characteristic matrix, the task characteristic matrix and the three-dimensional network characteristic tensor as a whole as the first multidimensional state tensor.

Optionally, the edge server group includes a plurality of edge servers, the task attribute information includes key attribute information of each task to be processed, and the task communication quality index includes estimated communication quality indexes of any task to be processed in the corresponding single-domain internet of things and each edge server respectively.

Optionally, the generating a task allocation identifier matrix, a node feature matrix, a task feature matrix and a three-dimensional network feature tensor according to the task allocation information, the operation performance index, the task attribute information and the task communication quality index of the plurality of single-domain internet of things and the edge server group includes:

Based on task allocation information of the single-domain Internet of things and the edge servers, constructing a task allocation identification matrix, wherein each row of data in the task allocation identification matrix is used for identifying the single-domain Internet of things to allocate the task to be processed to local equipment or the edge servers;

Constructing the node characteristic matrix by using the operation performance indexes of each single-domain Internet of things and each edge server, wherein each row of data of the node characteristic matrix comprises the operation performance indexes of the single-domain Internet of things or the edge servers;

constructing a task feature matrix based on the key attribute information of each task to be processed, wherein each row of data of the task feature matrix comprises the key attribute information of the task to be processed;

And constructing a three-dimensional network characteristic tensor by using the estimated communication quality indexes of each task to be processed in the corresponding single-domain Internet of things and each edge server, wherein each row of data of the three-dimensional network characteristic tensor comprises the estimated communication quality indexes of the task to be processed in the corresponding single-domain Internet of things and each edge server.

Optionally, the determining an optimized reward value according to the task allocation information, the optimized task allocation information, the constructed task processing cost objective function and the reward function includes:

determining a first task processing cost according to the task allocation information and the task processing cost objective function, and determining a second task processing cost according to the optimized task allocation information and the task processing cost objective function;

Subtracting the first task processing cost from the second task processing cost to obtain a cost difference value;

and inputting the cost difference value into the reward function to perform reward calculation so as to determine the optimized reward value.

Optionally, the inputting the cost difference value into the reward function to perform a reward calculation to determine the optimized reward value includes:

inputting the cost difference value into the reward function so that the reward function determines a first set value greater than 0 as the optimized reward value when the cost difference value is greater than 0, determines a second set value less than 0 as the optimized reward value when the cost difference value is equal to 0, and determines a third set value less than 0 as the optimized reward value when the cost difference value is less than 0;

the absolute value of the third set value is equal to the first set value, and the absolute value of the third set value is larger than the absolute value of the second set value.

Optionally, when the task allocation optimization model to be trained is a deep Q network DQN model, the task allocation optimization model to be trained includes an estimation network, a target network and an error function;

The updating the task allocation optimization model to be trained based on the optimized reward value, the second multidimensional state tensor, the first multidimensional state tensor and the deep reinforcement learning strategy to obtain a trained task allocation optimization model comprises the following steps:

Inputting the first multidimensional state tensor into the estimation network to perform action benefit prediction to obtain a first benefit value; inputting the second multidimensional state tensor into the target network to predict expected benefits, and obtaining a second benefit value;

the first benefit value, the second benefit value and the optimized reward value are input into the error function, so that the error function performs weighted summation on the optimized reward value and the second benefit value based on set weights to obtain a target benefit value, and a loss function value is determined according to the difference between the target benefit value and the first benefit value;

updating the task allocation optimization model to be trained based on the loss function value to obtain a trained task allocation optimization model.

In a second aspect, the present invention provides a multi-domain internet of things task allocation optimization model training device, including:

The system comprises a first generation unit, a second generation unit and a third generation unit, wherein the first generation unit is used for generating a first multi-dimensional state tensor according to task allocation information, operation performance indexes, task attribute information and task communication quality indexes of a plurality of single-domain Internet of things and an edge server group, and the single-domain Internet of things is used for allocating corresponding tasks to be processed to local equipment or the edge server group for processing;

the input unit is used for inputting the first multi-dimensional state tensor into a task allocation optimization model to be trained so that the task allocation optimization model to be trained executes task allocation optimization based on the first multi-dimensional state tensor to obtain an allocation optimization strategy;

The optimizing unit is used for optimizing the distribution of at least one single-domain Internet of things to the task to be processed based on the distribution optimizing strategy;

The second generating unit is used for generating a second multidimensional state tensor based on the optimized task allocation information, the running performance index, the task attribute information and the task communication quality index;

the determining unit is used for determining an optimized rewarding value according to the task allocation information, the optimized task allocation information, the constructed task processing cost objective function and the rewarding function;

and the updating unit is used for updating the task allocation optimization model to be trained based on the optimized reward value, the second multidimensional state tensor, the first multidimensional state tensor and the deep reinforcement learning strategy so as to obtain a trained task allocation optimization model.

In a third aspect, the present invention provides a computer device, including a memory and a processor, where the memory and the processor are communicatively connected to each other, and the memory stores computer instructions, and the processor executes the computer instructions, so as to execute the multi-domain internet of things task allocation optimization model training method according to the first aspect or any implementation manner corresponding to the first aspect.

In a fourth aspect, the present invention provides a computer readable storage medium, where computer instructions are stored on the computer readable storage medium, where the computer instructions are configured to cause a computer to execute the multi-domain internet of things task allocation optimization model training method according to the first aspect or any implementation manner corresponding to the first aspect.

Drawings

In order to more clearly illustrate the invention or the technical solutions in the related art, the following description will briefly explain the drawings used in the embodiments or the related art description, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for those skilled in the art.

FIG. 1 is a flowchart of a multi-domain Internet of things task allocation optimization model training method provided by an embodiment of the invention;

FIG. 2 is a flowchart of another multi-domain Internet of things task allocation optimization model training method provided by an embodiment of the present invention;

Fig. 3 is a schematic structural diagram of a multi-domain task allocation optimization model training device for internet of things according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The multi-domain internet of things task allocation optimization model training method of the invention is described below with reference to fig. 1-2.

As shown in fig. 1, the present embodiment proposes a first multi-domain internet of things task allocation optimization model training method, which may include the following steps:

S101, generating a first multi-dimensional state tensor according to task allocation information, operation performance indexes, task attribute information and task communication quality indexes of a plurality of single-domain Internet of things and edge server groups, wherein the single-domain Internet of things is used for allocating corresponding tasks to be processed to local equipment or the edge server groups for processing.

Wherein, a plurality of internet of things devices can be deployed in each single-domain internet of things. The different single-domain Internet of things are independent and associated with each other. It should be noted that, a plurality of independent and interrelated single-domain internet of things form a multi-domain internet of things.

Specifically, the edge server group can be used for performing edge calculation on tasks of the multi-domain internet of things. The edge server set may include a plurality of edge servers, each of which may be used to handle the task of single domain internet of things offloading.

The task allocation information may include information that the tasks generated by each single-domain internet of things are allocated to a local device or a specific edge server for processing. It can be understood that the local device is an internet of things device for processing tasks in the corresponding single-domain internet of things. Specifically, a certain single-domain internet of things distributes tasks to local equipment for processing, which means that the single-domain internet of things distributes tasks to the internet of things equipment for processing tasks in the single-domain internet of things for processing.

The operation performance index may include index data related to the operation performance of each single-domain internet of things and each edge server, such as CPU utilization, memory, queuing task number, signal-to-noise ratio, bandwidth, transmission rate, and the like.

Optionally, in the method for training the task allocation optimization model of the other multi-domain internet of things provided in this embodiment, the edge server group includes a plurality of edge servers, the task attribute information includes key attribute information of each task to be processed, and the task communication quality index includes estimated communication quality indexes of any task to be processed in the corresponding single-domain internet of things and each edge server respectively.

Specifically, the task attribute information may include related attribute information of each task generated by each single-domain internet of things, such as a calculation requirement, a data amount, a deadline, and the like of the task.

Specifically, the task communication quality index may include estimated communication quality indexes of any task in each single-domain internet of things and each edge server, such as signal-to-noise ratio, round trip delay, and the like.

Specifically, the embodiment may construct the first multidimensional state tensor according to the task allocation information, the operation performance index, the task attribute information and the task communication quality index of each single-domain internet of things and each edge server.

Optionally, step S101 may include:

Generating a task allocation identification matrix, a node characteristic matrix, a task characteristic matrix and a three-dimensional network characteristic tensor according to task allocation information, operation performance indexes, task attribute information and task communication quality indexes of a plurality of single-domain Internet of things and edge server groups;

And taking the task allocation identification matrix, the node characteristic matrix, the task characteristic matrix and the three-dimensional network characteristic tensor as a whole as a first multidimensional state tensor.

The task allocation identification matrix is a matrix for describing task allocation information, the node characteristic matrix is a matrix comprising operation performance indexes of each single-domain Internet of things and each edge server, the task characteristic matrix is a matrix comprising task attribute information of each task, and the three-dimensional network characteristic tensor comprises estimated communication quality indexes of any task in each single-domain Internet of things and each edge server.

Optionally, generating the task allocation identifier matrix, the node feature matrix, the task feature matrix and the three-dimensional network feature tensor according to the task allocation information, the operation performance index, the task attribute information and the task communication quality index of the plurality of single-domain internet of things and the edge server group includes:

based on task allocation information of a plurality of single-domain Internet of things and a plurality of edge servers, constructing a task allocation identification matrix, wherein each row of data in the task allocation identification matrix is used for identifying the single-domain Internet of things to allocate a task to be processed to local equipment or the edge servers;

constructing a node characteristic matrix by using the operation performance indexes of each single-domain Internet of things and each edge server, wherein each row of data of the node characteristic matrix comprises the operation performance indexes of the single-domain Internet of things or the edge servers;

And constructing a three-dimensional network characteristic tensor by using the estimated communication quality indexes of each task to be processed in the corresponding single-domain Internet of things and each edge server respectively, wherein each row of data of the three-dimensional network characteristic tensor comprises the estimated communication quality indexes of the task to be processed in the corresponding single-domain Internet of things and each edge server respectively.

In particular, the task allocation identity matrix may be expressed asWherein M represents the total number of tasks of the multi-domain Internet of things, N represents the total number of nodes available for calculation, and each single-domain Internet of things and each edge server are included. I.e. whenWhen the current task is executed in node n, otherwise. The matrixing representation not only can intuitively reflect the distribution situation of tasks, but also is convenient for realizing efficient state update through matrix operation, thereby reducing the computational complexity.

In particular, the task feature matrix may be expressed as. Attribute vector for task m for each row of dataIncluding calculating critical parameters such as demand, data volume, deadline, etc.

In particular, the node feature matrix may be expressed as. Each data represents a real-time state vector of node n, which may include performance metrics such as CPU utilization, memory, available bandwidth, etc.

Wherein the three-dimensional network feature tensor can be expressed as. Three-dimensional matrix elementThe estimated communication quality between task m and node n, such as signal-to-noise ratio, round trip delay, is described.

Specifically, in this embodiment, the constructed task allocation identifier matrix, node feature matrix, task feature matrix, and three-dimensional network feature tensor may be integrally used as the first multidimensional state tensor.

Wherein the first multidimensional state tensor may be represented as。

S102, inputting the first multidimensional state tensor into a task allocation optimization model to be trained, so that the task allocation optimization model to be trained executes task allocation optimization based on the first multidimensional state tensor, and an allocation optimization strategy is obtained.

The task allocation optimization model to be trained can be a model with a deep reinforcement network architecture, and deep reinforcement learning is performed to achieve the task allocation optimization capability.

Alternatively, the task allocation optimization model to be trained may be a Deep Q-Network (DQN) model, or may be another type of Deep-enhanced Network architecture model.

Specifically, the embodiment may input the constructed first multidimensional state tensor into the task allocation optimization model to be trained, so that the task allocation optimization model performs task allocation optimization based on the first multidimensional state tensor, and obtains and outputs an allocation optimization strategy.

S103, optimizing the allocation of the tasks to be processed by at least one single-domain Internet of things based on an allocation optimization strategy.

Specifically, in this embodiment, task allocation of each single-domain internet of things may be optimized according to an allocation optimization policy, that is, some tasks are adjusted from a local device to an edge server for processing, and some tasks are adjusted from the edge server to the local device for processing.

S104, generating a second multidimensional state tensor based on the optimized task allocation information, the running performance index, the task attribute information and the task communication quality index.

Specifically, in this embodiment, after task allocation of the multi-domain internet of things is optimized based on an allocation optimization policy, task allocation information, operation performance indexes, task attribute information and task communication quality indexes of the multi-domain internet of things and an edge server group after allocation optimization are obtained, and a specific process of generating a first multi-dimensional state tensor in step S101 is referred to, and a corresponding second multi-dimensional state tensor is generated based on the task allocation information, the operation performance indexes, the task attribute information and the task communication quality indexes of the multi-domain internet of things and the edge server group after allocation optimization.

It is understood that the second multidimensional state tensor is also integrally formed with the task assignment identification matrix, the node feature matrix, the task feature matrix, and the three-dimensional network feature tensor.

S105, determining an optimized rewarding value according to the task allocation information, the optimized task allocation information, the constructed task processing cost objective function and the rewarding function.

Specifically, the embodiment can construct a task processing cost objective function and a reward function according to actual situations.

Specifically, the embodiment can determine the corresponding optimized reward value according to the task allocation information of the multi-domain internet of things, the optimized task allocation information, the task processing cost objective function and the reward function.

Optionally, step S105 may include:

The cost difference is input into a bonus function for bonus calculation to determine an optimized bonus value.

Specifically, in this embodiment, the task processing cost objective function may be used to calculate based on the task allocation information, so as to obtain the first task processing cost. And calculating based on the optimized task allocation information by using the task processing cost objective function to obtain a second task processing cost.

Optionally, the inputting the cost difference value into the reward function to perform a reward calculation to determine an optimized reward value includes:

Inputting the cost difference into the bonus function so that the bonus function can determine that a first set value larger than 0 is an optimized bonus value when the cost difference is larger than 0, a second set value smaller than 0 is an optimized bonus value when the cost difference is equal to 0, and a third set value smaller than 0 is an optimized bonus value when the cost difference is smaller than 0;

And S106, updating the task allocation optimization model to be trained based on the optimized reward value, the second multidimensional state tensor, the first multidimensional state tensor and the deep reinforcement learning strategy so as to obtain a trained task allocation optimization model.

Optionally, in the method for training the task allocation optimization model of the other multi-domain internet of things provided in this embodiment, when the task allocation optimization model to be trained is the deep Q network DQN model, the task allocation optimization model to be trained includes an estimation network, a target network and an error function. At this time, step S106 may include:

inputting the first multi-dimensional state tensor into an estimation network to perform action profit prediction to obtain a first profit value;

inputting the first benefit value, the second benefit value and the optimized reward value into an error function so that the error function performs weighted summation on the optimized reward value and the second benefit value based on set weights to obtain a target benefit value, and determining a loss function value according to the difference between the target benefit value and the first benefit value;

Specifically, the task allocation optimization model to be trained is an DQN model, and the task allocation optimization model comprises an estimation network, a target network and an DQN error function.

Specifically, the embodiment may consider the whole of the multi-domain internet of things and the edge server group as an environment, and input the network state of the environment at the time t, that is, the first multi-dimensional state tensor, into the estimation network to perform action benefit prediction, so as to obtain the first benefit value output by the estimation network. And inputting the network state of the environment at the time t+1, namely a second multidimensional state tensor, into the target network to perform expected profit prediction, and obtaining a second profit value.

Specifically, the first benefit value, the second benefit value and the optimized prize value may be input into the DQN error function, so that the DQN error function may perform weighted summation on the optimized prize value and the second benefit value based on the set weight, obtain the target benefit value, and calculate the corresponding loss function value according to the difference between the target benefit value and the first benefit value and according to the set loss function. And then, when the loss function value is not smaller than the set threshold value, the embodiment can update parameters of the task allocation optimization model to be trained to obtain an updated task allocation optimization model.

Based on fig. 1, another multi-domain internet of things task allocation optimization model training method is provided in this embodiment, and according to the multi-domain internet of things task allocation scene, a task processing cost objective function and a reward function can be constructed.

Specifically, in the process of establishing the task processing cost objective function, the network model may be configured to be composed of D single-domain internet of things and E edge servers, where different single-domain internet of things may be represented asThe edge servers are represented as. Setting each single-domain internet of things generates individual tasks that can be offloaded to an edge server.

Apparatus and method for controlling the operation of a deviceGenerating tasksWhereinRepresenting the input value of the task,Representing the resources required to process the task, such as the different number of CPU cycles.

The total energy consumption for performing a task can be expressed as:

;

Wherein, theFor a fixed frequency of the CPU, i.e. the CPU cycle frequency required to process 1bit data,And (5) corresponding energy consumption proportion for each CPU.Is the task size.

The local computation delay is expressed as:

;

Wherein, theRepresenting single-domain Internet of thingsThe amount of data that can be processed per cycle by the CPU of (c).

The cost function is expressed as:

;

Wherein, theRepresenting the weight of the delay cost.

In an edge offload task, the communication latency may be expressed as:

;

Wherein, theIs the single-domain internet of thingsAnd edge serverThe time delay of the communication between them,Is the single-domain internet of thingsAnd edge serverA bandwidth channel therebetween.Delegate tasks assigned to edge servers. W is the shared bandwidth between the hypothetical edge server and the connected single-domain internet of things.Representative ofChannel gain of (i.e. single domain internet of things)And edge serverCommunication gap between them.Representing single-domain Internet of thingsTo edge serversIs provided.Representing the use of complex gaussian models to process noise present in the data.Representing the signal to noise ratio.

The transmission delay between the device and the server can be expressed as:

;

Wherein, theRepresenting single domain internet of thingsTo edge serversIs a data amount of (a) in the data stream.

With respect to the task computing model, when an edge server receives two or more offloaded tasks from different single-domain Internet of things, it is assumed that computing resources are shared equally between tasks. At this time, the number of resources allocated to each single-domain internet of things for calculation may be expressed as:

;

。

Wherein, theRepresentative edge serverThe task processing capacities of all servers are set to be uniform.Representing the CPU cycles required by the edge server to handle a task,Representing the power consumption per CPU cycle.Representing the frequency of the CPU and,To calculate the cost.

The latency of an edge server performing a task can be expressed as:

;

Wherein, theRepresenting the frequency of the CPU.

The transmission delay of the single-domain internet of things to transmit tasks to the edge server can be expressed as:

。

Wherein whenWhen the device is used, the device of the Internet of things is representedSuccessful offloading of tasks toCalculating on a server, and meeting the constraint:

;

Finally, the cost function of the edge computation mode is expressed as

;

Wherein P represents the transmission energy consumption of the device, and specifically refers to the fixed transmission power of the device during communication.

The cost function can be expressed as:

。

Penalty functionIs suitable for task execution failure caused by violation of computational constraint.

Optimizing an objective function, and constructing a task processing cost objective function:

。

Specifically, the embodiment may calculate the first task processing cost of the multi-domain internet of things and the edge server group based on the task processing cost objective function. And the second task processing cost of the multi-domain internet of things and the edge server after the task allocation optimization is performed can be calculated based on the task processing cost objective function after the task allocation is adjusted according to the allocation optimization strategy. Subtracting the first task processing cost from the second task processing cost to obtain a cost difference value。

Specifically, the reward function constructed in this embodiment is:

。

Wherein, theAndAre all larger than 0 and are the optimal rewarding valuesGreater than。

In particular, the present embodiment can be based on the cost differenceAnd a prize function, determining a corresponding optimized prize value.

Furthermore, it is considered that it is difficult to employ conventional Q learning due to complexity and continuous states of edge computation, because the speed of model convergence and obtaining an optimal strategy is greatly reduced as the search space increases. Therefore, the design adopts the deep convolutional neural network to estimate the Q value, so that the convergence speed can be increased and the method can be used for dimension reduction.

It will be appreciated that the present embodiment may be used in conjunction with a distributed deep reinforcement learning approach to training the DQN model.

As shown in fig. 2, the estimation network, the target network and the DQN error function are included in the DQN model to be trained. Both the estimation network and the target network may be deep convolutional neural networks. In this embodiment, the multi-domain internet of things and the edge server group may be regarded as an environment as a whole, and the constructed first multi-dimensional state tensor may be regarded as a network state of the environment at time t. In this embodiment, the first multidimensional state tensor is input into the DQN model to perform task allocation optimization, an allocation optimization strategy is obtained and returned to the environment, the allocation optimization strategy is regarded as an action, the environment responds to the action, and a corresponding second multidimensional state tensor, namely, the network state of the environment at time t+1 is generated. The first multidimensional state tensor and the second multidimensional state tensor are both stored in an experience pool. The present embodiment may store the calculated optimized prize value r_t also in the experience pool.

In the embodiment, a first multidimensional state tensor and a second multidimensional state tensor in an experience pool are respectively input into an estimation network and a target network, the estimation network performs action gain prediction based on the first multidimensional state tensor to obtain a first gain value, namely an estimated Q value, and the target network performs expected gain prediction based on the second multidimensional state tensor to obtain a second gain value, namely a target Q value. And inputting the first benefit value, the second benefit value and the determined optimized rewards value r_t into the DQN error function for calculation to obtain a corresponding loss function value. And updating parameters in the estimated network according to the loss function value, and periodically updating the parameters in the estimated network to the target network synchronously. Then, the present embodiment may continue to train the updated DQN model until a trained DQN model is obtained.

According to the embodiment, the Internet of things can be modeled as an agent through the distributed deep reinforcement learning framework, and the task unloading strategy is optimized through the Markov decision process. Using DQN model and convolutional neural network as function approximators, convergence is accelerated and high-dimensional state space is handled. Dynamic resource allocation policies are designed to take into account device computing power, bandwidth, energy state, etc. According to the embodiment, the energy-saving distributed task unloading system is designed, the unloading knowledge model is constructed, and load decisions can be made in real time and efficiently, so that idle resources and response time are reduced to the greatest extent.

The embodiment can realize multi-domain IoT network architecture design and support dynamic task offloading. The distributed deep reinforcement learning algorithm allows devices to independently make decisions without global information. And the learning efficiency and the convergence rate are improved through the deep convolutional neural network compression state space.

According to the method and the system, efficient and self-adaptive task unloading can be achieved through distributed deep reinforcement learning under the dynamic and resource-limited environment of the multi-domain Internet of things, difficulties such as high-dimensional decision making, multi-objective optimization and cross-domain coordination are overcome, and finally the edge intelligent system with low delay, low energy consumption and high reliability is achieved.

According to the multi-domain internet of things task allocation optimization model training method, a distributed deep reinforcement learning task unloading optimization mode is provided, local calculation and edge unloading can be dynamically balanced through a built knowledge model, and lower time delay, higher energy efficiency and faster convergence are achieved. And an extensible and self-adaptive scheme idea is provided for landing in a complex Internet of things scene.

As shown in fig. 3, this embodiment proposes a multi-domain task allocation optimization model training device for internet of things, where the device may include:

The first generating unit 301 is configured to generate a first multi-dimensional state tensor according to task allocation information, operation performance indexes, task attribute information and task communication quality indexes of a plurality of single-domain internet of things and an edge server group, where the single-domain internet of things is configured to allocate a corresponding task to be processed to a local device or the edge server group for processing;

The input unit 302 is configured to input the first multidimensional state tensor into a task allocation optimization model to be trained, so that the task allocation optimization model to be trained performs task allocation optimization based on the first multidimensional state tensor, and an allocation optimization strategy is obtained;

an optimizing unit 303, configured to optimize allocation of tasks to be processed by at least one single-domain internet of things based on an allocation optimization policy;

A second generating unit 304, configured to generate a second multidimensional state tensor based on the optimized task allocation information, the running performance index, the task attribute information, and the task communication quality index;

a determining unit 305, configured to determine an optimized reward value according to the task allocation information, the optimized task allocation information, the constructed task processing cost objective function and the reward function;

The updating unit 306 is configured to update the task allocation optimization model to be trained based on the optimized reward value, the second multidimensional state tensor, the first multidimensional state tensor, and the deep reinforcement learning strategy, so as to obtain a trained task allocation optimization model.

It should be noted that, the processing procedures of the first generating unit 301, the input unit 302, the optimizing unit 303, the second generating unit 304, the determining unit 305, and the updating unit 306 and the beneficial effects thereof may refer to steps S101 to S105 in fig. 1, respectively, and are not described again.

Optionally, the first generating unit 301 is further configured to:

Optionally, the edge server group comprises a plurality of edge servers, the task attribute information comprises key attribute information of each task to be processed, and the task communication quality index comprises estimated communication quality indexes of any task to be processed in the corresponding single-domain Internet of things and each edge server respectively.

Optionally, the first generating unit 301 is further configured to:

Optionally, the determining unit 305 is further configured to:

The updating unit 306 is further configured to:

The multi-domain internet of things task allocation optimization model training device in this embodiment is presented in the form of functional units, where the units are ASIC (Application SPECIFIC INTEGRATED Circuit) circuits, processors and memories that execute one or more software or firmware programs, and/or other devices that can provide the above functions.

The embodiment of the invention also provides computer equipment, which is provided with the multi-domain internet of things task allocation optimization model training device shown in the figure 3.

Referring to FIG. 4, an alternative embodiment of the present invention provides a schematic structural diagram of a computer device, which includes one or more processors 10, a memory 20, and interfaces for connecting the components, including a high-speed interface and a low-speed interface. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the computer device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories. Also, multiple computer devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 10 is illustrated in fig. 4.

The processor 10 may be a central processor, a network processor, or a combination thereof. The processor 10 may further include a hardware chip, among others. The hardware chip may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.

Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform a method for implementing the embodiments described above.

The memory 20 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area. The storage data area may store data created according to the use of the computer device, etc. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the computer device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The memory 20 may include volatile memory, such as random access memory. The memory may also include non-volatile memory, such as flash memory, a hard disk, or a solid state disk. The memory 20 may also comprise a combination of the above types of memories.

The computer device also includes a communication interface 30 for the computer device to communicate with other devices or communication networks.

The embodiments of the present invention also provide a computer readable storage medium, and the method according to the embodiments of the present invention described above may be implemented in hardware, firmware, or as a computer code which may be recorded on a storage medium, or as original stored in a remote storage medium or a non-transitory machine readable storage medium downloaded through a network and to be stored in a local storage medium, so that the method described herein may be stored on such software process on a storage medium using a general purpose computer, a special purpose processor, or programmable or special purpose hardware. The storage medium may be a magnetic disk, an optical disk, a read-only memory, a random-access memory, a flash memory, a hard disk, a solid state disk, or the like, and further, the storage medium may further include a combination of the above types of memories. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.

It should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention, and not for limiting the same, and although the present invention has been described in detail with reference to the above-mentioned embodiments, it should be understood by those skilled in the art that the technical solution described in the above-mentioned embodiments may be modified or some technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the spirit and scope of the technical solution of the embodiments of the present invention.

Claims

1. A multi-domain Internet of things task allocation optimization model training method is characterized by comprising the following steps:

2. The method of claim 1, wherein generating the first multi-dimensional state tensor based on the task allocation information, the operation performance index, the task attribute information, and the task communication quality index of the plurality of single-domain internet of things and the edge server group comprises:

3. The method of claim 2, wherein the edge server group comprises a plurality of edge servers, the task attribute information comprises key attribute information of each task to be processed, and the task communication quality index comprises estimated communication quality indexes of any task to be processed in the corresponding single-domain internet of things and each edge server respectively.

4. The method of claim 3, wherein generating the task allocation identification matrix, the node feature matrix, the task feature matrix, and the three-dimensional network feature tensor based on the task allocation information, the operation performance index, the task attribute information, and the task communication quality index of the plurality of single-domain internet of things and the edge server group comprises:

Constructing a task allocation identification matrix based on task allocation information of the plurality of single-domain Internet of things and the plurality of edge servers, wherein each row of data in the task allocation identification matrix is used for identifying the single-domain Internet of things to allocate the task to be processed to local equipment or the edge servers;

And constructing the three-dimensional network characteristic tensor by using the estimated communication quality indexes of each task to be processed in the corresponding single-domain Internet of things and each edge server respectively, wherein each row of data of the three-dimensional network characteristic tensor comprises the estimated communication quality indexes of the task to be processed in the corresponding single-domain Internet of things and each edge server respectively.

5. The method of claim 1, wherein the determining an optimized prize value based on the task allocation information, the optimized task allocation information, the constructed task processing cost objective function, and the prize function comprises:

6. The method of claim 5, wherein said inputting the cost difference into the bonus function for bonus calculation to determine the optimized bonus value comprises:

7. The method according to any one of claims 1 to 6, wherein when the task allocation optimization model to be trained is a deep Q network DQN model, an estimation network, a target network, and an error function are included in the task allocation optimization model to be trained;

8. Multi-domain internet of things task allocation optimization model training device is characterized by comprising:

9. A computer device, comprising:

The system comprises a memory and a processor, wherein the memory and the processor are in communication connection, the memory stores computer instructions, and the processor executes the computer instructions, so as to execute the multi-domain internet of things task allocation optimization model training method according to any one of claims 1 to 7.

10. A computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the multi-domain internet of things task allocation optimization model training method according to any one of claims 1 to 7.