Movatterモバイル変換


[0]ホーム

URL:


CN120499018B - Multi-domain Internet of things task allocation optimization model training method, device, equipment and medium - Google Patents

Multi-domain Internet of things task allocation optimization model training method, device, equipment and medium

Info

Publication number
CN120499018B
CN120499018BCN202510976135.XACN202510976135ACN120499018BCN 120499018 BCN120499018 BCN 120499018BCN 202510976135 ACN202510976135 ACN 202510976135ACN 120499018 BCN120499018 BCN 120499018B
Authority
CN
China
Prior art keywords
task
task allocation
things
value
optimization model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202510976135.XA
Other languages
Chinese (zh)
Other versions
CN120499018A (en
Inventor
贾吾财
唐潇
杨海滨
赵轩
李岩
刘清涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CRSC Communication and Information Group Co Ltd CRSCIC
Original Assignee
CRSC Communication and Information Group Co Ltd CRSCIC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CRSC Communication and Information Group Co Ltd CRSCICfiledCriticalCRSC Communication and Information Group Co Ltd CRSCIC
Priority to CN202510976135.XApriorityCriticalpatent/CN120499018B/en
Publication of CN120499018ApublicationCriticalpatent/CN120499018A/en
Application grantedgrantedCritical
Publication of CN120499018BpublicationCriticalpatent/CN120499018B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Landscapes

Abstract

The invention relates to the field of Internet of things, and discloses a multi-domain Internet of things task allocation optimization model training method, device, equipment and medium, which can be used for optimizing task allocation model training according to task allocation information, operation performance indexes, task attribute information and task communication quality indexes of a plurality of single-domain Internet of things and edge server groups, generating a first multidimensional state tensor and inputting the first multidimensional state tensor into a task allocation optimization model to be trained so that the task allocation optimization model to be trained executes task allocation optimization based on the first multidimensional state tensor to obtain an allocation optimization strategy, and updating the task allocation optimization model to be trained based on the allocation optimization strategy to obtain a trained task allocation optimization model. The trained task allocation optimization model can optimize task allocation of the multi-domain Internet of things under the condition of considering processing requirements of different tasks, reasonable allocation of computing resources and other diversified requirements, and effectively achieves diversified targets of different task processing requirements, reasonable allocation of computing resources and the like.

Description

Multi-domain Internet of things task allocation optimization model training method, device, equipment and medium
Technical Field
The invention relates to the field of Internet of things, in particular to a multi-domain Internet of things task allocation optimization model training method, device, equipment and medium.
Background
With the development of scientific technology, the technology of the internet of things is continuously improved.
Currently, the explosive growth and complex applications of the internet of things have spawned numerous computationally intensive and delay sensitive tasks. These tasks may involve multiple independent but interrelated domains. Where each domain may deploy a large number of heterogeneous devices such as sensors, cameras, and actuators and generate different types of data such as structured data, images, and video streams. The computing resources of the Internet of things equipment are limited, the energy consumption is limited, and the task processing efficiency is low. The related art improves task processing efficiency by setting an edge server with a relatively high processing capacity.
The processing requirements of different tasks may be different, and when the processing requirements of the tasks are low-delay and the calculated amount is small, the local calculation can be directly performed by the internet of things. When the processing requirement of the task is high reliability and the delay requirement is not high, the task can be unloaded to an edge server with idle resources for processing.
Therefore, in order to meet the processing requirements of different tasks and the diversified requirements of reasonable allocation of computing resources, an effective task allocation optimization mode is needed in the multi-domain internet of things.
Disclosure of Invention
The invention provides a multi-domain internet of things task allocation optimization model training method, device, equipment and medium, which are used for solving the defect that the multi-domain internet of things task allocation cannot meet diversified requirements in the related technology, optimizing the multi-domain internet of things task allocation and meeting the diversified requirements.
In a first aspect, the present invention provides a multi-domain internet of things task allocation optimization model training method, including:
Generating a first multi-dimensional state tensor according to task allocation information, operation performance indexes, task attribute information and task communication quality indexes of a plurality of single-domain Internet of things and an edge server group, wherein the single-domain Internet of things is used for allocating corresponding tasks to be processed to local equipment or the edge server group for processing;
Inputting the first multi-dimensional state tensor into a task allocation optimization model to be trained, so that the task allocation optimization model to be trained executes task allocation optimization based on the first multi-dimensional state tensor to obtain an allocation optimization strategy;
optimizing the allocation of at least one single-domain Internet of things to the task to be processed based on the allocation optimization strategy, and generating a second multidimensional state tensor based on the optimized task allocation information, the operation performance index, the task attribute information and the task communication quality index;
Determining an optimized rewarding value according to the task allocation information, the optimized task allocation information, the constructed task processing cost objective function and the rewarding function;
Updating the task allocation optimization model to be trained based on the optimized reward value, the second multidimensional state tensor, the first multidimensional state tensor and the deep reinforcement learning strategy to obtain a trained task allocation optimization model.
Optionally, the generating a first multidimensional state tensor according to task allocation information, operation performance indexes, task attribute information and task communication quality indexes of the plurality of single-domain internet of things and the edge server group includes:
Generating a task allocation identification matrix, a node characteristic matrix, a task characteristic matrix and a three-dimensional network characteristic tensor according to the task allocation information, the operation performance index, the task attribute information and the task communication quality index of the plurality of single-domain Internet of things and the edge server group;
And taking the task allocation identification matrix, the node characteristic matrix, the task characteristic matrix and the three-dimensional network characteristic tensor as a whole as the first multidimensional state tensor.
Optionally, the edge server group includes a plurality of edge servers, the task attribute information includes key attribute information of each task to be processed, and the task communication quality index includes estimated communication quality indexes of any task to be processed in the corresponding single-domain internet of things and each edge server respectively.
Optionally, the generating a task allocation identifier matrix, a node feature matrix, a task feature matrix and a three-dimensional network feature tensor according to the task allocation information, the operation performance index, the task attribute information and the task communication quality index of the plurality of single-domain internet of things and the edge server group includes:
Based on task allocation information of the single-domain Internet of things and the edge servers, constructing a task allocation identification matrix, wherein each row of data in the task allocation identification matrix is used for identifying the single-domain Internet of things to allocate the task to be processed to local equipment or the edge servers;
Constructing the node characteristic matrix by using the operation performance indexes of each single-domain Internet of things and each edge server, wherein each row of data of the node characteristic matrix comprises the operation performance indexes of the single-domain Internet of things or the edge servers;
constructing a task feature matrix based on the key attribute information of each task to be processed, wherein each row of data of the task feature matrix comprises the key attribute information of the task to be processed;
And constructing a three-dimensional network characteristic tensor by using the estimated communication quality indexes of each task to be processed in the corresponding single-domain Internet of things and each edge server, wherein each row of data of the three-dimensional network characteristic tensor comprises the estimated communication quality indexes of the task to be processed in the corresponding single-domain Internet of things and each edge server.
Optionally, the determining an optimized reward value according to the task allocation information, the optimized task allocation information, the constructed task processing cost objective function and the reward function includes:
determining a first task processing cost according to the task allocation information and the task processing cost objective function, and determining a second task processing cost according to the optimized task allocation information and the task processing cost objective function;
Subtracting the first task processing cost from the second task processing cost to obtain a cost difference value;
and inputting the cost difference value into the reward function to perform reward calculation so as to determine the optimized reward value.
Optionally, the inputting the cost difference value into the reward function to perform a reward calculation to determine the optimized reward value includes:
inputting the cost difference value into the reward function so that the reward function determines a first set value greater than 0 as the optimized reward value when the cost difference value is greater than 0, determines a second set value less than 0 as the optimized reward value when the cost difference value is equal to 0, and determines a third set value less than 0 as the optimized reward value when the cost difference value is less than 0;
the absolute value of the third set value is equal to the first set value, and the absolute value of the third set value is larger than the absolute value of the second set value.
Optionally, when the task allocation optimization model to be trained is a deep Q network DQN model, the task allocation optimization model to be trained includes an estimation network, a target network and an error function;
The updating the task allocation optimization model to be trained based on the optimized reward value, the second multidimensional state tensor, the first multidimensional state tensor and the deep reinforcement learning strategy to obtain a trained task allocation optimization model comprises the following steps:
Inputting the first multidimensional state tensor into the estimation network to perform action benefit prediction to obtain a first benefit value; inputting the second multidimensional state tensor into the target network to predict expected benefits, and obtaining a second benefit value;
the first benefit value, the second benefit value and the optimized reward value are input into the error function, so that the error function performs weighted summation on the optimized reward value and the second benefit value based on set weights to obtain a target benefit value, and a loss function value is determined according to the difference between the target benefit value and the first benefit value;
updating the task allocation optimization model to be trained based on the loss function value to obtain a trained task allocation optimization model.
In a second aspect, the present invention provides a multi-domain internet of things task allocation optimization model training device, including:
The system comprises a first generation unit, a second generation unit and a third generation unit, wherein the first generation unit is used for generating a first multi-dimensional state tensor according to task allocation information, operation performance indexes, task attribute information and task communication quality indexes of a plurality of single-domain Internet of things and an edge server group, and the single-domain Internet of things is used for allocating corresponding tasks to be processed to local equipment or the edge server group for processing;
the input unit is used for inputting the first multi-dimensional state tensor into a task allocation optimization model to be trained so that the task allocation optimization model to be trained executes task allocation optimization based on the first multi-dimensional state tensor to obtain an allocation optimization strategy;
The optimizing unit is used for optimizing the distribution of at least one single-domain Internet of things to the task to be processed based on the distribution optimizing strategy;
The second generating unit is used for generating a second multidimensional state tensor based on the optimized task allocation information, the running performance index, the task attribute information and the task communication quality index;
the determining unit is used for determining an optimized rewarding value according to the task allocation information, the optimized task allocation information, the constructed task processing cost objective function and the rewarding function;
and the updating unit is used for updating the task allocation optimization model to be trained based on the optimized reward value, the second multidimensional state tensor, the first multidimensional state tensor and the deep reinforcement learning strategy so as to obtain a trained task allocation optimization model.
In a third aspect, the present invention provides a computer device, including a memory and a processor, where the memory and the processor are communicatively connected to each other, and the memory stores computer instructions, and the processor executes the computer instructions, so as to execute the multi-domain internet of things task allocation optimization model training method according to the first aspect or any implementation manner corresponding to the first aspect.
In a fourth aspect, the present invention provides a computer readable storage medium, where computer instructions are stored on the computer readable storage medium, where the computer instructions are configured to cause a computer to execute the multi-domain internet of things task allocation optimization model training method according to the first aspect or any implementation manner corresponding to the first aspect.
According to the multi-domain Internet of things task allocation optimization model training method, device, equipment and medium, a first multi-dimensional state tensor can be generated according to task allocation information, operation performance indexes, task attribute information and task communication quality indexes of a plurality of single-domain Internet of things and edge server groups, the first multi-dimensional state tensor is input into a task allocation optimization model to be trained, so that the task allocation optimization model to be trained performs task allocation optimization based on the first multi-dimensional state tensor to obtain an allocation optimization strategy, allocation of at least one single-domain Internet of things to be processed is optimized based on the allocation optimization strategy, a second multi-dimensional state tensor is generated based on the optimized task allocation information, the built task processing cost objective function and the reward function, an optimization reward value is determined, and the task allocation optimization model to be trained is updated based on the optimization reward value, the second multi-dimensional state tensor, the first multi-dimensional state tensor and the deep reinforcement learning strategy to obtain the well-trained task allocation optimization model. The trained task allocation optimization model can optimize task allocation of the multi-domain Internet of things under the condition of considering processing requirements of different tasks, reasonable allocation of computing resources and other diversified requirements, and effectively achieves diversified targets of different task processing requirements, reasonable allocation of computing resources and the like.
Drawings
In order to more clearly illustrate the invention or the technical solutions in the related art, the following description will briefly explain the drawings used in the embodiments or the related art description, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for those skilled in the art.
FIG. 1 is a flowchart of a multi-domain Internet of things task allocation optimization model training method provided by an embodiment of the invention;
FIG. 2 is a flowchart of another multi-domain Internet of things task allocation optimization model training method provided by an embodiment of the present invention;
Fig. 3 is a schematic structural diagram of a multi-domain task allocation optimization model training device for internet of things according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The multi-domain internet of things task allocation optimization model training method of the invention is described below with reference to fig. 1-2.
As shown in fig. 1, the present embodiment proposes a first multi-domain internet of things task allocation optimization model training method, which may include the following steps:
S101, generating a first multi-dimensional state tensor according to task allocation information, operation performance indexes, task attribute information and task communication quality indexes of a plurality of single-domain Internet of things and edge server groups, wherein the single-domain Internet of things is used for allocating corresponding tasks to be processed to local equipment or the edge server groups for processing.
Wherein, a plurality of internet of things devices can be deployed in each single-domain internet of things. The different single-domain Internet of things are independent and associated with each other. It should be noted that, a plurality of independent and interrelated single-domain internet of things form a multi-domain internet of things.
Specifically, the edge server group can be used for performing edge calculation on tasks of the multi-domain internet of things. The edge server set may include a plurality of edge servers, each of which may be used to handle the task of single domain internet of things offloading.
The task allocation information may include information that the tasks generated by each single-domain internet of things are allocated to a local device or a specific edge server for processing. It can be understood that the local device is an internet of things device for processing tasks in the corresponding single-domain internet of things. Specifically, a certain single-domain internet of things distributes tasks to local equipment for processing, which means that the single-domain internet of things distributes tasks to the internet of things equipment for processing tasks in the single-domain internet of things for processing.
The operation performance index may include index data related to the operation performance of each single-domain internet of things and each edge server, such as CPU utilization, memory, queuing task number, signal-to-noise ratio, bandwidth, transmission rate, and the like.
Optionally, in the method for training the task allocation optimization model of the other multi-domain internet of things provided in this embodiment, the edge server group includes a plurality of edge servers, the task attribute information includes key attribute information of each task to be processed, and the task communication quality index includes estimated communication quality indexes of any task to be processed in the corresponding single-domain internet of things and each edge server respectively.
Specifically, the task attribute information may include related attribute information of each task generated by each single-domain internet of things, such as a calculation requirement, a data amount, a deadline, and the like of the task.
Specifically, the task communication quality index may include estimated communication quality indexes of any task in each single-domain internet of things and each edge server, such as signal-to-noise ratio, round trip delay, and the like.
Specifically, the embodiment may construct the first multidimensional state tensor according to the task allocation information, the operation performance index, the task attribute information and the task communication quality index of each single-domain internet of things and each edge server.
Optionally, step S101 may include:
Generating a task allocation identification matrix, a node characteristic matrix, a task characteristic matrix and a three-dimensional network characteristic tensor according to task allocation information, operation performance indexes, task attribute information and task communication quality indexes of a plurality of single-domain Internet of things and edge server groups;
And taking the task allocation identification matrix, the node characteristic matrix, the task characteristic matrix and the three-dimensional network characteristic tensor as a whole as a first multidimensional state tensor.
The task allocation identification matrix is a matrix for describing task allocation information, the node characteristic matrix is a matrix comprising operation performance indexes of each single-domain Internet of things and each edge server, the task characteristic matrix is a matrix comprising task attribute information of each task, and the three-dimensional network characteristic tensor comprises estimated communication quality indexes of any task in each single-domain Internet of things and each edge server.
Optionally, generating the task allocation identifier matrix, the node feature matrix, the task feature matrix and the three-dimensional network feature tensor according to the task allocation information, the operation performance index, the task attribute information and the task communication quality index of the plurality of single-domain internet of things and the edge server group includes:
based on task allocation information of a plurality of single-domain Internet of things and a plurality of edge servers, constructing a task allocation identification matrix, wherein each row of data in the task allocation identification matrix is used for identifying the single-domain Internet of things to allocate a task to be processed to local equipment or the edge servers;
constructing a node characteristic matrix by using the operation performance indexes of each single-domain Internet of things and each edge server, wherein each row of data of the node characteristic matrix comprises the operation performance indexes of the single-domain Internet of things or the edge servers;
Constructing a task feature matrix based on the key attribute information of each task to be processed, wherein each row of data of the task feature matrix comprises the key attribute information of the task to be processed;
And constructing a three-dimensional network characteristic tensor by using the estimated communication quality indexes of each task to be processed in the corresponding single-domain Internet of things and each edge server respectively, wherein each row of data of the three-dimensional network characteristic tensor comprises the estimated communication quality indexes of the task to be processed in the corresponding single-domain Internet of things and each edge server respectively.
In particular, the task allocation identity matrix may be expressed asWherein M represents the total number of tasks of the multi-domain Internet of things, N represents the total number of nodes available for calculation, and each single-domain Internet of things and each edge server are included. I.e. whenWhen the current task is executed in node n, otherwise. The matrixing representation not only can intuitively reflect the distribution situation of tasks, but also is convenient for realizing efficient state update through matrix operation, thereby reducing the computational complexity.
In particular, the task feature matrix may be expressed as. Attribute vector for task m for each row of dataIncluding calculating critical parameters such as demand, data volume, deadline, etc.
In particular, the node feature matrix may be expressed as. Each data represents a real-time state vector of node n, which may include performance metrics such as CPU utilization, memory, available bandwidth, etc.
Wherein the three-dimensional network feature tensor can be expressed as. Three-dimensional matrix elementThe estimated communication quality between task m and node n, such as signal-to-noise ratio, round trip delay, is described.
Specifically, in this embodiment, the constructed task allocation identifier matrix, node feature matrix, task feature matrix, and three-dimensional network feature tensor may be integrally used as the first multidimensional state tensor.
Wherein the first multidimensional state tensor may be represented as
S102, inputting the first multidimensional state tensor into a task allocation optimization model to be trained, so that the task allocation optimization model to be trained executes task allocation optimization based on the first multidimensional state tensor, and an allocation optimization strategy is obtained.
The task allocation optimization model to be trained can be a model with a deep reinforcement network architecture, and deep reinforcement learning is performed to achieve the task allocation optimization capability.
Alternatively, the task allocation optimization model to be trained may be a Deep Q-Network (DQN) model, or may be another type of Deep-enhanced Network architecture model.
Specifically, the embodiment may input the constructed first multidimensional state tensor into the task allocation optimization model to be trained, so that the task allocation optimization model performs task allocation optimization based on the first multidimensional state tensor, and obtains and outputs an allocation optimization strategy.
S103, optimizing the allocation of the tasks to be processed by at least one single-domain Internet of things based on an allocation optimization strategy.
Specifically, in this embodiment, task allocation of each single-domain internet of things may be optimized according to an allocation optimization policy, that is, some tasks are adjusted from a local device to an edge server for processing, and some tasks are adjusted from the edge server to the local device for processing.
S104, generating a second multidimensional state tensor based on the optimized task allocation information, the running performance index, the task attribute information and the task communication quality index.
Specifically, in this embodiment, after task allocation of the multi-domain internet of things is optimized based on an allocation optimization policy, task allocation information, operation performance indexes, task attribute information and task communication quality indexes of the multi-domain internet of things and an edge server group after allocation optimization are obtained, and a specific process of generating a first multi-dimensional state tensor in step S101 is referred to, and a corresponding second multi-dimensional state tensor is generated based on the task allocation information, the operation performance indexes, the task attribute information and the task communication quality indexes of the multi-domain internet of things and the edge server group after allocation optimization.
It is understood that the second multidimensional state tensor is also integrally formed with the task assignment identification matrix, the node feature matrix, the task feature matrix, and the three-dimensional network feature tensor.
S105, determining an optimized rewarding value according to the task allocation information, the optimized task allocation information, the constructed task processing cost objective function and the rewarding function.
Specifically, the embodiment can construct a task processing cost objective function and a reward function according to actual situations.
Specifically, the embodiment can determine the corresponding optimized reward value according to the task allocation information of the multi-domain internet of things, the optimized task allocation information, the task processing cost objective function and the reward function.
Optionally, step S105 may include:
determining a first task processing cost according to the task allocation information and the task processing cost objective function, and determining a second task processing cost according to the optimized task allocation information and the task processing cost objective function;
Subtracting the first task processing cost from the second task processing cost to obtain a cost difference value;
The cost difference is input into a bonus function for bonus calculation to determine an optimized bonus value.
Specifically, in this embodiment, the task processing cost objective function may be used to calculate based on the task allocation information, so as to obtain the first task processing cost. And calculating based on the optimized task allocation information by using the task processing cost objective function to obtain a second task processing cost.
Optionally, the inputting the cost difference value into the reward function to perform a reward calculation to determine an optimized reward value includes:
Inputting the cost difference into the bonus function so that the bonus function can determine that a first set value larger than 0 is an optimized bonus value when the cost difference is larger than 0, a second set value smaller than 0 is an optimized bonus value when the cost difference is equal to 0, and a third set value smaller than 0 is an optimized bonus value when the cost difference is smaller than 0;
The absolute value of the third set value is equal to the first set value, and the absolute value of the third set value is larger than the absolute value of the second set value.
And S106, updating the task allocation optimization model to be trained based on the optimized reward value, the second multidimensional state tensor, the first multidimensional state tensor and the deep reinforcement learning strategy so as to obtain a trained task allocation optimization model.
Optionally, in the method for training the task allocation optimization model of the other multi-domain internet of things provided in this embodiment, when the task allocation optimization model to be trained is the deep Q network DQN model, the task allocation optimization model to be trained includes an estimation network, a target network and an error function. At this time, step S106 may include:
inputting the first multi-dimensional state tensor into an estimation network to perform action profit prediction to obtain a first profit value;
inputting the first benefit value, the second benefit value and the optimized reward value into an error function so that the error function performs weighted summation on the optimized reward value and the second benefit value based on set weights to obtain a target benefit value, and determining a loss function value according to the difference between the target benefit value and the first benefit value;
updating the task allocation optimization model to be trained based on the loss function value to obtain a trained task allocation optimization model.
Specifically, the task allocation optimization model to be trained is an DQN model, and the task allocation optimization model comprises an estimation network, a target network and an DQN error function.
Specifically, the embodiment may consider the whole of the multi-domain internet of things and the edge server group as an environment, and input the network state of the environment at the time t, that is, the first multi-dimensional state tensor, into the estimation network to perform action benefit prediction, so as to obtain the first benefit value output by the estimation network. And inputting the network state of the environment at the time t+1, namely a second multidimensional state tensor, into the target network to perform expected profit prediction, and obtaining a second profit value.
Specifically, the first benefit value, the second benefit value and the optimized prize value may be input into the DQN error function, so that the DQN error function may perform weighted summation on the optimized prize value and the second benefit value based on the set weight, obtain the target benefit value, and calculate the corresponding loss function value according to the difference between the target benefit value and the first benefit value and according to the set loss function. And then, when the loss function value is not smaller than the set threshold value, the embodiment can update parameters of the task allocation optimization model to be trained to obtain an updated task allocation optimization model.
Then, the embodiment may use the updated task allocation optimization model as the task allocation optimization model to be trained, and return to execute step S101, and continue to update the task allocation optimization model until the calculated loss function value is smaller than the set threshold, or the iteration number meets the requirement, and obtain the latest task allocation optimization model. After that, the embodiment can test and verify the latest task allocation optimization model, and when the test and verification pass, the latest task allocation optimization model can be determined as a trained task allocation optimization model. And when the test and verification are not passed, training and updating the latest task allocation optimization model can be continued until a trained task allocation optimization model is obtained.
According to the multi-domain Internet of things task allocation optimization model training method, a first multi-dimensional state tensor can be generated according to task allocation information, operation performance indexes, task attribute information and task communication quality indexes of a plurality of single-domain Internet of things and edge server groups, the first multi-dimensional state tensor is input into a task allocation optimization model to be trained, the task allocation optimization model to be trained is enabled to execute task allocation optimization based on the first multi-dimensional state tensor to obtain an allocation optimization strategy, allocation of at least one single-domain Internet of things to be processed is optimized based on the allocation optimization strategy, a second multi-dimensional state tensor is generated based on the optimized task allocation information, the operation performance indexes, the task attribute information and the task communication quality indexes, an optimized reward value is determined according to the task allocation information, the constructed task processing cost objective function and the reward function, and the task allocation optimization model to be trained is updated based on the optimized reward value, the second multi-dimensional state tensor, the first multi-dimensional state tensor and the deep reinforcement learning strategy, and the task allocation optimization model to be trained is obtained. The trained task allocation optimization model can optimize task allocation of the multi-domain Internet of things under the condition of considering processing requirements of different tasks, reasonable allocation of computing resources and other diversified requirements, and effectively achieves diversified targets of different task processing requirements, reasonable allocation of computing resources and the like.
Based on fig. 1, another multi-domain internet of things task allocation optimization model training method is provided in this embodiment, and according to the multi-domain internet of things task allocation scene, a task processing cost objective function and a reward function can be constructed.
Specifically, in the process of establishing the task processing cost objective function, the network model may be configured to be composed of D single-domain internet of things and E edge servers, where different single-domain internet of things may be represented asThe edge servers are represented as. Setting each single-domain internet of things generates individual tasks that can be offloaded to an edge server.
Apparatus and method for controlling the operation of a deviceGenerating tasksWhereinRepresenting the input value of the task,Representing the resources required to process the task, such as the different number of CPU cycles.
The total energy consumption for performing a task can be expressed as:
;
Wherein, theFor a fixed frequency of the CPU, i.e. the CPU cycle frequency required to process 1bit data,And (5) corresponding energy consumption proportion for each CPU.Is the task size.
The local computation delay is expressed as:
;
Wherein, theRepresenting single-domain Internet of thingsThe amount of data that can be processed per cycle by the CPU of (c).
The cost function is expressed as:
;
Wherein, theRepresenting the weight of the delay cost.
In an edge offload task, the communication latency may be expressed as:
;
;
Wherein, theIs the single-domain internet of thingsAnd edge serverThe time delay of the communication between them,Is the single-domain internet of thingsAnd edge serverA bandwidth channel therebetween.Delegate tasks assigned to edge servers. W is the shared bandwidth between the hypothetical edge server and the connected single-domain internet of things.Representative ofChannel gain of (i.e. single domain internet of things)And edge serverCommunication gap between them.Representing single-domain Internet of thingsTo edge serversIs provided.Representing the use of complex gaussian models to process noise present in the data.Representing the signal to noise ratio.
The transmission delay between the device and the server can be expressed as:
;
Wherein, theRepresenting single domain internet of thingsTo edge serversIs a data amount of (a) in the data stream.
With respect to the task computing model, when an edge server receives two or more offloaded tasks from different single-domain Internet of things, it is assumed that computing resources are shared equally between tasks. At this time, the number of resources allocated to each single-domain internet of things for calculation may be expressed as:
;
Wherein, theRepresentative edge serverThe task processing capacities of all servers are set to be uniform.Representing the CPU cycles required by the edge server to handle a task,Representing the power consumption per CPU cycle.Representing the frequency of the CPU and,To calculate the cost.
The latency of an edge server performing a task can be expressed as:
;
Wherein, theRepresenting the frequency of the CPU.
The transmission delay of the single-domain internet of things to transmit tasks to the edge server can be expressed as:
Wherein whenWhen the device is used, the device of the Internet of things is representedSuccessful offloading of tasks toCalculating on a server, and meeting the constraint:
;
Finally, the cost function of the edge computation mode is expressed as
;
Wherein P represents the transmission energy consumption of the device, and specifically refers to the fixed transmission power of the device during communication.
The cost function can be expressed as:
Penalty functionIs suitable for task execution failure caused by violation of computational constraint.
Optimizing an objective function, and constructing a task processing cost objective function:
Specifically, the embodiment may calculate the first task processing cost of the multi-domain internet of things and the edge server group based on the task processing cost objective function. And the second task processing cost of the multi-domain internet of things and the edge server after the task allocation optimization is performed can be calculated based on the task processing cost objective function after the task allocation is adjusted according to the allocation optimization strategy. Subtracting the first task processing cost from the second task processing cost to obtain a cost difference value
Specifically, the reward function constructed in this embodiment is:
Wherein, theAndAre all larger than 0 and are the optimal rewarding valuesGreater than
In particular, the present embodiment can be based on the cost differenceAnd a prize function, determining a corresponding optimized prize value.
In the related art, cloud computing has a high latency problem in remote task processing, and edge computing solves this problem by bringing computing tasks close to the data sources. The dynamics and heterogeneity of multi-domain internet of things (e.g., smart agriculture, smart cities) makes task offloading an NP-hard problem. The key challenge is how to trade off local computing with cloud offloading in real time in a resource constrained, dynamically diverse environment to meet the demands of low latency, high energy efficiency and low cost simultaneously. According to the research of the inventor of the embodiment, in view of the complexity and the dynamics of the task offloading problem of the multi-domain internet of things, the related technology is difficult to process a high-dimensional state space (such as equipment load, network delay, energy consumption and the like) of the multi-domain internet of things, while the DQN automatically extracts key features through a deep neural network approximate Q value function to realize end-to-end self-adaptive decision. The experience playback mechanism breaks the data correlation, improves the training stability, is particularly suitable for cross-domain experience sharing, and the target network relieves the problem of Q value overestimation and ensures the stable optimization of strategies among multiple targets such as time delay, energy consumption and the like. In addition, the DQN has lower calculation cost, can be deployed on edge equipment through lightweight design, and can learn a long-term optimal strategy without presetting rules. The present embodiment uses the DQN model.
Furthermore, it is considered that it is difficult to employ conventional Q learning due to complexity and continuous states of edge computation, because the speed of model convergence and obtaining an optimal strategy is greatly reduced as the search space increases. Therefore, the design adopts the deep convolutional neural network to estimate the Q value, so that the convergence speed can be increased and the method can be used for dimension reduction.
It will be appreciated that the present embodiment may be used in conjunction with a distributed deep reinforcement learning approach to training the DQN model.
As shown in fig. 2, the estimation network, the target network and the DQN error function are included in the DQN model to be trained. Both the estimation network and the target network may be deep convolutional neural networks. In this embodiment, the multi-domain internet of things and the edge server group may be regarded as an environment as a whole, and the constructed first multi-dimensional state tensor may be regarded as a network state of the environment at time t. In this embodiment, the first multidimensional state tensor is input into the DQN model to perform task allocation optimization, an allocation optimization strategy is obtained and returned to the environment, the allocation optimization strategy is regarded as an action, the environment responds to the action, and a corresponding second multidimensional state tensor, namely, the network state of the environment at time t+1 is generated. The first multidimensional state tensor and the second multidimensional state tensor are both stored in an experience pool. The present embodiment may store the calculated optimized prize value rt also in the experience pool.
In the embodiment, a first multidimensional state tensor and a second multidimensional state tensor in an experience pool are respectively input into an estimation network and a target network, the estimation network performs action gain prediction based on the first multidimensional state tensor to obtain a first gain value, namely an estimated Q value, and the target network performs expected gain prediction based on the second multidimensional state tensor to obtain a second gain value, namely a target Q value. And inputting the first benefit value, the second benefit value and the determined optimized rewards value rt into the DQN error function for calculation to obtain a corresponding loss function value. And updating parameters in the estimated network according to the loss function value, and periodically updating the parameters in the estimated network to the target network synchronously. Then, the present embodiment may continue to train the updated DQN model until a trained DQN model is obtained.
According to the embodiment, the Internet of things can be modeled as an agent through the distributed deep reinforcement learning framework, and the task unloading strategy is optimized through the Markov decision process. Using DQN model and convolutional neural network as function approximators, convergence is accelerated and high-dimensional state space is handled. Dynamic resource allocation policies are designed to take into account device computing power, bandwidth, energy state, etc. According to the embodiment, the energy-saving distributed task unloading system is designed, the unloading knowledge model is constructed, and load decisions can be made in real time and efficiently, so that idle resources and response time are reduced to the greatest extent.
The embodiment can realize multi-domain IoT network architecture design and support dynamic task offloading. The distributed deep reinforcement learning algorithm allows devices to independently make decisions without global information. And the learning efficiency and the convergence rate are improved through the deep convolutional neural network compression state space.
According to the method and the system, efficient and self-adaptive task unloading can be achieved through distributed deep reinforcement learning under the dynamic and resource-limited environment of the multi-domain Internet of things, difficulties such as high-dimensional decision making, multi-objective optimization and cross-domain coordination are overcome, and finally the edge intelligent system with low delay, low energy consumption and high reliability is achieved.
According to the multi-domain internet of things task allocation optimization model training method, a distributed deep reinforcement learning task unloading optimization mode is provided, local calculation and edge unloading can be dynamically balanced through a built knowledge model, and lower time delay, higher energy efficiency and faster convergence are achieved. And an extensible and self-adaptive scheme idea is provided for landing in a complex Internet of things scene.
As shown in fig. 3, this embodiment proposes a multi-domain task allocation optimization model training device for internet of things, where the device may include:
The first generating unit 301 is configured to generate a first multi-dimensional state tensor according to task allocation information, operation performance indexes, task attribute information and task communication quality indexes of a plurality of single-domain internet of things and an edge server group, where the single-domain internet of things is configured to allocate a corresponding task to be processed to a local device or the edge server group for processing;
The input unit 302 is configured to input the first multidimensional state tensor into a task allocation optimization model to be trained, so that the task allocation optimization model to be trained performs task allocation optimization based on the first multidimensional state tensor, and an allocation optimization strategy is obtained;
an optimizing unit 303, configured to optimize allocation of tasks to be processed by at least one single-domain internet of things based on an allocation optimization policy;
A second generating unit 304, configured to generate a second multidimensional state tensor based on the optimized task allocation information, the running performance index, the task attribute information, and the task communication quality index;
a determining unit 305, configured to determine an optimized reward value according to the task allocation information, the optimized task allocation information, the constructed task processing cost objective function and the reward function;
The updating unit 306 is configured to update the task allocation optimization model to be trained based on the optimized reward value, the second multidimensional state tensor, the first multidimensional state tensor, and the deep reinforcement learning strategy, so as to obtain a trained task allocation optimization model.
It should be noted that, the processing procedures of the first generating unit 301, the input unit 302, the optimizing unit 303, the second generating unit 304, the determining unit 305, and the updating unit 306 and the beneficial effects thereof may refer to steps S101 to S105 in fig. 1, respectively, and are not described again.
Optionally, the first generating unit 301 is further configured to:
Generating a task allocation identification matrix, a node characteristic matrix, a task characteristic matrix and a three-dimensional network characteristic tensor according to task allocation information, operation performance indexes, task attribute information and task communication quality indexes of a plurality of single-domain Internet of things and edge server groups;
And taking the task allocation identification matrix, the node characteristic matrix, the task characteristic matrix and the three-dimensional network characteristic tensor as a whole as a first multidimensional state tensor.
Optionally, the edge server group comprises a plurality of edge servers, the task attribute information comprises key attribute information of each task to be processed, and the task communication quality index comprises estimated communication quality indexes of any task to be processed in the corresponding single-domain Internet of things and each edge server respectively.
Optionally, the first generating unit 301 is further configured to:
based on task allocation information of a plurality of single-domain Internet of things and a plurality of edge servers, constructing a task allocation identification matrix, wherein each row of data in the task allocation identification matrix is used for identifying the single-domain Internet of things to allocate a task to be processed to local equipment or the edge servers;
constructing a node characteristic matrix by using the operation performance indexes of each single-domain Internet of things and each edge server, wherein each row of data of the node characteristic matrix comprises the operation performance indexes of the single-domain Internet of things or the edge servers;
Constructing a task feature matrix based on the key attribute information of each task to be processed, wherein each row of data of the task feature matrix comprises the key attribute information of the task to be processed;
And constructing a three-dimensional network characteristic tensor by using the estimated communication quality indexes of each task to be processed in the corresponding single-domain Internet of things and each edge server respectively, wherein each row of data of the three-dimensional network characteristic tensor comprises the estimated communication quality indexes of the task to be processed in the corresponding single-domain Internet of things and each edge server respectively.
Optionally, the determining unit 305 is further configured to:
determining a first task processing cost according to the task allocation information and the task processing cost objective function, and determining a second task processing cost according to the optimized task allocation information and the task processing cost objective function;
Subtracting the first task processing cost from the second task processing cost to obtain a cost difference value;
The cost difference is input into a bonus function for bonus calculation to determine an optimized bonus value.
Optionally, the determining unit 305 is further configured to:
Inputting the cost difference into the bonus function so that the bonus function can determine that a first set value larger than 0 is an optimized bonus value when the cost difference is larger than 0, a second set value smaller than 0 is an optimized bonus value when the cost difference is equal to 0, and a third set value smaller than 0 is an optimized bonus value when the cost difference is smaller than 0;
The absolute value of the third set value is equal to the first set value, and the absolute value of the third set value is larger than the absolute value of the second set value.
Optionally, when the task allocation optimization model to be trained is a deep Q network DQN model, the task allocation optimization model to be trained includes an estimation network, a target network and an error function;
The updating unit 306 is further configured to:
inputting the first multi-dimensional state tensor into an estimation network to perform action profit prediction to obtain a first profit value;
inputting the first benefit value, the second benefit value and the optimized reward value into an error function so that the error function performs weighted summation on the optimized reward value and the second benefit value based on set weights to obtain a target benefit value, and determining a loss function value according to the difference between the target benefit value and the first benefit value;
updating the task allocation optimization model to be trained based on the loss function value to obtain a trained task allocation optimization model.
The multi-domain internet of things task allocation optimization model training device provided by the embodiment can generate a first multi-dimensional state tensor according to the task allocation information, the operation performance indexes, the task attribute information and the task communication quality indexes of the plurality of single-domain internet of things and the edge server group, and input the first multi-dimensional state tensor into the task allocation optimization model to be trained, so that the task allocation optimization model to be trained executes task allocation optimization based on the first multi-dimensional state tensor to obtain an allocation optimization strategy, and updates the task allocation optimization model to be trained based on the allocation optimization strategy to obtain the trained task allocation optimization model. The trained task allocation optimization model can optimize task allocation of the multi-domain Internet of things under the condition of considering processing requirements of different tasks, reasonable allocation of computing resources and other diversified requirements, and effectively achieves diversified targets of different task processing requirements, reasonable allocation of computing resources and the like.
The multi-domain internet of things task allocation optimization model training device in this embodiment is presented in the form of functional units, where the units are ASIC (Application SPECIFIC INTEGRATED Circuit) circuits, processors and memories that execute one or more software or firmware programs, and/or other devices that can provide the above functions.
The embodiment of the invention also provides computer equipment, which is provided with the multi-domain internet of things task allocation optimization model training device shown in the figure 3.
Referring to FIG. 4, an alternative embodiment of the present invention provides a schematic structural diagram of a computer device, which includes one or more processors 10, a memory 20, and interfaces for connecting the components, including a high-speed interface and a low-speed interface. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the computer device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories. Also, multiple computer devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 10 is illustrated in fig. 4.
The processor 10 may be a central processor, a network processor, or a combination thereof. The processor 10 may further include a hardware chip, among others. The hardware chip may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.
Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform a method for implementing the embodiments described above.
The memory 20 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area. The storage data area may store data created according to the use of the computer device, etc. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the computer device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The memory 20 may include volatile memory, such as random access memory. The memory may also include non-volatile memory, such as flash memory, a hard disk, or a solid state disk. The memory 20 may also comprise a combination of the above types of memories.
The computer device also includes a communication interface 30 for the computer device to communicate with other devices or communication networks.
The embodiments of the present invention also provide a computer readable storage medium, and the method according to the embodiments of the present invention described above may be implemented in hardware, firmware, or as a computer code which may be recorded on a storage medium, or as original stored in a remote storage medium or a non-transitory machine readable storage medium downloaded through a network and to be stored in a local storage medium, so that the method described herein may be stored on such software process on a storage medium using a general purpose computer, a special purpose processor, or programmable or special purpose hardware. The storage medium may be a magnetic disk, an optical disk, a read-only memory, a random-access memory, a flash memory, a hard disk, a solid state disk, or the like, and further, the storage medium may further include a combination of the above types of memories. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.
It should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention, and not for limiting the same, and although the present invention has been described in detail with reference to the above-mentioned embodiments, it should be understood by those skilled in the art that the technical solution described in the above-mentioned embodiments may be modified or some technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the spirit and scope of the technical solution of the embodiments of the present invention.

Claims (10)

CN202510976135.XA2025-07-162025-07-16Multi-domain Internet of things task allocation optimization model training method, device, equipment and mediumActiveCN120499018B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202510976135.XACN120499018B (en)2025-07-162025-07-16Multi-domain Internet of things task allocation optimization model training method, device, equipment and medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202510976135.XACN120499018B (en)2025-07-162025-07-16Multi-domain Internet of things task allocation optimization model training method, device, equipment and medium

Publications (2)

Publication NumberPublication Date
CN120499018A CN120499018A (en)2025-08-15
CN120499018Btrue CN120499018B (en)2025-09-26

Family

ID=96679232

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202510976135.XAActiveCN120499018B (en)2025-07-162025-07-16Multi-domain Internet of things task allocation optimization model training method, device, equipment and medium

Country Status (1)

CountryLink
CN (1)CN120499018B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN117032928A (en)*2023-08-172023-11-10上海交通大学Self-adaptive multi-type task scheduling and multi-domain resource allocation joint design method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20180165602A1 (en)*2016-12-142018-06-14Microsoft Technology Licensing, LlcScalability of reinforcement learning by separation of concerns

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN117032928A (en)*2023-08-172023-11-10上海交通大学Self-adaptive multi-type task scheduling and multi-domain resource allocation joint design method

Also Published As

Publication numberPublication date
CN120499018A (en)2025-08-15

Similar Documents

PublicationPublication DateTitle
CN110971706B (en) Approximate Optimization and Reinforcement Learning-Based Task Offloading Methods in MEC
Huang et al.Toward decentralized and collaborative deep learning inference for intelligent iot devices
CN109818786B (en)Method for optimally selecting distributed multi-resource combined path capable of sensing application of cloud data center
CN114340016B (en)Power grid edge calculation unloading distribution method and system
CN111176820A (en)Deep neural network-based edge computing task allocation method and device
CN113747504B (en) Method and system for joint task offloading and resource allocation of multi-access edge computing
CN111669291A (en) Deployment method of virtualized network service function chain based on deep reinforcement learning
CN112312299A (en)Service unloading method, device and system
CN114205353A (en)Calculation unloading method based on hybrid action space reinforcement learning algorithm
CN112672382A (en)Hybrid collaborative computing unloading method and device, electronic equipment and storage medium
EP4024212A1 (en)Method for scheduling interference workloads on edge network resources
CN116112525A (en) A method, system, and electronic device for task offloading of Internet of Vehicles
CN117669741A (en) Dynamic collaborative reasoning method for UAV cluster size model based on genetic algorithm
CN113573363A (en)MEC calculation unloading and resource allocation method based on deep reinforcement learning
CN115934192B (en)B5G/6G network-oriented internet of vehicles multi-type task cooperation unloading method
CN117042184A (en)Calculation unloading and resource allocation method based on deep reinforcement learning
CN119759587A (en) Task scheduling method for multi-machine collaboration in edge computing scenarios of space and air networks
CN119417022A (en) A virtualized digital twin traffic resource scheduling model based on cloud-edge collaboration
CN120499018B (en)Multi-domain Internet of things task allocation optimization model training method, device, equipment and medium
CN118175158A (en)Multi-layer collaborative task unloading optimization method crossing local edge and cloud resource
Yao et al.Performance Optimization in Serverless Edge Computing Environment using DRL-Based Function Offloading
CN115174681B (en) An edge computing service request scheduling method, device and storage medium
CN113157344B (en)DRL-based energy consumption perception task unloading method in mobile edge computing environment
Ye et al.Deep reinforcement learning for dependent task offloading in multi-access edge computing
Zhang et al.Joint DNN Partitioning and Task Offloading Based on Attention Mechanism-Aided Reinforcement Learning

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp