CN120596280A

Movatterモバイル変換

Info

Publication number: CN120596280A
Application number: CN202511093589.9A
Authority: CN
Inventors: 吴文臣; 郑琦涵; 梁建燊; 陈俊松; 张文韬; 翁圳楷; 庞文刚; 谢舒安
Original assignee: China Unicom WO Music and Culture Co Ltd
Current assignee: China Unicom WO Music and Culture Co Ltd
Priority date: 2025-08-06
Filing date: 2025-08-06
Publication date: 2025-09-05
Anticipated expiration: 2045-08-06

Abstract

The invention provides a GPU scheduling system, a GPU scheduling method and electronic equipment, and relates to the technical field of computer control. The monitoring module tracks the resource use condition and task queue details of the system instance in real time, including task types, system loads and user priorities. And the dynamic scheduling module performs offline measurement, priority filling and multi-strategy coordination through a GPU scheduling algorithm according to the monitoring data, and dynamically adjusts GPU resource allocation. And when receiving the switching request, the switching module executes the context snapshot, the channel migration and the target snapshot recovery, and realizes the rapid switching of the GPU resources among different system instances. According to the method, the modules cooperate with each other, so that the dispatching efficiency and the utilization rate of GPU resources are improved together.

Description

GPU scheduling system, method and electronic equipment

Technical Field

The invention relates to the technical field of computer control, in particular to a GPU scheduling system, a method and electronic equipment.

Background

At present, with the rapid popularization of GPU intensive loads such as AI, HPC, graphics rendering and the like in a data center, the demands of users on GPU resources are in a trend of multi-operating system, multi-tenant and fine-granularity sharing. The prior art commonly adopts GPU direct connection or container-level GPU plug-in to realize the monopolization or sharing of GPU resources.

Disclosure of Invention

The invention solves the problem of how to improve the resource scheduling efficiency and the resource utilization rate of the GPU in a multi-operating system scene.

In order to solve the problems, the invention provides a GPU scheduling system, a GPU scheduling method and electronic equipment.

In a first aspect, a GPU scheduling system of the present invention includes:

When a request for creating a system instance by a user is received, GPU resources are allocated to the system instance by combining the request for creating through the GPU resource pool, and then user tasks are executed through the GPU resources in the system instance;

the monitoring module is used for acquiring the use condition of the current GPU resource by the system instance and the task type, the system load and the user priority information in the task queue of the user task executed by the system instance in real time;

The dynamic scheduling module is used for carrying out offline measurement, priority filling and multi-strategy coordination processing in sequence through a GPU scheduling algorithm according to the use condition of the current GPU resource, the task type, the system load and the user priority information in the task queue, and carrying out dynamic adjustment on the GPU resource;

And the switching module is used for distributing the current GPU resource to a target system instance in the switching request by executing continuous processing of context snapshot, channel migration and target-end snapshot recovery on the current GPU resource when the switching request of the current GPU resource is received.

Optionally, the allocation running module is specifically configured to:

acquiring equipment information of the GPU equipment of the user, loading a driving program corresponding to the GPU equipment, and generating a GPU resource pool of the GPU equipment;

When the creation request of the user for the system instance is received, determining a type identifier and GPU resource demand parameters of the system instance according to the creation request;

Determining GPU driving corresponding to the system instance according to the type identifier;

Through the GPU driver, GPU resources are distributed from the GPU resource pool to the system instance according to the GPU resource demand parameters;

Executing user tasks through the GPU resources in the system instance.

Optionally, the dynamic scheduling module is specifically configured to:

according to the task types in the task queue, offline measurement is carried out through the GPU scheduling algorithm, and kernel execution time corresponding to the task types is obtained;

Dividing the tasks in the task queue according to preset categories to obtain category labels of each task;

each category label is respectively used as the corresponding task type in the task queue;

and executing the tasks in each task type in the GPU for preset times, and performing offline measurement to obtain the execution time of the kernel.

Optionally, the dynamic scheduling module is specifically further configured to:

Generating an idle gap threshold corresponding to the task type according to the kernel execution time, and determining a GPU idle period according to the idle gap threshold;

determining the priority of the task under each task type according to the task type, the system load and the user priority information;

Sequencing according to the priorities of the tasks to obtain a priority sequence under the task type;

And filling the task to be executed with the priority higher than that of the currently executed task into the idle period of the GPU according to the priority sequence, and carrying out priority inheritance between the currently executed task and the task to be executed through an rt_mutex mechanism.

after filling the GPU idle period is completed each time, acquiring real-time GPU utilization rate and task completion time of all tasks in the GPU idle period after filling is completed;

Determining an urgency score of the GPU idle period after filling is completed according to the real-time GPU utilization rate and the task completion time;

And determining an adjustment strategy of the GPU resources according to a comparison result of the emergency score and a preset threshold interval, and dynamically adjusting the GPU resources through the adjustment strategy.

when the urgency score is higher than the preset threshold interval, reallocating the GPU resources according to preset credit value weights through ACCCREDIT strategies;

when the urgency score is lower than the preset threshold interval, dividing the tasks in the idle period of the GPU into a plurality of group tasks, and cooperatively distributing the GPU resources according to the group tasks through CoSched strategies;

when the emergency degree score is in the preset threshold value interval and the preset fairness requirement of the system instance is higher than the preset real-time requirement, distributing the GPU resource to the task with the priority higher than the preset threshold value through AugC strategies;

And when the emergency degree score is in the preset threshold value interval and the service level agreement constraint exists in the system instance, distributing the GPU resources according to the priority of the task through SLAF strategy.

Optionally, the switching module is specifically configured to:

When a switching request of the current GPU resource is received, snapshot operation is carried out on the context of the current GPU resource, snapshot data are generated, and the snapshot data comprise GPU register states, video memory contents and command queues of the GPU equipment;

migrating the GPU register state, the video memory content and the command queue from the current system instance to the target system instance through VFIO secure channels;

And restoring the GPU context and the video memory content in the target system instance according to the GPU register state, the video memory content and the command queue, and continuously executing the task executed when the switching request is received.

Optionally, the system further comprises a resource release module for:

And when any system instance is detected to release the GPU resources, recovering the released GPU resources and returning to the GPU resource pool.

In a second aspect, the present invention provides a GPU scheduling method, including:

When a request for creating a system instance by a user is received, distributing GPU resources to the system instance by combining the request for creating through the GPU resource pool, and executing user tasks through the GPU resources in the system instance;

acquiring the use condition of the system instance on the current GPU resource and the task type, the system load and the user priority information in the task queue of the user task executed by the system instance in real time;

According to the use condition of the current GPU resources, the task type in the task queue, the system load and the user priority information, performing offline measurement, priority filling and multi-strategy coordination processing in sequence through a GPU scheduling algorithm, and dynamically adjusting the GPU resources;

When a switching request of the current GPU resource is received, the current GPU resource is distributed to a target system instance in the switching request through continuous processing of context snapshot, channel migration and target-end snapshot recovery of the current GPU resource.

In a third aspect, an electronic device of the present invention includes a memory and a processor;

the memory is used for storing a computer program;

The processor is configured to implement the GPU scheduling method described above when executing the computer program.

According to the GPU scheduling system, the method and the electronic equipment, equipment information of the GPU equipment is accurately acquired through the distribution operation module, and accordingly a GPU resource pool is generated. When a user creates a system instance request, the GPU resource pool is combined with the creation request to reasonably allocate GPU resources to the system instance, so that the GPU resources in the system instance can execute user tasks. The use condition of the system instance on the GPU resource is tracked in real time through the monitoring module, and the task type, the system load and the user priority information in the task queue of the system instance are obtained at the same time, so that a data basis is provided for subsequent dynamic scheduling. And the dynamic scheduling module sequentially performs offline measurement, priority filling and multi-strategy coordination processing by using a GPU scheduling algorithm according to the information acquired by the monitoring module, so that GPU resources are dynamically adjusted, and the optimal configuration of the resources is realized. When receiving a switching request of the current GPU resource, the switching module rapidly and efficiently distributes the current GPU resource to the target system instance by executing continuous processing of context snapshot, channel migration and target-end snapshot recovery, so that resource waste and performance loss in the switching process are reduced. In the invention, the GPU resource pool generated by the allocation operation module provides basic resources for the whole dispatching system, the monitoring module collects operation data of system examples in real time and provides decision basis for the dynamic dispatching module, the dynamic dispatching module optimizes resources according to the data, the switching module ensures the high efficiency of switching GPU resources among different system examples, and the modules cooperate with each other to jointly improve the dispatching efficiency and the utilization rate of the GPU resources.

According to the method, the GPU resource allocation can be quickly regulated according to the real-time resource use condition and task requirements by improving the scheduling efficiency through the GPU scheduling algorithm of the dynamic scheduling module, so that resource idling and waste are reduced, and the overall GPU resource scheduling efficiency is improved. The flexible allocation and switching of the GPU resources under the multi-operating-system scene are realized by improving the resource utilization rate, so that the GPU resources can better meet the requirements of different users and tasks, and the utilization rate of the GPU resources is improved. The system can adapt to complex environments of multiple operating systems and multiple tenants by enhancing the cooperative work of each module of the system flexibility, and the flexibility and the adaptability of the system are enhanced by changing different task types and user priorities. By reducing the processing of the switching overhead switching module through context snapshot, channel migration, target end snapshot recovery and the like, time consumption and resource overhead in the process of switching the virtual machine and the container are reduced, the switching efficiency is improved, and the use efficiency of GPU resources is further improved.

Drawings

FIG. 1 is a block diagram of a GPU scheduling system according to an embodiment of the present invention;

fig. 2 is a flowchart of a GPU scheduling method according to an embodiment of the present invention.

Detailed Description

In order that the above objects, features and advantages of the invention will be readily understood, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. While the invention is susceptible of embodiment in the drawings, it is to be understood that the invention may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided to provide a more thorough and complete understanding of the invention. It should be understood that the drawings and embodiments of the invention are for illustration purposes only and are not intended to limit the scope of the present invention.

It should be understood that the various steps recited in the method embodiments of the present invention may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the invention is not limited in this respect.

The term "comprising" and variations thereof as used herein is meant to be open-ended, i.e., "including but not limited to," based at least in part on, "one embodiment" means "at least one embodiment," another embodiment "means" at least one additional embodiment, "some embodiments" means "at least some embodiments," and "optional" means "optional embodiment. Related definitions of other terms will be given in the description below. It should be noted that the concepts of "first", "second", etc. mentioned in this disclosure are only used to distinguish between different devices, modules or units, and are not intended to limit the order or interdependence of functions performed by these devices, modules or units.

It should be noted that references to "a" and "an" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

Aiming at the problems in the related art, the embodiment provides a GPU scheduling system, a method and an electronic device.

Referring to fig. 1, a GPU scheduling system provided in an embodiment of the present invention includes:

The system comprises an allocation operation module, a system instance, a user task execution module and a system instance, wherein the allocation operation module is used for acquiring equipment information of GPU equipment of a user and generating a GPU resource pool of the GPU equipment according to the equipment information, when a creation request of the user for the system instance is received, GPU resources are allocated to the system instance through the GPU resource pool in combination with the creation request, and user tasks are executed through the GPU resources in the system instance.

Specifically, the allocation running module firstly communicates with the GPU device of the user, and utilizes a device management API or a hardware abstraction layer interface of the system to obtain detailed information of the GPU device, such as GPU model, video memory capacity, number of computing cores, supported instruction set, and the like. And then, according to the equipment information and certain rules and strategies, GPU equipment with the same model and similar performance is grouped into a group, and a GPU resource pool is constructed. When a user requests the creation of the system instance, the module analyzes the information such as the task type, the required GPU resource quantity, the performance requirement and the like contained in the request, and then applies a resource allocation algorithm by combining the established GPU resource pool to accurately allocate the proper GPU resource to the corresponding system instance, so that the system instance can efficiently execute the user task by utilizing the allocated GPU resource.

The monitoring module is used for acquiring the use condition of the current GPU resource by the system instance and the task type, the system load and the user priority information in the task queue of the user task executed by the system instance in real time.

Specifically, the monitoring module collects various use indexes of the system instance on the current GPU resources in real time by means of a monitoring agent preset in the system instance or a performance monitoring tool utilizing an operating system, such as the video memory occupancy rate, the computing core utilization rate, the texture mapping unit use condition and the like of the GPU. The method comprises the steps of obtaining a task type by analyzing a task description file, inquiring a task state database and the like, distinguishing a deep learning training task, a graphic rendering task, a high-performance computing task and the like, collecting system load information including CPU (Central processing Unit) utilization rate, memory occupation conditions, the number of currently running tasks, the queue length of waiting tasks and the like, and searching corresponding user priority information in a user priority rule base set in a system according to identity information of a user, the emergency degree and importance level of the task, so that the running state and task demand conditions of the system example are comprehensively mastered.

And the dynamic scheduling module is used for carrying out offline measurement, priority filling and multi-strategy coordination processing in sequence through a GPU scheduling algorithm according to the use condition of the current GPU resource, the task type in the task queue, the system load and the user priority information, and carrying out dynamic adjustment on the GPU resource.

Specifically, the dynamic scheduling module starts the flow of the GPU scheduling algorithm based on the rich data provided by the monitoring module. The GPU scheduling algorithm flow comprises the steps of pre-testing and recording performance performances of various GPU resources under different task types and system load conditions in an offline measurement stage, establishing a performance model database, entering a priority filling link, distributing the high-performance GPU resources preferentially according to the task types, the system load and user priority information according to set priority rules such as high-priority user tasks, giving extra resource inclination to emergency tasks and the like, distributing preliminary GPU resource shares for each task, finally performing multi-strategy coordination processing, dynamically adjusting and optimizing the primarily distributed GPU resources by comprehensively considering a plurality of strategy factors such as fairness of the resources, emergency degree of the tasks and overall performance of the system, and ensuring that the GPU resources can be distributed to the tasks in each system instance reasonably and efficiently so as to adapt to continuously changing running environments and task demands.

Specifically, when the switching module receives a switching request of the current GPU resource, the switching module rapidly starts a switching flow. The method comprises the steps of firstly executing context snapshot operation, completely storing the current register state, video memory content, execution instruction sequences and other context information of a GPU by using a state storage interface provided by the GPU equipment or in a software simulation mode to form a snapshot file, then carrying out channel migration processing, adapting and migrating a data transmission channel, a communication interface and the like related to GPU resources according to the architecture and the running environment of a target system instance to ensure that the GPU resources can normally communicate and interact in the new system instance, finally implementing target-end snapshot recovery, and accurately recovering the previously stored context snapshot information to the target GPU equipment by calling the state recovery interface of the GPU in the target system instance, so that the GPU resources can continue the previous working state in the target system instance, and switching of the GPU resources among different system instances is efficiently completed, and interrupt time and resource loss in the switching process are reduced.

Optionally, the allocation running module is specifically configured to:

Executing user tasks through the GPU resources in the system instance.

Specifically, when the allocation running module obtains the device information of the GPU device of the user, the detailed specification of the GPU device, such as the GPU model, the video memory size, the number of computing cores, and the supported instruction set, is read through the device management API of the system or the tool interface provided by the device manufacturer. At the same time, the module automatically loads the driver matching the GPU device, typically by querying the unique identifier (e.g., PCI-ID) of the device and looking up the corresponding version in the driver library preset in the system. After the driver is loaded, the module generates a GPU resource pool according to the characteristics of the equipment and available resources, and the resource pool stores detailed information and current states of all available GPU equipment in a data structure mode. When a system instance creation request sent by a user is received, the module first parses the request packet, extracting therefrom the type identification of the system instance, which may be an identifier of the virtual machine, container, or other custom environment. At the same time, GPU resource demand parameters are identified, which may include the requested memory size, computing power level, and the specific functions required (e.g., whether ray tracing or tensor core support is required).

According to the type identification of the system instance, the module looks up the corresponding GPU driver requirements in a pre-configured mapping table, as different types of system instances may require different driver support. For example, a container environment may use a lightweight driver, while a virtual machine may require a fully functional driver. And then, the module calls a corresponding GPU driving interface, screens out GPU equipment meeting the conditions from a GPU resource pool according to the resource demand parameters, and performs resource allocation. This may involve partitioning the memory, allocating the computing cores, and so on. After the allocation is completed, the module notifies the system instance management component to bind the allocated GPU resources to the newly created system instance. Finally, the system instance performs the user task using the allocated GPU resources. This typically involves starting a task process inside a system instance, invoking GPU hardware resources through a GPU driver to perform a computing or graphics processing task. In the whole process, the allocation operation module can continuously monitor the resource allocation state, ensure reasonable utilization of resources and timely respond to subsequent resource change requests.

According to the embodiment of the invention, the allocation operation module accurately acquires user GPU equipment information and loads a corresponding driver to generate a resource pool, the type and the resource requirement of a system instance are determined in detail according to a creation request, reasonable allocation and binding of resources are realized by matching corresponding GPU drivers, a foundation is laid for efficient execution of user tasks, the monitoring module collects resource use conditions and task related information of the system instance in real time, comprehensive and accurate data support is provided for dynamic scheduling, the dynamic scheduling module can dynamically adjust GPU resource allocation scientifically and reasonably by using a GPU scheduling algorithm through offline measurement, priority filling and multi-strategy coordination processing according to the information, resource allocation is optimized to meet different task requirements, and when the switching module receives a switching request, efficient switching of GPU resources among different system instances is realized by executing operations such as context snapshot, channel migration and target end snapshot recovery, and resource expenditure and time delay in the switching process are reduced. The system effectively solves the problems of low GPU resource scheduling efficiency, low resource utilization rate, long time consumption for switching the virtual machine and the container and the like in a multi-operating system scene, remarkably improves the scheduling efficiency and the utilization rate of GPU resources in a multi-operating system and multi-tenant environment, enhances the flexibility and the adaptability of the system, meets the requirement of users on fine-granularity sharing of the GPU resources, and avoids resource waste and performance bottleneck.

Optionally, the dynamic scheduling module is specifically configured to:

Specifically, when the dynamic scheduling module processes the task queue of the system instance, the task in the task queue is firstly divided into a plurality of categories, such as a deep learning training task, a graphic rendering task, a scientific computing task and the like, according to preset rules of the task, the purpose or the resource requirement and the like, and each category is endowed with a category label so as to clearly distinguish different task types. And then, aiming at the task type corresponding to each category label, performing offline measurement by using a GPU scheduling algorithm. In the offline measurement process, all tasks in the same task type are executed on the GPU for a preset number of times, for example, the same task is repeatedly executed for a plurality of times, and meanwhile, the execution time of the kernel executed each time, that is, the time consumed by the GPU to process the core part of the task, is acquired by using a performance counter of the GPU device itself or through a software tool. And then, carrying out statistical analysis on the collected multi-kernel execution time data, calculating an average value, a median or considering data distribution characteristics and the like so as to obtain the kernel execution time with more accurate and representative task types, providing a key basis for subsequent resource allocation and scheduling decisions, further realizing accurate dynamic scheduling, optimizing the utilization efficiency of GPU resources and improving the overall performance of the system.

According to the embodiment of the invention, the tasks are classified, the execution time of the kernel is measured offline, the actual demand conditions of different tasks on GPU resources can be obtained in advance, accurate data support is provided for dynamic scheduling, so that the GPU resource allocation is more reasonable and accurate, excessive allocation or shortage of resources is effectively avoided, the resource utilization rate is improved, the flexibility and adaptability of the system in the process of multitasking are enhanced, the high-priority and urgent tasks are ensured to be supported by timely and sufficient resources, and the overall performance and user experience of the system are improved.

Specifically, after the kernel execution time corresponding to the task type is acquired, the dynamic scheduling module generates an idle gap threshold according to the time. Based on analysis of task execution mode and resource occupancy characteristics, for example, for tasks with shorter execution times, a smaller idle gap threshold is set to quickly respond to task switching, while for tasks with longer execution times, a larger idle gap threshold is set to reduce overhead due to frequent switching. And determining the period of the GPU with the utilization rate lower than a certain threshold value and the duration exceeding the idle gap threshold value as the idle period of the GPU by monitoring the real-time use condition of the GPU. Meanwhile, the module comprehensively considers task types (such as compute intensive or memory intensive), system loads (current system resource occupation conditions) and user priorities (such as higher priorities of paid users or important project tasks), and assigns priorities to the tasks under each task type by applying a preset weight formula or priority calculation strategy. For example, computationally intensive tasks may be given higher priority when the system load is high, while tasks with higher user priorities may be correspondingly increased in priority under the same conditions. Next, the modules sort the tasks according to the priorities from high to low, and generate a priority sequence. According to this sequence, when idle periods of the GPU come, tasks to be executed having a higher priority than the currently executing task are filled into these idle periods. In order to avoid the task priority reversal problem, an rt_mutex mechanism is adopted to realize priority inheritance between the currently executed task and the task to be executed. When the high-priority task is blocked by the low-priority task, the low-priority task inherits the priority of the high-priority task until the high-priority task is not blocked, so that the system is ensured to respond to the high-priority task in time, and the overall execution efficiency is improved.

According to the embodiment of the invention, through reasonably determining the idle period of the GPU, accurately distributing the task priority, efficiently filling the tasks and adopting an advanced locking mechanism, the utilization rate and scheduling flexibility of GPU resources in a multi-task scene are effectively improved, the task with high priority can be timely executed, the task waiting time is reduced, and the overall throughput and response speed of the system are improved.

Specifically, the dynamic scheduling module acquires the GPU utilization rate and the task completion time of the filling task in real time through a performance monitoring tool built in the system or a hardware counter of the GPU equipment after completing task filling in the idle period of the GPU each time. These data may be recorded in a system log or performance monitoring database for later analysis. And according to a preset emergency degree scoring model, weighting and calculating the GPU utilization rate and the task completion time to obtain a scoring value representing the emergency degree of task execution in the current GPU idle period. The scoring model may integrate task sensitivity to time (higher task weight for high faithfulness requirements) and resource occupancy efficiency (high GPU utilization tasks may score lower because they utilize resources efficiently and urgency is relatively low). Comparing the calculated urgency score with a plurality of preset threshold intervals, for example, if the score is higher than a certain high threshold, determining that the current task combination has a resource bottleneck and GPU resources need to be increased, and if the score is lower than a certain low threshold, the score may indicate that the resource allocation is excessive, waste exists and part of resources should be considered to be recovered. According to the comparison result, the module dynamically adjusts the GPU resource allocation, such as allocating more computing cores, memory, or adjusting task priorities for high-urgency tasks, to optimize overall resource utilization efficiency and task execution.

The embodiment of the invention can feed back the utilization effect of the GPU resources and the task execution state in real time, and realize the fine management of the GPU resources by quantitatively evaluating the urgency and dynamically adjusting the resource allocation according to the urgency. Under the multi-task concurrency scene, the precision of resource allocation is improved, resource idling or excessive allocation is avoided, task priority change can be responded quickly, the overall throughput and task response speed of the system are improved, the task with high urgency is ensured to be processed in time, and the adaptability and stability of the system under complex and changeable loads are enhanced.

Specifically, when the urgency score is higher than a preset threshold interval, it indicates that the current GPU resources are seriously insufficient and cannot meet the urgent needs of the task, and at this time, the dynamic scheduling module adopts ACCCREDIT policy to reallocate the GPU resources according to the preset credit value weight. The credit value weight is preset according to the type of the system instance, the user priority, the task type and the like, for example, a higher credit value is given to the system instance with high priority and the urgent task. The module calculates the credit value of each system instance or task, and reallocates the GPU resources according to the size proportion of the credit value, so that more resources can be obtained when the credit value is high. And when the urgency score is lower than a preset threshold interval, indicating that the GPU resource is idle to a certain extent, and dividing the tasks in the idle period of the GPU into a plurality of group tasks by the module in order to better utilize the resource. The division is based on the task type, the user or project, etc., then applies CoSched strategy to cooperatively allocate GPU resources according to the characteristics and requirements of the group task, for example, allocating continuous video memory area or the same computing core set for the same group task, so as to improve the sharing and utilization efficiency of the resources. When the emergency degree score is in a preset threshold value interval and the preset fairness requirement of the system instance is higher than the preset real-time requirement, the module adopts AugC strategies, comprehensively considers the fairness requirement of the system instance, and distributes GPU resources to the high-priority tasks so as to meet the resource requirement of the high-priority tasks as much as possible on the premise of ensuring fairness. And when the urgency score is in a preset threshold interval and the service level agreement constraint exists in the system instance, the module sequentially distributes GPU resources for the tasks according to SLAF strategy and strict priority sequence of the tasks, so that the high priority tasks can obtain the resources preferentially, and the requirements of the service level agreement on the task execution sequence and response time are met.

The embodiment of the invention can adapt to different emergency scores and system instance requirements through flexible application of various scheduling strategies, and effectively improves the allocation rationality of GPU resources under multiple scenes. When resources are tense, the system instance and the task with high credit value can obtain the resources preferentially, the execution of the key task is ensured, when the resources are relatively abundant, the utilization efficiency of the resources and the performance of the whole system are improved through the division and the cooperative allocation of the task groups, meanwhile, the fairness and the service level agreement constraint are considered, the benefits of different system instances are balanced, and the stable operation and the user satisfaction of the system are ensured.

Optionally, the switching module is specifically configured to:

Specifically, when a switch request of a current GPU resource is received, the switch module first performs a snapshot operation on a context of the current GPU resource. This process involves reading and recording critical state information of the GPU device, including the state of the GPU registers that store control and configuration information of the GPU, the content of the memory, i.e., the data in the memory that the GPU uses to store data and intermediate computing results, and the command queue, which contains sequences of instructions waiting to be executed. These data are integrated into snapshot data to ensure that the current operating state of the GPU can be completely saved during the switching process. Next, the module utilizes VFIO (Virtual Function I/O) secure channels for data migration. VFIO is a framework that allows user space programs to directly access hardware devices, providing a secure device sharing and migration mechanism. The GPU register state, memory content, and command queue can be securely transferred from the current system instance to the target system instance via VFIO. VFIO ensures the integrity and the safety of the data in the transmission process and prevents the data from being revealed and damaged. In the target system example, the switching module restores the context and the video memory content of the GPU according to the received snapshot data. This means that the GPU registers are restored to the previous state, the memory contents are reloaded, and the instructions in the command queue are ready for continued execution. In this way, the GPU can continue to execute the task being processed at the time of the switch request in the target system instance, thereby achieving seamless task migration.

The switching mechanism of the embodiment of the invention ensures the high-efficiency switching of GPU resources among system examples through the combination of snapshot operation and VFIO secure channels, and simultaneously ensures the continuity of task execution and the security of data. The processing mode effectively reduces the service interruption time in the switching process, improves the response speed and the resource utilization rate of the system, and is particularly suitable for a data center environment in which GPU resources are required to be dynamically allocated among multiple operating systems or multiple tenants. By the mode, the system can flexibly adapt to different workload demands, optimize resource allocation and improve overall performance.

Optionally, the system further comprises a resource release module for:

Specifically, the resource release module monitors the resource use condition of the system instance in real time, and when detecting that a certain system instance releases GPU resources, the resource release module immediately starts a recovery process. First, the module verifies the legitimacy of the release request, ensuring that the release operation is initiated by an authorized system instance and does not affect other instances that are using the resource. The module then separates the GPU resources from the system instance and identifies them as "to be reclaimed" to prevent other modules from reallocating these resources. And then, the module performs state checking and cleaning on the GPU resources to be recycled, including resetting GPU registers to a default state, clearing the content of the video memory and clearing the legacy instructions in the command queue, so as to ensure that the resources are in a clean and available state. And finally, the module re-adds the cleaned GPU resources into a GPU resource pool so that the GPU resources can be used for being distributed again by other system examples, the recycling of the resources is optimized, and the utilization rate of the whole resources is improved.

The embodiment of the invention ensures that GPU resources can be quickly returned to the resource pool when the GPU resources are no longer used through a real-time monitoring and efficient recovery mechanism, and avoids long-term occupation and waste of the resources. Therefore, the turnover rate of the resources can be improved, and the time for the system instance to wait for acquiring the resources is reduced, so that the performance and response speed of the whole system are improved. The quality and the safety of the recovered resources are ensured through timely cleaning and state inspection, the potential problems of resource pollution and data leakage are prevented, and reliable guarantee is provided for subsequent resource allocation.

Referring to fig. 2, a GPU scheduling method of the present invention includes:

The advantages of the GPU scheduling method of the present invention compared to the prior art are the same as those of the GPU scheduling system described above compared to the prior art, and will not be described in detail herein.

The invention also provides an electronic device, which comprises a memory and a processor;

the memory is used for storing a computer program;

The advantages of the electronic device of the present invention compared to the prior art are the same as those of the GPU scheduling system described above compared to the prior art, and will not be described in detail herein.

Although the present disclosure is disclosed above, the scope of the present disclosure is not limited thereto. Various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the disclosure, and these changes and modifications will fall within the scope of the disclosure.

Claims

Translated fromChinese

1.一种GPU调度系统，其特征在于，包括：1. A GPU scheduling system, comprising:

分配运行模块，用于获取用户的GPU设备的设备信息，并根据所述设备信息生成所述GPU设备的GPU资源池；当接收到所述用户对系统实例的创建请求时，通过所述GPU资源池结合所述创建请求，分配GPU资源至所述系统实例，再通过所述系统实例中的所述GPU资源执行用户任务；an allocation and operation module, configured to obtain device information of a user's GPU device and generate a GPU resource pool for the GPU device based on the device information; upon receiving a user's request to create a system instance, allocate GPU resources to the system instance through the GPU resource pool in combination with the creation request, and then execute the user's task using the GPU resources in the system instance;

监控模块，用于实时获取所述系统实例对当前GPU资源的使用情况以及所述系统实例执行的所述用户任务的任务队列中的任务类型、系统负载和用户优先级信息；A monitoring module is used to obtain in real time the usage of the current GPU resources by the system instance and the task type, system load and user priority information in the task queue of the user task executed by the system instance;

动态调度模块，用于根据所述当前GPU资源的使用情况以及所述任务队列中的所述任务类型、所述系统负载和所述用户优先级信息，通过GPU调度算法依次进行离线测量、优先级填充以及多策略协调处理，对所述GPU资源进行动态调整；A dynamic scheduling module is configured to dynamically adjust the GPU resources by performing offline measurement, priority filling, and multi-strategy coordination processing in sequence through a GPU scheduling algorithm based on the current GPU resource usage, the task type in the task queue, the system load, and the user priority information;

切换模块，用于当接收到所述当前GPU资源的切换请求时，通过对所述当前GPU资源执行上下文快照、通道迁移及目标端快照恢复的连续处理，将所述当前GPU资源分配至所述切换请求中的目标系统实例。The switching module is used to allocate the current GPU resource to the target system instance in the switching request by performing continuous processing of context snapshot, channel migration and target end snapshot recovery on the current GPU resource when a switching request for the current GPU resource is received.

2.根据权利要求1所述的GPU调度系统，其特征在于，所述分配运行模块，具体用于：2. The GPU scheduling system according to claim 1, wherein the allocation and operation module is specifically configured to:

获取所述用户的GPU设备的设备信息并加载所述GPU设备对应的驱动程序，生成所述GPU设备的GPU资源池；Obtain device information of the user's GPU device and load a driver corresponding to the GPU device to generate a GPU resource pool for the GPU device;

当接收到所述用户对所述系统实例的所述创建请求时，根据所述创建请求确定所述系统实例的类型标识和GPU资源需求参数；When receiving the creation request for the system instance from the user, determining the type identifier and GPU resource requirement parameters of the system instance according to the creation request;

根据所述类型标识，确定所述系统实例对应的GPU驱动；Determining a GPU driver corresponding to the system instance according to the type identifier;

通过所述GPU驱动，根据所述GPU资源需求参数从所述GPU资源池中分配GPU资源至所述系统实例；Allocating GPU resources from the GPU resource pool to the system instance according to the GPU resource requirement parameters through the GPU driver;

通过所述系统实例中的所述GPU资源执行用户任务。The user task is executed by the GPU resource in the system instance.

3.根据权利要求1所述的GPU调度系统，其特征在于，所述动态调度模块，具体用于：3. The GPU scheduling system according to claim 1, wherein the dynamic scheduling module is specifically configured to:

根据所述任务队列中的所述任务类型，通过所述GPU调度算法进行离线测量，得到所述任务类型对应的内核执行时间；According to the task type in the task queue, performing offline measurement by the GPU scheduling algorithm to obtain the kernel execution time corresponding to the task type;

其中，将所述任务队列中的任务按预设类别进行划分，得到每个所述任务的类别标签；The tasks in the task queue are divided into preset categories to obtain a category label for each task;

将每个所述类别标签分别作为所述任务队列中对应的所述任务类型；Using each of the category labels as the corresponding task type in the task queue;

对每个所述任务类型中所有所述任务在GPU内执行预设次数，并进行离线测量，得到所述内核执行时间。All the tasks in each task type are executed a preset number of times in the GPU, and offline measurements are performed to obtain the kernel execution time.

4.根据权利要求3所述的GPU调度系统，其特征在于，所述动态调度模块，具体还用于：4. The GPU scheduling system according to claim 3, wherein the dynamic scheduling module is further configured to:

根据所述内核执行时间，生成所述任务类型对应的空闲间隙阈值，并根据所述空闲间隙阈值，确定GPU空闲时段；generating an idle gap threshold corresponding to the task type according to the kernel execution time, and determining a GPU idle period according to the idle gap threshold;

根据所述任务类型、所述系统负载和所述用户优先级信息，确定每个所述任务类型下的所述任务的优先级；Determining the priority of the task under each task type according to the task type, the system load and the user priority information;

根据所述任务的所述优先级进行排序，得到所述任务类型下的优先级序列；Sorting the tasks according to their priorities to obtain a priority sequence for the task type;

根据所述优先级序列，将所述优先级高于当前执行任务的待执行任务填充至所述GPU空闲时段，并通过rt_mutex机制在所述当前执行任务与所述待执行任务之间进行优先级继承。According to the priority sequence, the to-be-executed tasks having a higher priority than the currently-executed tasks are filled into the GPU idle period, and priority inheritance is performed between the currently-executed tasks and the to-be-executed tasks through the rt_mutex mechanism.

5.根据权利要求4所述的GPU调度系统，其特征在于，所述动态调度模块，具体还用于：5. The GPU scheduling system according to claim 4, wherein the dynamic scheduling module is further configured to:

在每一次所述GPU空闲时段填充完成后，获取填充完成后的所述GPU空闲时段内所有所述任务的实时GPU利用率与任务完成时间；After each GPU idle period is filled, the real-time GPU utilization and task completion time of all the tasks in the GPU idle period after the filling is completed are obtained;

根据所述实时GPU利用率与所述任务完成时间，确定填充完成后的所述GPU空闲时段的紧急度评分；Determining an urgency score of the GPU idle period after filling is completed according to the real-time GPU utilization and the task completion time;

根据所述紧急度评分与预设阈值区间的比较结果，确定所述GPU资源的调整策略，并通过所述调整策略对所述GPU资源进行动态调整。An adjustment strategy for the GPU resources is determined according to a comparison result between the urgency score and a preset threshold interval, and the GPU resources are dynamically adjusted according to the adjustment strategy.

6.根据权利要求5所述的GPU调度系统，其特征在于，所述动态调度模块，具体还用于：6. The GPU scheduling system according to claim 5, wherein the dynamic scheduling module is further configured to:

当所述紧急度评分高于所述预设阈值区间时，通过AccCredit策略按照预设信用值权重重新分配所述GPU资源；When the urgency score is higher than the preset threshold range, reallocating the GPU resources according to the preset credit value weight through the AccCredit policy;

当所述紧急度评分低于所述预设阈值区间时，将所述GPU空闲时段中的所述任务划分为多个组任务，并通过CoSched策略按照所述组任务对所述GPU资源进行协同分配；When the urgency score is lower than the preset threshold interval, the tasks in the GPU idle period are divided into a plurality of group tasks, and the GPU resources are collaboratively allocated according to the group tasks using a CoSched policy;

当所述紧急度评分处于所述预设阈值区间且所述系统实例的预设公平性要求高于预设实时性要求时，通过AugC策略向所述优先级高于预设阈值的任务分配所述GPU资源；When the urgency score is within the preset threshold range and the preset fairness requirement of the system instance is higher than the preset real-time requirement, allocating the GPU resource to the task with a priority higher than the preset threshold through the AugC policy;

当所述紧急度评分处于所述预设阈值区间且所述系统实例存在服务等级协议约束时，通过SLAF策略按照所述任务的所述优先级分配所述GPU资源。When the urgency score is within the preset threshold range and the system instance has a service level agreement constraint, the GPU resources are allocated according to the priority of the task through a SLAF policy.

7.根据权利要求1所述的GPU调度系统，其特征在于，所述切换模块，具体用于：7. The GPU scheduling system according to claim 1, wherein the switching module is specifically configured to:

当接收到所述当前GPU资源的切换请求时，对所述当前GPU资源的上下文执行快照操作，生成快照数据，所述快照数据包括所述GPU设备的GPU寄存器状态、显存内容以及命令队列；When receiving a switch request for the current GPU resource, performing a snapshot operation on the context of the current GPU resource to generate snapshot data, the snapshot data including the GPU register state, video memory content, and command queue of the GPU device;

通过VFIO安全通道，将所述GPU寄存器状态、所述显存内容以及所述命令队列从当前系统实例迁移至所述目标系统实例；Migrating the GPU register state, the video memory content, and the command queue from the current system instance to the target system instance through a VFIO secure channel;

根据所述GPU寄存器状态、所述显存内容以及所述命令队列，恢复所述目标系统实例中的GPU上下文和显存内容，并继续执行接收到所述切换请求时执行的任务。The GPU context and video memory content in the target system instance are restored according to the GPU register state, the video memory content, and the command queue, and the task that was being executed when the switching request was received is continued.

8.根据权利要求1所述的GPU调度系统，其特征在于，还包括资源释放模块，用于：8. The GPU scheduling system according to claim 1, further comprising a resource release module configured to:

当检测到任一所述系统实例释放所述GPU资源时，对释放的所述GPU资源进行回收并返回至所述GPU资源池。When it is detected that any of the system instances releases the GPU resources, the released GPU resources are recovered and returned to the GPU resource pool.

9.一种GPU调度方法，其特征在于，包括：9. A GPU scheduling method, comprising:

获取用户的GPU设备的设备信息，并根据所述设备信息生成所述GPU设备的GPU资源池；当接收到所述用户对系统实例的创建请求时，通过所述GPU资源池结合所述创建请求，分配GPU资源至所述系统实例，再通过所述系统实例中的所述GPU资源执行用户任务；Obtaining device information of a user's GPU device and generating a GPU resource pool for the GPU device based on the device information; upon receiving a user's request to create a system instance, allocating GPU resources to the system instance through the GPU resource pool in combination with the creation request, and then executing the user's task using the GPU resources in the system instance;

实时获取所述系统实例对当前GPU资源的使用情况以及所述系统实例执行的所述用户任务的任务队列中的任务类型、系统负载和用户优先级信息；Acquire in real time the usage of the current GPU resources by the system instance and the task type, system load, and user priority information in the task queue of the user task executed by the system instance;

根据所述当前GPU资源的使用情况以及所述任务队列中的所述任务类型、所述系统负载和所述用户优先级信息，通过GPU调度算法依次进行离线测量、优先级填充以及多策略协调处理，对所述GPU资源进行动态调整；Dynamically adjust the GPU resources by performing offline measurement, priority filling, and multi-strategy coordination processing in sequence through a GPU scheduling algorithm based on the current GPU resource usage, the task type in the task queue, the system load, and the user priority information;

当接收到所述当前GPU资源的切换请求时，通过对所述当前GPU资源执行上下文快照、通道迁移及目标端快照恢复的连续处理，将所述当前GPU资源分配至所述切换请求中的目标系统实例。When a switch request for the current GPU resource is received, the current GPU resource is allocated to the target system instance in the switch request by performing continuous processing of context snapshot, channel migration and target-end snapshot recovery on the current GPU resource.

10.一种电子设备，其特征在于，包括存储器和处理器；10. An electronic device, comprising a memory and a processor;

所述存储器，用于存储计算机程序；The memory is used to store computer programs;

所述处理器，用于当执行所述计算机程序时，实现如权利要求9所述的GPU调度方法。The processor is configured to implement the GPU scheduling method according to claim 9 when executing the computer program.