Disclosure of Invention
The invention solves the problem of how to improve the resource scheduling efficiency and the resource utilization rate of the GPU in a multi-operating system scene.
In order to solve the problems, the invention provides a GPU scheduling system, a GPU scheduling method and electronic equipment.
In a first aspect, a GPU scheduling system of the present invention includes:
When a request for creating a system instance by a user is received, GPU resources are allocated to the system instance by combining the request for creating through the GPU resource pool, and then user tasks are executed through the GPU resources in the system instance;
the monitoring module is used for acquiring the use condition of the current GPU resource by the system instance and the task type, the system load and the user priority information in the task queue of the user task executed by the system instance in real time;
The dynamic scheduling module is used for carrying out offline measurement, priority filling and multi-strategy coordination processing in sequence through a GPU scheduling algorithm according to the use condition of the current GPU resource, the task type, the system load and the user priority information in the task queue, and carrying out dynamic adjustment on the GPU resource;
And the switching module is used for distributing the current GPU resource to a target system instance in the switching request by executing continuous processing of context snapshot, channel migration and target-end snapshot recovery on the current GPU resource when the switching request of the current GPU resource is received.
Optionally, the allocation running module is specifically configured to:
acquiring equipment information of the GPU equipment of the user, loading a driving program corresponding to the GPU equipment, and generating a GPU resource pool of the GPU equipment;
When the creation request of the user for the system instance is received, determining a type identifier and GPU resource demand parameters of the system instance according to the creation request;
Determining GPU driving corresponding to the system instance according to the type identifier;
Through the GPU driver, GPU resources are distributed from the GPU resource pool to the system instance according to the GPU resource demand parameters;
Executing user tasks through the GPU resources in the system instance.
Optionally, the dynamic scheduling module is specifically configured to:
according to the task types in the task queue, offline measurement is carried out through the GPU scheduling algorithm, and kernel execution time corresponding to the task types is obtained;
Dividing the tasks in the task queue according to preset categories to obtain category labels of each task;
each category label is respectively used as the corresponding task type in the task queue;
and executing the tasks in each task type in the GPU for preset times, and performing offline measurement to obtain the execution time of the kernel.
Optionally, the dynamic scheduling module is specifically further configured to:
Generating an idle gap threshold corresponding to the task type according to the kernel execution time, and determining a GPU idle period according to the idle gap threshold;
determining the priority of the task under each task type according to the task type, the system load and the user priority information;
Sequencing according to the priorities of the tasks to obtain a priority sequence under the task type;
And filling the task to be executed with the priority higher than that of the currently executed task into the idle period of the GPU according to the priority sequence, and carrying out priority inheritance between the currently executed task and the task to be executed through an rt_mutex mechanism.
Optionally, the dynamic scheduling module is specifically further configured to:
after filling the GPU idle period is completed each time, acquiring real-time GPU utilization rate and task completion time of all tasks in the GPU idle period after filling is completed;
Determining an urgency score of the GPU idle period after filling is completed according to the real-time GPU utilization rate and the task completion time;
And determining an adjustment strategy of the GPU resources according to a comparison result of the emergency score and a preset threshold interval, and dynamically adjusting the GPU resources through the adjustment strategy.
Optionally, the dynamic scheduling module is specifically further configured to:
when the urgency score is higher than the preset threshold interval, reallocating the GPU resources according to preset credit value weights through ACCCREDIT strategies;
when the urgency score is lower than the preset threshold interval, dividing the tasks in the idle period of the GPU into a plurality of group tasks, and cooperatively distributing the GPU resources according to the group tasks through CoSched strategies;
when the emergency degree score is in the preset threshold value interval and the preset fairness requirement of the system instance is higher than the preset real-time requirement, distributing the GPU resource to the task with the priority higher than the preset threshold value through AugC strategies;
And when the emergency degree score is in the preset threshold value interval and the service level agreement constraint exists in the system instance, distributing the GPU resources according to the priority of the task through SLAF strategy.
Optionally, the switching module is specifically configured to:
When a switching request of the current GPU resource is received, snapshot operation is carried out on the context of the current GPU resource, snapshot data are generated, and the snapshot data comprise GPU register states, video memory contents and command queues of the GPU equipment;
migrating the GPU register state, the video memory content and the command queue from the current system instance to the target system instance through VFIO secure channels;
And restoring the GPU context and the video memory content in the target system instance according to the GPU register state, the video memory content and the command queue, and continuously executing the task executed when the switching request is received.
Optionally, the system further comprises a resource release module for:
And when any system instance is detected to release the GPU resources, recovering the released GPU resources and returning to the GPU resource pool.
In a second aspect, the present invention provides a GPU scheduling method, including:
When a request for creating a system instance by a user is received, distributing GPU resources to the system instance by combining the request for creating through the GPU resource pool, and executing user tasks through the GPU resources in the system instance;
acquiring the use condition of the system instance on the current GPU resource and the task type, the system load and the user priority information in the task queue of the user task executed by the system instance in real time;
According to the use condition of the current GPU resources, the task type in the task queue, the system load and the user priority information, performing offline measurement, priority filling and multi-strategy coordination processing in sequence through a GPU scheduling algorithm, and dynamically adjusting the GPU resources;
When a switching request of the current GPU resource is received, the current GPU resource is distributed to a target system instance in the switching request through continuous processing of context snapshot, channel migration and target-end snapshot recovery of the current GPU resource.
In a third aspect, an electronic device of the present invention includes a memory and a processor;
the memory is used for storing a computer program;
The processor is configured to implement the GPU scheduling method described above when executing the computer program.
According to the GPU scheduling system, the method and the electronic equipment, equipment information of the GPU equipment is accurately acquired through the distribution operation module, and accordingly a GPU resource pool is generated. When a user creates a system instance request, the GPU resource pool is combined with the creation request to reasonably allocate GPU resources to the system instance, so that the GPU resources in the system instance can execute user tasks. The use condition of the system instance on the GPU resource is tracked in real time through the monitoring module, and the task type, the system load and the user priority information in the task queue of the system instance are obtained at the same time, so that a data basis is provided for subsequent dynamic scheduling. And the dynamic scheduling module sequentially performs offline measurement, priority filling and multi-strategy coordination processing by using a GPU scheduling algorithm according to the information acquired by the monitoring module, so that GPU resources are dynamically adjusted, and the optimal configuration of the resources is realized. When receiving a switching request of the current GPU resource, the switching module rapidly and efficiently distributes the current GPU resource to the target system instance by executing continuous processing of context snapshot, channel migration and target-end snapshot recovery, so that resource waste and performance loss in the switching process are reduced. In the invention, the GPU resource pool generated by the allocation operation module provides basic resources for the whole dispatching system, the monitoring module collects operation data of system examples in real time and provides decision basis for the dynamic dispatching module, the dynamic dispatching module optimizes resources according to the data, the switching module ensures the high efficiency of switching GPU resources among different system examples, and the modules cooperate with each other to jointly improve the dispatching efficiency and the utilization rate of the GPU resources.
According to the method, the GPU resource allocation can be quickly regulated according to the real-time resource use condition and task requirements by improving the scheduling efficiency through the GPU scheduling algorithm of the dynamic scheduling module, so that resource idling and waste are reduced, and the overall GPU resource scheduling efficiency is improved. The flexible allocation and switching of the GPU resources under the multi-operating-system scene are realized by improving the resource utilization rate, so that the GPU resources can better meet the requirements of different users and tasks, and the utilization rate of the GPU resources is improved. The system can adapt to complex environments of multiple operating systems and multiple tenants by enhancing the cooperative work of each module of the system flexibility, and the flexibility and the adaptability of the system are enhanced by changing different task types and user priorities. By reducing the processing of the switching overhead switching module through context snapshot, channel migration, target end snapshot recovery and the like, time consumption and resource overhead in the process of switching the virtual machine and the container are reduced, the switching efficiency is improved, and the use efficiency of GPU resources is further improved.
Detailed Description
In order that the above objects, features and advantages of the invention will be readily understood, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. While the invention is susceptible of embodiment in the drawings, it is to be understood that the invention may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided to provide a more thorough and complete understanding of the invention. It should be understood that the drawings and embodiments of the invention are for illustration purposes only and are not intended to limit the scope of the present invention.
It should be understood that the various steps recited in the method embodiments of the present invention may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the invention is not limited in this respect.
The term "comprising" and variations thereof as used herein is meant to be open-ended, i.e., "including but not limited to," based at least in part on, "one embodiment" means "at least one embodiment," another embodiment "means" at least one additional embodiment, "some embodiments" means "at least some embodiments," and "optional" means "optional embodiment. Related definitions of other terms will be given in the description below. It should be noted that the concepts of "first", "second", etc. mentioned in this disclosure are only used to distinguish between different devices, modules or units, and are not intended to limit the order or interdependence of functions performed by these devices, modules or units.
It should be noted that references to "a" and "an" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.
Aiming at the problems in the related art, the embodiment provides a GPU scheduling system, a method and an electronic device.
Referring to fig. 1, a GPU scheduling system provided in an embodiment of the present invention includes:
The system comprises an allocation operation module, a system instance, a user task execution module and a system instance, wherein the allocation operation module is used for acquiring equipment information of GPU equipment of a user and generating a GPU resource pool of the GPU equipment according to the equipment information, when a creation request of the user for the system instance is received, GPU resources are allocated to the system instance through the GPU resource pool in combination with the creation request, and user tasks are executed through the GPU resources in the system instance.
Specifically, the allocation running module firstly communicates with the GPU device of the user, and utilizes a device management API or a hardware abstraction layer interface of the system to obtain detailed information of the GPU device, such as GPU model, video memory capacity, number of computing cores, supported instruction set, and the like. And then, according to the equipment information and certain rules and strategies, GPU equipment with the same model and similar performance is grouped into a group, and a GPU resource pool is constructed. When a user requests the creation of the system instance, the module analyzes the information such as the task type, the required GPU resource quantity, the performance requirement and the like contained in the request, and then applies a resource allocation algorithm by combining the established GPU resource pool to accurately allocate the proper GPU resource to the corresponding system instance, so that the system instance can efficiently execute the user task by utilizing the allocated GPU resource.
The monitoring module is used for acquiring the use condition of the current GPU resource by the system instance and the task type, the system load and the user priority information in the task queue of the user task executed by the system instance in real time.
Specifically, the monitoring module collects various use indexes of the system instance on the current GPU resources in real time by means of a monitoring agent preset in the system instance or a performance monitoring tool utilizing an operating system, such as the video memory occupancy rate, the computing core utilization rate, the texture mapping unit use condition and the like of the GPU. The method comprises the steps of obtaining a task type by analyzing a task description file, inquiring a task state database and the like, distinguishing a deep learning training task, a graphic rendering task, a high-performance computing task and the like, collecting system load information including CPU (Central processing Unit) utilization rate, memory occupation conditions, the number of currently running tasks, the queue length of waiting tasks and the like, and searching corresponding user priority information in a user priority rule base set in a system according to identity information of a user, the emergency degree and importance level of the task, so that the running state and task demand conditions of the system example are comprehensively mastered.
And the dynamic scheduling module is used for carrying out offline measurement, priority filling and multi-strategy coordination processing in sequence through a GPU scheduling algorithm according to the use condition of the current GPU resource, the task type in the task queue, the system load and the user priority information, and carrying out dynamic adjustment on the GPU resource.
Specifically, the dynamic scheduling module starts the flow of the GPU scheduling algorithm based on the rich data provided by the monitoring module. The GPU scheduling algorithm flow comprises the steps of pre-testing and recording performance performances of various GPU resources under different task types and system load conditions in an offline measurement stage, establishing a performance model database, entering a priority filling link, distributing the high-performance GPU resources preferentially according to the task types, the system load and user priority information according to set priority rules such as high-priority user tasks, giving extra resource inclination to emergency tasks and the like, distributing preliminary GPU resource shares for each task, finally performing multi-strategy coordination processing, dynamically adjusting and optimizing the primarily distributed GPU resources by comprehensively considering a plurality of strategy factors such as fairness of the resources, emergency degree of the tasks and overall performance of the system, and ensuring that the GPU resources can be distributed to the tasks in each system instance reasonably and efficiently so as to adapt to continuously changing running environments and task demands.
And the switching module is used for distributing the current GPU resource to a target system instance in the switching request by executing continuous processing of context snapshot, channel migration and target-end snapshot recovery on the current GPU resource when the switching request of the current GPU resource is received.
Specifically, when the switching module receives a switching request of the current GPU resource, the switching module rapidly starts a switching flow. The method comprises the steps of firstly executing context snapshot operation, completely storing the current register state, video memory content, execution instruction sequences and other context information of a GPU by using a state storage interface provided by the GPU equipment or in a software simulation mode to form a snapshot file, then carrying out channel migration processing, adapting and migrating a data transmission channel, a communication interface and the like related to GPU resources according to the architecture and the running environment of a target system instance to ensure that the GPU resources can normally communicate and interact in the new system instance, finally implementing target-end snapshot recovery, and accurately recovering the previously stored context snapshot information to the target GPU equipment by calling the state recovery interface of the GPU in the target system instance, so that the GPU resources can continue the previous working state in the target system instance, and switching of the GPU resources among different system instances is efficiently completed, and interrupt time and resource loss in the switching process are reduced.
According to the GPU scheduling system, the method and the electronic equipment, equipment information of the GPU equipment is accurately acquired through the distribution operation module, and accordingly a GPU resource pool is generated. When a user creates a system instance request, the GPU resource pool is combined with the creation request to reasonably allocate GPU resources to the system instance, so that the GPU resources in the system instance can execute user tasks. The use condition of the system instance on the GPU resource is tracked in real time through the monitoring module, and the task type, the system load and the user priority information in the task queue of the system instance are obtained at the same time, so that a data basis is provided for subsequent dynamic scheduling. And the dynamic scheduling module sequentially performs offline measurement, priority filling and multi-strategy coordination processing by using a GPU scheduling algorithm according to the information acquired by the monitoring module, so that GPU resources are dynamically adjusted, and the optimal configuration of the resources is realized. When receiving a switching request of the current GPU resource, the switching module rapidly and efficiently distributes the current GPU resource to the target system instance by executing continuous processing of context snapshot, channel migration and target-end snapshot recovery, so that resource waste and performance loss in the switching process are reduced. In the invention, the GPU resource pool generated by the allocation operation module provides basic resources for the whole dispatching system, the monitoring module collects operation data of system examples in real time and provides decision basis for the dynamic dispatching module, the dynamic dispatching module optimizes resources according to the data, the switching module ensures the high efficiency of switching GPU resources among different system examples, and the modules cooperate with each other to jointly improve the dispatching efficiency and the utilization rate of the GPU resources.
According to the method, the GPU resource allocation can be quickly regulated according to the real-time resource use condition and task requirements by improving the scheduling efficiency through the GPU scheduling algorithm of the dynamic scheduling module, so that resource idling and waste are reduced, and the overall GPU resource scheduling efficiency is improved. The flexible allocation and switching of the GPU resources under the multi-operating-system scene are realized by improving the resource utilization rate, so that the GPU resources can better meet the requirements of different users and tasks, and the utilization rate of the GPU resources is improved. The system can adapt to complex environments of multiple operating systems and multiple tenants by enhancing the cooperative work of each module of the system flexibility, and the flexibility and the adaptability of the system are enhanced by changing different task types and user priorities. By reducing the processing of the switching overhead switching module through context snapshot, channel migration, target end snapshot recovery and the like, time consumption and resource overhead in the process of switching the virtual machine and the container are reduced, the switching efficiency is improved, and the use efficiency of GPU resources is further improved.
Optionally, the allocation running module is specifically configured to:
acquiring equipment information of the GPU equipment of the user, loading a driving program corresponding to the GPU equipment, and generating a GPU resource pool of the GPU equipment;
When the creation request of the user for the system instance is received, determining a type identifier and GPU resource demand parameters of the system instance according to the creation request;
Determining GPU driving corresponding to the system instance according to the type identifier;
Through the GPU driver, GPU resources are distributed from the GPU resource pool to the system instance according to the GPU resource demand parameters;
Executing user tasks through the GPU resources in the system instance.
Specifically, when the allocation running module obtains the device information of the GPU device of the user, the detailed specification of the GPU device, such as the GPU model, the video memory size, the number of computing cores, and the supported instruction set, is read through the device management API of the system or the tool interface provided by the device manufacturer. At the same time, the module automatically loads the driver matching the GPU device, typically by querying the unique identifier (e.g., PCI-ID) of the device and looking up the corresponding version in the driver library preset in the system. After the driver is loaded, the module generates a GPU resource pool according to the characteristics of the equipment and available resources, and the resource pool stores detailed information and current states of all available GPU equipment in a data structure mode. When a system instance creation request sent by a user is received, the module first parses the request packet, extracting therefrom the type identification of the system instance, which may be an identifier of the virtual machine, container, or other custom environment. At the same time, GPU resource demand parameters are identified, which may include the requested memory size, computing power level, and the specific functions required (e.g., whether ray tracing or tensor core support is required).
According to the type identification of the system instance, the module looks up the corresponding GPU driver requirements in a pre-configured mapping table, as different types of system instances may require different driver support. For example, a container environment may use a lightweight driver, while a virtual machine may require a fully functional driver. And then, the module calls a corresponding GPU driving interface, screens out GPU equipment meeting the conditions from a GPU resource pool according to the resource demand parameters, and performs resource allocation. This may involve partitioning the memory, allocating the computing cores, and so on. After the allocation is completed, the module notifies the system instance management component to bind the allocated GPU resources to the newly created system instance. Finally, the system instance performs the user task using the allocated GPU resources. This typically involves starting a task process inside a system instance, invoking GPU hardware resources through a GPU driver to perform a computing or graphics processing task. In the whole process, the allocation operation module can continuously monitor the resource allocation state, ensure reasonable utilization of resources and timely respond to subsequent resource change requests.
According to the embodiment of the invention, the allocation operation module accurately acquires user GPU equipment information and loads a corresponding driver to generate a resource pool, the type and the resource requirement of a system instance are determined in detail according to a creation request, reasonable allocation and binding of resources are realized by matching corresponding GPU drivers, a foundation is laid for efficient execution of user tasks, the monitoring module collects resource use conditions and task related information of the system instance in real time, comprehensive and accurate data support is provided for dynamic scheduling, the dynamic scheduling module can dynamically adjust GPU resource allocation scientifically and reasonably by using a GPU scheduling algorithm through offline measurement, priority filling and multi-strategy coordination processing according to the information, resource allocation is optimized to meet different task requirements, and when the switching module receives a switching request, efficient switching of GPU resources among different system instances is realized by executing operations such as context snapshot, channel migration and target end snapshot recovery, and resource expenditure and time delay in the switching process are reduced. The system effectively solves the problems of low GPU resource scheduling efficiency, low resource utilization rate, long time consumption for switching the virtual machine and the container and the like in a multi-operating system scene, remarkably improves the scheduling efficiency and the utilization rate of GPU resources in a multi-operating system and multi-tenant environment, enhances the flexibility and the adaptability of the system, meets the requirement of users on fine-granularity sharing of the GPU resources, and avoids resource waste and performance bottleneck.
Optionally, the dynamic scheduling module is specifically configured to:
according to the task types in the task queue, offline measurement is carried out through the GPU scheduling algorithm, and kernel execution time corresponding to the task types is obtained;
Dividing the tasks in the task queue according to preset categories to obtain category labels of each task;
each category label is respectively used as the corresponding task type in the task queue;
and executing the tasks in each task type in the GPU for preset times, and performing offline measurement to obtain the execution time of the kernel.
Specifically, when the dynamic scheduling module processes the task queue of the system instance, the task in the task queue is firstly divided into a plurality of categories, such as a deep learning training task, a graphic rendering task, a scientific computing task and the like, according to preset rules of the task, the purpose or the resource requirement and the like, and each category is endowed with a category label so as to clearly distinguish different task types. And then, aiming at the task type corresponding to each category label, performing offline measurement by using a GPU scheduling algorithm. In the offline measurement process, all tasks in the same task type are executed on the GPU for a preset number of times, for example, the same task is repeatedly executed for a plurality of times, and meanwhile, the execution time of the kernel executed each time, that is, the time consumed by the GPU to process the core part of the task, is acquired by using a performance counter of the GPU device itself or through a software tool. And then, carrying out statistical analysis on the collected multi-kernel execution time data, calculating an average value, a median or considering data distribution characteristics and the like so as to obtain the kernel execution time with more accurate and representative task types, providing a key basis for subsequent resource allocation and scheduling decisions, further realizing accurate dynamic scheduling, optimizing the utilization efficiency of GPU resources and improving the overall performance of the system.
According to the embodiment of the invention, the tasks are classified, the execution time of the kernel is measured offline, the actual demand conditions of different tasks on GPU resources can be obtained in advance, accurate data support is provided for dynamic scheduling, so that the GPU resource allocation is more reasonable and accurate, excessive allocation or shortage of resources is effectively avoided, the resource utilization rate is improved, the flexibility and adaptability of the system in the process of multitasking are enhanced, the high-priority and urgent tasks are ensured to be supported by timely and sufficient resources, and the overall performance and user experience of the system are improved.
Optionally, the dynamic scheduling module is specifically further configured to:
Generating an idle gap threshold corresponding to the task type according to the kernel execution time, and determining a GPU idle period according to the idle gap threshold;
determining the priority of the task under each task type according to the task type, the system load and the user priority information;
Sequencing according to the priorities of the tasks to obtain a priority sequence under the task type;
And filling the task to be executed with the priority higher than that of the currently executed task into the idle period of the GPU according to the priority sequence, and carrying out priority inheritance between the currently executed task and the task to be executed through an rt_mutex mechanism.
Specifically, after the kernel execution time corresponding to the task type is acquired, the dynamic scheduling module generates an idle gap threshold according to the time. Based on analysis of task execution mode and resource occupancy characteristics, for example, for tasks with shorter execution times, a smaller idle gap threshold is set to quickly respond to task switching, while for tasks with longer execution times, a larger idle gap threshold is set to reduce overhead due to frequent switching. And determining the period of the GPU with the utilization rate lower than a certain threshold value and the duration exceeding the idle gap threshold value as the idle period of the GPU by monitoring the real-time use condition of the GPU. Meanwhile, the module comprehensively considers task types (such as compute intensive or memory intensive), system loads (current system resource occupation conditions) and user priorities (such as higher priorities of paid users or important project tasks), and assigns priorities to the tasks under each task type by applying a preset weight formula or priority calculation strategy. For example, computationally intensive tasks may be given higher priority when the system load is high, while tasks with higher user priorities may be correspondingly increased in priority under the same conditions. Next, the modules sort the tasks according to the priorities from high to low, and generate a priority sequence. According to this sequence, when idle periods of the GPU come, tasks to be executed having a higher priority than the currently executing task are filled into these idle periods. In order to avoid the task priority reversal problem, an rt_mutex mechanism is adopted to realize priority inheritance between the currently executed task and the task to be executed. When the high-priority task is blocked by the low-priority task, the low-priority task inherits the priority of the high-priority task until the high-priority task is not blocked, so that the system is ensured to respond to the high-priority task in time, and the overall execution efficiency is improved.
According to the embodiment of the invention, through reasonably determining the idle period of the GPU, accurately distributing the task priority, efficiently filling the tasks and adopting an advanced locking mechanism, the utilization rate and scheduling flexibility of GPU resources in a multi-task scene are effectively improved, the task with high priority can be timely executed, the task waiting time is reduced, and the overall throughput and response speed of the system are improved.
Optionally, the dynamic scheduling module is specifically further configured to:
after filling the GPU idle period is completed each time, acquiring real-time GPU utilization rate and task completion time of all tasks in the GPU idle period after filling is completed;
Determining an urgency score of the GPU idle period after filling is completed according to the real-time GPU utilization rate and the task completion time;
And determining an adjustment strategy of the GPU resources according to a comparison result of the emergency score and a preset threshold interval, and dynamically adjusting the GPU resources through the adjustment strategy.
Specifically, the dynamic scheduling module acquires the GPU utilization rate and the task completion time of the filling task in real time through a performance monitoring tool built in the system or a hardware counter of the GPU equipment after completing task filling in the idle period of the GPU each time. These data may be recorded in a system log or performance monitoring database for later analysis. And according to a preset emergency degree scoring model, weighting and calculating the GPU utilization rate and the task completion time to obtain a scoring value representing the emergency degree of task execution in the current GPU idle period. The scoring model may integrate task sensitivity to time (higher task weight for high faithfulness requirements) and resource occupancy efficiency (high GPU utilization tasks may score lower because they utilize resources efficiently and urgency is relatively low). Comparing the calculated urgency score with a plurality of preset threshold intervals, for example, if the score is higher than a certain high threshold, determining that the current task combination has a resource bottleneck and GPU resources need to be increased, and if the score is lower than a certain low threshold, the score may indicate that the resource allocation is excessive, waste exists and part of resources should be considered to be recovered. According to the comparison result, the module dynamically adjusts the GPU resource allocation, such as allocating more computing cores, memory, or adjusting task priorities for high-urgency tasks, to optimize overall resource utilization efficiency and task execution.
The embodiment of the invention can feed back the utilization effect of the GPU resources and the task execution state in real time, and realize the fine management of the GPU resources by quantitatively evaluating the urgency and dynamically adjusting the resource allocation according to the urgency. Under the multi-task concurrency scene, the precision of resource allocation is improved, resource idling or excessive allocation is avoided, task priority change can be responded quickly, the overall throughput and task response speed of the system are improved, the task with high urgency is ensured to be processed in time, and the adaptability and stability of the system under complex and changeable loads are enhanced.
Optionally, the dynamic scheduling module is specifically further configured to:
when the urgency score is higher than the preset threshold interval, reallocating the GPU resources according to preset credit value weights through ACCCREDIT strategies;
when the urgency score is lower than the preset threshold interval, dividing the tasks in the idle period of the GPU into a plurality of group tasks, and cooperatively distributing the GPU resources according to the group tasks through CoSched strategies;
when the emergency degree score is in the preset threshold value interval and the preset fairness requirement of the system instance is higher than the preset real-time requirement, distributing the GPU resource to the task with the priority higher than the preset threshold value through AugC strategies;
And when the emergency degree score is in the preset threshold value interval and the service level agreement constraint exists in the system instance, distributing the GPU resources according to the priority of the task through SLAF strategy.
Specifically, when the urgency score is higher than a preset threshold interval, it indicates that the current GPU resources are seriously insufficient and cannot meet the urgent needs of the task, and at this time, the dynamic scheduling module adopts ACCCREDIT policy to reallocate the GPU resources according to the preset credit value weight. The credit value weight is preset according to the type of the system instance, the user priority, the task type and the like, for example, a higher credit value is given to the system instance with high priority and the urgent task. The module calculates the credit value of each system instance or task, and reallocates the GPU resources according to the size proportion of the credit value, so that more resources can be obtained when the credit value is high. And when the urgency score is lower than a preset threshold interval, indicating that the GPU resource is idle to a certain extent, and dividing the tasks in the idle period of the GPU into a plurality of group tasks by the module in order to better utilize the resource. The division is based on the task type, the user or project, etc., then applies CoSched strategy to cooperatively allocate GPU resources according to the characteristics and requirements of the group task, for example, allocating continuous video memory area or the same computing core set for the same group task, so as to improve the sharing and utilization efficiency of the resources. When the emergency degree score is in a preset threshold value interval and the preset fairness requirement of the system instance is higher than the preset real-time requirement, the module adopts AugC strategies, comprehensively considers the fairness requirement of the system instance, and distributes GPU resources to the high-priority tasks so as to meet the resource requirement of the high-priority tasks as much as possible on the premise of ensuring fairness. And when the urgency score is in a preset threshold interval and the service level agreement constraint exists in the system instance, the module sequentially distributes GPU resources for the tasks according to SLAF strategy and strict priority sequence of the tasks, so that the high priority tasks can obtain the resources preferentially, and the requirements of the service level agreement on the task execution sequence and response time are met.
The embodiment of the invention can adapt to different emergency scores and system instance requirements through flexible application of various scheduling strategies, and effectively improves the allocation rationality of GPU resources under multiple scenes. When resources are tense, the system instance and the task with high credit value can obtain the resources preferentially, the execution of the key task is ensured, when the resources are relatively abundant, the utilization efficiency of the resources and the performance of the whole system are improved through the division and the cooperative allocation of the task groups, meanwhile, the fairness and the service level agreement constraint are considered, the benefits of different system instances are balanced, and the stable operation and the user satisfaction of the system are ensured.
Optionally, the switching module is specifically configured to:
When a switching request of the current GPU resource is received, snapshot operation is carried out on the context of the current GPU resource, snapshot data are generated, and the snapshot data comprise GPU register states, video memory contents and command queues of the GPU equipment;
migrating the GPU register state, the video memory content and the command queue from the current system instance to the target system instance through VFIO secure channels;
And restoring the GPU context and the video memory content in the target system instance according to the GPU register state, the video memory content and the command queue, and continuously executing the task executed when the switching request is received.
Specifically, when a switch request of a current GPU resource is received, the switch module first performs a snapshot operation on a context of the current GPU resource. This process involves reading and recording critical state information of the GPU device, including the state of the GPU registers that store control and configuration information of the GPU, the content of the memory, i.e., the data in the memory that the GPU uses to store data and intermediate computing results, and the command queue, which contains sequences of instructions waiting to be executed. These data are integrated into snapshot data to ensure that the current operating state of the GPU can be completely saved during the switching process. Next, the module utilizes VFIO (Virtual Function I/O) secure channels for data migration. VFIO is a framework that allows user space programs to directly access hardware devices, providing a secure device sharing and migration mechanism. The GPU register state, memory content, and command queue can be securely transferred from the current system instance to the target system instance via VFIO. VFIO ensures the integrity and the safety of the data in the transmission process and prevents the data from being revealed and damaged. In the target system example, the switching module restores the context and the video memory content of the GPU according to the received snapshot data. This means that the GPU registers are restored to the previous state, the memory contents are reloaded, and the instructions in the command queue are ready for continued execution. In this way, the GPU can continue to execute the task being processed at the time of the switch request in the target system instance, thereby achieving seamless task migration.
The switching mechanism of the embodiment of the invention ensures the high-efficiency switching of GPU resources among system examples through the combination of snapshot operation and VFIO secure channels, and simultaneously ensures the continuity of task execution and the security of data. The processing mode effectively reduces the service interruption time in the switching process, improves the response speed and the resource utilization rate of the system, and is particularly suitable for a data center environment in which GPU resources are required to be dynamically allocated among multiple operating systems or multiple tenants. By the mode, the system can flexibly adapt to different workload demands, optimize resource allocation and improve overall performance.
Optionally, the system further comprises a resource release module for:
And when any system instance is detected to release the GPU resources, recovering the released GPU resources and returning to the GPU resource pool.
Specifically, the resource release module monitors the resource use condition of the system instance in real time, and when detecting that a certain system instance releases GPU resources, the resource release module immediately starts a recovery process. First, the module verifies the legitimacy of the release request, ensuring that the release operation is initiated by an authorized system instance and does not affect other instances that are using the resource. The module then separates the GPU resources from the system instance and identifies them as "to be reclaimed" to prevent other modules from reallocating these resources. And then, the module performs state checking and cleaning on the GPU resources to be recycled, including resetting GPU registers to a default state, clearing the content of the video memory and clearing the legacy instructions in the command queue, so as to ensure that the resources are in a clean and available state. And finally, the module re-adds the cleaned GPU resources into a GPU resource pool so that the GPU resources can be used for being distributed again by other system examples, the recycling of the resources is optimized, and the utilization rate of the whole resources is improved.
The embodiment of the invention ensures that GPU resources can be quickly returned to the resource pool when the GPU resources are no longer used through a real-time monitoring and efficient recovery mechanism, and avoids long-term occupation and waste of the resources. Therefore, the turnover rate of the resources can be improved, and the time for the system instance to wait for acquiring the resources is reduced, so that the performance and response speed of the whole system are improved. The quality and the safety of the recovered resources are ensured through timely cleaning and state inspection, the potential problems of resource pollution and data leakage are prevented, and reliable guarantee is provided for subsequent resource allocation.
Referring to fig. 2, a GPU scheduling method of the present invention includes:
When a request for creating a system instance by a user is received, distributing GPU resources to the system instance by combining the request for creating through the GPU resource pool, and executing user tasks through the GPU resources in the system instance;
acquiring the use condition of the system instance on the current GPU resource and the task type, the system load and the user priority information in the task queue of the user task executed by the system instance in real time;
According to the use condition of the current GPU resources, the task type in the task queue, the system load and the user priority information, performing offline measurement, priority filling and multi-strategy coordination processing in sequence through a GPU scheduling algorithm, and dynamically adjusting the GPU resources;
When a switching request of the current GPU resource is received, the current GPU resource is distributed to a target system instance in the switching request through continuous processing of context snapshot, channel migration and target-end snapshot recovery of the current GPU resource.
The advantages of the GPU scheduling method of the present invention compared to the prior art are the same as those of the GPU scheduling system described above compared to the prior art, and will not be described in detail herein.
The invention also provides an electronic device, which comprises a memory and a processor;
the memory is used for storing a computer program;
The processor is configured to implement the GPU scheduling method described above when executing the computer program.
The advantages of the electronic device of the present invention compared to the prior art are the same as those of the GPU scheduling system described above compared to the prior art, and will not be described in detail herein.
Although the present disclosure is disclosed above, the scope of the present disclosure is not limited thereto. Various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the disclosure, and these changes and modifications will fall within the scope of the disclosure.