Disclosure of Invention
In view of the foregoing, the present application has been developed to provide a method and apparatus for GPU resource virtualization power scheduling that overcomes, or at least partially solves, the problems, including:
a method for GPU resource virtualization computing power scheduling comprises the following steps:
acquiring a task and determining the calculation demand characteristics and task attributes of the task;
Determining resource requirements according to the computing requirement characteristics and task attributes;
Determining task priority according to the task attribute;
and determining a scheduling result of the GPU resources according to the task priority and the resource demand.
Further, the method further comprises the following steps:
monitoring the GPU resources;
And when the GPU resource load is unbalanced, starting cross-node task scheduling.
Further, the step of monitoring the GPU resource includes:
Acquiring the use condition and task execution state of the GPU;
determining the idle state of the GPU according to the use condition;
And updating the scheduling result according to the task execution state and the idle state.
Further, the step of determining the resource requirement according to the computing requirement characteristic and the task attribute includes:
Determining the calculated amount and the memory requirement according to the calculated requirement characteristics;
determining a task type according to the task attribute;
And determining the resource requirement according to the calculated amount, the memory requirement and the task type.
Further, the step of determining the task priority according to the task attribute includes:
determining task types according to the task attributes, wherein the task types comprise a real-time reasoning task, a model training task and a data preprocessing task;
and determining the task priority according to the task type.
Further, the step of scheduling GPU resources according to the task priority and the resource requirement includes:
dividing a physical GPU into a plurality of virtual instances, and establishing a virtualized GPU environment;
Determining a computing power allocation strategy of the virtual instance according to the resource requirement;
and determining a GPU dispatching result according to the task priority and the computing power distribution strategy.
Further, the method further comprises the following steps:
predicting GPU resources required by the task according to the historical resource demand data;
and pre-distributing the GPU resources according to the predicted result.
A device for GPU resource virtualized power scheduling, the device for GPU resource virtualized power scheduling implementing the steps of the method for GPU resource virtualized power scheduling according to any of the above claims, comprising:
the acquisition module is used for acquiring the task and determining the calculation demand characteristics and the task attributes of the task;
The resource demand module is used for determining resource demands according to the calculation demand characteristics and the task attributes;
the priority module is used for determining task priority according to the task attribute;
And the scheduling module is used for determining a scheduling result of the GPU resources according to the task priority and the resource demand.
An electronic device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the federally learning based echocardiographic segmentation method of any of the above.
A computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of a method of GPU resource virtualization power scheduling as described in any of the preceding claims.
The application has the following advantages:
Aiming at the defects that GPU resources cannot be distributed efficiently, calculation resources are wasted and GPU loads are unstable in the prior art, the application provides a method for dispatching GPU resource virtualization computing power, which comprises the steps of obtaining tasks and determining the calculation demand characteristics and task attributes of the tasks; determining resource requirements according to the calculation requirement characteristics and task attributes, determining task priorities according to the task attributes, and determining scheduling results of the GPU resources according to the task priorities and the resource requirements. The GPU resources of the task are distributed through specific task attributes, so that the high efficiency of GPU operation is guaranteed, the computing power resources are distributed in a balanced mode, the utilization rate of the GPU is improved, the balance of task loads is guaranteed, and the task is completed more efficiently.
Detailed Description
In order that the manner in which the above recited objects, features and advantages of the present application are obtained will become more readily apparent, a more particular description of the application briefly described above will be rendered by reference to the appended drawings. It will be apparent that the described embodiments are some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The inventor discovers through analysis of the prior art that in the deep learning training or reasoning task, the phenomenon of idle or low-efficiency utilization of GPU resources often exists. Part of tasks cannot fully utilize the computing power of the GPU, resulting in a waste of computing resources. How to effectively schedule GPU resources in a multi-user and multi-task environment and avoid resource contention and allocation is not a public problem. Some tasks may be wasted resources due to over-allocation, while other tasks face the dilemma of insufficient resources. As tasks change, the GPU load assumes an unstable state. The computing demands of certain tasks are momentarily increased, which can easily cause system overload, while the GPU is in an idle state for other periods of time. In a large-scale computing cluster, how to efficiently schedule GPUs on different nodes, especially cross-node resource scheduling, faces a great technical challenge including network delay, data transmission bottlenecks, and the like.
It should be noted that, in any embodiment of the present invention, GPU (Graphic Process Unit) is an image processor, which is a special graphics core processor, and the 3D display chip concentrates the three-dimensional image and special effect processing functions in the display chip, and relies on the computing power of the GPU, that is, the so-called "hardware acceleration" function.
Referring to fig. 1, a method for GPU resource virtualization computing power scheduling according to an embodiment of the present application is shown;
S110, acquiring a task and determining the calculation demand characteristics and task attributes of the task;
s120, determining resource requirements according to the calculation requirement characteristics and the task attributes;
s130, determining task priority according to the task attribute;
and S140, determining a scheduling result of the GPU resources according to the task priority and the resource demand.
Aiming at the defects that GPU resources cannot be distributed efficiently, calculation resources are wasted and GPU loads are unstable in the prior art, the application provides a method for dispatching GPU resource virtualization computing power, which comprises the steps of obtaining tasks and determining the calculation demand characteristics and task attributes of the tasks; determining resource requirements according to the calculation requirement characteristics and task attributes, determining task priorities according to the task attributes, and determining scheduling results of the GPU resources according to the task priorities and the resource requirements. The GPU resources of the task are distributed through specific task attributes, so that the high efficiency of GPU operation is guaranteed, the computing power resources are distributed in a balanced mode, and the task is completed more efficiently.
Next, a method for GPU resource virtualization power scheduling in the present exemplary embodiment will be further described.
In one embodiment of the present invention, the method further comprises monitoring the GPU resource;
And when the GPU resource load is unbalanced, starting cross-node task scheduling.
It should be noted that, adding the monitoring function to the GPU resources, the task migration or redistribution can be performed among multiple GPU nodes by using the intelligent scheduler through real-time monitoring of GPU load and task progress, and the GPU resources are scheduled according to the real-time situation, so as to ensure the high efficiency of task completion, and the load threshold is not exceeded. When the GPU load of a certain node is too high, the scheduler automatically migrates part of tasks to the node with lower load, and resource balanced distribution in the global scope is realized.
In one embodiment of the present invention, the specific process of step "monitor the GPU resources" may be further described in conjunction with the following description.
The use condition and task execution state of the GPU are obtained as follows;
Determining the idle state of the GPU according to the use condition as follows;
And updating the scheduling result according to the task execution state and the idle state as described in the following steps.
It is noted that, if the load is balanced, the task is continuously executed, if the load is unbalanced, the cross-node task scheduling is required to be started, the data transmission is optimized (such as optimizing by using an efficient network protocol), and the idle GPU resources are scheduled to the tasks with the loads exceeding the threshold or about to reach the threshold for operation so as to balance the scheduling, improve the task efficiency and ensure the balance of the GPU scheduling.
As described above in step S110, a task is acquired and the computing demand characteristics and task attributes of the task are determined.
It should be noted that, the calculation requirement characteristics include calculation amount, memory requirement, special requirement and the like required by executing the task, the task attributes are task information such as task type, task number, task source and the like, and after the calculation requirement characteristics of the task are obtained, GPU resources can be dynamically pre-allocated according to characteristics (such as deep learning training, reasoning, data preprocessing and the like) of different tasks. Before the task starts, the GPU resources required by the task are predicted according to the task scale and the historical resource demand data, and a proper number of GPU instances are allocated, so that the task can be supported by enough computing power, and meanwhile excessive allocation is avoided.
As described in the above step S120, the resource requirement is determined according to the computing requirement characteristic and the task attribute.
In one embodiment of the present invention, the specific process of "determining resource requirements according to the computing requirement characteristics and task attributes" described in step S120 may be further described in conjunction with the following description.
Determining a calculated amount and a memory requirement according to the calculated requirement characteristics as follows;
Determining a task type according to the task attribute as follows;
and determining the resource requirement according to the calculated amount, the memory requirement and the task type as follows.
The computing demand characteristics are analyzed, the computing amount and the memory demand of the task are determined, the allocation of resources required by the task corresponding to the task is estimated, and the computing resources can be allocated to the corresponding task in a targeted manner later, so that the GPU resources are fully utilized.
As described in step S130, the task priority is determined according to the task attribute.
In one embodiment of the present invention, the specific process of determining task priority according to the task attribute in step S130 may be further described in conjunction with the following description.
Determining task types according to the task attributes, wherein the task types comprise a real-time reasoning task, a model training task and a data preprocessing task;
the task priority is determined according to the task type, as described in the following steps.
It should be noted that, by designing a set of GPU scheduling algorithm based on task priority, tasks with different priorities are scheduled with different levels according to importance, emergency degree and resource requirements, and when idle. For example, real-time reasoning tasks have high priority, model training has higher priority, and auxiliary tasks such as data preprocessing have lowest priority. This scheduling mechanism can reasonably balance the overall load of the system.
As described in step S140, the scheduling result of the GPU resource is determined according to the task priority and the resource requirement.
In one embodiment of the present invention, the following description may be further described with reference to step S140, "determining the scheduling result of the GPU resource according to the task priority and the resource requirement".
Dividing a physical GPU into a plurality of virtual instances, and establishing a virtualized GPU environment;
determining a computing power allocation strategy of the virtual instance according to the resource requirement;
And determining a GPU dispatching result according to the task priority and the computing power distribution strategy as described in the following steps.
Through the GPU virtualization technology, a physical GPU is divided into a plurality of virtual instances so as to support parallel execution of a plurality of tasks on the same GPU. When the task demand is smaller, a plurality of small tasks can be concentrated on one GPU to run, and the computing power of the small tasks is fully utilized.
In one embodiment of the invention, the method further comprises the steps of predicting GPU resources required by the task according to the historical resource demand data;
and pre-distributing the GPU resources according to the predicted result.
Before the task starts, GPU resources required by the task can be predicted according to the task scale and the historical resource demand data, and a proper number of GPU instances are allocated, so that the task can be supported by enough computing power, and meanwhile excessive allocation is avoided. The GPU resources are dynamically pre-allocated according to the characteristics of different tasks (e.g., deep learning training, reasoning, data preprocessing, etc.). The history records are analyzed to obtain the predicted data of the task, GPU resources are pre-allocated according to the predicted data, the subsequent allocation and adjustment of the resources are facilitated, the resources are only required to be properly adjusted, the head calculation is not required, the load balancing and the dynamic scheduling are mutually matched, the resource balancing allocation in the global range is realized, the allocation balancing is guaranteed, and the allocation efficiency is improved.
The scheme adopts a self-adaptive scheduling algorithm, and can dynamically adjust the allocation strategy of GPU resources according to the resource requirements of tasks, the current load of the GPU and the future task prediction. The scheduling algorithm can also learn through historical data to continuously optimize the efficiency of resource allocation, in a large-scale computing cluster, the data transmission delay among different GPU nodes is reduced by improving a cross-node scheduling mechanism and combining a high-speed data transmission protocol, so that the cross-node scheduling process is more efficient, and the future GPU load trend is predicted by introducing a machine learning model and combining task historical load data. The prediction capability can help the system to carry out resource optimization configuration before task execution, and unnecessary resource waste is reduced.
For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.
Referring to fig. 2, an apparatus for GPU resource virtualized computing power scheduling according to an embodiment of the present application is shown;
The method specifically comprises the following steps:
an acquisition module 210, configured to acquire a task and determine a computation demand characteristic and a task attribute of the task;
a resource requirement module 220, configured to determine a resource requirement according to the computing requirement characteristic and the task attribute;
a priority module 230, configured to determine a task priority according to the task attribute;
And the scheduling module 240 is configured to determine a scheduling result of the GPU resource according to the task priority and the resource requirement.
In an embodiment of the present invention, further includes:
the monitoring module is used for monitoring the GPU resources;
And the cross-node scheduling module is used for starting cross-node task scheduling when the GPU resource load is unbalanced.
In an embodiment of the present invention, the monitoring module includes:
the GPU state acquisition sub-module is used for acquiring the use condition and task execution state of the GPU;
the GPU idle state submodule is used for determining the idle state of the GPU according to the use condition;
And the updating and scheduling sub-module is used for updating the scheduling result according to the task execution state and the idle state.
In one embodiment of the present invention, the obtaining module 210 includes:
The calculation amount determining submodule is used for determining calculation amount and memory requirements according to the calculation requirement characteristics;
The task type sub-module is used for determining the task type according to the task attribute;
And the resource requirement sub-module is used for determining the resource requirement according to the calculated amount, the memory requirement and the task type.
In one embodiment of the present invention, the priority module 230 includes:
The task type determining submodule is used for determining a task type according to the task attribute, wherein the task type comprises a real-time reasoning task, a model training task and a data preprocessing task;
And the priority determining sub-module is used for determining the task priority according to the task type.
In one embodiment of the present invention, the scheduling module 240 includes:
The GPU environment submodule is used for dividing a physical GPU into a plurality of virtual instances and establishing a virtualized GPU environment;
the allocation policy sub-module is used for determining the computing power allocation policy of the virtual instance according to the resource requirement;
and the scheduling result submodule is used for determining a GPU scheduling result according to the task priority and the computing power distribution strategy.
In an embodiment of the present invention, further includes:
The prediction module is used for predicting GPU resources required by the task according to the historical resource demand data;
and the pre-allocation module is used for pre-allocating the GPU resources according to the predicted result.
Referring to fig. 3, a computer device illustrating a method for GPU resource virtualization computing power scheduling according to the present invention may specifically include the following:
The computer device 12 described above is in the form of a general purpose computing device and the components of the computer device 12 may include, but are not limited to, one or more processors or processing units 16, a system memory 28, and a bus 18 that connects the various system components, including the system memory 28 and the processing units 16.
Bus 18 represents one or more of several types of bus 18 structures, including a memory bus 18 or memory controller, a peripheral bus 18, an accelerated graphics port, a processor, or a local bus 18 using any of a variety of bus 18 architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus 18, micro channel architecture (MAC) bus 18, enhanced ISA bus 18, video Electronics Standards Association (VESA) local bus 18, and Peripheral Component Interconnect (PCI) bus 18.
Computer device 12 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32. The computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (commonly referred to as a "hard disk drive"). Although not shown in fig. 3, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk such as a CD-ROM, DVD-ROM, or other optical media may be provided. In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. The memory may include at least one program product having a set (e.g., at least one) of program modules 42, the program modules 42 being configured to carry out the functions of embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored in, for example, a memory, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules 42, and program data, each or some combination of which may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the embodiments described herein.
The computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, camera, etc.), one or more devices that enable a healthcare worker to interact with the computer device 12, and/or any devices (e.g., network card, modem, etc.) that enable the computer device 12 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 22. Moreover, computer device 12 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet, through network adapter 20. As shown, network adapter 20 communicates with other modules of computer device 12 via bus 18. It should be appreciated that although not shown in FIG. 3, other hardware and/or software modules may be used in connection with computer device 12, including, but not limited to, microcode, device drivers, redundant processing units 16, external disk drive arrays, RAID systems, tape drives, and data backup storage system 34, among others.
The processing unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, for example, to implement a method for virtualized computing power scheduling of GPU resources provided by embodiments of the present invention.
That is, the processing unit 16 performs the above-described program by acquiring a task and determining the calculation demand characteristics and task attributes of the task;
Determining resource requirements according to the computing requirement characteristics and task attributes;
Determining task priority according to the task attribute;
and determining a scheduling result of the GPU resources according to the task priority and the resource demand.
In an embodiment of the present application, the present application further provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method for GPU resource virtualized computing power scheduling as provided in all embodiments of the present application:
that is, the program is realized when being executed by a processor by acquiring a task and determining the calculation requirement characteristic and the task attribute of the task;
Determining resource requirements according to the computing requirement characteristics and task attributes;
Determining task priority according to the task attribute;
and determining a scheduling result of the GPU resources according to the task priority and the resource demand.
Any combination of one or more computer readable media may be employed. The computer readable medium may be a computer-readable signal medium or a computer-readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium include an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPOM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the healthcare worker computer, partly on the healthcare worker computer, as a stand-alone software package, partly on the healthcare worker computer and partly on a remote computer or entirely on the remote computer or server. In the case of remote computers, the remote computer may be connected to the healthcare worker computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (e.g., through the internet using an internet service provider). In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the application.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.
The method and apparatus for GPU resource virtualized computing power scheduling provided by the present application have been described in detail, and specific examples are used herein to illustrate the principles and embodiments of the present application, and the description of the above examples is only for aiding in understanding the method and core concept of the present application, and meanwhile, to those skilled in the art, according to the concept of the present application, there are variations in the specific embodiments and application ranges, so the disclosure should not be interpreted as limiting the present application.