Movatterモバイル変換


[0]ホーム

URL:


CN114168344A - GPU resource allocation method, device, equipment and readable storage medium - Google Patents

GPU resource allocation method, device, equipment and readable storage medium
Download PDF

Info

Publication number
CN114168344A
CN114168344ACN202111538135.XACN202111538135ACN114168344ACN 114168344 ACN114168344 ACN 114168344ACN 202111538135 ACN202111538135 ACN 202111538135ACN 114168344 ACN114168344 ACN 114168344A
Authority
CN
China
Prior art keywords
gpu
instruction
resource allocation
cuda
ebpf
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111538135.XA
Other languages
Chinese (zh)
Inventor
陈鹏飞
谢文欣
郑子彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen UniversityfiledCriticalSun Yat Sen University
Priority to CN202111538135.XApriorityCriticalpatent/CN114168344A/en
Publication of CN114168344ApublicationCriticalpatent/CN114168344A/en
Pendinglegal-statusCriticalCurrent

Links

Images

Classifications

Landscapes

Abstract

The application discloses a GPU resource allocation method, a device, equipment and a readable storage medium, wherein the method comprises the following steps: determining whether an instruction currently executed by a target program is a mounted CUDA instruction, wherein the mounted CUDA instruction is a resource allocation CUDA instruction mounted with an eBPF label; if yes, triggering an eBPF process to perform execution audit on GPU resource quantity through an eBPF label mounted with the mounted CUDA instruction; obtaining an execution audit result returned by the eBPF process, wherein the execution audit result is determined based on the rated GPU resource amount corresponding to the currently executed instruction and the resource amount occupied by the current GPU in real time; if the execution auditing result is that auditing is passed, allowing GPU resource allocation by calling the currently executed instruction; and if the audit execution result is that the audit is not passed, limiting GPU resource allocation according to a preset isolation mode. The method and the device can adapt to all CUDA versions simultaneously, avoid the conditions that performance overhead is high, development cycle is long, maintenance cost is high, and influence is caused to upper-layer application.

Description

GPU resource allocation method, device, equipment and readable storage medium
Technical Field
The present application relates to the field of computer intelligence, and more particularly, to a method, an apparatus, a device, and a readable storage medium for allocating GPU resources.
Background
With the continuous development of AI technology, a GPU (Graphics Processing Unit) is widely used in the field of deep learning. Existing GPU processors often include more than 10GB of video memory and up to 2000 GPU computing cores such as Nvidia V100, so coarse-grained GPU scheduling with a single task exclusive of the GPU would result in a significant waste of GPU resources. In order to improve the resource utilization rate and task throughput of the GPU, GPU virtualization gradually becomes an industry hot topic. The GPU virtualization is to logically divide a physical GPU into a plurality of virtual GPUs, and the virtual GPU is used as a scheduling unit to achieve the purpose that a plurality of services share one physical GPU.
Aiming at GPU sharing and isolation, academia and various major manufacturers propose corresponding schemes, which mainly comprise the following schemes:
firstly, GPU resource limitation is achieved by hijacking a CUDA API, but according to the scheme, a CUDA instruction needs to be modified and replaced according to a CUDA version, and influences can be generated on upper-layer GPU application. And different CUDA versions need to be adapted to different hijacking schemes, and development and maintenance cost is very high.
And secondly, the GPU tasks are scheduled to the GPU in a time-sharing mode in a time slice rotation mode to run in a time-sharing mode, so that the purpose of sharing the GPU is achieved. The scheme also needs to perform customized development on the CUDA, and the compatibility and maintainability are not good. In addition, in such a scheme mode of time division multiplexing, the cost of context switching of the GPU caused by task switching is very large, and performance loss is not small.
And thirdly, a plurality of deep learning tasks are enabled to run on one GPU simultaneously by modifying the deep learning framework. However, the development and maintenance difficulty of the scheme for modifying the deep learning framework is high, the modification scheme can be provided only for a certain version of a certain framework, the risk of incompatibility with user codes is high, and the method is not suitable for most of GPU tasks. The scheme has little practical significance.
Based on the situation, the incompatibility of the CUDA versions brings great trouble to GPU sharing and isolation, the GPU resource allocation scheme can be adapted to all the CUDA versions simultaneously, and the situation that performance overhead is high, development period is long, maintenance cost is high, and influence is caused to upper-layer application is avoided.
Disclosure of Invention
In view of this, the present application provides a method, an apparatus, a device and a readable storage medium for allocating GPU resources, which are adaptable to different CUDA versions, and have low performance overhead, short development period and low maintenance cost.
In order to achieve the above object, the following solutions are proposed:
a GPU resource allocation method comprises the following steps:
determining whether an instruction currently executed by a target program is a mounted CUDA instruction, wherein the mounted CUDA instruction is a resource allocation CUDA instruction mounted with an eBPF label;
if yes, triggering an eBPF process to perform execution audit on GPU resource quantity through an eBPF label mounted with the mounted CUDA instruction;
obtaining an execution audit result returned by the eBPF process, wherein the execution audit result is determined based on the rated GPU resource amount corresponding to the currently executed instruction and the resource amount occupied by the current GPU in real time;
if the execution auditing result is that auditing is passed, allowing GPU resource allocation by calling the currently executed instruction;
and if the audit execution result is that the audit is not passed, limiting GPU resource allocation according to a preset isolation mode.
Preferably, the process of mounting the eBPF tag to the resource allocation CUDA instruction includes:
acquiring a current CUDA version;
reading a logical address of the resource allocation CUDA instruction corresponding to the current CUDA version from a stored configuration file;
mounting an eBPF tag to a logical address of the resource allocation CUDA instruction.
Preferably, the eBPF process determines the returned execution audit result, and includes:
the eBPF process determines the idle resource amount of the current GPU according to the real-time occupied resource amount of the current GPU and a preset GPU resource upper limit allowed to be used;
the eBPF process determines whether the rated GPU resource amount corresponding to the currently executed instruction exceeds the current GPU idle resource amount;
and if so, the execution audit result returned by the eBPF process is that the audit does not pass, otherwise, the audit passes.
Preferably, the limiting the allocation according to the preset isolation mode includes:
limiting GPU resource allocation according to GPU hard isolation;
or the like, or, alternatively,
and limiting GPU resource allocation according to GPU soft isolation.
Preferably, the limiting GPU resource allocation according to GPU hard isolation includes:
and suspending the execution of the GPU resource allocation and prompting that the resource allocation is insufficient.
Preferably, the limiting GPU resource allocation according to GPU soft isolation includes:
determining whether the current GPU idle resource amount is empty;
if yes, suspending the GPU resource allocation;
and if not, allowing the GPU resource allocation to be carried out by calling the currently executed instruction.
Preferably, the resource allocation CUDA instruction includes a memory allocation CUDA instruction, a computational allocation CUDA instruction, and a device information CUDA instruction.
A GPU resource allocation apparatus, comprising:
the instruction determining unit is used for determining whether an instruction currently executed by a target program is a mounted CUDA (compute unified device architecture) instruction, wherein the mounted CUDA instruction is a resource allocation CUDA instruction mounted with an eBPF (edge bridge propagation) tag;
the execution auditing unit is used for triggering an eBPF process to execute the execution auditing of the GPU resource amount through an eBPF label mounted with the mounted CUDA instruction when the currently executed instruction of the target program is the mounted CUDA instruction;
the result obtaining unit is used for obtaining an execution auditing result returned by the eBPF process, and the execution auditing result is determined based on the rated GPU resource amount corresponding to the currently executed instruction and the resource amount occupied by the current GPU in real time;
the resource allocation unit is used for allowing GPU resource allocation through calling the currently executed instruction under the condition that the execution audit result is that the audit is passed;
and the resource limiting unit is used for limiting GPU resource allocation according to a preset isolation mode under the condition that the execution audit result is that the audit is not passed.
A GPU resource allocation device comprising a memory and a processor;
the memory is used for storing programs;
the processor is configured to execute the program to implement the steps of the GPU resource allocation method.
A readable storage medium, on which a computer program is stored, which, when executed by a processor, performs the steps of the GPU resource allocation method described above.
According to the technical scheme, whether the currently executed instruction of the target program is the mounted CUDA instruction is determined, and the mounted CUDA instruction is the resource allocation CUDA instruction mounted with the eBPF label. And if the currently executed instruction is a mounted CUDA instruction, triggering an eBPF process to perform execution audit on the GPU resource amount through an eBPF label mounted with the mounted CUDA instruction. And obtaining an execution audit result returned by the eBPF process, wherein the execution audit result is determined based on the rated GPU resource amount corresponding to the currently executed instruction and the resource amount occupied by the current GPU in real time. And if the execution audit result is that the audit is passed, allowing the GPU resource allocation to be carried out by calling the currently executed instruction, and if the execution audit result is that the audit is not passed, limiting the GPU resource allocation according to a preset isolation mode.
According to the method and the device, an instruction triggering mechanism is adopted, when a CUDA instruction for resource allocation is executed, the eBPF process is triggered through the eBPF label, and therefore execution audit of GPU resource amount and follow-up GPU resource allocation limitation are achieved. Because the eBPF process runs in the kernel, a large amount of loads such as data copying, system calling, context switching and the like on a user space and a kernel space are saved, and therefore the performance overhead of the method is low. According to the method and the device, the limitation of GPU resource allocation is realized by mounting the eBPF process on the CUDA instruction, any CUDA instruction or other system codes do not need to be modified, and corresponding CUDA instructions exist in various CUDA versions, so that the method and the device can be adapted to various CUDA versions and do not need to develop aiming at the CUDA versions.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a schematic diagram of an eBPF tag mounting process according to an embodiment of the present application;
FIG. 2 is a flowchart of a GPU resource allocation method disclosed in the present application;
FIG. 3 is a diagram illustrating exemplary GPU resource allocation;
FIG. 4 is a diagram illustrating exemplary GPU task processing;
FIG. 5 is a block diagram of a GPU resource allocation apparatus disclosed in the present application;
fig. 6 is a block diagram of a hardware structure of a GPU resource allocation apparatus disclosed in the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The following is a description of the present application, which proposes the following technical solutions, and is referred to in detail below.
Before describing the process of specifically implementing GPU resource allocation in the present application, a process of mounting an eBPF tag to a resource allocation CUDA instruction is introduced first.
GPU resource allocation realized by the method is based on the idea of CUDA hijacking, and an eBPF technology is introduced to realize the hijacking of the CUDA. Because the CUDA Driver library is also a program running in a user mode, the eBPF label can be used for binding the instruction related to the resource allocation in the CUDA Driver library, and the resource allocation of the GPU is hijacked through the eBPF process, so that the purpose of controlling the resource allocation of the GPU is achieved.
The eBPF is a top-level submodule in a Linux kernel, can be simply understood as a small virtual machine in the kernel, and supports the user mode to inject a section of kernel code written in the C language into the kernel for running. The eBPF can flexibly modify the kernel processing strategy under the condition of not modifying the kernel code. The eBPF process is operated by a kernel when an event is triggered, and is a function hook or event-driven programming form, wherein an eBPF label can be mounted to a specified mounting point in a CUDA Driver dynamic library libcua. An eBPF process may have multiple eBPF tags that may bind to multiple mount points. And when the target program is executed to the mounted instruction, jumping to execute the eBPF process, and jumping back to the mounting point to continue to run the target program after the execution is finished. The whole execution process does not need to make invasive modification on the target program.
The following describes, with reference to fig. 1, a process of mounting an eBPF tag to a resource allocation CUDA instruction, which may specifically include:
first, the current CUDA version is obtained.
Specifically, the resource allocation related instructions of different CUDA versions are the same, but the logical addresses of the resource allocation related instructions are different, and the loader needs to automatically scan the local CUDA version to determine the logical address of the resource allocation related instruction of the current CUDA version.
And secondly, reading the logical address of the resource allocation CUDA instruction corresponding to the current CUDA version from the stored configuration file.
Specifically, the configuration file stores the logical address of the instruction related to resource allocation in each CUDA version, that is, the logical address of the CUDA instruction to be hijacked for resource allocation. The local CUDA version is obtained through scanning by the loader, and the logic address information of the resource allocation CUDA instruction corresponding to the current CUDA version is read from the configuration file.
Thereafter, the eBPF tag is mounted to the logical address of the resource allocation CUDA instruction.
Specifically, the eBPF label is mounted to the logical address of the resource allocation CUDA instruction corresponding to the current CUDA version. After the mounting, when the user program calls the corresponding instruction in the CUDA library to allocate the resource, the mounted eBPF label triggers the eBPF process, and the resource allocation is controlled.
It can be understood that after the eBPF tag is mounted to the resource allocation CUDA instruction, when the user program executes the resource allocation CUDA instruction in the CUDA library to perform resource allocation, the eBPF process is triggered by the mounted eBPF tag, and the resource allocation process is controlled, so as to achieve the purpose of resource limitation. And reading the logical address of the resource allocation CUDA instruction corresponding to the current CUDA version from the stored configuration file, and implementing eBPF label mounting, so that the application can adapt to the environment of each CUDA version. The GPU resources can be distributed under various CUDA version environments only by acquiring the logical addresses of the resource distribution CUDA instructions of the CUDA libraries of various versions, summarizing and forming a configuration file, storing the configuration file, identifying and reading the configuration file during mounting and mounting.
Fig. 2 is a flowchart of a GPU resource allocation method disclosed in the embodiment of the present application, which introduces the GPU resource allocation method based on the above process of mounting the eBPF tag to the resource allocation CUDA instruction, and with reference to fig. 2, the method may include:
and step S1, determining whether the instruction currently executed by the target program is a mounted CUDA instruction.
Specifically, the mount CUDA instruction is a resource allocation CUDA instruction with an eBPF tag mounted thereon. Because the process of mounting the eBPF label to the resource allocation CUDA instruction is completed before GPU resource allocation is carried out, whether the currently executed instruction of the target program is the resource allocation related instruction which needs to be limited can be determined only by determining whether the currently executed instruction of the target program is the mounted CUDA instruction.
And if the currently executed instruction is the mounted CUDA instruction, executing the step S2, and triggering the eBPF process to perform execution audit on the GPU resource amount through the eBPF label mounted with the mounted CUDA instruction.
Specifically, as shown in fig. 3, the target program sequentially calls and executes the programs in libcad. The eBPF process acquires CUDA execution parameters, and determines whether the resource allocation application passes through the current resource allocation application or not by comparing the rated GPU resource amount of the user task corresponding to the currently executed instruction with the GPU resource amount occupied currently in real time, namely, the execution audit is determined.
The eBPF process specifically comprises an eBPF program, eBPF Maps and a control program, the eBPF program can acquire CUDA execution parameters such as currently distributed display stock, applied SM number and the like and sends the parameters to the control program through the eBPF Maps, the control program determines whether the resource distribution application passes through the resource distribution application or not by comparing the rated GPU resource quantity with the resource quantity occupied by the GPU in real time, the audit result is sent to the eBPF program through the eBPF Maps, the eBPF program executes the audit result, if the audit result passes through the resource distribution application, a CUDA instruction is normally called to carry out the resource distribution, and if the audit result does not pass through the resource distribution application, the resource distribution is limited.
And step S3, obtaining an execution audit result returned by the eBPF process.
Specifically, the execution auditing result is determined based on the rated GPU resource amount corresponding to the currently executed instruction and the resource amount occupied by the current GPU in real time, and the execution auditing result comprises auditing pass and auditing non-pass. As shown in FIG. 3, the eBPF process returns and executes the audit result, if the audit result passes, the normally executed CUDA instruction is allowed to be normally called for resource allocation, and otherwise, the resource allocation is limited.
And if the execution auditing result is that the auditing is passed, executing the step S4, and allowing GPU resource allocation by calling the currently executed instruction.
And if the audit execution result is that the audit is not passed, executing the step S5, and limiting GPU resource allocation according to a preset isolation mode.
Specifically, the isolation mode is divided into a hard isolation mode and a soft isolation mode, wherein the hard isolation mode means that the resource amount occupied by the current GPU in real time reaches an upper limit, and the resource allocation related task is interrupted and the shortage of resources is prompted; the soft isolation means that the amount of resources occupied by the current GPU in real time reaches an upper limit, whether the current GPU has idle resource amount is further determined, and the resource limit of the GPU is properly relaxed.
As can be seen from the foregoing technical solutions, in the GPU resource allocation method provided in the embodiments of the present application, by determining whether an instruction currently executed by a target program is a mount CUDA instruction, the mount CUDA instruction is a resource allocation CUDA instruction mounted with an eBPF tag. And if the currently executed instruction is a mounted CUDA instruction, triggering an eBPF process to perform execution audit on the GPU resource amount through an eBPF label mounted with the mounted CUDA instruction. And obtaining an execution audit result returned by the eBPF process, wherein the execution audit result is determined based on the rated GPU resource amount corresponding to the currently executed instruction and the resource amount occupied by the current GPU in real time. And if the execution audit result is that the audit is passed, allowing the GPU resource allocation to be carried out by calling the currently executed instruction, and if the execution audit result is that the audit is not passed, limiting the GPU resource allocation according to a preset isolation mode.
According to the method and the device, an instruction triggering mechanism is adopted, when a CUDA instruction for resource allocation is executed, the eBPF process is triggered through the eBPF label, and therefore execution audit of GPU resource amount and follow-up GPU resource allocation limitation are achieved. Because the eBPF process runs in the kernel, a large amount of loads such as data copying, system calling, context switching and the like on a user space and a kernel space are saved, and therefore the performance overhead of the method is low. According to the method and the device, the limitation of GPU resource allocation is realized by mounting the eBPF process on the CUDA instruction, any CUDA instruction or other system codes do not need to be modified, and corresponding CUDA instructions exist in various CUDA versions, so that the method and the device can be adapted to various CUDA versions and do not need to develop aiming at the CUDA versions.
Optionally, the resource allocation CUDA instruction may include a memory allocation CUDA instruction, a computational allocation CUDA instruction, and a device information CUDA instruction.
Specifically, referring to table 1, the resource allocation CUDA instruction may be specifically divided into a memory allocation CUDA instruction, a computational power allocation CUDA instruction, and a device information CUDA instruction, where a CUDA Driver API column in table 1 lists several memory allocation CUDA instructions, computational power allocation CUDA instructions, and device information CUDA instructions, respectively.
When the video memory allocation CUDA instruction is executed, summarizing and counting the currently used video memory of the GPU, comparing the rated resource amount specified by the video memory allocation CUDA instruction when the task is created, and determining whether to allow the resource allocation request. And if the audit is passed, normally calling video memory allocation to issue resources, otherwise, carrying out restricted allocation according to a specified isolation mode.
When a compute power allocation CUDA instruction is executed, summarizing the total number of SMs used by the current GPU, if the number of SMs used by the current GPU does not reach the upper limit, normally calling a kernel function to start computation, and otherwise, limiting allocation according to a specified isolation mode.
When the API relevant to the equipment information is executed, the total amount of the current GPU video memory is replaced by the amount of the GPU resources appointed when the task is created, and the current idle video memory is replaced by the difference value between the distributed video memory amount and the used video memory amount of the task.
Figure BDA0003413549440000091
TABLE 1
In some embodiments of the present application, the description of the process, in which the eBPF process determines the returned execution audit result, in step S3 may specifically include:
and step S31, determining the current GPU idle resource amount by the eBPF process according to the real-time occupied resource amount of the current GPU and the preset GPU resource upper limit allowed to be used.
Specifically, the eBPF process first obtains the amount of resources occupied by the current GPU in real time and a preset upper limit of resources allowed to use the GPU, and a difference between the preset upper limit of resources allowed to use the GPU and the amount of resources occupied by the current GPU in real time is the amount of idle resources of the current GPU.
Step S32, the eBPF process determines whether the rated GPU resource amount corresponding to the currently executed instruction exceeds the current GPU idle resource amount;
and if so, the execution audit result returned by the eBPF process is that the audit does not pass, otherwise, the audit passes.
Specifically, the rated GPU resource amount corresponding to the currently executed instruction is obtained, and whether the rated GPU resource amount exceeds the current GPU idle resource amount is determined. If so, the audit is executed, namely the audit is not passed, otherwise, the audit is passed.
In some embodiments of the present application, two optional implementations are provided for the process of limiting GPU resource allocation according to the preset isolation mode in step S5, and the following two optional implementations may specifically include:
first, GPU resource allocation is limited according to GPU hard isolation.
Specifically, under the condition of hard isolation, if the execution audit result returned by the eBPF process is that the audit does not pass, the process of limiting GPU resource allocation by GPU hard isolation includes suspending execution of this GPU resource allocation and prompting that resource allocation is insufficient.
Second, GPU resource allocation is limited according to GPU soft isolation.
The process of limiting GPU resource allocation by GPU soft isolation comprises the following steps:
determining whether the current GPU idle resource amount is empty;
if yes, suspending the GPU resource allocation;
and if not, allowing the GPU resource allocation to be carried out by calling the currently executed instruction.
Specifically, under the condition of soft isolation, if the execution audit result returned by the eBPF process is that the audit does not pass, it is further determined whether the current GPU idle resource amount is empty. And when the GPU idle resource amount is empty, suspending the execution of the GPU resource allocation. And when the GPU idle resource amount is not empty, the current GPU idle resource amount does not meet the rated GPU resource amount corresponding to the currently executed instruction, but the resource amount occupied by the current GPU in real time does not reach the preset upper limit of the GPU resource allowed to be used, the GPU resource limit can be properly relaxed under the condition of soft isolation, and the GPU resource allocation is allowed to be carried out by calling the currently executed instruction.
It can be understood that, the embodiments of the present application for isolation limitation on GPU resource allocation include, but are not limited to, the above two, and the present application emphasizes that when the execution audit result is that the audit does not pass, that is, the rated GPU resource amount corresponding to the currently executed instruction exceeds the current GPU idle resource amount, the isolation limitation on GPU resource allocation is performed, and all the methods that meet the actual preset isolation limitation should belong to the protection scope of the present application.
Next, with reference to fig. 4, a practical application of the present application will be described by using a GPU task processing example.
Firstly, at the initial stage of system operation, the sharing module reads local GPU information, divides a single physical GPU into 100 unit resources in an equal proportion, and each GPU resource has 1/100 video memory and computing power resources, wherein the video memory is divided by total video memory amount/100, and the computing power is divided by SM amount/100. For example, if a physical GPU has 16000M video memory and 80 SMs, then 1 GPU resource after division represents 1600M video memory and 8 SMs. Since the essence of calling and executing the CUDA instruction for GPU calculation is to specify SMs to run kernel functions, each SM contains a large number of basic arithmetic instruction execution units SP, and the SP performs the actual calculation. The present application recognizes that the quantification of the GPU computing power may be defined by the number of SMs, and the computing power distribution ratio is controlled proportionally by controlling the number of SMs allocated to the task.
After a user creates a GPU task, a corresponding CUDA instruction is generated and enters libcua. The sharing module specifies a GPU ID, a rated GPU resource amount, an isolation mode and the like to be used when a GPU task is created according to a processing GPU sharing scheduling logic. The GPU ID refers to the GPU ID in the host, the serial number is started from 0, and if the No. 0 GPU is the first GPU card; the rated GPU resource amount refers to the predicted GPU resource limit required by the task, for example, 20/100 GPUs are used as 20 GPU resources; the isolation mode is divided into hard isolation and soft isolation.
The sharing module determines whether to allocate the GPU to the task under creation according to the current GPU resource condition. For example, 1 GPU has 100 GPU resource shares, and each task applies for 20 GPUs, the physical GPU can be allocated to only 5 tasks at most, and no other task allocation request to the GPU is received until the 5 tasks are completed. After the GPU allocation is completed, the sharing module stores information such as the task ID, the bound GPU ID, the rated GPU resource amount, the isolation mode and the like into a database for archiving.
So, when the instruction in libcpu executes one by one, when the instruction is executed to the resource allocation CUDA instruction which is hung with the eBPF process, the eBPF process is triggered. And the control program acquires the rated GPU resource amount of the current resource allocation CUDA instruction from the database, determines whether to execute the current resource allocation CUDA instruction or not, and sends an audit result to the eBPF program through the eBPF Maps. And if the audit result is that the audit is passed, allowing the eBPF program to execute a resource allocation CUDA instruction, performing subsequent GPU resource allocation according to the preset GPU ID and the rated GPU resource amount, and if the audit result is that the audit is not passed, performing subsequent GPU resource allocation limitation by the eBPF program through a preset isolation mode.
The following describes the GPU resource allocation apparatus provided in the embodiments of the present application, and the GPU resource allocation apparatus described below and the GPU resource allocation method described above may be referred to correspondingly.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a GPU resource allocation device disclosed in the present embodiment.
As shown in fig. 5, the apparatus may include:
the instruction determining unit is used for determining whether an instruction currently executed by a target program is a mounted CUDA (compute unified device architecture) instruction, wherein the mounted CUDA instruction is a resource allocation CUDA instruction mounted with an eBPF (edge bridge propagation) tag;
the execution auditing unit is used for triggering an eBPF process to execute the execution auditing of the GPU resource amount through an eBPF label mounted with the mounted CUDA instruction when the currently executed instruction of the target program is the mounted CUDA instruction;
the result obtaining unit is used for obtaining an execution auditing result returned by the eBPF process, and the execution auditing result is determined based on the rated GPU resource amount corresponding to the currently executed instruction and the resource amount occupied by the current GPU in real time;
the resource allocation unit is used for allowing GPU resource allocation through calling the currently executed instruction under the condition that the execution audit result is that the audit is passed;
and the resource limiting unit is used for limiting GPU resource allocation according to a preset isolation mode under the condition that the execution audit result is that the audit is not passed.
As can be seen from the foregoing technical solutions, in the GPU resource allocation device provided in this embodiment of the present application, by determining whether an instruction currently executed by a target program is a mount CUDA instruction, the mount CUDA instruction is a resource allocation CUDA instruction mounted with an eBPF tag. And if the currently executed instruction is a mounted CUDA instruction, triggering an eBPF process to perform execution audit on the GPU resource amount through an eBPF label mounted with the mounted CUDA instruction. And obtaining an execution audit result returned by the eBPF process, wherein the execution audit result is determined based on the rated GPU resource amount corresponding to the currently executed instruction and the resource amount occupied by the current GPU in real time. And if the execution audit result is that the audit is passed, allowing the GPU resource allocation to be carried out by calling the currently executed instruction, and if the execution audit result is that the audit is not passed, limiting the GPU resource allocation according to a preset isolation mode.
According to the method and the device, an instruction triggering mechanism is adopted, when a CUDA instruction for resource allocation is executed, the eBPF process is triggered through the eBPF label, and therefore execution audit of GPU resource amount and follow-up GPU resource allocation limitation are achieved. Because the eBPF process runs in the kernel, a large amount of loads such as data copying, system calling, context switching and the like on a user space and a kernel space are saved, and therefore the performance overhead of the method is low. According to the method and the device, the limitation of GPU resource allocation is realized by mounting the eBPF process on the CUDA instruction, any CUDA instruction or other system codes do not need to be modified, and corresponding CUDA instructions exist in various CUDA versions, so that the method and the device can be adapted to various CUDA versions and do not need to develop aiming at the CUDA versions.
Optionally, the GPU resource allocation apparatus may further include a tag mounting unit, configured to perform:
acquiring a current CUDA version;
reading a logical address of the resource allocation CUDA instruction corresponding to the current CUDA version from a stored configuration file;
mounting an eBPF tag to a logical address of the resource allocation CUDA instruction.
Optionally, the resource limiting unit may include a hard isolation unit or a soft isolation unit;
the hard isolation unit is used for limiting GPU resource allocation according to GPU hard isolation;
and the soft isolation unit is used for limiting GPU resource allocation according to GPU soft isolation.
Optionally, the hard isolation unit may be configured to suspend performing the GPU resource allocation this time and prompt that the resource allocation is insufficient.
Optionally, the soft isolation unit may be configured to perform:
determining whether the current GPU idle resource amount is empty;
if yes, suspending the GPU resource allocation;
and if not, allowing the GPU resource allocation to be carried out by calling the currently executed instruction.
Optionally, the resource allocation CUDA instruction may include a memory allocation CUDA instruction, a computational allocation CUDA instruction, and a device information CUDA instruction.
The GPU resource allocation device provided by the embodiment of the application can be applied to GPU resource allocation equipment. Optionally, fig. 6 is a block diagram illustrating a hardware structure of the GPU resource allocation device, and referring to fig. 6, the hardware structure of the GPU resource allocation device may include: at least oneprocessor 1, at least onecommunication interface 2, at least onememory 3 and at least onecommunication bus 4;
in the embodiment of the application, the number of theprocessor 1, thecommunication interface 2, thememory 3 and thecommunication bus 4 is at least one, and theprocessor 1, thecommunication interface 2 and thememory 3 complete mutual communication through thecommunication bus 4;
theprocessor 1 may be a central processing unit CPU, or an application Specific Integrated circuit asic, or one or more Integrated circuits configured to implement embodiments of the present invention, etc.;
thememory 3 may include a high-speed RAM memory, and may further include a non-volatile memory (non-volatile memory) or the like, such as at least one disk memory;
wherein the memory stores a program and the processor can call the program stored in the memory, the program for:
determining whether an instruction currently executed by a target program is a mounted CUDA instruction, wherein the mounted CUDA instruction is a resource allocation CUDA instruction mounted with an eBPF label;
if yes, triggering an eBPF process to perform execution audit on GPU resource quantity through an eBPF label mounted with the mounted CUDA instruction;
obtaining an execution audit result returned by the eBPF process, wherein the execution audit result is determined based on the rated GPU resource amount corresponding to the currently executed instruction and the resource amount occupied by the current GPU in real time;
if the execution auditing result is that auditing is passed, allowing GPU resource allocation by calling the currently executed instruction;
and if the audit execution result is that the audit is not passed, limiting GPU resource allocation according to a preset isolation mode.
Alternatively, the detailed function and the extended function of the program may refer to the above description.
Embodiments of the present application further provide a readable storage medium, where a program suitable for being executed by a processor may be stored, where the program is configured to:
determining whether an instruction currently executed by a target program is a mounted CUDA instruction, wherein the mounted CUDA instruction is a resource allocation CUDA instruction mounted with an eBPF label;
if yes, triggering an eBPF process to perform execution audit on GPU resource quantity through an eBPF label mounted with the mounted CUDA instruction;
obtaining an execution audit result returned by the eBPF process, wherein the execution audit result is determined based on the rated GPU resource amount corresponding to the currently executed instruction and the resource amount occupied by the current GPU in real time;
if the execution auditing result is that auditing is passed, allowing GPU resource allocation by calling the currently executed instruction;
and if the audit execution result is that the audit is not passed, limiting GPU resource allocation according to a preset isolation mode.
Alternatively, the detailed function and the extended function of the program may refer to the above description.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A GPU resource allocation method is characterized by comprising the following steps:
determining whether an instruction currently executed by a target program is a mounted CUDA instruction, wherein the mounted CUDA instruction is a resource allocation CUDA instruction mounted with an eBPF label;
if yes, triggering an eBPF process to perform execution audit on GPU resource quantity through an eBPF label mounted with the mounted CUDA instruction;
obtaining an execution audit result returned by the eBPF process, wherein the execution audit result is determined based on the rated GPU resource amount corresponding to the currently executed instruction and the resource amount occupied by the current GPU in real time;
if the execution auditing result is that auditing is passed, allowing GPU resource allocation by calling the currently executed instruction;
and if the audit execution result is that the audit is not passed, limiting GPU resource allocation according to a preset isolation mode.
2. The method of claim 1, wherein mounting the eBPF tag to a resource allocation CUDA instruction comprises:
acquiring a current CUDA version;
reading a logical address of the resource allocation CUDA instruction corresponding to the current CUDA version from a stored configuration file;
mounting an eBPF tag to a logical address of the resource allocation CUDA instruction.
3. The method of claim 1, wherein the eBPF process determines the returned process of performing audit results, comprising:
the eBPF process determines the idle resource amount of the current GPU according to the real-time occupied resource amount of the current GPU and a preset GPU resource upper limit allowed to be used;
the eBPF process determines whether the rated GPU resource amount corresponding to the currently executed instruction exceeds the current GPU idle resource amount;
and if so, the execution audit result returned by the eBPF process is that the audit does not pass, otherwise, the audit passes.
4. The method of claim 1, wherein limiting allocation according to a preset isolation mode comprises:
limiting GPU resource allocation according to GPU hard isolation;
or the like, or, alternatively,
and limiting GPU resource allocation according to GPU soft isolation.
5. The method of claim 4, wherein limiting GPU resource allocation in terms of GPU hard isolation comprises:
and suspending the execution of the GPU resource allocation and prompting that the resource allocation is insufficient.
6. The method of claim 4, wherein limiting GPU resource allocation in terms of GPU soft isolation comprises:
determining whether the current GPU idle resource amount is empty;
if yes, suspending the GPU resource allocation;
and if not, allowing the GPU resource allocation to be carried out by calling the currently executed instruction.
7. The method of claim 1 wherein the resource allocation CUDA command comprises a memory allocation CUDA command, a computational allocation CUDA command, and a device information CUDA command.
8. A GPU resource allocation apparatus, comprising:
the instruction determining unit is used for determining whether an instruction currently executed by a target program is a mounted CUDA (compute unified device architecture) instruction, wherein the mounted CUDA instruction is a resource allocation CUDA instruction mounted with an eBPF (edge bridge propagation) tag;
the execution auditing unit is used for triggering an eBPF process to execute the execution auditing of the GPU resource amount through an eBPF label mounted with the mounted CUDA instruction when the currently executed instruction of the target program is the mounted CUDA instruction;
the result obtaining unit is used for obtaining an execution auditing result returned by the eBPF process, and the execution auditing result is determined based on the rated GPU resource amount corresponding to the currently executed instruction and the resource amount occupied by the current GPU in real time;
the resource allocation unit is used for allowing GPU resource allocation through calling the currently executed instruction under the condition that the execution audit result is that the audit is passed;
and the resource limiting unit is used for limiting GPU resource allocation according to a preset isolation mode under the condition that the execution audit result is that the audit is not passed.
9. A GPU resource allocation device comprising a memory and a processor;
the memory is used for storing programs;
the processor, configured to execute the program, to implement the steps of the GPU resource allocation method according to any of claims 1-7.
10. A readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the GPU resource allocation method according to any of claims 1-7.
CN202111538135.XA2021-12-152021-12-15GPU resource allocation method, device, equipment and readable storage mediumPendingCN114168344A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202111538135.XACN114168344A (en)2021-12-152021-12-15GPU resource allocation method, device, equipment and readable storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202111538135.XACN114168344A (en)2021-12-152021-12-15GPU resource allocation method, device, equipment and readable storage medium

Publications (1)

Publication NumberPublication Date
CN114168344Atrue CN114168344A (en)2022-03-11

Family

ID=80486894

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202111538135.XAPendingCN114168344A (en)2021-12-152021-12-15GPU resource allocation method, device, equipment and readable storage medium

Country Status (1)

CountryLink
CN (1)CN114168344A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN115063282A (en)*2022-06-022022-09-16杭州海康威视数字技术股份有限公司GPU resource scheduling method, device, equipment and storage medium
CN119440808A (en)*2024-10-122025-02-14中国联合网络通信有限公司广东省分公司 Computing resource allocation control method, device, electronic device and storage medium
WO2025082300A1 (en)*2023-10-192025-04-24华为技术有限公司Resource scheduling method and apparatus, and server

Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
KR20190143248A (en)*2018-06-202019-12-30한국과학기술원Method and System to manage and schedule GPU memory resource in Container-based virtualized environment
CN111078412A (en)*2019-12-122020-04-28中山大学Method for resource management of GPU through API interception
CN112000463A (en)*2020-07-162020-11-27苏州浪潮智能科技有限公司 A CUDA-based GPU resource allocation method, system, terminal and storage medium
CN112256542A (en)*2020-10-192021-01-22中山大学 Microservice system performance detection method, device and system based on eBPF

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
KR20190143248A (en)*2018-06-202019-12-30한국과학기술원Method and System to manage and schedule GPU memory resource in Container-based virtualized environment
CN111078412A (en)*2019-12-122020-04-28中山大学Method for resource management of GPU through API interception
CN112000463A (en)*2020-07-162020-11-27苏州浪潮智能科技有限公司 A CUDA-based GPU resource allocation method, system, terminal and storage medium
CN112256542A (en)*2020-10-192021-01-22中山大学 Microservice system performance detection method, device and system based on eBPF

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
佚名: "详细介绍eBPF的起源和工作原理及作用", Retrieved from the Internet <URL:https://www.elecfans.com/d/1590845.html>*

Cited By (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN115063282A (en)*2022-06-022022-09-16杭州海康威视数字技术股份有限公司GPU resource scheduling method, device, equipment and storage medium
CN115063282B (en)*2022-06-022025-09-02杭州海康威视数字技术股份有限公司 GPU resource scheduling method, device, equipment and storage medium
WO2025082300A1 (en)*2023-10-192025-04-24华为技术有限公司Resource scheduling method and apparatus, and server
CN119440808A (en)*2024-10-122025-02-14中国联合网络通信有限公司广东省分公司 Computing resource allocation control method, device, electronic device and storage medium
CN119440808B (en)*2024-10-122025-07-11中国联合网络通信有限公司广东省分公司 Computing resource allocation control method, device, electronic device and storage medium

Similar Documents

PublicationPublication DateTitle
CN114168344A (en)GPU resource allocation method, device, equipment and readable storage medium
CN110888743B (en)GPU resource using method, device and storage medium
CN110192182B (en)Dynamic and dedicated virtualized graphics processing
US5553291A (en)Virtual machine control method and virtual machine system
JP5015665B2 (en) Method, apparatus, and computer program for sharing kernel services between kernels
KR100898315B1 (en)Enhanced runtime hosting
EP1467282B1 (en)Operating systems
CN112445550B (en)Server-free computing method and system for preprocessing function
CN102667714B (en)Support the method and system that the function provided by the resource outside operating system environment is provided
CN111209046A (en)Multitask-oriented embedded SPARC processor operating system design method
CN116578416B (en)Signal-level simulation acceleration method based on GPU virtualization
US11429424B2 (en)Fine-grained application-aware latency optimization for virtual machines at runtime
CN117311990A (en)Resource adjustment method and device, electronic equipment, storage medium and training platform
CN110162397B (en)Resource allocation method, device and system
US20210389994A1 (en)Automated performance tuning using workload profiling in a distributed computing environment
CN116991553A (en) A virtual GPU allocation method and system in a container cloud environment based on API interception and forwarding
CN112346835A (en) A coroutine-based scheduling processing method and system
CN106250217A (en)Synchronous dispatching method between a kind of many virtual processors and dispatching patcher thereof
CN113419820A (en)Deployment method of real-time virtual machine and cloud platform
EP2943877A1 (en)Method and apparatus for exploiting data locality in dynamic task scheduling
CN118819864A (en) Resource unified scheduling method and system for multiple types of loads
CN115617467A (en)Task processing method and device, electronic equipment and storage medium
JP2007280397A (en)Method for loading program by computer system including a plurality of processing nodes, computer readable medium including program, and parallel computer system
JP2690435B2 (en) Multiprocessor system having microprogram means for dispatching processing to a processor
US20090187911A1 (en)Computer device with reserved memory for priority applications

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp