In some embodiments of the present application, the description of the process, in which the eBPF process determines the returned execution audit result, in step S3 may specifically include:

and step S31, determining the current GPU idle resource amount by the eBPF process according to the real-time occupied resource amount of the current GPU and the preset GPU resource upper limit allowed to be used.

Specifically, the eBPF process first obtains the amount of resources occupied by the current GPU in real time and a preset upper limit of resources allowed to use the GPU, and a difference between the preset upper limit of resources allowed to use the GPU and the amount of resources occupied by the current GPU in real time is the amount of idle resources of the current GPU.

Step S32, the eBPF process determines whether the rated GPU resource amount corresponding to the currently executed instruction exceeds the current GPU idle resource amount;

Specifically, the rated GPU resource amount corresponding to the currently executed instruction is obtained, and whether the rated GPU resource amount exceeds the current GPU idle resource amount is determined. If so, the audit is executed, namely the audit is not passed, otherwise, the audit is passed.

In some embodiments of the present application, two optional implementations are provided for the process of limiting GPU resource allocation according to the preset isolation mode in step S5, and the following two optional implementations may specifically include:

first, GPU resource allocation is limited according to GPU hard isolation.

Specifically, under the condition of hard isolation, if the execution audit result returned by the eBPF process is that the audit does not pass, the process of limiting GPU resource allocation by GPU hard isolation includes suspending execution of this GPU resource allocation and prompting that resource allocation is insufficient.

Second, GPU resource allocation is limited according to GPU soft isolation.

The process of limiting GPU resource allocation by GPU soft isolation comprises the following steps:

determining whether the current GPU idle resource amount is empty;

if yes, suspending the GPU resource allocation;

Specifically, under the condition of soft isolation, if the execution audit result returned by the eBPF process is that the audit does not pass, it is further determined whether the current GPU idle resource amount is empty. And when the GPU idle resource amount is empty, suspending the execution of the GPU resource allocation. And when the GPU idle resource amount is not empty, the current GPU idle resource amount does not meet the rated GPU resource amount corresponding to the currently executed instruction, but the resource amount occupied by the current GPU in real time does not reach the preset upper limit of the GPU resource allowed to be used, the GPU resource limit can be properly relaxed under the condition of soft isolation, and the GPU resource allocation is allowed to be carried out by calling the currently executed instruction.

It can be understood that, the embodiments of the present application for isolation limitation on GPU resource allocation include, but are not limited to, the above two, and the present application emphasizes that when the execution audit result is that the audit does not pass, that is, the rated GPU resource amount corresponding to the currently executed instruction exceeds the current GPU idle resource amount, the isolation limitation on GPU resource allocation is performed, and all the methods that meet the actual preset isolation limitation should belong to the protection scope of the present application.

Next, with reference to fig. 4, a practical application of the present application will be described by using a GPU task processing example.

Firstly, at the initial stage of system operation, the sharing module reads local GPU information, divides a single physical GPU into 100 unit resources in an equal proportion, and each GPU resource has 1/100 video memory and computing power resources, wherein the video memory is divided by total video memory amount/100, and the computing power is divided by SM amount/100. For example, if a physical GPU has 16000M video memory and 80 SMs, then 1 GPU resource after division represents 1600M video memory and 8 SMs. Since the essence of calling and executing the CUDA instruction for GPU calculation is to specify SMs to run kernel functions, each SM contains a large number of basic arithmetic instruction execution units SP, and the SP performs the actual calculation. The present application recognizes that the quantification of the GPU computing power may be defined by the number of SMs, and the computing power distribution ratio is controlled proportionally by controlling the number of SMs allocated to the task.

After a user creates a GPU task, a corresponding CUDA instruction is generated and enters libcua. The sharing module specifies a GPU ID, a rated GPU resource amount, an isolation mode and the like to be used when a GPU task is created according to a processing GPU sharing scheduling logic. The GPU ID refers to the GPU ID in the host, the serial number is started from 0, and if the No. 0 GPU is the first GPU card; the rated GPU resource amount refers to the predicted GPU resource limit required by the task, for example, 20/100 GPUs are used as 20 GPU resources; the isolation mode is divided into hard isolation and soft isolation.

The sharing module determines whether to allocate the GPU to the task under creation according to the current GPU resource condition. For example, 1 GPU has 100 GPU resource shares, and each task applies for 20 GPUs, the physical GPU can be allocated to only 5 tasks at most, and no other task allocation request to the GPU is received until the 5 tasks are completed. After the GPU allocation is completed, the sharing module stores information such as the task ID, the bound GPU ID, the rated GPU resource amount, the isolation mode and the like into a database for archiving.

So, when the instruction in libcpu executes one by one, when the instruction is executed to the resource allocation CUDA instruction which is hung with the eBPF process, the eBPF process is triggered. And the control program acquires the rated GPU resource amount of the current resource allocation CUDA instruction from the database, determines whether to execute the current resource allocation CUDA instruction or not, and sends an audit result to the eBPF program through the eBPF Maps. And if the audit result is that the audit is passed, allowing the eBPF program to execute a resource allocation CUDA instruction, performing subsequent GPU resource allocation according to the preset GPU ID and the rated GPU resource amount, and if the audit result is that the audit is not passed, performing subsequent GPU resource allocation limitation by the eBPF program through a preset isolation mode.

The following describes the GPU resource allocation apparatus provided in the embodiments of the present application, and the GPU resource allocation apparatus described below and the GPU resource allocation method described above may be referred to correspondingly.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a GPU resource allocation device disclosed in the present embodiment.

As shown in fig. 5, the apparatus may include:

As can be seen from the foregoing technical solutions, in the GPU resource allocation device provided in this embodiment of the present application, by determining whether an instruction currently executed by a target program is a mount CUDA instruction, the mount CUDA instruction is a resource allocation CUDA instruction mounted with an eBPF tag. And if the currently executed instruction is a mounted CUDA instruction, triggering an eBPF process to perform execution audit on the GPU resource amount through an eBPF label mounted with the mounted CUDA instruction. And obtaining an execution audit result returned by the eBPF process, wherein the execution audit result is determined based on the rated GPU resource amount corresponding to the currently executed instruction and the resource amount occupied by the current GPU in real time. And if the execution audit result is that the audit is passed, allowing the GPU resource allocation to be carried out by calling the currently executed instruction, and if the execution audit result is that the audit is not passed, limiting the GPU resource allocation according to a preset isolation mode.

Optionally, the GPU resource allocation apparatus may further include a tag mounting unit, configured to perform:

acquiring a current CUDA version;

Optionally, the resource limiting unit may include a hard isolation unit or a soft isolation unit;

the hard isolation unit is used for limiting GPU resource allocation according to GPU hard isolation;

and the soft isolation unit is used for limiting GPU resource allocation according to GPU soft isolation.

Optionally, the hard isolation unit may be configured to suspend performing the GPU resource allocation this time and prompt that the resource allocation is insufficient.

Optionally, the soft isolation unit may be configured to perform:

determining whether the current GPU idle resource amount is empty;

if yes, suspending the GPU resource allocation;

The GPU resource allocation device provided by the embodiment of the application can be applied to GPU resource allocation equipment. Optionally, fig. 6 is a block diagram illustrating a hardware structure of the GPU resource allocation device, and referring to fig. 6, the hardware structure of the GPU resource allocation device may include: at least oneprocessor 1, at least onecommunication interface 2, at least onememory 3 and at least onecommunication bus 4;

in the embodiment of the application, the number of theprocessor 1, thecommunication interface 2, thememory 3 and thecommunication bus 4 is at least one, and theprocessor 1, thecommunication interface 2 and thememory 3 complete mutual communication through thecommunication bus 4;

theprocessor 1 may be a central processing unit CPU, or an application Specific Integrated circuit asic, or one or more Integrated circuits configured to implement embodiments of the present invention, etc.;

thememory 3 may include a high-speed RAM memory, and may further include a non-volatile memory (non-volatile memory) or the like, such as at least one disk memory;

wherein the memory stores a program and the processor can call the program stored in the memory, the program for:

Alternatively, the detailed function and the extended function of the program may refer to the above description.

Embodiments of the present application further provide a readable storage medium, where a program suitable for being executed by a processor may be stored, where the program is configured to:

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A GPU resource allocation method is characterized by comprising the following steps:

2. The method of claim 1, wherein mounting the eBPF tag to a resource allocation CUDA instruction comprises:

acquiring a current CUDA version;

3. The method of claim 1, wherein the eBPF process determines the returned process of performing audit results, comprising:

4. The method of claim 1, wherein limiting allocation according to a preset isolation mode comprises:

limiting GPU resource allocation according to GPU hard isolation;

or the like, or, alternatively,

and limiting GPU resource allocation according to GPU soft isolation.

5. The method of claim 4, wherein limiting GPU resource allocation in terms of GPU hard isolation comprises:

6. The method of claim 4, wherein limiting GPU resource allocation in terms of GPU soft isolation comprises:

determining whether the current GPU idle resource amount is empty;

if yes, suspending the GPU resource allocation;

7. The method of claim 1 wherein the resource allocation CUDA command comprises a memory allocation CUDA command, a computational allocation CUDA command, and a device information CUDA command.

8. A GPU resource allocation apparatus, comprising:

9. A GPU resource allocation device comprising a memory and a processor;

the memory is used for storing programs;

the processor, configured to execute the program, to implement the steps of the GPU resource allocation method according to any of claims 1-7.

10. A readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the GPU resource allocation method according to any of claims 1-7.