The resource upper limit	Distribute resource requirement	Surplus resources	It can continue to distribute
				<12,24,3>	<3,12,3>	<9,12,0>	Energy
<9,12,0>	<Isosorbide-5-Nitrae, 1>	<8,8, -1>	Can not
				<8,8, -1>	<4,4,0>	<4,4, -1>	Can not

(1) scheduling of resource of the present invention can complete CPU, internal memory and GPU scheduling, primaryYARN carries three kinds of schedulers, including FlFO, Capacity scheduler and FairScheduler, but they do not support GPU scheduling.Present invention improves over these three schedulers,When GPU resource is insufficient for the first time, GPU resource above quota distribution is not done；When GPU resource is zeroWhen, and the distribution of CPU, memory source continues, and is not zero to be influenceed by wherein GPU resource.ItCan support GPU scheduling.

Below with Fair Scheduler primary resource equity dispatching (Dominant ResourceFairness, DRF) exemplified by, DRF scheduling process is illustrated, other scheduling strategies are similar to be neededGPU is processed.

Assuming that system total resources are<12CPU, 24GB, 3GPU>, there are two users of A, B,Their each task resource demand is<1CPU, 3GB, 0GPU>With<2CPU, 2GB, 1GPU>, then A primary resource is internal memory, and B primary resource is GPU.DRF assigning process is as followsTable：

Section three, the execution Container processes of step 4 in resource dispatching model are described below.

Primary YARN Container does not support GPU, of the invention in order to support GPU, in order toOvercome two technical problems：(1) YARN resource models do not possess identical under the same node of differentiationThe ability of the distinct device of resource, (2) GPU is once can only to run a task.Present invention employsThe method of dynamic binding is as follows：

One) node manager safeguards the running state information of all devices on node；Implementation canTo use following Hash table, but it is not restricted to the method.

Two) gpu_info=Map<Gpu_type, gpu_device_id, container_id>；ShouldHash table illustrates the corresponding relation between current time GPU equipment and Container.

Three) when there is new Container to start, chosen from gpu_info idle GPU andContainer dynamic bindings, reach GPU resource managing on cluster.

gpuClient

In order to GPU program is run on our YARN clusters, on invention clusterGPU running environment, mainly it is made up of gpuClient and gpuApplication Master.It is non-GPU application still uses primary Client and Application Master.

GpuClient provides the interface of user and YARN interactions, interactive application, application programState, attribute etc., communication protocol are gpuClient and Resource Manager RPC agreements.Its flow is shown in Fig. 4, and step is as follows：

1. initialize and obtain GPU application program gpuApplication ID；

2. gpuApplication Master running environment is set, including environmental variance, order, ginsengNumber, resource etc.；

3. submit gpuApplion Master to Resource Manager.

Cycle obtains gpuApplication Master state, and do in exception abnormality processing orAt the end of quit a program.

On the one hand gpuApplication Master communicate with Resource Manager applies for resource,Especially GPU resource, on the one hand interacted with Node Manager to start and monitor task etc., separatelyOuter secondary distribution for also having resource especially GPU resource etc..Application relevant non-GPU is still using primaryApplication Master.

GpuApplication Master are the programme-control applied for the GPU of computation-intensiveDevice, mainly there are container startup and removal process.The process 5 of corresponding diagram 3.

(1) after gpuApplication Master receive new Container, to NodeManager requests start Container, and send the information needed for startup Container, includingStart the especially GPU resource such as order, environmental variance, parameter, various resources；Node manager is bornDuty performs the order that Container is specified；Meanwhile gpuApplication is to Resource MangerContinue the resource bid of follow-up work, the two processes are constantly carried out until all tasks are assigned.

After Container starts, Node Manager notice gpuApplicatoin Master,GpuApplication starts the state for the Container that detection starts and maintenance, collects information and judgesWhether program is completed.It should be appreciated that the above-mentioned embodiment of the present invention is used only for showingExample property illustrates or explains the principle of the present invention, without being construed as limiting the invention.Therefore, not inclinedAny modification, equivalent substitution and improvements done in the case of from the spirit and scope of the present invention etc.,It should be included within protection scope of the present invention.In addition, appended claims of the present invention are intended toEnter whole changes in scope and border or this scope and the equivalents on borderChange and modification.

Claims

1. a kind of resource management dispatching method towards GPGPU clusters based on YARN, it is specialSign is, comprises the following steps：

S02：Explorer response node manager, trigger the NODE_UPDATE things of schedulerPart；

S03：Scheduler distributes container according to scheduling strategy on node, the container report being assignedAccuse to explorer, and be added to resource allocation list；The scheduling strategy be GPU resource notAbove quota distribution, zero GPU resource continue to the distribution of CPU and internal memory；

2. the resource pipe towards GPGPU clusters according to claim 1 based on YARNManage dispatching method, it is characterised in that resource is represented with multi-component system,<X CPU, Y GB, z GPU>Or<X, y, z>, represent x CPU, y GB internal memories and z GPU resource.

3. the resource pipe towards GPGPU clusters according to claim 1 based on YARNManage dispatching method, it is characterised in that in the step S03 GPU resource identification and binding includeFollowing steps：

S12：The corresponding relation between current time GPU equipment and container, institute are represented using Hash tableIt is gpu_info=Map ＜ gpu_type, gpu_device_id, container_id to state Hash table>；

4. the resource management towards GPGPU clusters according to claim 1 based on YARNDispatching method, it is characterised in that GpuClient provides the interface of user and YARN interactions, interactionApplication program, Application Status, attribute；Communication steps are as follows：

S21：Initialize and obtain GPU application programs ID；

S23：GpuApplion Master are submitted to explorer；

5. the resource pipe towards GPGPU clusters according to claim 1 based on YARNManage dispatching method, it is characterised in that GPU application managers are used for explorer communication ShenPlease resource, interacted with node manager to start and monitor task, secondary point carried out to GPU resourceMatch somebody with somebody.

6. the money towards GPGPU clusters based on YARN according to claim 1 or 5Source control dispatching method, it is characterised in that the GPU application managers are used to realize containerStartup and recovery；Comprise the following steps：

S31：After GPU application managers receive new container, opened to node manager requestVisibly moved device, and send start container needed for information, including start order, environmental variance, parameter,The especially GPU resource such as various resources；Node manager is responsible for performing the order that container is specified；MeanwhileGPU application programs continue the resource bid of follow-up work to explorer, and the two processes are constantly enteredRow is assigned until all tasks；

S32：After container starts, node manager notice GPU application managers, GPU applicationsProgram starts the state for the container that detection starts and maintenance, collects whether information determining program is completed.