The resource management dispatching method towards GPGPU clusters based on YARNTechnical field
The invention belongs to the resource management techniques field of computer cluster, it is based on more particularly to one kindThe YARN resource management dispatching method towards GPGPU clusters.
Background technology
GPU English full name Graphic Processing Unit, translator of Chinese are " graphics processor ".GPU is a concept relative to CPU, due to modern times computer in (particularly domestic system,The fan of game) processing of figure becomes more and more important, it is necessary to the core of a special figureProcessor.GPU is to show card " heart ", is also equivalent to effects of the CPU in computer, it determinesDetermined the class of the video card and most of performance, video card currently on the market mostly using NVIDIA andThe graph processing chips of the companies of ATI two.
Today, GPU have been no longer limited to 3D graphics process, and GPU general-purpose computations technology developsCausing the concern that industry is many, the fact is also demonstrated that in terms of the part calculating such as floating-point operation, parallel computation,GPU can provide decades of times or even up to a hundred times of CPU performance, so-called GPGPU.
GPGPU full name General Purpose GPU, i.e. general-purpose computations graphics processor.Wherein firstIndividual " GP " general purpose (GeneralPurpose) and second " GP " then represents graphics process(GraphicProcess), the two " GP " collocation is got up i.e. " general graphical processing ".Again plusUpper " U " (Unit) just becomes complete general processor.People always search for various acceleration figuresAs the method for processing, but limited in Floating-point Computation ability by CPU, needed for those in itselfThe image processing operations for wanting high density to calculate, past traditional method realized on CPU, notThere is great progress in process performance and efficiency.As programmable graphics processor unit (GPU) is in propertyRapid development on energy, the technology for accelerating image procossing using GPU are increasingly becoming study hotspot.
At the same time, the maturation with the GPU general-purpose computations standards such as Open CL, CUDA and development, GPUGeneral-purpose computations greatly developed, still, due to GPU be related to be typically highdensity PangBig amount of calculation, general is difficult to be completed within the acceptable time, and so, GPU cluster is arisen at the historic moment.
GPU parallelizations or cluster based on Hadoop, there is many work, such as Mars, MapCG,SkePU etc., but these work mostly be for a GPU or more GPU of single node Map-Reduce simultaneouslyRowization, the parallel computation under multinuclear GPU and GPU mixing platform is realized to a certain extent.SubstantiallyIt is the characteristic with reference to GPU multithreads computings, the programming mode of GPU multithreadings is realizedMapReduce processes, GPU and Hadoop clusters are not combined really.
And these work are mostly the realizations based on first generation Hadoop, not using second generation HadoopYARN, it is difficult to build real GPU elastic calculations cluster.
YARN is a kind of new Hadoop explorers, and with respect to Hadoop1.0, it is by resource pipeReason is separated from Computational frame, is become a universal resource management system, can be carried for upper layer applicationFor unified resource management and scheduling, it is introduced as cluster in utilization rate, resource unified management sumBig advantages are brought according to shared etc..
The primary YARN of Hadoop are responsible for the Organization Chart of Hadoop resource management as shown in figure 1, by resourceManage (Resource Manager or RM), node administration (NodeManager or NM),Application Master (AM) and container (Container) etc. are formed, and primary YARN lacksResource management to GPU.
YARN from certain that meaning for should to be considered be a cloud operating system, it is responsible for clusterResource management.All kinds of application programs, such as batch processing can be developed on operating systemMapReduce, streaming operation Storm and real-time type service Storm etc..These applications can be simultaneouslyComputing capability and abundant Data Storage Models using Hadoop clusters, share same Hadoop collectionGroup and the data resided on cluster.In addition, these new frameworks can also utilize YARN resource pipeManage device, there is provided new application manager is realized.
Current YARN systems support the management and distribution of the Common resources such as CPU, internal memory, but can notSupport the management and distribution of GPU resource.
The content of the invention
For above-mentioned technical problem, the present invention is intended to provide it is a kind of based on YARN towards GPGPUThe resource management dispatching method of cluster.Improve YARN resource models, invent GPU application pipesReason device, United Dispatching of the cluster to GPU is realized on this basis, managed, realize elasticityGPU cluster.
To reach above-mentioned purpose, the technical scheme is that:
A kind of resource management dispatching method towards GPGPU clusters based on YARN, it is characterised in thatComprise the following steps:
S01:Node manager is by periodic heartbeat to explorer report node information;
S02:Explorer response node manager, trigger the NODE_UPDATE events of scheduler;
S03:Scheduler distributes container according to scheduling strategy on node, the container report being assignedAccuse to explorer, and be added to resource allocation list;The scheduling strategy is that GPU resource does not surpassVolume distributes, and zero GPU resource continues to the distribution of CPU and internal memory;
S04:The GPU application managers of GPU application programs send heartbeat to explorer;
S05:Explorer receives heartbeat, updates resource bid list, and by GPU in the listGPU application managers are given in the container response of application program;
S06:GPU application managers obtain container and make the second layer scheduling in double-deck scheduling;
S07:The application manager of resource in addition to GPU is sent into heartbeat to explorer;
S08:Explorer receives heartbeat, updates resource bid list, and GPU will be removed in listOutside resource container response to explorer.
Preferably, resource is represented with multi-component system,<X CPU, Y GB, z GPU>Or<X, y, z>,Represent x CPU, y GB internal memories and z GPU resource.
Preferably, in the step S03 GPU resource identification and binding comprise the following steps:
S11:Node manager safeguards the running state information of all devices on node;
S12:The corresponding relation between current time GPU equipment and container, institute are represented using Hash tableIt is gpu_info=Map to state Hash table<Gpu_type, gpu_device_id, eontainer_id>;
S13:When there is new container to start, selection free device is bound with it from Hash table.
Preferably, the interface of GpuClient offer users and YARN interactions, interactive application,Application Status, attribute;Communication steps are as follows:
S21:Initialize and obtain GPU application programs ID;
S22:GPU application manager running environment, including environmental variance, order, ginseng are setNumber, resource;
S23:GpuApplion Master are submitted to explorer;
S24:The state of GPU application managers is periodically obtained, and exception is done in exceptionReason or at the end of quit a program.
Preferably, GPU application managers, which are used to communicate to explorer, applies for resource, with sectionThe interaction of point manager starts and monitor task, to GPU resource carries out secondary distribution.
Preferably, the GPU application managers are used for the startup and recovery for realizing container;IncludingFollowing steps:
S31:After GPU application managers receive new container, ask to start to node managerContainer, and send start container needed for information, including start order, environmental variance, parameter, respectivelyThe especially GPU resource such as kind resource;Node manager is responsible for performing the order that container is specified;Meanwhile GPUApplication program continues the resource bid of follow-up work to explorer, and the two processes constantly carry out straightIt is assigned to all tasks;
S32:After container starts, node manager notice GPU application managers, GPU application journeysSequence starts the state for the container that detection starts and maintenance, collects whether information determining program is completed.
Compared with prior art, the beneficial effects of the invention are as follows:
Present invention improves over YARN resource models, GPU application managers, are on this basis inventedUnited Dispatching of the cluster to GPU is realized, is managed, realizes the GPU cluster of elasticity.
Brief description of the drawings
Fig. 1 is existing YARN Organization Chart;
Fig. 2 is the Organization Chart of the resource management of the GPU cluster of the invention based on YARN;
Fig. 3 is the resource management dispatching method stream towards GPGPU clusters of the invention based on YARNCheng Tu;
Fig. 4 is gpuClient communication flow diagrams.
Embodiment
To make the object, technical solutions and advantages of the present invention of greater clarity, with reference to specific realityMode and accompanying drawing are applied, the present invention is described in more detail.It should be understood that these descriptions are simply shownExample property, and it is not intended to limit the scope of the present invention.In addition, in the following description, eliminate to public affairsThe description of structure and technology is known, to avoid unnecessarily obscuring idea of the invention.
Embodiment:
As shown in Fig. 2 the GPU cluster of the present invention is based on the improvement to primary YARN frameworks, inventionGPU cluster management method, make GPU resource cluster layer it is visible, can manage, schedulable.
The YARN systems of the present invention support CPU (X86, ARM etc.), internal memory and GPU (this valve withExemplified by NVIDIA GPU and corresponding CUDA programming frameworks, but the invention is not restricted to NVIDIA'sGPU)。
CPU and internal memory are respectively " yarn.nodemanager.resource.memory-mb " and "Yarn.nodemanager.resource.cpu-vcores ", it is consistent with primary YARN, but needIncrease the expression to GPU resource.
Increase the expression of GPU type and quantity in YARN, such as" yarn.nodemanager.resource.gpu-type " represents GPU species," yarn.nodemanager.reosurce.gpu-ngpgpu " represents physically available on this nodeGeneral GPU quantity, default is 0.
For GPU difference CPU features, (below all by taking Nvidia and CUDA as an example), GPU is once onlyCan run a CUDA program, respective design GPU status list, can represent unit list open,The unit complicated GPU environment such as card more simultaneously represents GPU equipment dynamic bindings in CUDA task startsInformation.
Resource finally is represented with multi-component system, such as<X CPU, Y GB, z GPU>Or<X, y, z>, tableShow x CPU, y GB internal memories and z GPU resource.By in general two dimension by CPU and internal memoryTwo tuples of composition expand into various group, to support GPU.
Resource dispatching model
A) double-deck Scheduling Framework
Explorer Resource Manager allocate resources to application program/GPU application programsManager Application/gpuApplication manager;Resource is further distributed to each task by Application/gpuApplication managerTask。
B) the scheduling based on resource reservation
As shown in figure 3, scheduling strategy process is as follows:
1. node manager Node Manager are reported by periodic heartbeat to Resource ManagerNode Node information, information containing GPU, such as GPU card species, quantity etc.;
2.Resource Manager response Node Manager;
3.Resource Manager triggering schedulers Scheduler NODE_UPDATE events;
4.Scheduler distributes according to scheduling strategy (following Section of four content can be described in detail) on NodeContainer Container;
5.Scheduler, which reports the Container being assigned, gives Resource Manger, andIt is added to resource allocation list;
The GPU application manager gpuApplication Master of 6.GPU application programs toResource Manager send heartbeat;
7.Resource Manager receive heartbeat, update resource bid list, and will belong in listGpuApplication Master are given in the Container responses of GPU application programs in 6;
8.gpuApplication Master obtain Container and do the in above-mentioned double-deck schedulingTwo layers of scheduling;
6 ', 7 ', 8 ' and above-mentioned 6,7,8 is similar, is free from the assigning process of GPU resource.
The original open source resources scheduling of YARN is not suitable for GPU scheduling, and original realize allows the Resources listOnce, the present invention realizes GPU resource not above quota distribution to excess, and zero GPU resource continues to CPUWith the distribution of internal memory, such as following table:
| The resource upper limit | Distribute resource requirement | Surplus resources | It can continue to distribute |
| <12,24,3> | <3,12,3> | <9,12,0> | Energy |
| <9,12,0> | <Isosorbide-5-Nitrae, 1> | <8,8, -1> | Can not |
| <8,8, -1> | <4,4,0> | <4,4, -1> | Can not |
(1) scheduling of resource of the present invention can complete CPU, internal memory and GPU scheduling, primaryYARN carries three kinds of schedulers, including FlFO, Capacity scheduler and FairScheduler, but they do not support GPU scheduling.Present invention improves over these three schedulers,When GPU resource is insufficient for the first time, GPU resource above quota distribution is not done;When GPU resource is zeroWhen, and the distribution of CPU, memory source continues, and is not zero to be influenceed by wherein GPU resource.ItCan support GPU scheduling.
Below with Fair Scheduler primary resource equity dispatching (Dominant ResourceFairness, DRF) exemplified by, DRF scheduling process is illustrated, other scheduling strategies are similar to be neededGPU is processed.
Assuming that system total resources are<12CPU, 24GB, 3GPU>, there are two users of A, B,Their each task resource demand is<1CPU, 3GB, 0GPU>With<2CPU, 2GB, 1GPU>, then A primary resource is internal memory, and B primary resource is GPU.DRF assigning process is as followsTable:
Section three, the execution Container processes of step 4 in resource dispatching model are described below.
Primary YARN Container does not support GPU, of the invention in order to support GPU, in order toOvercome two technical problems:(1) YARN resource models do not possess identical under the same node of differentiationThe ability of the distinct device of resource, (2) GPU is once can only to run a task.Present invention employsThe method of dynamic binding is as follows:
One) node manager safeguards the running state information of all devices on node;Implementation canTo use following Hash table, but it is not restricted to the method.
Two) gpu_info=Map<Gpu_type, gpu_device_id, container_id>;ShouldHash table illustrates the corresponding relation between current time GPU equipment and Container.
Three) when there is new Container to start, chosen from gpu_info idle GPU andContainer dynamic bindings, reach GPU resource managing on cluster.
gpuClient
In order to GPU program is run on our YARN clusters, on invention clusterGPU running environment, mainly it is made up of gpuClient and gpuApplication Master.It is non-GPU application still uses primary Client and Application Master.
GpuClient provides the interface of user and YARN interactions, interactive application, application programState, attribute etc., communication protocol are gpuClient and Resource Manager RPC agreements.Its flow is shown in Fig. 4, and step is as follows:
1. initialize and obtain GPU application program gpuApplication ID;
2. gpuApplication Master running environment is set, including environmental variance, order, ginsengNumber, resource etc.;
3. submit gpuApplion Master to Resource Manager.
Cycle obtains gpuApplication Master state, and do in exception abnormality processing orAt the end of quit a program.
On the one hand gpuApplication Master communicate with Resource Manager applies for resource,Especially GPU resource, on the one hand interacted with Node Manager to start and monitor task etc., separatelyOuter secondary distribution for also having resource especially GPU resource etc..Application relevant non-GPU is still using primaryApplication Master.
GpuApplication Master are the programme-control applied for the GPU of computation-intensiveDevice, mainly there are container startup and removal process.The process 5 of corresponding diagram 3.
(1) after gpuApplication Master receive new Container, to NodeManager requests start Container, and send the information needed for startup Container, includingStart the especially GPU resource such as order, environmental variance, parameter, various resources;Node manager is bornDuty performs the order that Container is specified;Meanwhile gpuApplication is to Resource MangerContinue the resource bid of follow-up work, the two processes are constantly carried out until all tasks are assigned.
After Container starts, Node Manager notice gpuApplicatoin Master,GpuApplication starts the state for the Container that detection starts and maintenance, collects information and judgesWhether program is completed.It should be appreciated that the above-mentioned embodiment of the present invention is used only for showingExample property illustrates or explains the principle of the present invention, without being construed as limiting the invention.Therefore, not inclinedAny modification, equivalent substitution and improvements done in the case of from the spirit and scope of the present invention etc.,It should be included within protection scope of the present invention.In addition, appended claims of the present invention are intended toEnter whole changes in scope and border or this scope and the equivalents on borderChange and modification.