技术领域technical field
本申请涉及云计算领域,特别涉及一种集群的资源调整方法、装置及云平台。The present application relates to the field of cloud computing, in particular to a cluster resource adjustment method, device and cloud platform.
背景技术Background technique
云计算(Cloud Computing)中的平台即服务(Platform as a Service,PaaS)技术是一种能够将应用程序的运行和开发环境作为一种服务提供给用户的技术。其中,用于提供应用程序的运行和开发环境的平台称为云平台,该云平台通常包括调度器,以及由大量虚拟机(Virtual Machine,VM)组成的集群,该调度器可以根据用户的需求以及预设的调度规则,将用户提交的应用程序部署在一个或多个虚拟机中,实现对应用程序的调度。The platform as a service (Platform as a Service, PaaS) technology in cloud computing (Cloud Computing) is a technology that can provide the running and development environment of an application program as a service to users. Among them, the platform used to provide the running and development environment of the application is called the cloud platform, and the cloud platform usually includes a scheduler and a cluster composed of a large number of virtual machines (Virtual Machine, VM). As well as the preset scheduling rules, the application program submitted by the user is deployed in one or more virtual machines to realize the scheduling of the application program.
相关技术中,为了提高调度效率,可以在云平台中设置多个调度器,该多个调度器可以共享集群的资源,即每个调度器均能实时获取集群中各个虚拟机的资源信息,并可以根据获取到的资源信息,实现对应用程序的调度。其中,集群的资源是指集群中各个虚拟机所占用的CPU、内存和磁盘等资源。In related technologies, in order to improve scheduling efficiency, multiple schedulers can be set in the cloud platform, and the multiple schedulers can share the resources of the cluster, that is, each scheduler can obtain the resource information of each virtual machine in the cluster in real time, and The scheduling of the application program can be realized according to the acquired resource information. The resources of the cluster refer to resources such as CPU, memory, and disk occupied by each virtual machine in the cluster.
但是,当集群负载较大,剩余资源较少时,若多个调度器在同一时刻均有调度任务需执行,且该多个调度器均将应用程序调度至同一剩余资源较少的虚拟机时,可能会出现调度冲突导致调度失败的情况。However, when the cluster load is heavy and the remaining resources are small, if multiple schedulers have scheduling tasks to execute at the same time, and the multiple schedulers all schedule applications to the same virtual machine with few remaining resources , there may be scheduling conflicts that lead to scheduling failures.
发明内容Contents of the invention
本申请提供了一种集群的资源调整方法、装置及云平台,可以解决相关技术中的调度冲突导致调度失败的问题,技术方案如下:This application provides a cluster resource adjustment method, device and cloud platform, which can solve the problem of scheduling failure caused by scheduling conflicts in related technologies. The technical solution is as follows:
一方面,提供了一种集群的资源调整方法,该方法可以应用于云平台的主节点中,该集群包括多个资源分区,每个资源分区包括至少一个虚拟机VM,且每个资源分区对应一个调度器,该方法可以包括:主节点获取该集群中每个VM的VM信息,根据获取到的该VM信息,调整至少一个资源分区所包括的VM,并可以根据调整结果更新该集群的分区信息,该分区信息用于指示每个资源分区包括的VM,每个调度器用于根据该分区信息在对应的资源分区内执行调度任务。In one aspect, a cluster resource adjustment method is provided, the method can be applied to a master node of a cloud platform, the cluster includes multiple resource partitions, each resource partition includes at least one virtual machine VM, and each resource partition corresponds to A scheduler, the method may include: the master node obtains the VM information of each VM in the cluster, adjusts the VMs included in at least one resource partition according to the obtained VM information, and may update the partition of the cluster according to the adjustment result information, the partition information is used to indicate the VMs included in each resource partition, and each scheduler is used to execute the scheduling task in the corresponding resource partition according to the partition information.
由于各个调度器可以在对应的资源分区内独立执行调度任务,因此可以避免各个调度器调度冲突而导致的调度失败的问题;并且由于可以基于VM信息,对集群中各个资源分区的资源进行动态调整,因此可以实现集群资源的均衡分布,有效均衡各资源分区的资源使用率,进而可以提高集群资源的利用率。Since each scheduler can independently execute the scheduling task in the corresponding resource partition, it can avoid the problem of scheduling failure caused by the scheduling conflict of each scheduler; and because the resources of each resource partition in the cluster can be dynamically adjusted based on VM information , so that the balanced distribution of cluster resources can be realized, the resource utilization rate of each resource partition can be effectively balanced, and the utilization rate of cluster resources can be improved.
可选的,该VM信息包括:资源信息;则主节点根据获取到的该VM信息,调整至少一个资源分区所包括的VM的过程可以包括:Optionally, the VM information includes: resource information; then the process of the master node adjusting the VM included in at least one resource partition according to the obtained VM information may include:
根据该集群中每个VM的资源信息,确定每个VM的剩余资源量,并确定该集群的剩余资源总量;基于每个VM的剩余资源量以及该集群的剩余资源总量,调整至少一个资源分区所包括的VM归属,使得各个资源分区占用的剩余资源量满足预设的资源配比。According to the resource information of each VM in the cluster, determine the amount of remaining resources of each VM, and determine the total amount of remaining resources of the cluster; based on the amount of remaining resources of each VM and the total amount of remaining resources of the cluster, adjust at least one The VM ownership included in the resource partitions makes the remaining resources occupied by each resource partition meet the preset resource ratio.
该预设的资源配比可以是等比,也可以是基于各个调度器的历史调度量所确定的,通过该资源配比调整各个资源分区所包括的资源量,可以保证集群资源的合理分配,提高资源利用率。The preset resource ratio can be an equal ratio, or it can be determined based on the historical scheduling amount of each scheduler. By adjusting the resource ratio of each resource partition through the resource ratio, the reasonable allocation of cluster resources can be ensured. Improve resource utilization.
可选的,该主节点基于每个VM的剩余资源量以及该剩余资源总量,调整至少一个资源分区所包括的VM的过程可以包括:Optionally, the process of the master node adjusting the VMs included in at least one resource partition based on the remaining resource amount of each VM and the total amount of remaining resources may include:
按照该预设的资源配比,将该集群的剩余资源划分为N份资源,每份资源由至少一个VM提供,且每份资源对应一个资源分区,该N为该集群包括的资源分区的个数;According to the preset resource ratio, divide the remaining resources of the cluster into N resources, each resource is provided by at least one VM, and each resource corresponds to a resource partition, where N is the number of resource partitions included in the cluster number;
将用于提供每份资源的至少一个VM划分至对应的资源分区。Divide at least one VM for providing each resource into corresponding resource partitions.
进一步的,该VM信息还可以包括:VM的类型信息;则确定该集群的剩余资源总量的过程可以包括:Further, the VM information may also include: VM type information; then the process of determining the total amount of remaining resources of the cluster may include:
根据每个VM的类型信息,将该集群包括的多个VM划分为至少两组资源组,每组资源组包括的至少一个VM的类型一致;According to the type information of each VM, the multiple VMs included in the cluster are divided into at least two resource groups, and at least one VM included in each resource group is of the same type;
分别确定每组资源组包括的至少一个VM的剩余资源总量;Respectively determine the total amount of remaining resources of at least one VM included in each resource group;
相应的,按照该预设的资源配比,将该集群的剩余资源划分为N份资源的过程可以包括:Correspondingly, according to the preset resource ratio, the process of dividing the remaining resources of the cluster into N shares of resources may include:
按照该预设的资源配比,将每组资源组的剩余资源划分为N份子资源,每份子资源由至少一个VM提供,且每份子资源对应一个资源分区;According to the preset resource ratio, divide the remaining resources of each resource group into N sub-resources, each sub-resource is provided by at least one VM, and each sub-resource corresponds to a resource partition;
将对应于同一个资源分区的至少两份子资源确定为一份资源。Determine at least two sub-resources corresponding to the same resource partition as a resource.
基于该各个VM的类型对集群资源进行调整,可以保证集群中不同类型的资源的均衡分配,进一步提高了集群中资源分配的均衡性。Adjusting cluster resources based on the type of each VM can ensure balanced allocation of different types of resources in the cluster, and further improve the balance of resource allocation in the cluster.
可选的,在调整至少一个资源分区所包括的VM之前,该方法还可以包括:Optionally, before adjusting the VMs included in at least one resource partition, the method may further include:
确定每个VM所部署的物理位置;Determine the physical location where each VM is deployed;
相应的,基于每个VM的剩余资源量以及该剩余资源总量,调整至少一个资源分区所包括的VM的过程可以包括:Correspondingly, based on the remaining resource amount of each VM and the total amount of remaining resources, the process of adjusting the VMs included in at least one resource partition may include:
基于每个VM的剩余资源量、该剩余资源总量以及每个VM所部署的物理位置,调整至少一个资源分区所包括的VM;adjusting the VMs included in at least one resource partition based on the amount of remaining resources of each VM, the total amount of remaining resources, and the physical location where each VM is deployed;
其中,对于任意两个剩余资源量相等,且调整至不同资源分区的第一VM和第二VM,该第一VM与其所属的第一资源分区中各个VM之间的平均物理距离,小于该第二VM与该第一资源分区中各个VM之间的平均物理距离。Wherein, for any two first VMs and second VMs with the same amount of remaining resources and adjusted to different resource partitions, the average physical distance between the first VM and each VM in the first resource partition to which it belongs is less than the first VM The average physical distance between the second VM and each VM in the first resource partition.
本申请提供的方法可以尽量将物理位置较近的VM划分在同一个资源分区,以降低同一资源分区中各个VM之间的通信时延,提高通信的效率。The method provided in this application can try to divide VMs with closer physical locations into the same resource partition, so as to reduce communication delay between VMs in the same resource partition and improve communication efficiency.
可选的,根据该集群中每个VM的资源信息,确定每个VM的剩余资源量,并确定该集群的剩余资源总量得过程可以包括:Optionally, according to the resource information of each VM in the cluster, the process of determining the amount of remaining resources of each VM and determining the total amount of remaining resources of the cluster may include:
根据该集群中每个VM的资源信息,确定每个VM的剩余资源量;Determine the remaining resource amount of each VM according to the resource information of each VM in the cluster;
基于每个VM的剩余资源量,确定至少一个目标VM,每个目标VM的剩余资源量大于预设阈值;Determine at least one target VM based on the remaining resource amount of each VM, where the remaining resource amount of each target VM is greater than a preset threshold;
将该至少一个目标VM的剩余资源量之和确定为该集群的剩余资源总量;determining the sum of the remaining resources of the at least one target VM as the total remaining resources of the cluster;
相应的,基于每个VM的剩余资源量以及该剩余资源总量,调整至少一个资源分区所包括的VM的过程可以包括:Correspondingly, based on the remaining resource amount of each VM and the total amount of remaining resources, the process of adjusting the VMs included in at least one resource partition may include:
基于每个目标VM的剩余资源量以及该剩余资源总量,调整至少一个资源分区所包括的目标VM。Target VMs included in at least one resource partition are adjusted based on the remaining resource amount of each target VM and the total amount of remaining resources.
本申请提供的方法,可以仅对至少一个目标VM所属的资源分区进行调整,而对于剩余资源量小于预设阈值的VM,可以无需调整其所属的分区,由此可以尽量减小资源分区的变化程度,提高资源分区的调整效率。The method provided by this application can only adjust the resource partition to which at least one target VM belongs, and for a VM whose remaining resource amount is less than a preset threshold, there is no need to adjust the partition to which it belongs, thereby minimizing the change of the resource partition To improve the adjustment efficiency of resource partitions.
可选的,该VM信息可以包括:资源信息;在调整至少一个资源分区所包括的VM之前,该方法还可以包括:Optionally, the VM information may include: resource information; before adjusting the VM included in at least one resource partition, the method may further include:
获取该集群的分区信息;根据该集群中每个VM的资源信息,以及该分区信息,检测该集群是否满足分区调整条件;Obtain the partition information of the cluster; detect whether the cluster satisfies the partition adjustment condition according to the resource information of each VM in the cluster and the partition information;
相应的,根据获取到的该VM信息,调整该集群的分区信息的过程可以包括:Correspondingly, according to the obtained VM information, the process of adjusting the partition information of the cluster may include:
当检测到该集群满足该分区调整条件时,根据获取到的该VM信息,调整每个资源分区所包括的VM。When it is detected that the cluster satisfies the partition adjustment condition, the VM included in each resource partition is adjusted according to the obtained VM information.
其中,检测该集群是否满足分区调整条件的过程可以包括:Among them, the process of detecting whether the cluster meets the partition adjustment conditions may include:
根据该集群中每个VM的资源信息,以及该分区信息,确定每个资源分区的资源使用率,该资源使用率为资源分区已使用的资源量与占用的资源总量的比值;According to the resource information of each VM in the cluster and the partition information, determine the resource usage rate of each resource partition, and the resource usage rate is the ratio of the amount of resources used by the resource partition to the total amount of resources occupied;
当检测到资源使用率大于使用率阈值的资源分区的个数大于个数阈值时,确定该集群满足分区调整条件;When it is detected that the number of resource partitions whose resource utilization rate is greater than the utilization rate threshold is greater than the number threshold, it is determined that the cluster meets the partition adjustment condition;
当检测到资源使用率大于使用率阈值的资源分区的个数不大于个数阈值时,确定该集群不满足分区调整条件。When it is detected that the number of resource partitions whose resource utilization rate is greater than the utilization rate threshold is not greater than the number threshold, it is determined that the cluster does not meet the partition adjustment condition.
在资源使用率大于使用率阈值的资源分区的个数大于个数阈值时对集群的资源进行重新调整,可以保证集群资源调整的及时性,有效避免资源利用率较高的资源分区所对应的调度器出现调度失败的问题。When the number of resource partitions whose resource utilization rate is greater than the utilization rate threshold is greater than the number threshold, the resources of the cluster are readjusted, which can ensure the timeliness of cluster resource adjustment and effectively avoid the scheduling corresponding to resource partitions with high resource utilization rates. The server has a scheduling failure problem.
可选的,该资源信息可以包括:处理器资源信息、内存资源信息和存储资源信息中的至少一种信息;该资源使用率大于使用率阈值可以是指:Optionally, the resource information may include: at least one of processor resource information, memory resource information, and storage resource information; the resource usage rate being greater than the usage rate threshold may refer to:
各个信息对应的资源的使用率的平均值大于该使用率阈值;或者,该至少一种信息中,对应的资源的使用率大于该使用率阈值的信息的个数大于数量阈值。The average value of the usage rate of resources corresponding to each information is greater than the usage rate threshold; or, in the at least one kind of information, the number of information whose corresponding resource usage rate is greater than the usage rate threshold is greater than the quantity threshold.
可选的,获取集群中每个VM的VM信息的过程可以包括:Optionally, the process of obtaining VM information of each VM in the cluster may include:
按照预设的调整周期,周期性的获取该集群中每个VM的VM信息;According to the preset adjustment cycle, periodically obtain the VM information of each VM in the cluster;
或者,在检测到云平台中设置的调度器的数量发生变化时,获取该集群中每个VM的VM信息。Alternatively, when it is detected that the number of schedulers set in the cloud platform changes, the VM information of each VM in the cluster is acquired.
本申请提供的方法,主节点可以按照预设的调整周期,周期性的对集群资源进行调整,或者可以在调度器数量变化时,及时对集群的资源分区进行调整,该资源调整方法的灵活性较高。In the method provided by this application, the master node can periodically adjust the cluster resources according to the preset adjustment cycle, or can adjust the resource partitions of the cluster in time when the number of schedulers changes. The flexibility of this resource adjustment method higher.
另一方面,提供了一种集群的资源调整装置,该集群包括多个资源分区,每个资源分区包括至少一个VM,且每个资源分区对应一个调度器,该装置可以包括:至少一个模块,该至少一个模块用于实现上述方面所提供的集群的资源调整方法。In another aspect, a cluster resource adjustment device is provided, the cluster includes multiple resource partitions, each resource partition includes at least one VM, and each resource partition corresponds to a scheduler, the device may include: at least one module, The at least one module is used to implement the cluster resource adjustment method provided in the above aspect.
又一方面,提供了一种云平台,该云平台包括:集群、多个调度器以及如上述方面所提供的集群的资源调整装置。In yet another aspect, a cloud platform is provided, and the cloud platform includes: a cluster, multiple schedulers, and the resource adjustment device for the cluster as provided in the above aspect.
再一方面,提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当该计算机可读存储介质在计算机上运行时,使得计算机执行如上述方面所提供的集群的资源调整方法。In yet another aspect, a computer-readable storage medium is provided, and instructions are stored in the computer-readable storage medium. When the computer-readable storage medium is run on a computer, the computer executes the resource of the cluster as provided in the above aspect. Adjustment method.
再一方面,提供了一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,可以使得计算机执行上述方面所提供的集群的资源调整方法。In yet another aspect, a computer program product containing instructions is provided, and when the computer program product is run on a computer, it can cause the computer to execute the cluster resource adjustment method provided in the above aspect.
综上所述,本申请提供了一种集群的资源调整方法、装置及云平台,对于包括多个资源分区的集群,本申请供的方法可以获取该集群中每个VM的VM信息,根据获取到的VM信息,调整至少一个资源分区所包括的VM,并可以根据调整结果更新集群的分区信息,使得每个调度器可以根据调整后的分区信息在对应的资源分区内执行调度任务。由于本申请提供的方法中,各个调度器可以在对应的资源分区内独立执行调度任务,因此可以有效避免调度冲突而导致的调度失败的问题;并且由于可以对集群的资源进行动态调整,因此可以保证集群资源在各个资源分区的均衡分配,有效均衡了各个资源分区的资源使用率,提高了集群资源的利用率。In summary, this application provides a cluster resource adjustment method, device, and cloud platform. For a cluster that includes multiple resource partitions, the method provided by this application can obtain the VM information of each VM in the cluster. According to the acquired According to the obtained VM information, adjust the VMs included in at least one resource partition, and update the partition information of the cluster according to the adjustment result, so that each scheduler can execute scheduling tasks in the corresponding resource partition according to the adjusted partition information. In the method provided by this application, each scheduler can independently execute the scheduling task in the corresponding resource partition, so the problem of scheduling failure caused by scheduling conflict can be effectively avoided; and because the resources of the cluster can be dynamically adjusted, it can Ensure the balanced allocation of cluster resources in each resource partition, effectively balance the resource utilization of each resource partition, and improve the utilization of cluster resources.
附图说明Description of drawings
图1A是本发明实施例中提供的一种集群的资源调整方法所涉及的云平台的架构图;FIG. 1A is an architecture diagram of a cloud platform involved in a cluster resource adjustment method provided in an embodiment of the present invention;
图1B是本发明实施例中提供的一种集群的资源划分情况的示意图;FIG. 1B is a schematic diagram of resource division of a cluster provided in an embodiment of the present invention;
图1C是本发明实施例中提供的另一种集群的资源调整方法所涉及的云平台的架构图;FIG. 1C is an architecture diagram of a cloud platform involved in another cluster resource adjustment method provided in an embodiment of the present invention;
图2是本发明实施例中提供的一种集群的资源调整方法的流程图;FIG. 2 is a flow chart of a cluster resource adjustment method provided in an embodiment of the present invention;
图3是本发明实施例中提供的一种检测集群是否满足分区调整条件的方法流程图;FIG. 3 is a flow chart of a method for detecting whether a cluster meets partition adjustment conditions provided in an embodiment of the present invention;
图4是本发明实施例中提供的一种调整至少一个资源分区所包括的VM的方法流程图;FIG. 4 is a flowchart of a method for adjusting VMs included in at least one resource partition provided in an embodiment of the present invention;
图5是本发明实施例中提供的另一种集群的资源划分情况的示意图;FIG. 5 is a schematic diagram of resource division of another cluster provided in an embodiment of the present invention;
图6是本发明实施例中提供的又一种集群的资源划分情况的示意图;FIG. 6 is a schematic diagram of resource division of another cluster provided in an embodiment of the present invention;
图7是本发明实施例中提供的另一种集群的资源调整方法的流程图;FIG. 7 is a flow chart of another cluster resource adjustment method provided in an embodiment of the present invention;
图8是本发明实施例中提供的又一种集群的资源调整方法的流程图;FIG. 8 is a flow chart of another cluster resource adjustment method provided in an embodiment of the present invention;
图9是本发明实施例中提供的再一种集群的资源调整方法的流程图;FIG. 9 is a flow chart of yet another cluster resource adjustment method provided in an embodiment of the present invention;
图10是本发明实施例提供的一种集群的资源调整装置的结构示意图;FIG. 10 is a schematic structural diagram of a cluster resource adjustment device provided by an embodiment of the present invention;
图11是本发明实施例提供的一种调整模块的结构示意图;Fig. 11 is a schematic structural diagram of an adjustment module provided by an embodiment of the present invention;
图12是本发明实施例提供的另一种集群的资源调整装置的结构示意图;FIG. 12 is a schematic structural diagram of another cluster resource adjustment device provided by an embodiment of the present invention;
图13是本发明实施例提供的再一种集群的资源调整装置的结构示意图。Fig. 13 is a schematic structural diagram of another cluster resource adjustment device provided by an embodiment of the present invention.
具体实施方式Detailed ways
相关技术中,为了提高调度效率,还可以根据计算框架的不同,将集群划分为多个资源分区,每个资源分区包括用于支持一种计算框架的多个VM。并且,可以为每个资源分区对应设置一个调度器,每个调度器可以在其对应的资源分区内执行任务调度,也即是,每个调度器在接收到用户提交的应用程序后,可以在其对应的资源分区所包括的多个VM中,选择合适的VM部署该应用程序,从而让应用程序的安装包或者镜像文件启动和运行在该虚拟机上。该多个调度器并行工作,能够有效提高调度的效率。In related technologies, in order to improve scheduling efficiency, the cluster can also be divided into multiple resource partitions according to different computing frameworks, and each resource partition includes multiple VMs for supporting one computing framework. In addition, a scheduler can be set for each resource partition, and each scheduler can perform task scheduling in its corresponding resource partition, that is, after receiving the application program submitted by the user, each scheduler can Among the multiple VMs included in the corresponding resource partition, an appropriate VM is selected to deploy the application, so that the installation package or image file of the application starts and runs on the virtual machine. The multiple schedulers work in parallel, which can effectively improve scheduling efficiency.
但是,随着云平台运行时间的增长,可能会出现集群中某些资源分区的资源紧张,某些资源分区的资源空闲,造成集群中各个资源分区资源利用率不均衡的问题。However, as the running time of the cloud platform increases, resources in some resource partitions in the cluster may be tight and resources in some resource partitions may be idle, resulting in an unbalanced resource utilization rate among resource partitions in the cluster.
请参见图1A,其示出了本发明实施例中提供的集群的资源调整方法所涉及的云平台的架构图。该集群的资源调整方法可以应用于云平台中集群管理系统的主节点(也称Master节点)00中。参考图1A,该云平台还包括由多个VM组成的集群、多个调度器以及数据库10,例如图1A示出了S0、S1和S2共三个调度器。该集群所包括的多个VM可以划分为多个资源分区,每个资源分区包括至少一个VM。该多个调度器中的每个调度器可以对应于一个资源分区,每个调度器在接收到用户提交的应用程序后,可以在其对应的资源分区所包括的至少一个VM中选择合适的VM部署该应用程序,由此可以避免多个调度器并行调度时,可能出现的调度冲突的问题。例如,参考图1B,该集群可以包括S00、S10和S20共三个资源分区,每个资源分区包括多个VM。其中资源分区S00与调度器S0对应,资源分区S10与调度器S1对应,资源分区S20与调度器S2对应,当调度器S0接收到用户提交的应用程序后,可以在其对应的资源分区S00所包括的至少一个VM中选择合适的VM部署该应用程序。该数据库10可以用于存储集群中各个资源分区的分区信息,该分区信息用于指示每个资源分区所包括的VM;该数据库10还可以存储各个VM的VM信息(例如VM的类型信息和位置信息等),以供该管理模块01和策略模块03读取。Please refer to FIG. 1A , which shows an architecture diagram of a cloud platform involved in the cluster resource adjustment method provided in the embodiment of the present invention. The cluster resource adjustment method can be applied to the master node (also called Master node) 00 of the cluster management system in the cloud platform. Referring to FIG. 1A, the cloud platform further includes a cluster composed of multiple VMs, multiple schedulers and a database 10. For example, FIG. 1A shows three schedulers, S0, S1 and S2. The multiple VMs included in the cluster may be divided into multiple resource partitions, and each resource partition includes at least one VM. Each of the multiple schedulers may correspond to a resource partition, and each scheduler may select an appropriate VM from at least one VM included in its corresponding resource partition after receiving the application program submitted by the user By deploying the application, the problem of scheduling conflicts that may occur when multiple schedulers schedule in parallel can be avoided. For example, referring to FIG. 1B , the cluster may include three resource partitions, S00, S10, and S20, and each resource partition includes multiple VMs. The resource partition S00 corresponds to the scheduler S0, the resource partition S10 corresponds to the scheduler S1, and the resource partition S20 corresponds to the scheduler S2. When the scheduler S0 receives the application submitted by the user, it can An appropriate VM is selected from the at least one included VM to deploy the application. The database 10 can be used to store the partition information of each resource partition in the cluster, and the partition information is used to indicate the VMs included in each resource partition; the database 10 can also store the VM information (such as type information and location of the VM) of each VM information, etc.), for the management module 01 and the policy module 03 to read.
参考图1A,该主节点00与每个调度器以及每个VM均建立有通信连接,该主节点00能够接收每个VM发送的VM信息,并可以基于接收到的VM信息,对至少一个资源分区所包括的VM进行调整,使得每个调度器可以依据调整后的结果,实现应用程序的调度,由此可以实现集群资源的动态调整,提高资源的利用率。Referring to FIG. 1A, the master node 00 establishes a communication connection with each scheduler and each VM, the master node 00 can receive VM information sent by each VM, and based on the received VM information, at least one resource The VMs included in the partition are adjusted so that each scheduler can implement application scheduling according to the adjusted result, thereby realizing dynamic adjustment of cluster resources and improving resource utilization.
如图1A所示,该主节点00可以包括管理模块01、收集模块02、策略模块03以及与该多个调度器对应的多个缓存,每个缓存用于存储一个调度器所对应的资源分区的分区信息,例如缓存0可以存储调度器S0的分区信息。收集模块02可以用于获取集群中每个VM的VM信息(例如VM的标识和资源信息等),并将获取到的VM信息发送至策略模块03;策略模块03可以根据各个VM的VM信息,对至少一个资源分区所包括的VM进行调整,并根据调整结果更新数据库中存储的分区信息,并将更新后的分区信息发送至管理模块01;该管理模块01可以基于该分区信息,更新每个缓存中所存储的分区信息。该每个缓存中存储的分区信息除了可以包括资源分区所包括的VM的标识,还可以包括该资源分区中每个VM的资源信息,每个调度器可以基于其对应的缓存中存储的分区信息进行应用程序的调度。As shown in Figure 1A, the master node 00 may include a management module 01, a collection module 02, a policy module 03, and multiple caches corresponding to the multiple schedulers, each cache is used to store a resource partition corresponding to a scheduler The partition information of , for example, cache 0 can store the partition information of scheduler S0. The collection module 02 can be used to obtain the VM information (such as the identification and resource information of the VM) of each VM in the cluster, and send the acquired VM information to the policy module 03; the policy module 03 can, according to the VM information of each VM, Adjust the VM included in at least one resource partition, update the partition information stored in the database according to the adjustment result, and send the updated partition information to the management module 01; the management module 01 can update each Partition information stored in the cache. The partition information stored in each cache may not only include the identifier of the VM included in the resource partition, but also include the resource information of each VM in the resource partition, and each scheduler may base on the partition information stored in its corresponding cache Perform application scheduling.
需要说明的是,在本发明实施例中,云平台中集群所包括的多个VM可以划分为两组,其中一组所包括的VM均为管理面VM,另一组所包括的VM均为数据面VM。该管理面VM用于部署集群管理系统中的各个组件,例如主节点00、各个调度器以及数据库10等;该数据面VM用于部署用户提交的应用程序,因此本发明实施例提供的方法所调整的集群的资源是指该数据面VM所占用的资源。It should be noted that, in the embodiment of the present invention, the multiple VMs included in the cluster in the cloud platform can be divided into two groups, wherein the VMs included in one group are all management plane VMs, and the VMs included in the other group are all Data plane VMs. The management plane VM is used to deploy various components in the cluster management system, such as the master node 00, each scheduler, and database 10; the data plane VM is used to deploy applications submitted by users, so the method provided by the embodiment of the present invention The adjusted cluster resources refer to the resources occupied by the data plane VM.
还需要说明的是,参考图1C,在本发明实施例中,该云平台可以支持多个不同的计算框架,例如,图1C示出了计算框架0、计算框架1和计算框架2共三个计算框架。云平台中的每个调度器可以隶属于一个计算框架,并可以对其所属的计算框架内的应用程序(即采用该计算框架开发的应用程序)进行调度。例如,调度器S0对应于计算框架0,该调度器S0可以对该计算框架0内的应用程序进行调度。示例的,该云平台中可以设置有Mesos框架(一种开源分布式资源管理框架),该Mesos框架的上层可以对接多个独立开发的计算框架,例如Hadoop、MPI和Kubernetes等,该Mesos框架可以通过一个通用资源共享层,使得该多个计算框架能够共享一个集群中的资源。It should also be noted that, referring to FIG. 1C, in the embodiment of the present invention, the cloud platform can support multiple different computing frameworks. For example, FIG. 1C shows a total of three computing frameworks 0, 1 and 2 Computing framework. Each scheduler in the cloud platform can belong to a computing framework, and can schedule applications in the computing framework to which it belongs (that is, applications developed using the computing framework). For example, the scheduler S0 corresponds to the computing framework 0, and the scheduler S0 can schedule the applications in the computing framework 0. As an example, a Mesos framework (an open source distributed resource management framework) can be set in the cloud platform, and the upper layer of the Mesos framework can be connected to multiple independently developed computing frameworks, such as Hadoop, MPI, and Kubernetes. The Mesos framework can Through a common resource sharing layer, the multiple computing frameworks can share resources in a cluster.
参考图1C还可以看出,每个VM中可以包括多个执行器(Executor),每个VM可以通过该执行器实现任务(即应用程序)的部署。Referring to FIG. 1C , it can also be seen that each VM may include multiple executors (Executors), and each VM may deploy tasks (ie, application programs) through the executors.
图2是本发明实施例提供的一种集群的资源调整方法的流程图,该方法可以应用于图1A或图1C所示的主节点00中。在该图1A或图1C所示的云平台中,集群可以包括多个资源分区,每个资源分区包括至少一个虚拟机VM,且每个资源分区对应一个调度器。参考图2,该集群的资源调整方法可以包括:FIG. 2 is a flow chart of a cluster resource adjustment method provided by an embodiment of the present invention, and the method may be applied to the master node 00 shown in FIG. 1A or FIG. 1C . In the cloud platform shown in FIG. 1A or FIG. 1C , the cluster may include multiple resource partitions, each resource partition includes at least one virtual machine VM, and each resource partition corresponds to a scheduler. Referring to Fig. 2, the resource adjustment method of the cluster may include:
步骤101、获取集群中每个VM的VM信息。Step 101, acquiring VM information of each VM in the cluster.
在本发明实施例中,该主节点00可以按需或者周期性地获取集群中每个VM的VM信息,例如,该主节点00可以通过收集模块02每隔30分钟获取一次集群中每个VM的VM信息,并可以基于该获取到的VM信息,更新数据库10中存储的每个VM的VM信息。每个VM的VM信息至少可以包括VM的标识和VM的资源信息,该VM信息还可以包括VM的状态信息、类型信息、位置信息以及所属分区的信息中的至少一种。In the embodiment of the present invention, the master node 00 can acquire the VM information of each VM in the cluster on demand or periodically, for example, the master node 00 can acquire each VM in the cluster every 30 minutes through the collection module 02 VM information, and based on the acquired VM information, update the VM information of each VM stored in the database 10. The VM information of each VM may at least include a VM identifier and resource information of the VM, and the VM information may also include at least one of status information, type information, location information, and information of the partition to which the VM belongs.
其中,VM的标识可以为能够唯一标识该VM的字符串,且该字符串可以由云平台随机生成;资源信息可以用于指示VM当前已使用的资源量以及剩余资源量,例如该资源信息可以包括VM的资源总量,以及已使用的资源量,该资源可以是指CPU资源、内存资源和存储资源等;该状态信息可以用于指示VM当前的工作状态,该工作状态可以为正常状态或宕机状态;该类型信息可以用于指示VM的异构类型(也可以称为架构类型),其中,不同类型的VM可以是指采用不同体系架构的处理器或者内存的VM;该位置信息可以用于指示该VM所部署的物理位置,例如该位置信息可以包括VM所部署的机架、机房、数据中心(Data center,DC)、可用区(Available Zone,AZ)以及地域(Region)中的至少一种;VM所属分区的信息则可以用于指示VM当前所属的资源分区。Wherein, the identifier of the VM can be a character string that can uniquely identify the VM, and the character string can be randomly generated by the cloud platform; the resource information can be used to indicate the amount of resources currently used by the VM and the amount of remaining resources, for example, the resource information can be Including the total amount of resources of the VM and the amount of used resources. The resources can refer to CPU resources, memory resources, and storage resources. The status information can be used to indicate the current working status of the VM. The working status can be normal or normal. Downtime state; this type information can be used to indicate the heterogeneous type of VM (also can be referred to as architecture type), wherein, different types of VM can refer to the VM that adopts the processor of different architecture or memory; This location information can be Used to indicate the physical location where the VM is deployed. For example, the location information may include the rack, computer room, data center (Data center, DC), availability zone (Available Zone, AZ) and region (Region) where the VM is deployed. At least one; the information about the partition to which the VM belongs may be used to indicate the resource partition to which the VM currently belongs.
步骤102、获取集群的分区信息。Step 102, acquiring partition information of the cluster.
主节点00可以从数据库10中获取该分区信息,例如,该主节点00中的策略模块03可以在接收到收集模块02发送的各个VM的VM信息后,从数据库10中获取分区信息。该分区信息用于指示资源分区所包括的VM,例如该分区信息中可以记录有每个资源分区的标识,以及每个资源分区所包括的VM的标识。The master node 00 can obtain the partition information from the database 10 , for example, the policy module 03 in the master node 00 can obtain the partition information from the database 10 after receiving the VM information of each VM sent by the collection module 02 . The partition information is used to indicate the VMs included in the resource partition. For example, the partition information may record an identifier of each resource partition and an identifier of the VM included in each resource partition.
示例的,假设如图1B所示,该云平台中设置有S0、S1和S3共三个调度器,其中调度器S0对应的资源分区为S00,调度器S1对应的资源分区为S10,调度器S2对应的资源分区为S20。从图1B可以看出,资源分区S20所包括的VM的个数较多,资源分区S00所包括的VM的个数较少。相应的,该主节点00获取到的分区信息可以如表1所示。从表1可以看出,资源分区S00包括10个VM,该10个VM的标识依次为VM1至VM10;资源分区S10包括12个VM,该12个VM的标识依次为VM11至VM22;资源分区S20包括26个VM,该26个VM的标识依次为VM23至VM48。For example, assuming that as shown in Figure 1B, three schedulers, S0, S1 and S3, are set in the cloud platform, wherein the resource partition corresponding to scheduler S0 is S00, the resource partition corresponding to scheduler S1 is S10, and the scheduler The resource partition corresponding to S2 is S20. It can be seen from FIG. 1B that the resource partition S20 includes a large number of VMs, and the resource partition S00 includes a small number of VMs. Correspondingly, the partition information obtained by the master node 00 may be shown in Table 1. As can be seen from Table 1, the resource partition S00 includes 10 VMs, and the identifiers of the 10 VMs are VM1 to VM10 in sequence; the resource partition S10 includes 12 VMs, and the identifiers of the 12 VMs are VM11 to VM22 in sequence; the resource partition S20 It includes 26 VMs, and the identifiers of the 26 VMs are VM23 to VM48 in sequence.
表1Table 1
步骤103、根据集群中每个VM的资源信息,以及该分区信息,检测该集群是否满足分区调整条件。Step 103 , according to the resource information of each VM in the cluster and the partition information, detect whether the cluster satisfies the partition adjustment condition.
当主节点检测到该集群满足该分区调整条件时,可以进行资源分区的调整,即执行步骤104;当检测到集群不满足该分区调整条件时,可以继续执行步骤101,即继续获取集群中每个VM的VM信息。When the master node detects that the cluster meets the partition adjustment condition, it can adjust the resource partition, that is, execute step 104; when it detects that the cluster does not meet the partition adjustment condition, it can continue to execute step 101, that is, continue to obtain each VM information of the VM.
在本发明实施例中,如图3所示,该主节点检测集群是否满足分区调整条件的过程可以包括:In the embodiment of the present invention, as shown in FIG. 3, the process of the master node detecting whether the cluster meets the partition adjustment condition may include:
步骤1031、根据集群中每个VM的资源信息,以及的分区信息,确定每个资源分区的资源使用率。Step 1031 , according to the resource information of each VM in the cluster and the partition information of each VM, determine the resource usage rate of each resource partition.
每个资源分区的资源使用率可以是指该资源分区已使用的资源量与该资源分区所占用的资源总量的比值。假设该集群包括N个资源分区(N为大于1的整数),其中第n个资源分区包括Sn个VM,则该第n个资源分区的使用率rn可以满足:The resource usage rate of each resource partition may refer to a ratio of the amount of resources used by the resource partition to the total amount of resources occupied by the resource partition. Assuming that the cluster includes N resource partitions (N is an integer greater than 1), wherein the nth resource partition includes Sn VMs, then the utilization rate rn of the nth resource partition can satisfy:
其中,Ui为第i个VM当前已使用的资源量,Ti为该第i个VM的资源总量,n为不大于N的正整数,i为不大于Sn的正整数。Wherein, Ui is the amount of resources currently used by the i-th VM, Ti is the total amount of resources of the i-th VM, n is a positive integer not greater than N, and i is a positive integer not greater than Sn .
步骤1032、当检测到资源使用率大于使用率阈值的资源分区的个数大于个数阈值时,确定该集群满足分区调整条件。Step 1032 : When it is detected that the number of resource partitions whose resource utilization rate is greater than the utilization rate threshold is greater than the number threshold, determine that the cluster satisfies the partition adjustment condition.
在本发明实施例中,该使用率阈值和个数阈值可以由云平台的运维人员手动设置;或者该使用率阈值也可以由主节点根据历史数据统计得到,例如,主节点可以对各个虚拟机在不同资源使用率下的性能进行分析,并可以将虚拟机性能下降较快时的资源利用率确定为该使用率阈值;该个数阈值也可以由主节点根据当前资源分区的个数确定,例如,该个数阈值可以为当前资源分区的个数的10%或者30%等。并且,在根据当前资源分区的个数计算该个数阈值时,应当保证计算得到的个数阈值为整数。In the embodiment of the present invention, the usage rate threshold and the number threshold can be manually set by the operation and maintenance personnel of the cloud platform; or the usage rate threshold can also be obtained by the master node based on historical data statistics, for example, the master node can Analyze the performance of the virtual machine under different resource utilization rates, and determine the resource utilization rate when the performance of the virtual machine drops rapidly as the utilization rate threshold; the number threshold can also be determined by the master node according to the number of current resource partitions For example, the number threshold may be 10% or 30% of the number of current resource partitions. Moreover, when calculating the number threshold according to the number of current resource partitions, it should be ensured that the calculated number threshold is an integer.
示例的,假设该使用率阈值为80%,该个数阈值为1,则当主节点00检测到该S00、S10和S30三个资源分区中,任一资源分区的资源使用率大于80%时,即可确定集群满足分区调整条件。或者,若当前集群中资源分区的个数为10,该个数阈值为当前资源分区的个数的30%,即个数阈值为3;相应的,主节点00可以在检测到超过3个资源分区的资源使用率大于80%时,确定该集群满足分区调整条件。For example, assuming that the usage rate threshold is 80% and the number threshold is 1, when the master node 00 detects that the resource usage rate of any resource partition in the three resource partitions S00, S10 and S30 is greater than 80%, It can be determined that the cluster satisfies the partition adjustment condition. Or, if the number of resource partitions in the current cluster is 10, the number threshold is 30% of the current number of resource partitions, that is, the number threshold is 3; correspondingly, the master node 00 can detect more than 3 resource partitions When the resource utilization rate of the partition is greater than 80%, it is determined that the cluster meets the partition adjustment condition.
步骤1033、当检测到资源使用率大于使用率阈值的资源分区的个数不大于个数阈值时,确定该集群不满足分区调整条件。Step 1033 , when it is detected that the number of resource partitions whose resource utilization rate is greater than the utilization rate threshold is not greater than the number threshold, determine that the cluster does not meet the partition adjustment condition.
示例的,当主节点00检测到每个资源分区的资源使用率均不大于80%时,可以确定集群不满足分区调整条件。For example, when the master node 00 detects that the resource usage of each resource partition is not greater than 80%, it may determine that the cluster does not meet the partition adjustment condition.
需要说明的是,由于每个VM的资源可以包括CPU资源、内存资源和存储资源中的至少一种,因此每个VM的资源信息也可以包括:CPU资源信息、内存资源信息和存储资源信息中的至少一种信息。相应的,在上述步骤1031中,主节点在计算资源使用率时,可以分别计算每一种信息对应的资源的使用率。例如可以分别计算每个资源分区的CPU资源使用率、内存资源使用率以及存储资源使用率。It should be noted that, since the resources of each VM may include at least one of CPU resources, memory resources, and storage resources, the resource information of each VM may also include: CPU resource information, memory resource information, and storage resource information at least one type of information. Correspondingly, in the above step 1031, when calculating the resource usage rate, the master node may separately calculate the resource usage rate corresponding to each type of information. For example, the CPU resource usage, memory resource usage, and storage resource usage of each resource partition may be calculated separately.
进一步的,上述步骤1032和步骤1033中所述的资源使用率大于使用率阈值可以是指:各个信息对应的资源的使用率的平均值大于该使用率阈值;或者,该至少一种信息中,对应的资源的使用率大于该使用率阈值的信息的个数大于数量阈值。其中,该数量阈值可以为预设的固定值,也可以由主节点根据资源信息包括的信息的个数确定,例如,该数量阈值可以为资源信息包括的信息的个数的三分之一,或者三分之二,且该数量阈值应当为整数。Further, the above-mentioned step 1032 and step 1033 that the resource usage rate is greater than the usage rate threshold may refer to: the average value of the resource usage rate corresponding to each piece of information is greater than the usage rate threshold; or, in the at least one type of information, The number of information that the usage rate of the corresponding resource is greater than the usage rate threshold is greater than the number threshold. Wherein, the number threshold may be a preset fixed value, or may be determined by the master node according to the number of information included in the resource information, for example, the number threshold may be one-third of the number of information included in the resource information, or two-thirds, and the quantity threshold should be an integer.
此外,每种资源也可以分别对应于一个使用率阈值,且各种资源对应的使用率阈值可以不同;相应的,在上述步骤1032和步骤1033中,可以将每种资源的资源使用率与其对应的使用率阈值进行对比。In addition, each resource may also correspond to a usage rate threshold, and the usage rate thresholds corresponding to various resources may be different; correspondingly, in the above step 1032 and step 1033, the resource usage rate of each resource may be corresponding to compared with the usage threshold.
示例的,假设使用率阈值为80%,且资源使用率大于使用率阈值是指:资源信息包括的至少一种信息中,任一种信息对应的资源的使用率大于该使用率阈值(即该数量阈值为1)。若每个VM的资源信息包括CPU资源信息、内存资源信息和存储资源信息,且主节点计算得到的资源分区S00的CPU资源使用率为85%,内存资源使用率为75%,存储资源使用率为50%,则由于其中CPU资源使用率大于80%,则主节点00可以确定该资源分区S00的资源使用率大于使用率阈值。For example, assuming that the usage rate threshold is 80%, and the resource usage rate is greater than the usage rate threshold value, it means: among at least one kind of information included in the resource information, the resource usage rate corresponding to any kind of information is greater than the usage rate threshold value (that is, the The number threshold is 1). If the resource information of each VM includes CPU resource information, memory resource information, and storage resource information, and the CPU resource usage rate of resource partition S00 calculated by the master node is 85%, the memory resource usage rate is 75%, and the storage resource usage rate is If it is 50%, the master node 00 can determine that the resource usage of the resource partition S00 is greater than the usage threshold because the CPU resource usage is greater than 80%.
或者,假设CPU资源对应的使用率阈值为80%,内存资源对应的使用率阈值为85%,存储资源对应的使用率阈值为90%,且资源使用率大于使用率阈值是指:每种信息对应的资源的使用率均大于该信息对应的使用率阈值(即数量阈值为3)。则当主节点计算得到资源分区S00的CPU资源使用率为85%,内存资源使用率为88%,存储资源使用率为92%,则由于每种信息对应的资源的使用率均大于其对应的使用率阈值,则主节点00可以确定该资源分区S00的资源使用率大于使用率阈值。Or, assume that the usage threshold corresponding to CPU resources is 80%, the usage threshold corresponding to memory resources is 85%, and the usage threshold corresponding to storage resources is 90%, and the resource usage is greater than the usage threshold means: each information The usage rates of the corresponding resources are greater than the usage rate threshold corresponding to the information (that is, the number threshold is 3). Then when the master node calculates that the CPU resource utilization rate of the resource partition S00 is 85%, the memory resource utilization rate is 88%, and the storage resource utilization rate is 92%, since the resource utilization rate corresponding to each information is greater than its corresponding rate threshold, the master node 00 may determine that the resource usage rate of the resource partition S00 is greater than the usage rate threshold.
还需要说明的是,在本发明实施例中,主节点00在检测集群是否满足分区调整条件时,除了可以检测各个资源分区的资源使用率是否大于使用率阈值,还可以通过检测各个资源分区的资源使用率的均衡程度来判断该集群是否满足分区调整条件。例如,主节点可以计算各个资源分区的资源使用率的方差,当该方差大于预设方差阈值时,可以确定各个资源分区的资源使用率不均衡,进而可以确定该集群满足分区调整条件;当方差不大于该预设方差阈值时,可以确定各个资源分区的资源使用率较为均衡,无需对集群的资源分区进行调整,即可以确定该集群不满足分区调整条件。It should also be noted that, in the embodiment of the present invention, when the master node 00 detects whether the cluster meets the partition adjustment condition, in addition to detecting whether the resource usage of each resource partition is greater than the usage threshold, it can also detect the resource usage of each resource partition. The balance of resource usage is used to determine whether the cluster meets the partition adjustment conditions. For example, the master node can calculate the variance of the resource usage of each resource partition. When the variance is greater than the preset variance threshold, it can be determined that the resource usage of each resource partition is unbalanced, and then it can be determined that the cluster meets the partition adjustment condition; when the variance When it is not greater than the preset variance threshold, it can be determined that the resource usage of each resource partition is relatively balanced, and it can be determined that the cluster does not meet the partition adjustment condition without adjusting the resource partitions of the cluster.
在资源使用率大于使用率阈值的资源分区的个数大于个数阈值时对集群的资源进行重新调整,可以保证集群资源调整的及时性,进而可以有效避免资源利用率较高的资源分区所对应的调度器出现调度失败的问题,改善调度器的调度效果。When the number of resource partitions with a resource usage rate greater than the usage rate threshold is greater than the number threshold, readjusting the resources of the cluster can ensure the timeliness of cluster resource adjustments, thereby effectively avoiding resource partitions with high resource utilization rates. The scheduler has the problem of scheduling failure, and the scheduling effect of the scheduler is improved.
步骤104、根据集群中每个VM的资源信息,确定每个VM的剩余资源量,并确定集群的剩余资源总量。Step 104, according to the resource information of each VM in the cluster, determine the amount of remaining resources of each VM, and determine the total amount of remaining resources of the cluster.
主节点确定集群满足分区调整条件后,即可开始对集群的资源重新进行调整,以均衡各个资源分区的资源使用率,进而可以提高集群资源的利用率。在进行资源调整前,该主节点可以先确定集群当前的剩余资源总量。After the master node determines that the cluster meets the partition adjustment conditions, it can start to readjust the resources of the cluster to balance the resource utilization of each resource partition, thereby improving the utilization of cluster resources. Before resource adjustment, the master node can first determine the current total remaining resources of the cluster.
由于每个VM的资源信息可以包括该VM的资源总量,以及已使用的资源量,因此主节点00可以基于该资源总量以及已使用的资源量计算得到每个VM的剩余资源量,进而可以将各个VM的剩余资源量进行累加,以确定该集群的剩余资源总量。Since the resource information of each VM can include the total amount of resources of the VM and the amount of used resources, the master node 00 can calculate the remaining amount of resources of each VM based on the total amount of resources and the amount of used resources, and then The amount of remaining resources of each VM may be accumulated to determine the total amount of remaining resources of the cluster.
或者,每个VM向该主节点00上报的资源信息即可以为该VM的剩余资源量,主节点00可以直接基于各个VM上报的资源信息计算集群的剩余资源总量。Alternatively, the resource information reported by each VM to the master node 00 can be the remaining resource amount of the VM, and the master node 00 can directly calculate the total amount of remaining resources of the cluster based on the resource information reported by each VM.
又或者,每个VM向该主节点00上报的资源信息可以仅为该VM当前已使用的资源量,主节点00可以从数据库10中获取每个VM的资源总量,进而再计算出每个VM的剩余资源量,以及集群的剩余资源总量。Alternatively, the resource information reported by each VM to the master node 00 may only be the amount of resources currently used by the VM, and the master node 00 may obtain the total amount of resources of each VM from the database 10, and then calculate each The amount of remaining resources of the VM and the total amount of remaining resources of the cluster.
需要说明的是,由于每个VM的资源可以包括CPU资源、内存资源和存储资源中的至少一种资源,因此主节点在计算集群的剩余资源总量时,可以分别计算每种资源的剩余资源总量。例如,可以主节点可以分别计算集群中所有VM的CPU资源的剩余资源总量、内存资源的剩余资源总量以及存储资源的剩余资源总量。It should be noted that since the resources of each VM can include at least one of CPU resources, memory resources, and storage resources, the master node can separately calculate the remaining resources of each resource when calculating the total remaining resources of the cluster. total amount. For example, the master node may separately calculate the total remaining resources of CPU resources, the total remaining resources of memory resources, and the total remaining resources of storage resources of all VMs in the cluster.
示例的,若如图1B所示,该集群中包括48个VM,则该主节点可以分别计算该48个VM的CPU资源的剩余资源总量、内存资源的剩余资源总量以及存储资源的剩余资源总量。For example, if as shown in Figure 1B, the cluster includes 48 VMs, then the master node can calculate the total remaining resources of the CPU resources, the total remaining resources of the memory resources, and the remaining resources of the storage resources of the 48 VMs. Total resources.
步骤105、确定每个VM所部署的物理位置。Step 105, determine the physical location where each VM is deployed.
在本发明实施例中,主节点接收到的每个VM的VM信息中可以包括该VM的位置信息,因此主节点可以基于获取到的VM信息确定每个VM所部署的物理位置;或者,主节点00也可以直接从数据库中获取每个VM的位置信息,进而确定每个VM所部署的物理位置。In this embodiment of the present invention, the VM information of each VM received by the master node may include the location information of the VM, so the master node can determine the physical location where each VM is deployed based on the acquired VM information; or, the master Node 00 may also directly obtain the location information of each VM from the database, and then determine the physical location where each VM is deployed.
步骤106、基于每个VM的剩余资源量、集群的剩余资源总量以及每个VM所部署的物理位置,调整至少一个资源分区所包括的VM。Step 106: Adjust the VMs included in at least one resource partition based on the remaining resource amount of each VM, the total remaining resource amount of the cluster, and the physical location where each VM is deployed.
进一步的,主节点可以基于资源均衡分配的原则,调整该多个资源分区中,至少一个资源分区所包括的VM,以使得各个资源分区占用的剩余资源量满足预设的资源配比,以保证集群资源的均衡分配。并且在该调整的过程中,主节点还可以参考每个VM所部署的物理位置进行调整,以使得对于任意两个剩余资源量相等,且调整至不同资源分区的第一VM和第二VM,第一VM与其所属的第一资源分区中各个VM之间的平均物理距离,小于第二VM与该第一资源分区中各个VM之间的平均物理距离。也即是,可以尽量将物理位置较近的VM划分在同一个资源分区,以降低同一资源分区中各个VM之间的通信时延,进而可以降低应用程序或应用组件的通信时延。Further, based on the principle of balanced resource allocation, the master node can adjust the VMs included in at least one resource partition among the multiple resource partitions, so that the remaining resources occupied by each resource partition meet the preset resource ratio, so as to ensure Balanced allocation of cluster resources. And during the adjustment process, the master node can also adjust with reference to the physical location where each VM is deployed, so that for any two remaining resource amounts equal to the first VM and the second VM that are adjusted to different resource partitions, The average physical distance between the first VM and each VM in the first resource partition to which it belongs is smaller than the average physical distance between the second VM and each VM in the first resource partition. That is, VMs with closer physical locations can be divided into the same resource partition as much as possible, so as to reduce the communication delay between VMs in the same resource partition, thereby reducing the communication delay of applications or application components.
其中,该预设的资源配比可以为等比,即该主节点00可以通过调整至少一个资源分区所包括的VM,使得各个资源分区所占用的剩余资源量相等;或者,该预设的资源配比可以是根据各个调度器的历史调度量所确定的,例如,主节点可以每隔预设时间段统计一次各个调度器在该预设时间段内的历史调度量,并可以基于该统计得到的历史调度量确定各个调度器所对应的资源分区的资源配比,该资源配比可以与各个调度器的历史调度量之比正相关,即对于历史调度量较高的调度器所对应的资源分区,其所分配到的资源量在剩余资源总量中所占的比例可以较高,以保证集群资源分配的合理性,提高资源利用率。Wherein, the preset resource ratio may be an equal ratio, that is, the master node 00 may adjust the VMs included in at least one resource partition so that the amount of remaining resources occupied by each resource partition is equal; or, the preset resource The ratio can be determined according to the historical scheduling volume of each scheduler. For example, the master node can count the historical scheduling volume of each scheduler in the preset time period every preset time period, and can get The historical scheduling amount of each scheduler determines the resource ratio of the resource partition corresponding to each scheduler. The resource ratio can be positively correlated with the ratio of the historical scheduling amount of each scheduler, that is, for the resource corresponding to the scheduler with a higher historical scheduling amount Partitions, the proportion of allocated resources in the total remaining resources can be relatively high, so as to ensure the rationality of cluster resource allocation and improve resource utilization.
示例的,假设该云平台中设置有S0、S1和S3共三个调度器,且主节点00每隔一周统计一次各个调度器的历史调度量,若主节点最近一次统计得到的该三个调度器的历史调度量之比为1:2:3,则主节点00可以确定该三个调度器所对应的三个资源分区的资源配比可以为1:2:3。As an example, assume that the cloud platform has three schedulers, S0, S1, and S3, and the master node 00 counts the historical scheduling volume of each scheduler every other week. If the historical scheduling volume ratio of the three schedulers is 1:2:3, the master node 00 can determine that the resource ratio of the three resource partitions corresponding to the three schedulers can be 1:2:3.
在本发明实施例一种可选的实现方式中,主节点可以先根据集群当前的剩余资源总量以及该预设的资源配比,确定每个资源分区所应占用的剩余资源量;进一步的,主节点可以基于每个资源分区当前实际占用的剩余资源量,确定每个资源分区的资源量差值,进而可以基于该资源量差值、每个VM的剩余资源量以及各个VM所部署的物理位置,调整各个资源分区所包括的VM,使得各个资源分区的资源量之比满足该预设的资源配比(也即是,使得每个资源分区的资源量差值均为0)。当然,对于资源量差值为0的资源分区,主节点可以无需调整该资源分区所包括的VM。In an optional implementation of the embodiment of the present invention, the master node can first determine the amount of remaining resources that each resource partition should occupy according to the current total amount of remaining resources of the cluster and the preset resource ratio; further , the master node can determine the resource amount difference of each resource partition based on the remaining resource amount actually occupied by each resource partition, and then based on the resource amount difference, the remaining resource amount of each VM, and the deployment of each VM The physical location is to adjust the VMs included in each resource partition so that the resource amount ratio of each resource partition satisfies the preset resource ratio (that is, make the resource amount difference of each resource partition be 0). Of course, for a resource partition with a resource amount difference of 0, the master node does not need to adjust the VMs included in the resource partition.
在本发明实施例另一种可选的实现方式中,参考图4,该基于每个VM的剩余资源量、集群的剩余资源总量以及每个VM所部署的物理位置,调整至少一个资源分区所包括的VM的方法可以包括:In another optional implementation of the embodiment of the present invention, referring to FIG. 4 , at least one resource partition is adjusted based on the amount of remaining resources of each VM, the total amount of remaining resources of the cluster, and the physical location where each VM is deployed. Included VM methods can include:
步骤1061、按照预设的资源配比,将该集群的剩余资源划分为N份资源。Step 1061: Divide the remaining resources of the cluster into N resources according to the preset resource ratio.
其中,N为集群所包括的资源分区的个数,每份资源对应一个资源分区,即每份资源可以分配至对应的一个资源分区。在本发明实施例中,主节点可以先根据集群当前的剩余资源总量以及该预设的资源配比,确定每份资源的资源量;进一步的,对于任一份资源,主节点可以根据集群中每个VM的剩余资源量,选取剩余资源量之和与该任一份资源的资源量相等(或者两者之差小于预设差值阈值)的至少一组VM,每组VM可以包括至少一个VM。最后,主节点可以将该至少一组VM中,各个VM之间的平均物理距离最短的一组VM确定为用于提供该任一份资源的VM。Wherein, N is the number of resource partitions included in the cluster, and each resource corresponds to a resource partition, that is, each resource can be allocated to a corresponding resource partition. In the embodiment of the present invention, the master node can first determine the resource amount of each resource according to the current total remaining resources of the cluster and the preset resource ratio; further, for any resource, the master node can determine the resource amount according to the cluster For the remaining resource amount of each VM, select at least one group of VMs whose sum of the remaining resource amount is equal to the resource amount of any resource (or the difference between the two is smaller than the preset difference threshold), and each group of VMs can include at least a VM. Finally, the master node may determine a group of VMs with the shortest average physical distance between VMs among the at least one group of VMs as the VMs used to provide the any share of resources.
示例的,该主节点00可以将集群中当前的剩余资源按照1:2:3的比例划分为三份资源,若对应于资源分区S00的第一份资源的资源量为P0,对应于资源分区S10的第二份资源的资源量为P1,对应于资源分区S30的第三份资源的资源量为P2,则该三份资源的资源量之比满足P0:P1:P2=1:2:3。若集群所包括的48个VM中,存在6个第一VM和40个第二VM,其中每个第一VM的剩余资源量均为P0/6,每个第二VM的剩余资源量均为P0/8,则主节点可以选取该6个第一VM用于提供第一份资源,并选取16个第二VM用于提供第二份资源,选取24个第二VM用于提供该第三份资源。当然,也可以选取8个第二VM用于提供第一份资源,选取6个第一VM,以及8个第二VM用于提供第二份资源,选取24个第二VM用于提供第三份资源。For example, the master node 00 can divide the current remaining resources in the cluster into three resources according to the ratio of 1:2:3. If the resource amount of the first resource corresponding to the resource partition S00 is P0, corresponding to the resource partition The resource amount of the second resource in S10 is P1, and the resource amount of the third resource corresponding to the resource partition S30 is P2, then the ratio of the resource amounts of the three resources satisfies P0:P1:P2=1:2:3 . If there are 6 first VMs and 40 second VMs among the 48 VMs included in the cluster, the remaining resources of each first VM are P0/6, and the remaining resources of each second VM are P0/8, the master node can select the 6 first VMs to provide the first resource, select 16 second VMs to provide the second resource, and select 24 second VMs to provide the third resources. Of course, it is also possible to select 8 second VMs to provide the first resource, select 6 first VMs, and 8 second VMs to provide the second resource, and select 24 second VMs to provide the third resource. resources.
此外,在该选取的过程中,主节点可以尽量使物理位置较近的VM提供同一份资源。例如,若该40个第二VM中,16个第二VM部署在同一机房,剩余24个第二VM部署在另一个机房,则主节点可以选取该部署在同一机房的16个第二VM用于提供该第二份资源,并选取该部署在另一个机房的24个第二VM用于提供该第三份资源。In addition, during the selection process, the master node can try to make the VMs with closer physical locations provide the same resource. For example, if among the 40 second VMs, 16 second VMs are deployed in the same computer room, and the remaining 24 second VMs are deployed in another computer room, the master node can select the 16 second VMs deployed in the same computer room as After providing the second resource, select the 24 second VMs deployed in another computer room to provide the third resource.
步骤1062、将用于提供每份资源的至少一个VM划分至对应的资源分区。Step 1062: Divide at least one VM for providing each resource into corresponding resource partitions.
进一步的,主节点00即可根据集群中剩余资源的划分结果,将用于提供每份资源的至少一个VM划分至对应的资源分区,从而调整该多个资源分区中至少一个资源分区所包括的VM。Further, the master node 00 can divide at least one VM used to provide each resource into the corresponding resource partition according to the division result of the remaining resources in the cluster, thereby adjusting the resources included in at least one of the multiple resource partitions. VM.
示例的,主节点00可以将用于提供第一份资源的6个第一VM划分至资源分区S00,将用于提供第二份资源的16个第二VM划分至资源分区S10,并将用于提供该第三份资源的24个第二VM划分至资源分区S20。For example, the master node 00 may divide the 6 first VMs used to provide the first resource into the resource partition S00, and divide the 16 second VMs used to provide the second resource into the resource partition S10. The 24 second VMs that provide the third resource are divided into the resource partition S20.
需要说明的是,在本发明实施例中,由于主节点所获取到的每个VM的VM信息中还可以包括VM的状态信息,则在进行资源调整之前,主节点可以先根据获取到的各个VM的状态信息,检测每个VM是否处于正常状态,并可以仅对该处于正常状态的VM所属的资源分区进行调整,而对于该处于宕机状态的VM,则可以不对其进行调整。也即是,上述步骤103至步骤106中所指的VM可以均为处于正常状态的VM。It should be noted that, in this embodiment of the present invention, since the VM information of each VM acquired by the master node may also include VM status information, before performing resource adjustment, the master node may The status information of the VM detects whether each VM is in a normal state, and can only adjust the resource partition to which the VM in the normal state belongs, but does not need to adjust it for the VM in the down state. That is, the VMs referred to in the above step 103 to step 106 may all be VMs in a normal state.
还需要说明的是,由于在上述步骤104中,主节点可以计算集群资源所包括的至少一种资源中,每种资源的剩余资源总量,因此在上述步骤106中,在调整集群资源时,作为一种可实现方式,主节点可以以该至少一种资源中的指定资源的剩余资源总量为基准进行调整。该指定资源可以是在该至少一种资源中任意选取的一种资源,例如可以为CPU资源。或者,主节点也可以分别计算该至少一种资源中,每种资源在各个资源分区分配的均衡程度,并将均衡程度最低的一种资源确定为该指定资源;例如,主节点可以分别计算每种资源在各个资源分区的剩余资源量的方差,并可以将方差最大的一种资源确定为该指定资源。It should also be noted that since in the above step 104, the master node can calculate the total amount of remaining resources of each resource in at least one resource included in the cluster resource, so in the above step 106, when adjusting the cluster resource, As an implementable manner, the master node may perform adjustment based on the total amount of remaining resources of the specified resource in the at least one resource. The designated resource may be a resource selected arbitrarily from the at least one resource, for example, may be a CPU resource. Alternatively, the master node may also separately calculate the balance degree of each resource allocated in each resource partition among the at least one resource, and determine a resource with the lowest balance degree as the designated resource; for example, the master node may separately calculate each The variance of the remaining resources of each resource in each resource partition can be determined, and the resource with the largest variance can be determined as the designated resource.
作为另一种可实现方式,主节点还可以先计算该至少一种资源的剩余资源总量的平均值,以及每个VM中至少一种资源的剩余资源量的平均值,并基于该剩余资源总量的平均值进行集群资源的调整。As another practicable manner, the master node may first calculate the average value of the total amount of remaining resources of the at least one resource, and the average value of the remaining resource amount of at least one resource in each VM, and based on the remaining resources The average value of the total amount is used to adjust the cluster resources.
步骤107、根据调整结果更新集群的分区信息Step 107. Update the partition information of the cluster according to the adjustment result
进一步的,主节点00即可根据分区调整后的结果更新集群的分区信息,以便各个调度器可以根据更新后的分区信息在对应的资源分区内执行调度任务。如图1A和图1C所示,策略模块03在完成集群资源的重新调整后,可以更新数据库10中存储的分区信息,并可以将该更新后的分区信息发送至管理模块01。该管理模块01可以在接收到该更新后的分区信息后,从数据库10中获取每个VM的VM信息,进而可以根据该更新后的分区信息以及每个VM的VM信息,更新每个缓存中存储的分区信息。该每个缓存中所存储的分区信息可以包括该缓存对应的资源分区所包括的VM的标识,还可以包括该资源分区所包括的每个VM的VM信息,例如可以包括VM的资源信息和状态信息等。各个调度器可以根据缓存中更新后的分区信息,在对应的资源分区内执行调度任务。Further, the master node 00 can update the partition information of the cluster according to the partition adjustment result, so that each scheduler can execute the scheduling task in the corresponding resource partition according to the updated partition information. As shown in FIG. 1A and FIG. 1C , after readjustment of cluster resources, the policy module 03 may update partition information stored in the database 10 and send the updated partition information to the management module 01 . After receiving the updated partition information, the management module 01 can obtain the VM information of each VM from the database 10, and then can update the information stored in each cache according to the updated partition information and the VM information of each VM. Stored partition information. The partition information stored in each cache may include the identifier of the VM included in the resource partition corresponding to the cache, and may also include VM information of each VM included in the resource partition, for example, may include resource information and status of the VM information etc. Each scheduler can execute scheduling tasks in corresponding resource partitions according to the updated partition information in the cache.
示例的,假设如图5所示,该集群资源重新调整后,调度器S0对应的资源分区S00包括16个VM,调度器S10对应的资源分区S10包括17个VM,调度器S20对应的资源分区S20包括15个VM,则每个调度器可以在其对应的资源分区内执行调度任务。For example, assuming that as shown in Figure 5, after the cluster resources are readjusted, the resource partition S00 corresponding to the scheduler S0 includes 16 VMs, the resource partition S10 corresponding to the scheduler S10 includes 17 VMs, and the resource partition corresponding to the scheduler S20 S20 includes 15 VMs, and each scheduler can execute scheduling tasks in its corresponding resource partition.
由于本发明实施例提供的方法,各个调度器可以在对应的资源分区内独立执行调度任务,因此可以避免调度冲突导致的调度失败的问题;又由于主节点可以基于获取到的VM信息,对集群的资源进行动态调整,因此可以保证集群资源的均衡分配,有效提高资源利用率,进而改善调度器的调度效果。Due to the method provided by the embodiment of the present invention, each scheduler can independently execute the scheduling task in the corresponding resource partition, so the problem of scheduling failure caused by scheduling conflict can be avoided; Dynamically adjust the resources of the cluster, so it can ensure the balanced allocation of cluster resources, effectively improve resource utilization, and improve the scheduling effect of the scheduler.
可选的,作为一种可选的实现方式,主节点00获取到的每个VM的VM信息还可以包括:VM的类型信息。则上述步骤104中,主节点确定集群的剩余资源总量的过程可以包括:Optionally, as an optional implementation manner, the VM information of each VM obtained by the master node 00 may further include: VM type information. Then in the above step 104, the process for the master node to determine the total amount of remaining resources of the cluster may include:
步骤1041a、根据每个VM的类型信息,将集群包括的多个VM划分为至少两组资源组。Step 1041a, divide the multiple VMs included in the cluster into at least two resource groups according to the type information of each VM.
其中,每组资源组包括的至少一个VM的类型一致。假设该集群包括K(K为大于1的整数)个类型的VM,则主节点可以将该集群中的多个VM中,相同类型的VM划分为一组资源组,由此可以得到K组资源组。Wherein, at least one VM included in each resource group is of the same type. Assuming that the cluster includes K (K is an integer greater than 1) types of VMs, the master node can divide the same type of VMs among the multiple VMs in the cluster into a group of resource groups, thus obtaining K groups of resources Group.
步骤1042a、分别确定每组资源组包括的至少一个VM的剩余资源总量。Step 1042a, respectively determine the total amount of remaining resources of at least one VM included in each resource group.
进一步的,在确定集群的剩余资源总量时,主节点00可以分别计算该K组资源组中,每组资源组的剩余资源总量。Further, when determining the total amount of remaining resources of the cluster, the master node 00 may separately calculate the total amount of remaining resources of each resource group in the K resource groups.
相应的,在上述步骤1061中,主节点调整资源的过程可以包括:Correspondingly, in the above step 1061, the process for the master node to adjust resources may include:
步骤1061a、按照该预设的资源配比,将每组资源组的剩余资源划分为N份子资源。Step 1061a: Divide the remaining resources of each resource group into N sub-resources according to the preset resource ratio.
其中每份子资源可以由至少一个VM提供,且每份子资源对应一个资源分区。Each sub-resource can be provided by at least one VM, and each sub-resource corresponds to a resource partition.
步骤1061b、将对应于同一个资源分区的至少两份子资源确定为一份资源。Step 1061b: Determine at least two sub-resources corresponding to the same resource partition as a resource.
若该集群中的多个VM划分为了K组资源组,则将每组资源组的剩余资源划分为N份子资源后,每个资源分区可以对应分配到K份子资源,该K份子资源即组成了该资源分区所分配到的一份资源,其中第n个资源分区所分配到的一份资源的资源量Ln可以满足:If multiple VMs in the cluster are divided into K resource groups, after dividing the remaining resources of each resource group into N sub-resources, each resource partition can be allocated to K sub-resources correspondingly, and the K sub-resources are formed A resource allocated to this resource partition, wherein the resource amount Ln of a resource allocated to the nth resource partition can satisfy:
其中,为主节点在第k组资源组中为该第n个资源分区所分配的一份子资源的资源量,k为不大于K的正整数,n为不大于N的正整数。in, The resource amount of a sub-resource allocated by the master node for the nth resource partition in the kth resource group, where k is a positive integer not greater than K, and n is a positive integer not greater than N.
在本发明实施例中,基于该各个VM的类型对集群资源进行调整,可以保证集群中不同异构类型的资源的均衡分配,进一步提高了集群中资源分配的均衡性。In the embodiment of the present invention, cluster resources are adjusted based on the type of each VM, which can ensure balanced allocation of resources of different heterogeneous types in the cluster, and further improve the balance of resource allocation in the cluster.
可选的,作为另一种可选的实现方式,上述步骤104可以包括:Optionally, as another optional implementation manner, the foregoing step 104 may include:
步骤1041b、根据集群中每个VM的资源信息,确定每个VM的剩余资源量。Step 1041b, according to the resource information of each VM in the cluster, determine the remaining resource amount of each VM.
步骤1042b、基于每个VM的剩余资源量,确定至少一个目标VM。Step 1042b: Determine at least one target VM based on the remaining resource amount of each VM.
每个目标VM的剩余资源量大于预设阈值,该预设阈值可以为主节点中预先设定的固定值;或者,也可以为主节点根据每个VM的资源总量确定的,例如该预设阈值可以为VM的资源总量的10%;又或者,该预设阈值还可以由云平台的运维人员人工调整。The amount of remaining resources of each target VM is greater than a preset threshold, which can be a fixed value preset in the master node; or can also be determined by the master node according to the total amount of resources of each VM, for example, the preset The threshold can be set as 10% of the total resources of the VM; or, the preset threshold can also be manually adjusted by the operation and maintenance personnel of the cloud platform.
示例的,假设该预设阈值为0,则主节点00可以将集群中存在剩余资源的VM确定为目标VM。For example, assuming that the preset threshold is 0, the master node 00 may determine a VM with remaining resources in the cluster as the target VM.
步骤1043b、将该至少一个目标VM的剩余资源量之和确定为集群的剩余资源总量。Step 1043b, determine the sum of the remaining resources of the at least one target VM as the total remaining resources of the cluster.
进一步的,主节点可以计算该至少一个目标VM的剩余资源量之和,并将该至少一个目标VM的剩余资源量之和确定为该集群的剩余资源总量。Further, the master node may calculate the sum of the remaining resources of the at least one target VM, and determine the sum of the remaining resources of the at least one target VM as the total amount of remaining resources of the cluster.
相应的,在上述步骤105中,主节点仅需确定每个目标VM的物理位置;在上述步骤106中,主节点调整资源的过程可以包括:Correspondingly, in the above step 105, the master node only needs to determine the physical location of each target VM; in the above step 106, the process of the master node adjusting resources may include:
基于每个目标VM的剩余资源量、该集群的剩余资源总量以及每个目标VM的物理位置,调整至少一个资源分区所包括的目标VM。The target VMs included in at least one resource partition are adjusted based on the remaining resource amount of each target VM, the total amount of remaining resources of the cluster, and the physical location of each target VM.
此外,上述步骤1041b至步骤1043b所示的方法也可以在步骤1041a之前执行。相应的,在步骤1041a中,主节点可以根据每个目标VM的类型信息,将该集群包括的多个目标VM划分为至少两组资源组;在步骤1042a中,主节点则可以确定每组资源组包括的至少一个目标VM的剩余资源总量。In addition, the methods shown in the above step 1041b to step 1043b may also be executed before step 1041a. Correspondingly, in step 1041a, the master node can divide multiple target VMs included in the cluster into at least two resource groups according to the type information of each target VM; in step 1042a, the master node can determine the The total amount of remaining resources of at least one target VM included in the group.
在本发明实施例中,主节点可以仅对该至少一个目标VM所属的资源分区进行调整,而对于剩余资源量小于预设阈值的VM,可以无需调整其所属的分区,由此可以尽量减小资源分区的变化程度,提高资源分区的调整效率。In this embodiment of the present invention, the master node can only adjust the resource partition to which at least one target VM belongs, and for a VM whose remaining resource amount is less than a preset threshold, there is no need to adjust the partition to which it belongs, thereby minimizing The change degree of resource partitions improves the adjustment efficiency of resource partitions.
需要说明的是,在本发明实施例中,主节点除了可以基于各个资源分区的资源使用率触发集群的资源的调整,还可以通过以下方式触发集群的资源的调整:It should be noted that, in the embodiment of the present invention, in addition to triggering the adjustment of the resources of the cluster based on the resource usage of each resource partition, the master node can also trigger the adjustment of the resources of the cluster in the following ways:
一种可选的触发方式:主节点可以基于预设的调整周期,周期性的对该集群的资源进行调整。相应的,在上述步骤101中,主节点可以按照预设的调整周期,周期性的获取该集群中每个VM的VM信息。之后,主节点可以再依次执行步骤102至步骤107所示的方法,以实现对集群资源的调整。An optional trigger method: the master node can periodically adjust the resources of the cluster based on the preset adjustment cycle. Correspondingly, in the above step 101, the master node may periodically obtain the VM information of each VM in the cluster according to a preset adjustment period. Afterwards, the master node may execute the methods shown in step 102 to step 107 in sequence, so as to realize the adjustment of cluster resources.
其中,该调整周期可以为预设的固定值,也可以由云平台的运维人员进行设置,例如该调整周期可以为12个小时,也可以为一周。假设该调整周期为一周,则主节点可以每隔一周,通过上述步骤101至步骤107所示的方法,对集群的资源进行一次调整。该主节点00在图5所示的资源划分情况的基础上,对集群的资源进行一次调整后,集群的资源划分情况可以如图6所示。Wherein, the adjustment period can be a preset fixed value, or can be set by the operation and maintenance personnel of the cloud platform, for example, the adjustment period can be 12 hours, or one week. Assuming that the adjustment period is one week, the master node can adjust the resources of the cluster once every other week through the methods shown in steps 101 to 107 above. Based on the resource division situation shown in FIG. 5 , the master node 00 adjusts the resources of the cluster once, and the resource division situation of the cluster can be shown in FIG. 6 .
另一种可选的触发方式:主节点也可以在检测到云平台中设置的调度器的数量发生变化时,对该集群的资源进行调整。相应的,在上述步骤101之前,主节点可以实时监测云平台中设置的调度器的数量;则在上述步骤101中,主节点可以在检测到云平台中设置的调度器的数量发生变化时,获取该集群中每个VM的VM信息。之后,主节点可以再依次执行步骤102至步骤107所示的方法,以实现对集群资源的调整。Another optional triggering method: the master node can also adjust the resources of the cluster when it detects that the number of schedulers set in the cloud platform changes. Correspondingly, before the above step 101, the master node can monitor the number of schedulers set in the cloud platform in real time; then in the above step 101, when the master node detects that the number of schedulers set in the cloud platform changes, Get VM information for each VM in this cluster. Afterwards, the master node may execute the methods shown in step 102 to step 107 in sequence, so as to realize the adjustment of cluster resources.
需要说明的是,主节点在检测到调度器的数量增加后,还可以为每个新增的调度器创建对应的缓存;相应的,主节点在检测到调度器的数量减少后,还可以删除该减少的调度器所对应的缓存。It should be noted that after the master node detects that the number of schedulers has increased, it can also create a corresponding cache for each newly added scheduler; correspondingly, after the master node detects that the number of schedulers has decreased, it can also delete The cache corresponding to this reduced scheduler.
对于上述两种触发方式,上述实施例中的步骤103也可以删除,即主节点在获取到VM信息和分区信息后,可以直接通过步骤104至步骤107所示的方法对集群资源的调整。For the above two triggering methods, step 103 in the above embodiment can also be deleted, that is, after the master node obtains the VM information and partition information, it can directly adjust the cluster resources through the methods shown in steps 104 to 107.
当然,主节点也可以同时采用上述多种触发方式对集群资源进行调整,也即是,当主节点检测到云平台满足上述任一触发条件时,即可触发对集群资源的调整。此时,主节点还可以在进入每个新的调整周期时,先检测在上一个调整周期内,是否已经通过其他方式(例如资源利用率或者调度群数量改变)触发了对集群资源的调整。若主节点检测到在上一个调整周期内没有执行过由其他方式所触发的资源调整操作,则可以通过上述步骤101至步骤107(其中步骤103所示的操作可以删除)所示的方法对集群的资源进行调整;若主节点检测到在上一个调整周期内已经执行过至少一次由其他方式所触发的资源调整操作,则主节点可以跳过当前的资源调整操作,并等待下一个调整周期。Of course, the master node can also use the above-mentioned multiple trigger methods to adjust the cluster resources at the same time, that is, when the master node detects that the cloud platform meets any of the above trigger conditions, it can trigger the adjustment of the cluster resources. At this time, when entering each new adjustment period, the master node can first detect whether the adjustment of cluster resources has been triggered by other means (such as changes in resource utilization or the number of scheduling groups) in the previous adjustment period. If the master node detects that no resource adjustment operations triggered by other methods have been performed in the previous adjustment cycle, it can perform the cluster adjustment through the methods shown in steps 101 to 107 above (the operation shown in step 103 can be deleted). If the master node detects that at least one resource adjustment operation triggered by other methods has been performed in the last adjustment cycle, the master node can skip the current resource adjustment operation and wait for the next adjustment cycle.
进一步的以图1A和图1C所示的架构为例,介绍本发明实施例提供的集群的资源调整方法,参考图7,当主节点根据集群中各个资源分区的资源使用率判断是否触发资源调整时,该方法可以包括:Further taking the architecture shown in Figure 1A and Figure 1C as an example, the resource adjustment method of the cluster provided by the embodiment of the present invention is introduced. Referring to Figure 7, when the master node judges whether to trigger resource adjustment according to the resource usage rate of each resource partition in the cluster , the method can include:
步骤201、收集模块获取集群中每个VM的VM信息。Step 201, the collection module acquires VM information of each VM in the cluster.
步骤202、收集模块向策略模块发送VM信息。Step 202, the collection module sends the VM information to the policy module.
步骤203、收集模块向数据库发送VM信息。Step 203, the collection module sends the VM information to the database.
该收集模块还可以向数据库发送该获取到的VM信息,以便数据库更新其所存储的每个VM的VM信息。The collection module may also send the obtained VM information to the database, so that the database updates the stored VM information of each VM.
步骤204、策略模块从数据库中获取集群当前的分区信息。Step 204, the policy module obtains the current partition information of the cluster from the database.
步骤205、策略模块检测集群是否满足分区调整条件。Step 205, the policy module detects whether the cluster meets the partition adjustment condition.
策略模块检测到集群满足分区调整条件时,可以执行步骤206;否则可以不执行操作,或者也可以向该管理模块发送用于指示不调整资源分区的指令。When the policy module detects that the cluster meets the partition adjustment condition, step 206 may be performed; otherwise, no operation may be performed, or an instruction for not adjusting the resource partition may be sent to the management module.
步骤206、策略模块根据获取到的该VM信息,调整至少一个资源分区所包括的VM。Step 206, the policy module adjusts the VM included in at least one resource partition according to the obtained VM information.
步骤207、策略模块更新数据库中存储的分区信息。Step 207, the policy module updates the partition information stored in the database.
步骤208、策略模块向管理模块发送调整后的分区信息。Step 208, the policy module sends the adjusted partition information to the management module.
步骤209、管理模块从数据库中获取每个VM的VM信息。Step 209, the management module obtains the VM information of each VM from the database.
步骤210、管理模块更新至少一个缓存中存储的分区信息。Step 210, the management module updates the partition information stored in at least one cache.
其中,上述步骤201至步骤210的实现过程可以参考图2至图4所示实施例中的对应步骤,此处不再赘述。Wherein, for the implementation process of the above step 201 to step 210, reference may be made to the corresponding steps in the embodiment shown in FIG. 2 to FIG. 4 , which will not be repeated here.
参考图8,当主节点按照预设的调整周期触发资源调整时,该方法可以包括:Referring to FIG. 8, when the master node triggers resource adjustment according to a preset adjustment period, the method may include:
步骤301、策略模块中的定时器计时。Step 301, the timer in the policy module counts the time.
在本发明实施例中,该定时器可以为倒计时定时器,其倒计时时长即为该预设的调整周期,当到达该定时器的定时时刻(即倒计时为0)时,可以执行步骤302。In the embodiment of the present invention, the timer may be a countdown timer, and the countdown duration thereof is the preset adjustment period. When the timing time of the timer is reached (that is, the countdown is 0), step 302 may be executed.
步骤302、策略模块向收集模块发送调整指令。Step 302, the policy module sends an adjustment instruction to the collection module.
步骤303、收集模块根据调整指令,获取集群中每个VM的VM信息。Step 303, the collection module obtains the VM information of each VM in the cluster according to the adjustment instruction.
步骤304、收集模块向策略模块发送VM信息。Step 304, the collection module sends the VM information to the policy module.
步骤305、收集模块向数据库发送VM信息。Step 305, the collection module sends the VM information to the database.
数据库可以根据接收到的每个VM的VM信息更新其所存储的每个VM的VM信息。The database may update the stored VM information of each VM according to the received VM information of each VM.
步骤306、策略模块从数据库中获取集群当前的分区信息。Step 306, the policy module obtains the current partition information of the cluster from the database.
步骤307、策略模块根据获取到的该VM信息,调整至少一个资源分区所包括的VM。Step 307, the policy module adjusts the VM included in at least one resource partition according to the obtained VM information.
步骤308、策略模块更新数据库中存储的分区信息。Step 308, the policy module updates the partition information stored in the database.
步骤309、策略模块向管理模块发送调整后的分区信息。Step 309, the policy module sends the adjusted partition information to the management module.
步骤310、管理模块从数据库中获取每个VM的VM信息。Step 310, the management module obtains the VM information of each VM from the database.
步骤311、管理模块更新至少一个缓存中存储的分区信息。Step 311, the management module updates the partition information stored in at least one cache.
其中,上述步骤301至步骤311的实现过程可以参考图2至图4所示实施例中的对应步骤,此处不再赘述。Wherein, for the implementation process of the above step 301 to step 311, reference may be made to the corresponding steps in the embodiments shown in FIG. 2 to FIG. 4 , which will not be repeated here.
参考图9,当主节点根据调度器的数量变化触发资源调整时,该方法可以包括:Referring to FIG. 9, when the master node triggers resource adjustment according to the change in the number of schedulers, the method may include:
步骤401、管理模块检测云平台中调度器的数量是否改变。Step 401, the management module detects whether the number of schedulers in the cloud platform changes.
当检测到调度器的数量改变时,可以执行步骤402;否则可以继续对调度器的数量进行监测,即继续执行步骤401。并且,在调度器的数量增加时,管理模块还可以为每个新增的调度器创建对应的缓存;在调度器的数量减少时,管理模块可以将减少的调度器所对应的缓存删除。When it is detected that the number of schedulers has changed, step 402 may be performed; otherwise, the number of schedulers may continue to be monitored, that is, step 401 may be continued. Moreover, when the number of schedulers increases, the management module can also create a corresponding cache for each newly added scheduler; when the number of schedulers decreases, the management module can delete the caches corresponding to the reduced schedulers.
步骤402、管理模块向策略模块发送调整指令。Step 402, the management module sends an adjustment instruction to the policy module.
步骤403、策略模块向收集模块发送调整指令。Step 403, the policy module sends an adjustment instruction to the collection module.
步骤404、收集模块根据调整指令,获取集群中每个VM的VM信息。Step 404, the collection module obtains the VM information of each VM in the cluster according to the adjustment instruction.
步骤405、收集模块向策略模块发送VM信息。Step 405, the collection module sends the VM information to the policy module.
步骤406、收集模块向数据库发送VM信息。Step 406, the collection module sends the VM information to the database.
数据库可以根据接收到的每个VM的VM信息更新其所存储的每个VM的VM信息。The database may update the stored VM information of each VM according to the received VM information of each VM.
步骤407、策略模块从数据库中获取集群当前的分区信息。Step 407, the policy module obtains the current partition information of the cluster from the database.
步骤408、策略模块根据获取到的该VM信息,调整至少一个资源分区所包括的VM。Step 408, the policy module adjusts the VM included in at least one resource partition according to the obtained VM information.
步骤409、策略模块更新数据库中存储的分区信息。Step 409, the policy module updates the partition information stored in the database.
步骤410、策略模块向管理模块发送调整后的分区信息。Step 410, the policy module sends the adjusted partition information to the management module.
步骤411、管理模块从数据库中获取每个VM的VM信息。Step 411, the management module obtains the VM information of each VM from the database.
步骤412、管理模块更新至少一个缓存中存储的分区信息。Step 412, the management module updates the partition information stored in at least one cache.
其中,上述步骤401至步骤412的实现过程可以参考图2至图4所示实施例中的对应步骤,此处不再赘述。Wherein, for the implementation process of the above step 401 to step 412, reference may be made to the corresponding steps in the embodiments shown in FIG. 2 to FIG. 4 , and details are not repeated here.
需要说明的是,本发明实施例提供的集群的资源调整方法的步骤的先后顺序可以进行适当调整,步骤也可以根据情况进行相应增减。例如,步骤102可以根据情况进行删除,即主节点在进行资源调整时,也可以不考虑当前的分区信息,该主节点可以直接根据各个VM的VM信息,调整至少一个资源分区所包括的VM;或者,步骤103也可以根据情况进行删除,即主节点在获取到VM信息和分区信息后可以直接进行集群资源的调整;又或者,步骤105也可以根据情况进行删除,即在上述步骤106中,主节点可以仅基于每个VM的剩余资源量以及集群的剩余资源总量,调整至少一个资源分区包括的VM。任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化的方法,都应涵盖在本申请的保护范围之内,因此不再赘述。It should be noted that the order of the steps in the cluster resource adjustment method provided by the embodiment of the present invention can be adjusted appropriately, and the steps can also be increased or decreased accordingly according to the situation. For example, step 102 can be deleted according to the situation, that is, the master node may not consider the current partition information when performing resource adjustment, and the master node can directly adjust the VMs included in at least one resource partition according to the VM information of each VM; Or, step 103 can also be deleted according to the situation, that is, the master node can directly adjust the cluster resources after obtaining the VM information and partition information; or, step 105 can also be deleted according to the situation, that is, in the above step 106, The master node may adjust the VMs included in at least one resource partition only based on the remaining resource amount of each VM and the total amount of remaining resource resources of the cluster. Any person skilled in the art within the technical scope disclosed in this application can easily think of changes, which should be covered within the scope of protection of this application, and thus will not be repeated here.
综上所述,本发明实施例提供了一种集群的资源调整方法,对于包括多个资源分区的集群,本发明实施例提供的方法可以获取该集群中每个VM的VM信息,根据获取到的VM信息,调整至少一个资源分区所包括的VM,并可以根据调整结果更新集群的分区信息,使得每个调度器可以根据调整后的分区信息在对应的资源分区内执行调度任务。由于本发明实施例提供的方法中,各个调度器可以在对应的资源分区内独立执行调度任务,因此可以有效避免调度冲突而导致的调度失败的问题;并且由于可以对集群的资源进行动态调整,因此可以保证集群资源在各个资源分区的均衡分配,有效均衡了各个资源分区的资源使用率,提高了集群资源的利用率。To sum up, the embodiment of the present invention provides a cluster resource adjustment method. For a cluster including multiple resource partitions, the method provided by the embodiment of the present invention can obtain the VM information of each VM in the cluster. According to the acquired VM information, adjust the VMs included in at least one resource partition, and update the partition information of the cluster according to the adjustment result, so that each scheduler can execute scheduling tasks in the corresponding resource partition according to the adjusted partition information. In the method provided by the embodiment of the present invention, each scheduler can independently execute the scheduling task in the corresponding resource partition, so the problem of scheduling failure caused by scheduling conflict can be effectively avoided; and because the resources of the cluster can be dynamically adjusted, Therefore, the balanced allocation of cluster resources in each resource partition can be ensured, the resource utilization rate of each resource partition is effectively balanced, and the utilization rate of cluster resources is improved.
图10是本发明实施例提供的一种集群的资源调整装置的结构示意图,该装置可以配置于于图1A或图1C所示云平台中的主节点00中,该集群包括多个资源分区,每个资源分区包括至少一个虚拟机VM,且每个资源分区对应一个调度器。参考图10,该装置可以包括:FIG. 10 is a schematic structural diagram of a cluster resource adjustment device provided by an embodiment of the present invention. The device can be configured in the master node 00 of the cloud platform shown in FIG. 1A or FIG. 1C. The cluster includes multiple resource partitions, Each resource partition includes at least one virtual machine VM, and each resource partition corresponds to a scheduler. Referring to Figure 10, the device may include:
第一获取模块501,用于实现上述图2所示实施例中步骤101的方法。The first obtaining module 501 is configured to implement the method in step 101 in the above embodiment shown in FIG. 2 .
调整模块502,用于根据获取到的该VM信息,调整至少一个资源分区所包括的VM。The adjustment module 502 is configured to adjust the VM included in at least one resource partition according to the obtained VM information.
更新模块503,用于实现上述图2所示实施例中步骤107的方法。The update module 503 is configured to implement the method in step 107 in the above embodiment shown in FIG. 2 .
可选的,该VM信息可以包括:资源信息;图11是本发明实施例提供的一种调整模块502的结构示意图,参考图11,该调整模块502可以包括:Optionally, the VM information may include: resource information; FIG. 11 is a schematic structural diagram of an adjustment module 502 provided in an embodiment of the present invention. Referring to FIG. 11 , the adjustment module 502 may include:
第一确定子模块5021,用于实现上述图2所示实施例中步骤104的方法。The first determining submodule 5021 is configured to implement the method in step 104 in the embodiment shown in FIG. 2 above.
调整子模块5022,用于基于每个VM的剩余资源量以及该剩余资源总量,调整至少一个资源分区所包括的VM,使得各个该资源分区占用的剩余资源量满足预设的资源配比。The adjustment sub-module 5022 is configured to adjust the VMs included in at least one resource partition based on the remaining resource amount of each VM and the total amount of remaining resources, so that the remaining resource amount occupied by each resource partition meets a preset resource ratio.
可选的,该调整子模块5022可以用于实现上述图4所示实施例中步骤1061至步骤1062的方法。Optionally, the adjustment submodule 5022 may be used to implement the method from step 1061 to step 1062 in the embodiment shown in FIG. 4 above.
可选的,该VM信息还可以包括:VM的类型信息;Optionally, the VM information may also include: VM type information;
该第一确定子模块5021,用于:The first determining submodule 5021 is used for:
根据每个VM的类型信息,将该集群包括的多个VM划分为至少两组资源组,每组资源组包括的至少一个VM的类型一致;According to the type information of each VM, the multiple VMs included in the cluster are divided into at least two resource groups, and at least one VM included in each resource group is of the same type;
分别确定每组资源组包括的至少一个VM的剩余资源总量;Respectively determine the total amount of remaining resources of at least one VM included in each resource group;
相应的,该调整子模块5022可以用于:Correspondingly, the adjustment submodule 5022 can be used for:
按照该预设的资源配比,将每组资源组的剩余资源划分为N份子资源,每份子资源由至少一个VM提供,且每份子资源对应一个资源分区;According to the preset resource ratio, divide the remaining resources of each resource group into N sub-resources, each sub-resource is provided by at least one VM, and each sub-resource corresponds to a resource partition;
将对应于同一个资源分区的至少两份子资源确定为一份资源。Determine at least two sub-resources corresponding to the same resource partition as a resource.
可选的,如图11所示,该调整模块502,还可以包括:Optionally, as shown in Figure 11, the adjustment module 502 may also include:
第二确定子模块5023,用于实现上述图2所示实施例中步骤105的方法。The second determining submodule 5023 is configured to implement the method in step 105 in the embodiment shown in FIG. 2 above.
相应的,该调整子模块5022可以用于实现上述图2所示实施例中步骤106的方法。Correspondingly, the adjustment submodule 5022 can be used to implement the method in step 106 in the embodiment shown in FIG. 2 above.
可选的,该第一确定子模块5021可以用于:Optionally, the first determining submodule 5021 may be used to:
根据该集群中每个VM的资源信息,确定每个VM的剩余资源量;Determine the remaining resource amount of each VM according to the resource information of each VM in the cluster;
基于每个VM的剩余资源量,确定至少一个目标VM,每个目标VM的剩余资源量大于预设阈值;Determine at least one target VM based on the remaining resource amount of each VM, where the remaining resource amount of each target VM is greater than a preset threshold;
将该至少一个目标VM的剩余资源量之和确定为该集群的剩余资源总量。The sum of the remaining resources of the at least one target VM is determined as the total amount of remaining resources of the cluster.
相应的,该调整子模块5022可以用于:Correspondingly, the adjustment submodule 5022 can be used for:
基于每个目标VM的剩余资源量以及该剩余资源总量,调整至少一个资源分区所包括的目标VM。Target VMs included in at least one resource partition are adjusted based on the remaining resource amount of each target VM and the total amount of remaining resources.
可选的,该VM信息包括:资源信息;参考图12,该装置还可以包括:Optionally, the VM information includes: resource information; referring to FIG. 12, the device may also include:
第二获取模块504,用于实现上述图2所示实施例中步骤102的方法。The second obtaining module 504 is configured to implement the method in step 102 in the above embodiment shown in FIG. 2 .
检测模块505,用于实现上述图2所示实施例中步骤103的方法。The detection module 505 is configured to implement the method in step 103 in the above embodiment shown in FIG. 2 .
相应的,该调整模块502可以用于:当检测到该集群满足该分区调整条件时,根据获取到的该VM信息,调整每个资源分区所包括的VM。Correspondingly, the adjustment module 502 may be configured to: adjust the VM included in each resource partition according to the obtained VM information when it is detected that the cluster satisfies the partition adjustment condition.
可选的,该检测模块505可以用于实现上述图3所示实施例中步骤1031至步骤1033的方法。Optionally, the detection module 505 may be used to implement the method from step 1031 to step 1033 in the above embodiment shown in FIG. 3 .
可选的,该资源信息包括:处理器资源信息、内存资源信息和存储资源信息中的至少一种信息;该资源使用率大于使用率阈值是指:各个信息对应的资源的使用率的平均值大于该使用率阈值;或者,该至少一种信息中,对应的资源的使用率大于该使用率阈值的信息的个数大于数量阈值。Optionally, the resource information includes: at least one of processor resource information, memory resource information, and storage resource information; the resource usage rate greater than the usage rate threshold refers to: the average value of the resource usage rate corresponding to each information is greater than the usage rate threshold; or, in the at least one type of information, the number of information whose corresponding resource usage rate is greater than the usage rate threshold is greater than the quantity threshold.
可选的,该第一获取模块501可以用于:Optionally, the first obtaining module 501 may be used for:
按照预设的调整周期,周期性的获取该集群中每个VM的VM信息;According to the preset adjustment cycle, periodically obtain the VM information of each VM in the cluster;
或者,在检测到云平台中设置的调度器的数量发生变化时,获取该集群中每个VM的VM信息。Alternatively, when it is detected that the number of schedulers set in the cloud platform changes, the VM information of each VM in the cluster is acquired.
需要说明的是,上述装置实施例中的第一获取模块501的功能可以与图1A或图1C所示主节点00中收集模块02的功能相同,调整模块502、更新模块503、第二获取模块504和检测模块505的功能可以与图1A或图1C所示主节点00中策略模块03的功能相同。It should be noted that the function of the first acquisition module 501 in the above device embodiment may be the same as that of the collection module 02 in the master node 00 shown in FIG. 1A or FIG. The functions of 504 and the detection module 505 may be the same as those of the policy module 03 in the master node 00 shown in FIG. 1A or FIG. 1C .
综上所述,本发明实施例提供了一种集群的资源调整装置,对于包括多个资源分区的集群,本发明实施例提供的装置可以获取该集群中每个VM的VM信息,根据获取到的VM信息,调整至少一个资源分区所包括的VM,并可以根据调整结果更新集群的分区信息,使得每个调度器可以根据调整后的分区信息在对应的资源分区内执行调度任务。由于各个调度器可以在对应的资源分区内独立执行调度任务,因此可以有效避免调度冲突而导致的调度失败的问题;并且由于可以对集群的资源进行动态调整,因此可以保证集群资源在各个资源分区的均衡分配,有效均衡了各个资源分区的资源使用率,进而提高了集群资源的利用率。To sum up, the embodiment of the present invention provides a cluster resource adjustment device. For a cluster including multiple resource partitions, the device provided by the embodiment of the present invention can obtain the VM information of each VM in the cluster, and according to the obtained VM information, adjust the VMs included in at least one resource partition, and update the partition information of the cluster according to the adjustment result, so that each scheduler can execute scheduling tasks in the corresponding resource partition according to the adjusted partition information. Since each scheduler can independently execute scheduling tasks in the corresponding resource partition, it can effectively avoid the problem of scheduling failure caused by scheduling conflicts; and because the resources of the cluster can be dynamically adjusted, it can ensure that the cluster resources are allocated in each resource partition The balanced allocation effectively balances the resource utilization of each resource partition, thereby improving the utilization of cluster resources.
关于上述实施例中的装置,其中各个模块执行操作的实现方式已经在有关该方法的实施例中进行了详细描述,故此处不再阐述说明。With regard to the apparatus in the foregoing embodiments, the manner in which each module executes operations has been described in detail in the embodiments related to the method, so no further description is given here.
请参考图13,其示出了本申请实施例提供的一种集群的资源调整装置600的结构示意图,参见图13,该集群的资源调整装置600可以包括:处理器610、通信接口620和存储器630,通信接口620和存储器630分别与处理器610相连,示例地,如图13所示,通信接口620和存储器630通过总线640与处理器610相连。Please refer to FIG. 13, which shows a schematic structural diagram of a cluster resource adjustment device 600 provided by an embodiment of the present application. Referring to FIG. 13, the cluster resource adjustment device 600 may include: a processor 610, a communication interface 620, and a memory 630 , the communication interface 620 and the memory 630 are respectively connected to the processor 610 , for example, as shown in FIG. 13 , the communication interface 620 and the memory 630 are connected to the processor 610 through a bus 640 .
其中,处理器610可以为中央处理器(CPU),处理器610包括一个或者一个以上处理核心。处理器610通过运行软件程序,从而执行各种功能应用以及数据处理。Wherein, the processor 610 may be a central processing unit (CPU), and the processor 610 includes one or more processing cores. The processor 610 executes various functional applications and data processing by running software programs.
其中,通信接口620可以为多个,该通信接口620用于集群的资源调整装置600与外部设备进行通信,该外部设备例如显示器、第三方设备(例如,存储设备、移动终端等)等。There may be multiple communication interfaces 620, and the communication interfaces 620 are used for the resource adjustment apparatus 600 of the cluster to communicate with external devices, such as displays, third-party devices (eg, storage devices, mobile terminals, etc.) and the like.
其中,存储器630可以包括但不限于:随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM)、快闪存储器、光存储器。该存储器630负责信息存储,例如,该存储器630用于存储软件程序。Wherein, the memory 630 may include but not limited to: random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM), flash memory, optical memory. The memory 630 is responsible for information storage, for example, the memory 630 is used to store software programs.
可选地,该集群的资源调整装置600还可以包括:输入/输出(I/O)接口(图13中未示出)。I/O接口与处理器610、通信接口620以及存储器630连接。I/O接口例如可以为通用串行总线(USB)。Optionally, the cluster resource adjustment apparatus 600 may further include: an input/output (I/O) interface (not shown in FIG. 13 ). The I/O interface is connected to the processor 610 , the communication interface 620 and the memory 630 . The I/O interface may be, for example, a Universal Serial Bus (USB).
在本申请实施例中,处理器610被配置为执行存储器630中存储的指令,处理器630通过执行指令来实现上述方法实施例提供的集群的资源调整方法。In the embodiment of the present application, the processor 610 is configured to execute the instructions stored in the memory 630, and the processor 630 implements the resource adjustment method of the cluster provided in the foregoing method embodiments by executing the instructions.
本发明实施例提供了一种云平台,如图1A和图1C所示,该云平台可以包括:集群、多个调度器以及如图10、图12或图13所示的集群的资源调整装置,该集群的资源调整装置可以部署于主节点00中。An embodiment of the present invention provides a cloud platform, as shown in Figure 1A and Figure 1C, the cloud platform may include: a cluster, multiple schedulers, and a cluster resource adjustment device as shown in Figure 10, Figure 12 or Figure 13 , the resource adjustment device of the cluster can be deployed on the master node 00.
本发明实施例提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当该计算机可读存储介质在计算机上运行时,使得计算机执行上述方法实施例所提供的集群的资源调整方法。An embodiment of the present invention provides a computer-readable storage medium, and instructions are stored in the computer-readable storage medium. When the computer-readable storage medium is run on a computer, the computer executes the clustering provided by the above-mentioned method embodiment. Resource adjustment method.
本发明实施例还提供了一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述方法实施例所提供的集群的资源调整方法。The embodiment of the present invention also provides a computer program product containing instructions, and when the computer program product is run on a computer, the computer is made to execute the cluster resource adjustment method provided by the above method embodiments.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810119092.3ACN108427604B (en) | 2018-02-06 | 2018-02-06 | Cluster resource adjustment method and device and cloud platform |
| PCT/CN2018/100552WO2019153697A1 (en) | 2018-02-06 | 2018-08-15 | Cluster resource adjustment method and device, and cloud platform |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810119092.3ACN108427604B (en) | 2018-02-06 | 2018-02-06 | Cluster resource adjustment method and device and cloud platform |
| Publication Number | Publication Date |
|---|---|
| CN108427604Atrue CN108427604A (en) | 2018-08-21 |
| CN108427604B CN108427604B (en) | 2020-06-26 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201810119092.3AActiveCN108427604B (en) | 2018-02-06 | 2018-02-06 | Cluster resource adjustment method and device and cloud platform |
| Country | Link |
|---|---|
| CN (1) | CN108427604B (en) |
| WO (1) | WO2019153697A1 (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109586970A (en)* | 2018-12-13 | 2019-04-05 | 新华三大数据技术有限公司 | Resource allocation methods, apparatus and system |
| CN109614236A (en)* | 2018-12-07 | 2019-04-12 | 深圳前海微众银行股份有限公司 | Cluster resource dynamic adjustment method, apparatus, device and readable storage medium |
| CN110138883A (en)* | 2019-06-10 | 2019-08-16 | 北京贝斯平云科技有限公司 | Mixed cloud resource allocation methods and device |
| CN110209166A (en)* | 2019-05-22 | 2019-09-06 | 重庆大学 | Cooperative control method, device and the storage medium of multiple movable type service robots |
| CN110704195A (en)* | 2019-10-14 | 2020-01-17 | 腾讯云计算(北京)有限责任公司 | A CPU adjustment method, server and computer-readable storage medium |
| CN110888733A (en)* | 2018-09-11 | 2020-03-17 | 北京奇虎科技有限公司 | Cluster resource use condition processing method and device and electronic equipment |
| CN110912967A (en)* | 2019-10-31 | 2020-03-24 | 北京浪潮数据技术有限公司 | Service node scheduling method, device, equipment and storage medium |
| CN110968416A (en)* | 2018-09-29 | 2020-04-07 | 中兴通讯股份有限公司 | Resource allocation method, device, equipment and computer readable storage medium |
| CN112965828A (en)* | 2021-02-03 | 2021-06-15 | 北京轻松筹信息技术有限公司 | Multithreading data processing method, device, equipment and storage medium |
| CN114201295A (en)* | 2021-12-09 | 2022-03-18 | 兴业银行股份有限公司 | Scheduling method and system suitable for hybrid architecture container cloud |
| CN116599835A (en)* | 2023-05-12 | 2023-08-15 | 中国工商银行股份有限公司 | Method, system and processor for determining node deployment location |
| CN116661979A (en)* | 2023-08-02 | 2023-08-29 | 之江实验室 | Heterogeneous job scheduling system and method |
| WO2025091933A1 (en)* | 2023-10-31 | 2025-05-08 | 华为技术有限公司 | Task scheduling method and apparatus, and computing system |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101504620A (en)* | 2009-03-03 | 2009-08-12 | 华为技术有限公司 | Load balancing method, apparatus and system of virtual cluster system |
| CN106817243A (en)* | 2015-12-01 | 2017-06-09 | 广达电脑股份有限公司 | Management system and management method for server resources |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060069761A1 (en)* | 2004-09-14 | 2006-03-30 | Dell Products L.P. | System and method for load balancing virtual machines in a computer network |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101504620A (en)* | 2009-03-03 | 2009-08-12 | 华为技术有限公司 | Load balancing method, apparatus and system of virtual cluster system |
| CN106817243A (en)* | 2015-12-01 | 2017-06-09 | 广达电脑股份有限公司 | Management system and management method for server resources |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110888733B (en)* | 2018-09-11 | 2023-12-26 | 三六零科技集团有限公司 | Cluster resource use condition processing method and device and electronic equipment |
| CN110888733A (en)* | 2018-09-11 | 2020-03-17 | 北京奇虎科技有限公司 | Cluster resource use condition processing method and device and electronic equipment |
| CN110968416A (en)* | 2018-09-29 | 2020-04-07 | 中兴通讯股份有限公司 | Resource allocation method, device, equipment and computer readable storage medium |
| CN109614236A (en)* | 2018-12-07 | 2019-04-12 | 深圳前海微众银行股份有限公司 | Cluster resource dynamic adjustment method, apparatus, device and readable storage medium |
| CN109586970A (en)* | 2018-12-13 | 2019-04-05 | 新华三大数据技术有限公司 | Resource allocation methods, apparatus and system |
| CN110209166A (en)* | 2019-05-22 | 2019-09-06 | 重庆大学 | Cooperative control method, device and the storage medium of multiple movable type service robots |
| CN110138883A (en)* | 2019-06-10 | 2019-08-16 | 北京贝斯平云科技有限公司 | Mixed cloud resource allocation methods and device |
| CN110704195A (en)* | 2019-10-14 | 2020-01-17 | 腾讯云计算(北京)有限责任公司 | A CPU adjustment method, server and computer-readable storage medium |
| CN110912967A (en)* | 2019-10-31 | 2020-03-24 | 北京浪潮数据技术有限公司 | Service node scheduling method, device, equipment and storage medium |
| CN112965828A (en)* | 2021-02-03 | 2021-06-15 | 北京轻松筹信息技术有限公司 | Multithreading data processing method, device, equipment and storage medium |
| CN112965828B (en)* | 2021-02-03 | 2024-03-19 | 北京轻松怡康信息技术有限公司 | Multithreading data processing method, device, equipment and storage medium |
| CN114201295A (en)* | 2021-12-09 | 2022-03-18 | 兴业银行股份有限公司 | Scheduling method and system suitable for hybrid architecture container cloud |
| CN114201295B (en)* | 2021-12-09 | 2024-10-01 | 兴业银行股份有限公司 | Scheduling method and system suitable for hybrid architecture container cloud |
| CN116599835A (en)* | 2023-05-12 | 2023-08-15 | 中国工商银行股份有限公司 | Method, system and processor for determining node deployment location |
| CN116661979A (en)* | 2023-08-02 | 2023-08-29 | 之江实验室 | Heterogeneous job scheduling system and method |
| CN116661979B (en)* | 2023-08-02 | 2023-11-28 | 之江实验室 | Heterogeneous job scheduling system and method |
| WO2025091933A1 (en)* | 2023-10-31 | 2025-05-08 | 华为技术有限公司 | Task scheduling method and apparatus, and computing system |
| Publication number | Publication date |
|---|---|
| CN108427604B (en) | 2020-06-26 |
| WO2019153697A1 (en) | 2019-08-15 |
| Publication | Publication Date | Title |
|---|---|---|
| CN108427604B (en) | Cluster resource adjustment method and device and cloud platform | |
| CN102185779B (en) | Method and device for realizing data center resource load balance in proportion to comprehensive allocation capability | |
| AU2014309371B2 (en) | Virtual hadoop manager | |
| US10609129B2 (en) | Method and system for multi-tenant resource distribution | |
| US9571561B2 (en) | System and method for dynamically expanding virtual cluster and recording medium on which program for executing the method is recorded | |
| US10686728B2 (en) | Systems and methods for allocating computing resources in distributed computing | |
| US9535740B1 (en) | Implementing dynamic adjustment of resources allocated to SRIOV remote direct memory access adapter (RDMA) virtual functions based on usage patterns | |
| US10129101B2 (en) | Application driven and adaptive unified resource management for data centers with Multi-Resource Schedulable Unit (MRSU) | |
| CN110221920B (en) | Deployment method, device, storage medium and system | |
| CN106133693B (en) | Virtual machine migration method, device and equipment | |
| CN105743962A (en) | End-to-end datacenter performance control | |
| WO2015117565A1 (en) | Methods and systems for dynamically allocating resources and tasks among database work agents in smp environment | |
| CN102232282A (en) | Method and apparatus for realizing load balance of resources in data center | |
| CN110187960A (en) | A distributed resource scheduling method and device | |
| CN106681835A (en) | Resource allocation method and resource manager | |
| US10733022B2 (en) | Method of managing dedicated processing resources, server system and computer program product | |
| US11579942B2 (en) | VGPU scheduling policy-aware migration | |
| CN114116173A (en) | Method, device and system for dynamically adjusting task assignment | |
| CN109739634A (en) | A kind of atomic task execution method and device | |
| WO2022151951A1 (en) | Task scheduling method and management system | |
| CN108429704A (en) | A node resource allocation method and device | |
| CN107203256B (en) | Energy-saving distribution method and device under network function virtualization scene | |
| JP4862056B2 (en) | Virtual machine management mechanism and CPU time allocation control method in virtual machine system | |
| CN107634978B (en) | A resource scheduling method and device | |
| KR101639947B1 (en) | Hadoop preemptive deadline constraint scheduling method, execution program thereof method and recorded medium of the program |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| TR01 | Transfer of patent right | ||
| TR01 | Transfer of patent right | Effective date of registration:20220211 Address after:550025 Huawei cloud data center, jiaoxinggong Road, Qianzhong Avenue, Gui'an New District, Guiyang City, Guizhou Province Patentee after:Huawei Cloud Computing Technologies Co.,Ltd. Address before:518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen Patentee before:HUAWEI TECHNOLOGIES Co.,Ltd. | |
| TR01 | Transfer of patent right | ||
| TR01 | Transfer of patent right | Effective date of registration:20221205 Address after:518129 Huawei Headquarters Office Building 101, Wankecheng Community, Bantian Street, Longgang District, Shenzhen, Guangdong Patentee after:Shenzhen Huawei Cloud Computing Technology Co.,Ltd. Address before:550025 Huawei cloud data center, jiaoxinggong Road, Qianzhong Avenue, Gui'an New District, Guiyang City, Guizhou Province Patentee before:Huawei Cloud Computing Technologies Co.,Ltd. |