CN107453929A

Movatterモバイル変換

Info

Publication number: CN107453929A
Application number: CN201710867262.1A
Authority: CN
Inventors: 张勋; 张呈宇; 魏进武
Original assignee: China United Network Communications Group Co Ltd
Current assignee: China United Network Communications Group Co Ltd
Priority date: 2017-09-22
Filing date: 2017-09-22
Publication date: 2017-12-08
Anticipated expiration: 2037-09-22
Also published as: CN107453929B

Abstract

Translated fromChinese

本发明提供一种集群系统自构建方法、装置及集群系统，当业务节点确定第一管理节点出现故障时，通过业务节点间启动相互通信，每一业务节点从多个业务节点中确定若干个候选节点，并从若干个候选节点中确定出第二管理节点，第二管理节点替代第一管理节点，若干个候选节点中除第二管理节点之外的其他候选节点执行第二管理节点原有的子任务，从而在第一管理节点出现故障后较快的选举出新的管理节点，不需要暂停集群系统的运行，同时也不需要另设备用管理节点，提高集群系统的可靠性和系统资源的高效利用，使得集群系统可以安全和稳定的运行。

The present invention provides a cluster system self-construction method, device and cluster system. When a service node determines that the first management node fails, mutual communication is started between service nodes, and each service node determines several candidates from multiple service nodes. Node, and determine the second management node from several candidate nodes, the second management node replaces the first management node, and other candidate nodes except the second management node among the several candidate nodes execute the original management node of the second management node Subtasks, so that after the first management node fails, a new management node can be elected quickly, without suspending the operation of the cluster system, and at the same time, it is not necessary to use a management node for another device, which improves the reliability of the cluster system and the utilization of system resources. Efficient utilization enables the cluster system to operate safely and stably.

Description

Translated fromChinese

集群系统自构建方法、装置及集群系统Cluster system self-construction method, device and cluster system

技术领域technical field

本发明涉及通信技术领域，尤其涉及一种集群系统自构建方法、装置及集群系统。The present invention relates to the field of communication technologies, in particular to a self-constructing method and device of a cluster system and the cluster system.

背景技术Background technique

随着信息技术的不断发展，集群技术正在越来越多的领域中被广泛应用，常见的如服务器集群、数据库集群等。在应用集群的技术中，集群通常会按照一定的策略设置管理节点和业务节点，分别按各自承担的任务工作，而且管理节点还要负责监测各业务节点的状态和工作情况，如果业务节点发生故障，管理节点要确保将该发生故障的业务节点所承担的任务切换到其它正常的业务节点，从而增强集群的可靠性。With the continuous development of information technology, cluster technology is being widely used in more and more fields, such as server clusters and database clusters. In the application cluster technology, the cluster usually sets management nodes and business nodes according to certain policies, and works according to their respective tasks, and the management node is also responsible for monitoring the status and working conditions of each business node. If the business node fails , the management node should ensure that the tasks undertaken by the failed service node are switched to other normal service nodes, so as to enhance the reliability of the cluster.

而当管理节点发生故障，则会导致集群的任务调度出现问题。现有技术中通常在管理节点发生故障后，暂停集群的运行，由人工查明原因排出故障；或者，另设一台备用管理节点，在管理节点发生故障后，由该备用管理节点执行管理节点所承担的任务。When the management node fails, it will cause problems in the task scheduling of the cluster. In the prior art, after the failure of the management node, the operation of the cluster is usually suspended, and the cause is manually found out to eliminate the failure; or, another backup management node is set up, and after the failure of the management node, the backup management node executes the management node. the tasks undertaken.

现有技术中，当管理节点发生故障，通过暂停集群的运行，由人工查明原因排出故障，降低了集群系统的工作效率；而另设备用管理节点，在管理节点正常工作时，通常备用管理节点处于闲置状态，浪费了系统资源。In the prior art, when the management node fails, by suspending the operation of the cluster, manually find out the cause and remove the fault, which reduces the working efficiency of the cluster system; while the other equipment uses the management node, and when the management node is working normally, the backup management Nodes are idle, wasting system resources.

发明内容Contents of the invention

本发明提供一种集群系统自构建方法、装置及集群系统，以在管理节点发生故障时，能够快速的从业务节点中推选出新的管理节点承担原管理节点的工作，提高集群系统的可靠性和系统资源的高效利用。The present invention provides a cluster system self-construction method, device and cluster system, so that when the management node fails, a new management node can be quickly selected from the service nodes to undertake the work of the original management node, and the reliability of the cluster system can be improved and efficient use of system resources.

本发明的一个方面提供一种集群系统自构建方法，所述集群系统包括第一管理节点和多个业务节点，所述第一管理节点用于将任务划分为多个子任务并分发给所述业务节点；所述方法包括：One aspect of the present invention provides a cluster system self-construction method, the cluster system includes a first management node and a plurality of business nodes, the first management node is used to divide tasks into multiple subtasks and distribute them to the business node; the method comprising:

当所述业务节点确定所述第一管理节点出现故障时，所述业务节点间启动相互通信；When the service node determines that the first management node fails, start mutual communication between the service nodes;

所述多个业务节点中的每一业务节点从所述业多个务节点中确定若干个候选节点，并从所述若干个候选节点中确定出第二管理节点，所述第二管理节点用于替代所述第一管理节点，所述若干个候选节点中除所述第二管理节点之外的其他候选节点用于执行所述第二管理节点原有的子任务。Each service node in the plurality of service nodes determines several candidate nodes from the plurality of service nodes, and determines a second management node from the plurality of candidate nodes, and the second management node uses Instead of the first management node, other candidate nodes among the plurality of candidate nodes except the second management node are used to execute the original subtasks of the second management node.

本发明的另一个方面提供一种集群系统自构建装置，所述集群系统包括第一管理节点和多个业务节点，所述第一管理节点用于将任务划分为多个子任务并分发给所述业务节点；所述装置部署于所述第一管理节点及所述业务节点上，所述装置包括：Another aspect of the present invention provides a cluster system self-construction device, the cluster system includes a first management node and a plurality of service nodes, the first management node is used to divide tasks into multiple subtasks and distribute them to the A service node; the device is deployed on the first management node and the service node, and the device includes:

通信模块，用于所述业务节点与所述第一管理节通信，并当所述业务节点确定所述第一管理节点出现故障时，所述业务节点间启动相互通信；A communication module, used for the service node to communicate with the first management node, and when the service node determines that the first management node fails, start mutual communication between the service nodes;

候选节点获取模块，用于所述多个业务节点中的每一业务节点从所述多个业务节点中确定若干个候选节点；A candidate node acquiring module, configured for each of the plurality of service nodes to determine several candidate nodes from the plurality of service nodes;

选举模块，用于从所述候选节点中确定第二管理节点；an election module, configured to determine a second management node from the candidate nodes;

配置模块，用于配置由所述第二管理节点替代所述第一管理节点，由所述若干个候选节点中除所述第二管理节点之外的其他候选节点执行所述第二管理节点原有的子任务。A configuration module, configured to configure the first management node to be replaced by the second management node, and other candidate nodes except the second management node among the plurality of candidate nodes execute the original management node of the second management node. There are subtasks.

本发明的另一个方面提供一种集群系统，包括第一管理节点和多个业务节点，所述第一管理节点和所述业务节点均包括存储器和处理器；Another aspect of the present invention provides a cluster system, including a first management node and multiple service nodes, where both the first management node and the service nodes include a memory and a processor;

所述第一管理节点的处理器被配置为，将任务划分为多个子任务并分发给所述业务节点；The processor of the first management node is configured to divide the task into multiple subtasks and distribute them to the service nodes;

所述业务节点的处理器被配置为，执行所述子任务，并且当所述业务节点确定所述第一管理节点出现故障时，所述业务节点间启动相互通信，所述多个业务节点中的每一业务节点从所述多个业务节点中确定若干个候选节点，并从所述若干个候选节点中确定出第二管理节点，所述第二管理节点用于替代所述第一管理节点，所述若干个候选节点中除所述第二管理节点之外的其他候选节点用于执行所述第二管理节点原有的子任务。The processor of the service node is configured to execute the subtask, and when the service node determines that the first management node fails, the service nodes start mutual communication, and among the multiple service nodes Each service node in the network determines several candidate nodes from the plurality of service nodes, and determines a second management node from the plurality of candidate nodes, and the second management node is used to replace the first management node , other candidate nodes in the plurality of candidate nodes except the second management node are used to execute the original subtask of the second management node.

本发明提供的集群系统自构建方法、装置及集群系统，当业务节点确定第一管理节点出现故障时，通过业务节点间启动相互通信，每一业务节点从多个业务节点中确定若干个候选节点，并从若干个候选节点中确定出第二管理节点，第二管理节点替代第一管理节点，若干个候选节点中除第二管理节点之外的其他候选节点执行第二管理节点原有的子任务，从而在第一管理节点出现故障后较快的选举出新的管理节点，不需要暂停集群系统的运行，同时也不需要另设备用管理节点，提高集群系统的可靠性和系统资源的高效利用，使得集群系统可以安全和稳定的运行。The cluster system self-construction method, device and cluster system provided by the present invention, when the service node determines that the first management node fails, mutual communication is started between the service nodes, and each service node determines several candidate nodes from multiple service nodes , and determine the second management node from several candidate nodes, the second management node replaces the first management node, and the other candidate nodes except the second management node among the several candidate nodes execute the original child management node of the second management node tasks, so that after the first management node fails, a new management node can be elected quickly without suspending the operation of the cluster system, and at the same time, there is no need for another device to use a management node, which improves the reliability of the cluster system and the efficiency of system resources Utilization enables the cluster system to run safely and stably.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. For those skilled in the art, other drawings can also be obtained according to these drawings without any creative effort.

图1为本发明实施例一提供的集群系统自构建方法流程图；FIG. 1 is a flow chart of a cluster system self-construction method provided by Embodiment 1 of the present invention;

图2为本发明实施例二提供的集群系统自构建方法流程图；FIG. 2 is a flow chart of the cluster system self-construction method provided by Embodiment 2 of the present invention;

图3为本发明实施例三提供的集群系统自构建装置的结构图；FIG. 3 is a structural diagram of a cluster system self-construction device provided by Embodiment 3 of the present invention;

图4为本发明实施例四提供的集群系统的结构图；FIG. 4 is a structural diagram of a cluster system provided by Embodiment 4 of the present invention;

图5为本发明实施例四提供的集群系统中管理节点和业务节点的硬件架构图。FIG. 5 is a hardware architecture diagram of a management node and a service node in a cluster system provided by Embodiment 4 of the present invention.

具体实施方式detailed description

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

实施例一Embodiment one

图1为本发明实施例一提供的集群系统自构建方法流程图。如图1所示，本实施例提供了一种集群系统自构建方法，所述集群系统包括第一管理节点和多个业务节点，所述第一管理节点用于将任务划分为多个子任务并分发给所述业务节点。FIG. 1 is a flow chart of a method for self-constructing a cluster system provided by Embodiment 1 of the present invention. As shown in Figure 1, this embodiment provides a method for self-constructing a cluster system, the cluster system includes a first management node and multiple service nodes, and the first management node is used to divide tasks into multiple subtasks and distributed to the service nodes.

其中集群系统可以为数据库集群，也可以为服务器集群，或者其他集群，管理节点和业务节点可以为服务器、计算机等设备。The cluster system may be a database cluster, or a server cluster, or other clusters, and the management nodes and service nodes may be servers, computers and other devices.

本实施例中以数据库集群为例进行说明，其中第一管理节点根据预设配置，决定数据交换任务的执行方式，如读取源、写入目标、并发量，并把任务按顺序切分成若干子任务，分发给多个业务节点，以多线程并发，同时执行子任务。此外第一管理节点还用于汇总任务。通过数据库集群可实现海量数据管理，更有利于数据事务处理、数据分析挖掘过程等。具体可通过XML (Extensible Markup Language，可扩展标记语言)进行配置。In this embodiment, a database cluster is taken as an example, in which the first management node determines the execution mode of the data exchange task according to the preset configuration, such as reading source, writing target, and concurrent amount, and divides the task into several Subtasks are distributed to multiple business nodes to execute subtasks concurrently with multiple threads. In addition, the first management node is also used for summarizing tasks. Massive data management can be realized through database clusters, which is more conducive to data transaction processing, data analysis and mining processes, etc. Specifically, configuration may be performed through XML (Extensible Markup Language, Extensible Markup Language).

本实施例提供的集群系统自构建方法，具体步骤如下：The specific steps of the cluster system self-construction method provided in this embodiment are as follows:

S101、当所述业务节点确定所述第一管理节点出现故障时，所述业务节点间启动相互通信。S101. When the service node determines that the first management node fails, start mutual communication between the service nodes.

在本实施例中，当第一管理节点发生故障时，不能继续进行任务分发，此时集群系统无法正常工作。业务节点确定第一管理节点出现故障的方法，可以采用心跳协议，即第一管理系统与业务节点每隔一段时间进行一次通信，通过对方回复情况判断互联的双方之间的通讯链路是否已经断开；当然也可采用其他方法确定第一管理节点出现故障，例如当某一业务节点执行完当前子任务后提交报告给第一管理节点，若第一管理节点无反馈，则认为第一管理节点出现故障，并广播到所有业务节点。In this embodiment, when the first management node fails, task distribution cannot continue, and the cluster system cannot work normally at this time. The method for the business node to determine that the first management node is faulty can use the heartbeat protocol, that is, the first management system communicates with the business node at regular intervals, and judges whether the communication link between the two interconnected parties has been broken based on the response from the other party. open; of course, other methods can also be used to determine the failure of the first management node, for example, when a service node completes the current subtask and submits a report to the first management node, if the first management node has no feedback, it is considered that the first management node A fault occurs and is broadcast to all service nodes.

本实施例中各节点间是相互连接的，在第一管理节点正常工作时，各业务节点仅仅与第一管理节点通信，业务节点间通常不通信。而当第一管理节点出现故障时，业务节点间启动相互通信，以进行后续的选举过程。In this embodiment, the nodes are connected to each other. When the first management node is working normally, each service node only communicates with the first management node, and the service nodes usually do not communicate with each other. And when the first management node fails, the service nodes start to communicate with each other, so as to carry out the subsequent election process.

S102、所述多个业务节点中的每一业务节点从所述多个业务节点中确定若干个候选节点，并从所述若干个候选节点中确定出第二管理节点，所述第二管理节点用于替代所述第一管理节点，所述若干个候选节点中除所述第二管理节点之外的其他候选节点用于执行所述第二管理节点原有的子任务。S102. Each service node in the plurality of service nodes determines several candidate nodes from the plurality of service nodes, and determines a second management node from the plurality of candidate nodes, and the second management node For replacing the first management node, other candidate nodes among the plurality of candidate nodes except the second management node are used to execute the original subtasks of the second management node.

在本实施例中，当第一管理节点出现故障后，需要从业务节点中重新选出第二管理节点，而业务节点并不是适合或者有能力作为管理节点，例如健康度较差、当前任务繁重、或者通信能力较差的业务节点。因此首先由各业务节点选出适合作为管理节点的候选节点，其中候选节点可以为健康度较好、当前任务量较少、或者通信能力较强的业务节点，当然可以以其他指标进行衡量。通过先选出若干候选节点，排除不适合的业务节点，减少了处理量，可提高选举效率。在获取若干个候选节点后，从候选节点中选举第二管理节点，具体的可以以健康度、任务量、通信能力等指标进行进一步排序，当然可以采用末位淘汰的方式，进行多次选举，每次淘汰一个候选节点，最后得到最合适的候选节点作为第二管理节点。第二管理节点承担第一管理节点的工作，并由其余候选节点分担所述第二管理节点原有子任务，由于候选节点相对于其他普通的业务节点健康度较好、当前任务量较少、或者通信能力较强，因此将第二管理节点原有子任务分担给其余候选节点，并不会增加其余候选节点的负担，可以更有效的将第二管理节点原有子任务完成。由于当第一管理节点出现故障后，各业务节点处于无主状态，因此在基于各业务节点间相互通信进行信息同步的基础上，每一业务节点均进行候选节点的确定。而从候选节点中确定出第二管理节点，则可以由每一业务节点进行，也可仅由各候选节点确定。In this embodiment, when the first management node fails, the second management node needs to be re-elected from the service nodes, but the service node is not suitable or capable of being a management node, for example, the health degree is poor, and the current task is heavy , or service nodes with poor communication capabilities. Therefore, firstly, each service node selects a candidate node suitable as a management node. The candidate node can be a service node with better health, less current task load, or stronger communication capability. Of course, it can be measured by other indicators. By first selecting several candidate nodes and excluding unsuitable business nodes, the processing capacity is reduced and the election efficiency can be improved. After obtaining a number of candidate nodes, the second management node can be elected from the candidate nodes. Specifically, the second management node can be further sorted by indicators such as health, task volume, and communication capabilities. One candidate node is eliminated each time, and finally the most suitable candidate node is obtained as the second management node. The second management node undertakes the work of the first management node, and the remaining candidate nodes share the original subtasks of the second management node. Compared with other common business nodes, the health of the candidate nodes is better, and the current task load is less. Or the communication capability is relatively strong, so allocating the original subtasks of the second management node to other candidate nodes will not increase the burden of other candidate nodes, and the original subtasks of the second management node can be completed more effectively. Since each service node is in an unowned state when the first management node fails, each service node determines a candidate node on the basis of information synchronization based on communication among service nodes. The determination of the second management node from the candidate nodes may be performed by each service node, or may only be determined by each candidate node.

本实施例提供的集群系统自构建方法，当业务节点确定第一管理节点出现故障时，通过业务节点间启动相互通信，每一业务节点从多个业务节点中确定若干个候选节点，并从若干个候选节点中确定出第二管理节点，第二管理节点替代第一管理节点，若干个候选节点中除第二管理节点之外的其他候选节点执行第二管理节点原有的子任务，从而在第一管理节点出现故障后较快的选举出新的管理节点，不需要暂停集群系统的运行，同时也不需要另设备用管理节点，提高集群系统的可靠性和系统资源的高效利用，使得集群系统可以安全和稳定的运行。In the cluster system self-construction method provided by this embodiment, when the service node determines that the first management node fails, mutual communication is started between the service nodes, and each service node determines several candidate nodes from multiple service nodes, and selects from several The second management node is determined among the candidate nodes, and the second management node replaces the first management node. Among the several candidate nodes, other candidate nodes except the second management node execute the original subtasks of the second management node, thus in After the failure of the first management node, a new management node can be elected quickly without suspending the operation of the cluster system, and at the same time, there is no need to use a management node for another device, which improves the reliability of the cluster system and the efficient use of system resources, making the cluster The system can run safely and stably.

实施例二Embodiment two

图2为本发明实施例二提供的集群系统自构建方法流程图。如图2所示，在上述实施例的基础上，S101所述的业务节点确定所述第一管理节点出现故障前，还包括：FIG. 2 is a flow chart of a method for self-constructing a cluster system provided by Embodiment 2 of the present invention. As shown in FIG. 2, on the basis of the above embodiments, before the service node in S101 determines that the first management node fails, it also includes:

S201、所述业务节点接收所述第一管理节点广播的心跳包，所述心跳包由所述第一管理节点根据自我监控其运转情况和健康度所生成。S201. The service node receives a heartbeat packet broadcast by the first management node, and the heartbeat packet is generated by the first management node according to self-monitoring of its operation and health.

在本实施例中，第一管理节点实时监控或每隔预定时间进行检测其运转情况和健康度，并采用心跳协议每隔一预定时间间隔即发送给业务节点，以使业务节点获知第一管理节点的运转情况和健康度，从而可以在第一管理节点出现故障时第一时间获知，从而启动后续的程序。当然，业务节点也可将自身的运转情况和健康度以心跳包的形式发送给第一管理节点，或者在响应第一管理节点发送的心跳包时将自身的运转情况和健康度反馈给第一管理节点，从而使第一管理节点获知业务节点的运转情况和健康度，从而在业务节点发生故障时将该业务节点当前的子任务分配给其他业务节点执行。In this embodiment, the first management node monitors in real time or detects its operation status and health at predetermined intervals, and uses the heartbeat protocol to send it to service nodes at predetermined time intervals, so that service nodes are notified of the first management The operation status and health of the node can be known at the first time when the first management node fails, so as to start the subsequent program. Of course, the service node can also send its own operation status and health degree to the first management node in the form of a heartbeat packet, or feedback its own operation status and health degree to the first management node when responding to the heartbeat packet sent by the first management node. Management nodes, so that the first management node can know the operation status and health of the service node, so that when the service node fails, the current subtask of the service node is assigned to other service nodes for execution.

当然，本发明中并不仅限于通过第一管理节点广播心跳包的方式来使业务节点监控第一管理节点是否发生故障，还可以通过其他方式实现，在上述实施例中已进行说明，此处不再赘述。Of course, the present invention is not limited to broadcasting heartbeat packets by the first management node to enable the service node to monitor whether the first management node fails, and it can also be realized in other ways, which have been described in the above-mentioned embodiments and are not described here Let me repeat.

进一步的，S101所述的业务节点确定所述第一管理节点出现故障，具体包括：Further, the service node described in S101 determines that the first management node fails, specifically including:

S202、当所述业务节点未收到所述心跳包时，确定所述第一管理节点出现故障。S202. When the service node does not receive the heartbeat packet, determine that the first management node is faulty.

在本实施例中，由于第一管理节点每隔一预定时间间隔向业务节点发送心跳包，而业务节点则在某一预定时间间隔未收到心跳包，则确定第一管理节点出现故障，也即管理节点无响应。In this embodiment, since the first management node sends a heartbeat packet to the service node every predetermined time interval, but the service node does not receive the heartbeat packet within a certain predetermined time interval, it is determined that the first management node is faulty, and That is, the management node is unresponsive.

在S202后，执行S203，即：After S202, execute S203, namely:

S203、所述业务节点间启动相互通信。S203. The service nodes start mutual communication.

进一步的，S102所述的所述多个业务节点中的每一业务节点从所述多个业务节点中确定若干个候选节点，具体包括：Further, each service node in the plurality of service nodes described in S102 determines several candidate nodes from the plurality of service nodes, specifically including:

S204、每一所述业务节点自我监控其运转情况和健康度，并将其健康度和任务量广播给其他业务节点；S204. Each service node self-monitors its operation status and health, and broadcasts its health and task load to other service nodes;

S205、每一所述业务节点根据各业务节点的健康度和任务量进行排序，以健康度较高且任务量较少的前N个业务节点为所述候选节点。S205. Each service node is sorted according to the health degree and task load of each service node, and the top N service nodes with higher health degree and less task load are the candidate nodes.

在本实施例中，通过每一业务节点广播其健康度和任务量，从而使得每一业务节点获知所有业务节点的健康度和任务量，此时根据各业务节点的健康度和任务量进行排序，具体的可以对健康度和任务量分别预设一权重值，从而综合健康度和任务量对每一业务节点进行评估。当得到排序后，以健康度较高且任务量较少的前N个业务节点为候选节点，其中N为一预设正整数， N大于或等于2。通过先选出若干候选节点，排除不适合的业务节点，减少了处理量，可提高选举效率。In this embodiment, each service node broadcasts its health degree and task amount, so that each service node knows the health degree and task amount of all service nodes, and at this time, it is sorted according to the health degree and task amount of each service node , specifically, a weight value can be preset for the health degree and the task amount, so as to evaluate each business node comprehensively by the health degree and the task amount. After being sorted, the top N service nodes with higher health and less workload are used as candidate nodes, where N is a preset positive integer, and N is greater than or equal to 2. By first selecting several candidate nodes and excluding unsuitable business nodes, the processing capacity is reduced and the election efficiency can be improved.

进一步的，S102所述的从所述若干个候选节点中确定出第二管理节点，具体包括：Further, determining the second management node from the plurality of candidate nodes in S102 specifically includes:

S206、每一所述候选节点与其余所述候选节点进行通信，根据其余所述候选节点对该候选节点的通信延迟状况的评价、以及该候选节点的健康度，获取得分。S206. Each of the candidate nodes communicates with the other candidate nodes, and obtains a score according to the evaluation of the communication delay status of the candidate node by the other candidate nodes and the health of the candidate node.

本实施例中，选举过程由候选节点执行，从各候选节点中投票选出唯一的一个作为第二管理节点。具体的，由各候选节点相互对通信延迟状况进行评价，同时每一候选节点广播自身的健康度，进而综合通信延迟状况的评价和健康度获取得分。例如，对于候选节点A，其余所有候选节点B、C根据与节点A的通信对节点A的通信延迟状况分别进行评价，再结合A的健康度，获取得分。此处也可根据通信延迟状况评价和健康度对得分的重要程度不同分别设置权重。In this embodiment, the election process is performed by the candidate nodes, and the only one among the candidate nodes is selected by voting as the second management node. Specifically, each candidate node evaluates the communication delay status with each other, and each candidate node broadcasts its own health degree, and then the evaluation of the communication delay status and the health degree are combined to obtain a score. For example, for candidate node A, all other candidate nodes B and C evaluate the communication delay status of node A according to the communication with node A, and then combine with the health of A to obtain the score. Here, the weights can also be set separately according to the importance of the evaluation of the communication delay status and the degree of health to the score.

S207、根据各所述候选节点的得分，淘汰得分最低的所述候选节点。S207. Eliminate the candidate node with the lowest score according to the scores of the candidate nodes.

S208、重复上述获取得分和淘汰得分最低的候选节点，直至剩余一个所述候选节点，作为所述第二管理节点。S208. Repeat the above steps of obtaining the score and eliminating the candidate node with the lowest score, until there is one candidate node remaining as the second management node.

本实施例采用末位淘汰的方式，进行多轮投票选举，在得到各候选节点的得分后可进行排序，每一轮中仅淘汰一个得分最低的候选节点。由于候选节点当前并未暂停原有的子任务，因此其通信延迟状况和健康度是实时改变的，仅通过一次选举就得到最终的第二管理节点，是存在一定的风险的，而本实施例通过多轮投票选举，更能反映出候选节点通信延迟状况和健康度的平均水平，选举所得的第二管理节点更为可靠。In this embodiment, multiple rounds of voting are carried out by adopting the method of eliminating the last place. After obtaining the scores of each candidate node, they can be sorted, and only one candidate node with the lowest score is eliminated in each round. Since the candidate node does not suspend the original subtasks at present, its communication delay status and health change in real time, and there is a certain risk in obtaining the final second management node through only one election, and this embodiment Through multiple rounds of voting, it can better reflect the average level of communication delay and health of candidate nodes, and the second management node elected is more reliable.

需要说明的是，由于各业务节点处于无主状态，因此选举过程是在每一业务节点中进行，也即各业务节点将信息进行公开和同步，并遵循相同的选举规则，当自身未成为候选节点则自动退出，不再继续参与；若成为候选节点后，得分在当期一轮中最低时则自动退出，也不再继续参与；若成为最后剩余的唯一的候选节点时，则作为第二管理节点，承担第一管理节点的工作。It should be noted that since each business node is in an unowned state, the election process is carried out in each business node, that is, each business node discloses and synchronizes information, and follows the same election rules. The node will automatically withdraw and will not continue to participate; if it becomes a candidate node and its score is the lowest in the current round, it will automatically withdraw and will not continue to participate; if it becomes the last remaining only candidate node, it will be used as the second management The node undertakes the work of the first management node.

此外，选举过程全部操作均在各业务节点的内存中进行，不进行磁盘读写，从而提高运行速度，提高选举效率。同时，本实施例中各节点通过云联网和云同步机制，提供了高速数据交换的能力。In addition, all operations during the election process are carried out in the memory of each business node, without disk read and write, thereby improving the running speed and election efficiency. At the same time, each node in this embodiment provides the capability of high-speed data exchange through the cloud networking and cloud synchronization mechanism.

进一步的，在某一轮投票选举过程中，在每一候选节点与其余候选节点进行通信时，若某一所述候选节点出现通信无反馈，则视为该候选节点弃权，淘汰该候选节点；并将上一轮选举过程淘汰的候选节点重新加入本轮选举过程中。也即，当某一候选节点出现通信无反馈时，说明该候选节点出现故障，若直接淘汰该候选节点，而在该轮投票选举仍会淘汰一个得分最低的候选节点，则投票选举总轮数将减少一轮，当候选节点数量较少的时候，可能存在一定的风险，因此将上一轮选举过程淘汰的候选节点重新加入本轮选举过程中，从而进一步提高可靠性。Further, during a certain round of voting, when each candidate node communicates with other candidate nodes, if there is no communication feedback from one of the candidate nodes, it will be deemed that the candidate node has abstained from voting, and the candidate node will be eliminated; And the candidate nodes that were eliminated in the previous round of election process will be rejoined in the current round of election process. That is to say, when a candidate node has no communication feedback, it means that the candidate node is faulty. If the candidate node is directly eliminated, and a candidate node with the lowest score will be eliminated in this round of voting, the total number of voting rounds One round will be reduced. When the number of candidate nodes is small, there may be certain risks. Therefore, the candidate nodes eliminated in the previous round of election process will be added to the current round of election process to further improve reliability.

需要说明的是，当该弃权的候选节点正常运转后，可能仍处于投票选举进程中，而此时可能投票选举已经结束，因此该候选节点进行信息汇总并发送到第二管理节点，磁盘读写进行备案，进入正常的任务执行状态。It should be noted that when the abstaining candidate node is operating normally, it may still be in the voting process, and the voting may have ended at this time, so the candidate node summarizes the information and sends it to the second management node, disk read and write Make a record and enter the normal task execution state.

进一步的，所述第二管理节点承担第一管理节点的工作后，若所述第一管理节点修复成功，将其作为业务节点重新添加到所述集群系统中。当然，在重新添加到集群系统中前，需要通过联网确认其健康度、任务执行速度、挤压文件量等指标，确认通过后将其作为业务节点，而不再重新担任管理节点。Further, after the second management node assumes the work of the first management node, if the first management node is repaired successfully, it is re-added to the cluster system as a service node. Of course, before re-adding to the cluster system, it is necessary to confirm its health, task execution speed, squeezed file volume and other indicators through the Internet. After the confirmation is passed, it will be used as a business node instead of being a management node again.

本实施例提供的集群系统自构建方法，当业务节点确定第一管理节点出现故障时，通过业务节点间启动相互通信，每一业务节点从多个业务节点中确定若干个候选节点，并从若干个候选节点中确定出第二管理节点，第二管理节点用于替代第一管理节点，若干个候选节点中除第二管理节点之外的其他候选节点用于执行第二管理节点原有的子任务，从而在第一管理节点出现故障后较快的选举出新的管理节点，不需要暂停集群系统的运行，同时也不需要另设备用管理节点，提高集群系统的可靠性和系统资源的高效利用，使得集群系统可以安全和稳定的运行。本实施例中根据各业务节点的健康度和任务量，先选出若干候选节点，排除不适合的业务节点，减少了处理量，以提高选举效率；并根据通信延迟状况和健康度获取候选节点得分，通过末位淘汰的方式经过多轮投票选举从候选节点中选举出第二管理节点，更能反映出候选节点通信延迟状况和健康度的平均水平，选举所得的第二管理节点更为可靠。In the cluster system self-construction method provided by this embodiment, when the service node determines that the first management node fails, mutual communication is started between the service nodes, and each service node determines several candidate nodes from multiple service nodes, and selects from several The second management node is determined among the candidate nodes, the second management node is used to replace the first management node, and the other candidate nodes except the second management node among the several candidate nodes are used to execute the original tasks, so that after the first management node fails, a new management node can be elected quickly without suspending the operation of the cluster system, and at the same time, there is no need for another device to use a management node, which improves the reliability of the cluster system and the efficiency of system resources Utilization enables the cluster system to run safely and stably. In this embodiment, according to the health degree and task load of each business node, select some candidate nodes first, exclude unsuitable business nodes, reduce the processing capacity, and improve election efficiency; and obtain candidate nodes according to communication delay status and health degree Score, the second management node is elected from the candidate nodes through multiple rounds of voting through the method of final elimination, which can better reflect the average level of communication delay and health of the candidate nodes, and the second management node elected is more reliable .

实施例三Embodiment Three

图3为本发明实施例三提供的集群系统自构建装置的结构图。本发明实施例提供的集群系统自构建装置可以执行实施例一和实施例二提供的处理流程，其中，所述集群系统包括第一管理节点和多个业务节点，所述第一管理节点用于将任务划分为多个子任务并分发给所述业务节点；所述装置部署于所述第一管理节点及所述业务节点上。FIG. 3 is a structural diagram of a cluster system self-construction device provided by Embodiment 3 of the present invention. The cluster system self-construction device provided by the embodiment of the present invention can execute the processing flow provided by the first embodiment and the second embodiment, wherein the cluster system includes a first management node and a plurality of service nodes, and the first management node is used to The task is divided into multiple subtasks and distributed to the service nodes; the device is deployed on the first management node and the service nodes.

需要说明的是由于第一管理节点也是从业务节点选举产生，因此第一管理节点中部署的集群系统自构建装置与业务节点中部署的集群系统自构建装置是相同的。It should be noted that since the first management node is also elected from the service nodes, the cluster system self-construction device deployed in the first management node is the same as the cluster system self-construction device deployed in the service node.

如图3所示，所述集群系统自构建装置包括：As shown in Figure 3, the self-construction device of the cluster system includes:

通信模块31，用于所述业务节点与所述第一管理节通信，并当所述业务节点确定所述第一管理节点出现故障时，所述业务节点间启动相互通信；A communication module 31, used for the service node to communicate with the first management node, and when the service node determines that the first management node fails, start mutual communication between the service nodes;

候选节点获取模块32，用于所述多个业务节点中的每一业务节点从所述多个业务节点中确定若干个候选节点；A candidate node acquiring module 32, configured for each of the plurality of service nodes to determine several candidate nodes from the plurality of service nodes;

选举模块33，用于从所述候选节点中确定第二管理节点；An election module 33, configured to determine a second management node from the candidate nodes;

配置模块34，用于配置由所述第二管理节点替代所述第一管理节点，由所述若干个候选节点中除所述第二管理节点之外的其他候选节点执行所述第二管理节点原有的子任务。Configuration module 34, configured to replace the first management node by the second management node, and execute the second management node by other candidate nodes except the second management node among the several candidate nodes The original subtask.

进一步的，本实施例的集群系统自构建装置还包括：Further, the cluster system self-construction device of this embodiment also includes:

测试模块35，用于每一所述业务节点自我监控其运转情况和健康度；A test module 35, used for each of the service nodes to self-monitor its operation and health;

所述通信模块31还用于，每一所述业务节点将其健康度和任务量广播给其他业务节点；The communication module 31 is also used for each of the service nodes to broadcast its health and tasks to other service nodes;

所述候选节点获取模块32具体用于：The candidate node acquiring module 32 is specifically used for:

每一所述业务节点根据各业务节点的健康度和任务量进行排序，以健康度较高且任务量较少的前N个业务节点为所述候选节点。Each of the service nodes is sorted according to their health and task load, and the top N service nodes with higher health and less task load are the candidate nodes.

进一步的，所述选举模块33具体用于：Further, the election module 33 is specifically used for:

每一所述候选节点通过所述通信模块与其余所述候选节点进行通信，根据其余所述候选节点对该候选节点的通信延迟状况的评价、以及该候选节点的健康度，获取得分；Each of the candidate nodes communicates with the other candidate nodes through the communication module, and obtains a score according to the evaluation of the communication delay status of the candidate node by the other candidate nodes and the health of the candidate node;

根据各所述候选节点的得分，淘汰得分最低的所述候选节点；Eliminate the candidate node with the lowest score according to the scores of each of the candidate nodes;

重复上述获取得分和淘汰得分最低的候选节点，直至剩余一个所述候选节点，作为所述第二管理节点。Repeating the above steps of obtaining the score and eliminating the candidate node with the lowest score until there is one candidate node remaining as the second management node.

进一步的，所述选举模块33还用于：Further, the election module 33 is also used for:

若某一所述候选节点出现通信无反馈，则视为该候选节点弃权，淘汰该候选节点；If there is no communication feedback from one of the candidate nodes, it will be deemed that the candidate node has abstained from voting, and the candidate node will be eliminated;

将上一轮选举过程淘汰的候选节点重新加入本轮选举过程中。The candidate nodes that were eliminated in the previous round of election process will be rejoined in the current round of election process.

进一步的，所述通信模块31还用于：Further, the communication module 31 is also used for:

所述业务节点接收所述第一管理节点广播的心跳包，所述心跳包由所述第一管理节点通过测试模块自我监控其运转情况和健康度所生成；The service node receives the heartbeat packet broadcast by the first management node, and the heartbeat packet is generated by the first management node through the self-monitoring of its operation and health through the test module;

所述业务节点未收到所述心跳包时，确定所述第一管理节点出现故障。When the service node does not receive the heartbeat packet, determine that the first management node is faulty.

进一步的，所述配置模块34还用于，若所述第一管理节点修复成功，将其配置为业务节点重新添加到所述集群系统中。Further, the configuration module 34 is further configured to, if the first management node is successfully repaired, configure it as a service node and add it to the cluster system again.

此外，本实施例的集群系统自构建装置还提供读取、写入接口，便于开发。In addition, the cluster system self-construction device of this embodiment also provides read and write interfaces, which is convenient for development.

本发明实施例提供的可以具体用于执行上述图1和图2所提供的方法实施例，具体功能此处不再赘述。The embodiments provided by the embodiments of the present invention can be specifically used to execute the method embodiments provided in the above-mentioned FIG. 1 and FIG. 2 , and the specific functions will not be repeated here.

本实施例提供的集群系统自构建装置，当业务节点确定第一管理节点出现故障时，通过业务节点间启动相互通信，每一业务节点从多个业务节点中确定若干个候选节点，并从若干个候选节点中确定出第二管理节点，第二管理节点用于替代第一管理节点，若干个候选节点中除第二管理节点之外的其他候选节点用于执行第二管理节点原有的子任务，从而在第一管理节点出现故障后较快的选举出新的管理节点，不需要暂停集群系统的运行，同时也不需要另设备用管理节点，提高集群系统的可靠性和系统资源的高效利用，使得集群系统可以安全和稳定的运行。The cluster system self-construction device provided by this embodiment, when the service node determines that the first management node fails, mutual communication is started between the service nodes, and each service node determines several candidate nodes from multiple service nodes, and selects from several The second management node is determined among the candidate nodes, the second management node is used to replace the first management node, and the other candidate nodes except the second management node among the several candidate nodes are used to execute the original tasks, so that after the first management node fails, a new management node can be elected quickly without suspending the operation of the cluster system, and at the same time, there is no need for another device to use a management node, which improves the reliability of the cluster system and the efficiency of system resources Utilization enables the cluster system to run safely and stably.

实施例四Embodiment four

图4为本发明实施例四提供的集群系统的结构图，图5为本发明实施例四提供的集群系统中第一管理节点和业务节点的硬件架构图。如图4和图5 所述，本实施例提供的集群系统，包括第一管理节点41和多个业务节点42，所述第一管理节点和所述业务节点均包括存储器51和处理器52；FIG. 4 is a structural diagram of a cluster system provided by Embodiment 4 of the present invention, and FIG. 5 is a hardware architecture diagram of a first management node and a service node in the cluster system provided by Embodiment 4 of the present invention. As shown in FIG. 4 and FIG. 5 , the cluster system provided by this embodiment includes a first management node 41 and a plurality of service nodes 42, and both the first management node and the service nodes include a memory 51 and a processor 52;

所述第一管理节点的处理器52被配置为，将任务划分为多个子任务并分发给所述业务节点；The processor 52 of the first management node is configured to divide the task into multiple subtasks and distribute them to the service nodes;

所述业务节点的处理器52被配置为，执行所述子任务，并且当所述业务节点确定所述第一管理节点出现故障时，所述业务节点间启动相互通信，所述多个业务节点中的每一业务节点从所述多个业务节点中确定若干个候选节点，并从所述若干个候选节点中确定出第二管理节点，所述第二管理节点用于替代所述第一管理节点，所述若干个候选节点中除所述第二管理节点之外的其他候选节点用于执行所述第二管理节点原有的子任务。The processor 52 of the service node is configured to execute the subtask, and when the service node determines that the first management node fails, start mutual communication between the service nodes, and the multiple service nodes Each service node in the service node determines several candidate nodes from the plurality of service nodes, and determines a second management node from the several candidate nodes, and the second management node is used to replace the first management node Nodes, among the plurality of candidate nodes, other candidate nodes except the second management node are used to execute the original subtasks of the second management node.

本实施例的第一管理节点和业务节点的处理器52可以执行实施例一和实施例二提供的处理流程，具体功能此处不再赘述。The processors 52 of the first management node and the service node in this embodiment can execute the processing procedures provided in Embodiment 1 and Embodiment 2, and specific functions will not be repeated here.

当然，如图5所述，第一管理节点和业务节点还可以包括以下组件：接收器53、发送器54等，其具体功能此处不再赘述。Certainly, as shown in FIG. 5 , the first management node and the service node may also include the following components: a receiver 53 , a transmitter 54 , etc., and their specific functions will not be repeated here.

本实施例提供的集群系统，当业务节点确定第一管理节点出现故障时，通过业务节点间启动相互通信，每一所述业务节点从多个所述业务节点中确定若干个候选节点，并从所述若干个候选节点中确定出第二管理节点，所述第二管理节点用于替代所述第一管理节点，所述若干个候选节点中除所述第二管理节点之外的其他候选节点用于执行所述第二管理节点原有的子任务，从而在第一管理节点出现故障后较快的选举出新的管理节点，不需要暂停集群系统的运行，同时也不需要另设备用管理节点，提高集群系统的可靠性和系统资源的高效利用，使得集群系统可以安全和稳定的运行。In the cluster system provided by this embodiment, when the service node determines that the first management node fails, mutual communication is initiated between service nodes, and each service node determines several candidate nodes from a plurality of service nodes, and from A second management node is determined from the plurality of candidate nodes, the second management node is used to replace the first management node, and other candidate nodes in the plurality of candidate nodes except the second management node It is used to execute the original sub-tasks of the second management node, so that a new management node can be quickly elected after the failure of the first management node, without suspending the operation of the cluster system, and at the same time, no other equipment is required for management Nodes improve the reliability of the cluster system and the efficient utilization of system resources, so that the cluster system can run safely and stably.

在本发明所提供的几个实施例中，应该理解到，所揭露的装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the several embodiments provided by the present invention, it should be understood that the disclosed devices and methods can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本发明各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用硬件加软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or in the form of hardware plus software functional units.

上述以软件功能单元的形式实现的集成的单元，可以存储在一个计算机可读取存储介质中。上述软件功能单元存储在一个存储介质中，包括若干指令用以使得一个计算机设备(可以是个人计算机，服务器，或者网络设备等) 或处理器(processor)执行本发明各个实施例所述方法的部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(Read-Only Memory，ROM)、随机存取存储器(Random Access Memory，RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The above-mentioned integrated units implemented in the form of software functional units may be stored in a computer-readable storage medium. The above-mentioned software functional units are stored in a storage medium, and include several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) execute a part of the method described in each embodiment of the present invention step. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other various media that can store program codes. .

本领域技术人员可以清楚地了解到，为描述的方便和简洁，仅以上述各功能模块的划分进行举例说明，实际应用中，可以根据需要而将上述功能分配由不同的功能模块完成，即将装置的内部结构划分成不同的功能模块，以完成以上描述的全部或者部分功能。上述描述的装置的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that for the convenience and brevity of description, only the division of the above-mentioned functional modules is used as an example for illustration. The internal structure of the system is divided into different functional modules to complete all or part of the functions described above. For the specific working process of the device described above, reference may be made to the corresponding process in the foregoing method embodiments, and details are not repeated here.

最后应说明的是：以上各实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述各实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分或者全部技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present invention, rather than limiting them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: It is still possible to modify the technical solutions described in the foregoing embodiments, or perform equivalent replacements for some or all of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the various embodiments of the present invention. scope.