Summary of the invention
The invention provides a kind of cluster fissure processing method and device, solved the fissure processing mode singlely, influence the problem of cluster operating efficiency.
A kind of cluster fissure processing method comprises:
Heartbeat line between other nodes in each this node of node detection and the cluster in the cluster;
When the cluster interior nodes detected less than any heartbeat line, this node was ended the business on this node.
Preferably, said when the cluster interior nodes detects less than any heartbeat line, this node is ended also to comprise after the step of the business on this node:
Said node detection to and cluster in behind the heartbeat line heartbeating recovery between each node, the business on this node is reopened.
Preferably, said when the cluster interior nodes detects less than any heartbeat line, the business that this node is ended on this node is specially:
In the time can't detecting any heartbeat line in the sense cycle that the cluster interior nodes is presetting, this node is ended the business on this node.
Preferably, above-mentioned cluster fissure processing method also comprises:
The cluster interior nodes can detect and the part cluster in during heartbeat line between other nodes, judge detect less than heartbeat failure.
The present invention also provides a kind of cluster fissure processing unit, comprising:
The heartbeat administration module is used for detecting the heartbeat line between cluster interior nodes and other nodes of cluster;
The cluster management module is used for when detecting less than any heartbeat line between cluster interior nodes and other nodes of cluster, ending the business on this cluster interior nodes.
Preferably, said cluster management module also is used for behind the heartbeat line heartbeating recovery that detects between cluster interior nodes and other nodes of said cluster, the business on this cluster interior nodes being reopened.
Preferably, said heartbeat administration module, also be used for can detect and other nodes of part cluster between the heartbeat line time, judge to detect less than heartbeat failure.
The invention provides a kind of cluster fissure processing method and device; Heartbeat line between other nodes in each this node of node detection and the cluster in the cluster, when the cluster interior nodes detects less than any heartbeat line, the business on this this node of node termination; Replaced the system of directly restarting of the prior art with the termination business; Save recovery time, improved the accuracy that the fissure phenomenon is handled, guaranteed system works efficient.
Embodiment
Under many circumstances, for example: the disconnection of netting twine etc., directly restarting computer system does not have great necessity, and behind computer system starting, will reinitialize information as requested, and this will be a relatively time-consuming procedure, reduce efficient.
In order to address the above problem, embodiments of the invention provide a kind of cluster fissure processing method and device, fast detecting and response fissure, stop on this node shared resource, stop the business service that this node provides, guarantee the fail safe of shared resource; Behind this node heartbeating recovery, can be directly, the service of recovery nodes fast and efficiently.Not only guarantee the safety of resource, improved the speed of cluster recovery and the performance of high-availability system simultaneously.
Hereinafter will combine accompanying drawing that embodiments of the invention are elaborated.Need to prove that under the situation of not conflicting, embodiment among the application and the characteristic among the embodiment be combination in any each other.
At first combine accompanying drawing, embodiments of the invention one are described.
The embodiment of the invention provides a kind of cluster fissure processing method and device, in the available cluster of height, after node finds that heartbeat is broken off, can directly shut-down operation system, and just stop on this node shared resource, stop the business service that this node provides; Behind this node heartbeating recovery, can be directly, the service of recovery nodes fast and efficiently.The method has not only guaranteed the safety of resource, has improved the speed of cluster recovery simultaneously, improves the performance of high-availability system.The cluster fissure processing unit that the embodiment of the invention provides comprises: heartbeat administration module, cluster management module and local resource administration module.
In conjunction with above-mentioned cluster fissure processing unit, the cluster fissure processing method of using the embodiment of the invention to provide, the flow process that the node that the fissure phenomenon takes place is handled is following:
1) in the heartbeat administration module, heartbeat module regularly detects the information of every heartbeat line of all nodes in the cluster.In the time that system is provided with in advance, if continue not detect the information of heartbeat line, this judges this heartbeat failure.In a node, if all heartbeat line fault is all then judged other nodes disconnections in this node and the cluster.
2) in the cluster management module, when this module is received heartbeat module heartbeat ON-and OFF-command, can carry out a series of nodal informations and judge, confirm the processing method of node at last.If this node is the node that breaks from cluster, this node will be not can directly shut-down operation system, but startup local resource administration module (3) stop on this node shared resource, stop the business service that this node provides.Other normal node will be taken over the business on this disconnected node in the cluster, and service externally is provided.
3) the heartbeat administration module still detects the information of every heartbeat line of each node behind heartbeat failure, behind the heartbeat message that detects fault heartbeat line again, sends the order of heartbeating recovery and gives the cluster management module.
4) after the order that receives heartbeating recovery, the cluster management module will be made different operation according to the current state of cluster.As the cluster normal node can be directly, the service of recovery nodes fast and efficiently; Like cluster has been the fissure state, with the service of the whole cluster of fast quick-recovery.
After node breaks from cluster, can directly shut-down operation system, and just stop on this node shared resource, stop the business service that this node provides, guaranteed the fail safe of shared resource; The present invention has simultaneously increased the heartbeating recovery testing mechanism, behind this node heartbeating recovery, can be directly, the service of recovery nodes fast and efficiently, and improved the speed of cluster recovery, improve the performance of high-availability system.
To combine accompanying drawing that the present invention is carried out more detailed description below:
The master server of cluster management also is a node in the cluster, and this node can initiatively distribute the resource of cluster, to different servers, service is provided externally the various service assignment of cluster; Simultaneously, master server is also directly relevant with the user, and the user directly is assigned on the node of appointment by this node the operation of cluster.
Accompanying drawing 1 is the described fissure responding process of embodiment of the invention figure.The cluster management module is given in the heartbeat that detects certain node when the heartbeat administration module dead order of sending node when cluster breaks; The cluster management module is at first deleted and is upgraded the clustered node information list; And whether computing node is host node, and whether decision node is this node then, if be that this node breaks from cluster; The local resource administration module will stop on this node shared resource, stop the business service that this node provides, wait for the resurrection of heartbeat; Node breaking off is not under the situation of this node; Calculate the start node number of cluster, whether the decision node number is the high available cluster mode of 1+1 of 2 nodes, in the high available cluster of 2 nodes; This node is PING third party IP address initiatively; Judge whether this node also breaks from network, if this node breaks from network, the local resource administration module will stop on this node shared resource, stop the business service that this node provides; Do not wait for the resurrection of heartbeat, not then take over the master server of cluster management; Under the cluster situation of multinode; The half the size of contrast existing node number of cluster and start node number; If existing node number is less than a half; The local resource administration module will stop on this node shared resource, stop the business service that this node provides, the node number that wait for to bring back to life heartbeat is greater than 1/2; When existing node number equals 1/2, judge whether there is master server in the existing node; When existing node greater than 1/2 the time, judge then whether the node that breaks off is master server, if the node that breaks off is a master server, this node will calculate the information of this node, makes a strategic decision and whether takes over master server; If disconnected node is not master server, judge then whether this node is master server, if master server, the business on the disconnected node of then shifting is to other movable nodes.
Fig. 2 is heartbeating recovery responding process figure.The cluster management module is given in the order that sending node recovers when the heartbeat administration module detects the heartbeating recovery of node, and the cluster management module is at first sent the message of several times request adding and given all nodes in the cluster.For all nodes in the cluster; After the request of receiving adds order, will join nodal information in the node listing information on this node, in the cluster all nodes all cognitive the existence of node; Judge then whether this node is primary server joint; If node is a master server, this node will be replied the message of recovery nodes, inform the existence of master server; For the heartbeating recovery node, the request of transmission will be waited for the answer message of some time wait master server after adding message, if receive the answer message of master server, then node adds in the cluster, can start in the cluster and serve; Were it not for the answer message of receiving master server; Explain that master server does not exist; This recovery nodes will be sent again the main clothes of decision-making device and ordered to all nodes in the cluster, after each node is received this order, and information of computing node all; Make a strategic decision out new master server in the cluster restarts the service of cluster.
Cluster fissure processing method and device that the embodiment of the invention provided; Can respond the order that heartbeat is broken off fast; Stop local business and shared resource; And master server will guarantee the fail safe of resource to have guaranteed professional continuity simultaneously service assignment on the disconnected node to other normal nodes; Simultaneously, when the node heartbeating recovery, can be directly, the service of recovery nodes fast and efficiently, improved the speed of cluster recovery, improve the performance of high-availability system.
Below in conjunction with accompanying drawing, embodiments of the invention two are described.
The embodiment of the invention provides a kind of cluster fissure processing method, and it is as shown in Figure 3 to use this method to accomplish the flow process that fissure node in the cluster is handled, and comprising:
Heartbeat line between other nodes in each this node of node detection and the cluster instep 301, the cluster;
Step 302, when the cluster interior nodes detects less than any heartbeat line, this node is ended the business on this node;
In the time can't detecting any heartbeat line in the sense cycle that the cluster interior nodes is presetting, this node is ended the business on this node.
Step 303, said node detection to and cluster in behind the heartbeat line heartbeating recovery between each node, the business on this node is reopened.
Step 304, the cluster interior nodes can detect and the part cluster in during heartbeat line between other nodes, judge detect less than heartbeat failure;
Afterstep 301, if the cluster interior nodes can detect one or more heartbeat line, but can't detect whole heartbeat line the time, explain that fissure does not take place this node, at this moment, decidable detect less than heartbeat failure.
The embodiment of the invention also provides a kind of cluster fissure processing unit, and its structure is as shown in Figure 4, comprising:
Heartbeat administration module 401 is used for detecting the heartbeat line between cluster interior nodes and other nodes of cluster;
Cluster management module 402 is used for when detecting less than any heartbeat line between cluster interior nodes and other nodes of cluster, ending the business on this cluster interior nodes.
Preferably, saidcluster management module 402 also is used for behind the heartbeat line heartbeating recovery that detects between cluster interior nodes and other nodes of said cluster, the business on this cluster interior nodes being reopened.
Preferably, saidheartbeat administration module 401, also be used for can detect and other nodes of part cluster between the heartbeat line time, judge to detect less than heartbeat failure.
Above-mentioned cluster fissure processing unit can be integrated on interior each node of cluster, to accomplish the monitoring and the fissure of each node is handled.
The cluster fissure processing unit that the embodiment of the invention provides; Can combine with a kind of cluster fissure processing method that embodiments of the invention are provided; Heartbeat line between other nodes in each this node of node detection and the cluster in the cluster, when the cluster interior nodes detects less than any heartbeat line, the business on this this node of node termination; Replaced the system of directly restarting of the prior art with the termination business; Save recovery time, improved the accuracy that the fissure phenomenon is handled, guaranteed system works efficient.
The all or part of step that the one of ordinary skill in the art will appreciate that the foregoing description program circuit that can use a computer is realized; Said computer program can be stored in the computer-readable recording medium; Said computer program (like system, unit, device etc.) on the relevant hardware platform is carried out; When carrying out, comprise one of step or its combination of method embodiment.
Alternatively, all or part of step of the foregoing description also can use integrated circuit to realize, these steps can be made into integrated circuit modules one by one respectively, perhaps a plurality of modules in them or step is made into the single integrated circuit module and realizes.Like this, the present invention is not restricted to any specific hardware and software combination.
Each device/functional module/functional unit in the foregoing description can adopt the general calculation device to realize, they can concentrate on the single calculation element, also can be distributed on the network that a plurality of calculation element forms.
Each device/functional module/functional unit in the foregoing description is realized with the form of software function module and during as independently production marketing or use, can be stored in the computer read/write memory medium.The above-mentioned computer read/write memory medium of mentioning can be a read-only memory, disk or CD etc.
Any technical staff who is familiar with the present technique field can expect changing or replacement in the technical scope that the present invention discloses easily, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the described protection range of claim.