Invention content
The technical problems to be solved by the invention are to provide a kind of implementation method of expansible big data High Availabitity, solveUser develops the problem of node High Availabitity under different business scene and various communications protocols.
The technical solution that the present invention solves above-mentioned technical problem is as follows:A kind of realization side of expansible big data High AvailabitityMethod includes the following steps,
Build the first data directory structure in Zookeeper, and two controllers be configured for each node, respectively based onController and preparation controller;
When initial, setting master controller is activation process, and the master that master controller stores in the first data directory structureUnder the control for controlling information, and according to the service of node serve information administration corresponding node stored in the first data directory structureProcess;
When master controller delays machine, then the transient node that master controller creates in the first data directory structure is deleted by time-outIt removes;
After preparation controller receives the notice that transient node is deleted by time-out, re-create and face in the first data directory structureShi Jiedian, and activation process is taken over as, and the control of standby control information that preparation controller stores in the first data directory structureUnder system, and according to the service processes of node serve information take over corresponding node stored in the first data directory structure;
While preparation controller takes over as activation process, the process of pull-up master controller again.
The beneficial effects of the invention are as follows:A kind of implementation method of expansible big data High Availabitity of the present invention utilizesThe characteristic of Zookeeper, node have carried out distributed treatment, and the information of each node is shared and monitored by Zookeeper;Due to saving the details of each node in Zookeeper, so master controller and preparation controller at runtime can be mutualMutually monitor the situation of other side, the master controller for being in state of activation is delayed suddenly after machine, and spare preparation controller can be taken in timeTask simultaneously attempts pull-up and has delayed the master controller of machine, and ensure that process services normal operation, accomplishes that each process service height canWith.
Based on the above technical solution, the present invention can also be improved as follows.
Further, the first data directory structure includes root, second-level directory and three-level catalogue, wherein, rootProject name is named as, second-level directory is the host name of each node, and being created in three-level catalogue has node control message file and sectionPoint service information file, node control message file include active information file, main control message file and standby control information textPart;
The node serve information is stored in node serve message file, and master controller or preparation controller create interimNode is stored in active information file, and the main control information is stored in main control message file, the standby control informationIt is stored in standby control message file.
Further, master controller and preparation controller check in real time according to the transient node recorded in the active information fileThe state of activation of itself;
When master controller or preparation controller, which check, lays oneself open to state of activation, then master controller or preparation controller continueIt checks the state for oneself possessing service processes, and the service processes stopped being handled according to preset algorithm.
Further, before master controller delays machine, the information on services for the service processes being currently running is stored in by master controllerOn the node serve message file;
After master controller delays machine, when preparation controller takes over as activation process, preparation controller is according to the node serveThe information on services recorded on message file whether there is to detect the service processes that master controller is delayed before machine.
Further, when preparation controller detects that the service processes that master controller is delayed before machine exist and in health statusWhen, then the service processes that preparation controller delays to master controller before machine take over;
Before preparation controller delays machine, the information on services for the service processes being currently running is stored in the node by preparation controllerIn service information file.
Further, the service of controller administration corresponding node being active, the load balancing including control node,The step of load balancing of controller control node being active, includes,
The second data directory structure is built in Zookeeper, and node is started to the configuration information storage of service processesIn the second data directory structure;
The controller being active according to the multiple mutually independent service types of the node serve information monitoring, andEach service processes are individually dispatched respectively according to the number of nodes under each service type and configuration information.
Further, the second data directory structure includes root, second-level directory, three-level catalogue and level Four catalogue,In, root is named as project name, and second-level directory is named as load balancing master catalogue, and three-level catalogue is the service under each nodeClassification, creating in level Four catalogue has configuration information file and nodal information file, and nodal information file designation is the master of each nodeMachine title;The configuration information is stored in the configuration information file.
Further, the controller being active is distinguished according to the number of nodes under each service type and configuration informationThe process individually dispatched includes automatic load balancing, and the detailed process of automatic load balancing is,
When the controller being active starts, calculating needs service processes quantity to be started on transient node,And the service processes quantity oneself administered is adjusted according to the variation of service processes sum, while the service managed in needsTransient node is created in nodal information file under category list;
When the service processes failure of the controller on a node, then under the corresponding service type catalogue for needing to manageNodal information file in transient node be automatically left out, the controller being active on other nodes is according to monitoringSituation is to needing service processes quantity to be started to recalculate, and be adjusted correspondingly.
Further, the calculating process of service processes to be started is needed on node specifically, according to all number of nodes andThe need service processes quantity to be started being written in configuration information file, calculating needs service processes to be started on present nodeQuantity.
Further, the controller being active is distinguished according to the number of nodes under each service type and configuration informationThe process individually dispatched further includes sluggard's load balancing, and the principle of sluggard's load balancing is, when new node adds in cluster,Original service processes on present node will not actively be stopped;When configuration information file changes or has a service processes appearanceDuring mistake, load and start new service processes less than the controller corresponding to the node of preset configuration quantity.
Advantageous effect using above-mentioned further scheme is:Since the information of each node carries out in ZookeeperStorage, so controller can dynamically be increased and decreased according to load-balancing algorithm according to the quantity of service processes, when increasing newly orWhen losing node, carry out load balance can also be recalculated.The present invention carries out load balancing on the basis of the service of bottom process,Since certain service access request quantity can be configured in each independent process service, when access request quantity increasesWhen, it is only necessary to calculating needs process quantity of service to be started, shields access request and is initiated with which kind of communication protocol, it is only necessary to be pressedIt is encapsulated in process service according to the communication protocol of demand, load-balancing algorithm carries out load balancing according to service processes quantity.InstituteThe problem of load balancing of various communications protocols is adapted to the present invention, user can carry out load balancing with ready-made process service,It can also independently be extended under the load balancing frame of the present invention according to special scenes demand.And the present invention is according to processService time to live is divided into two kinds of mechanism and carries out load balancing, and sluggard load balancing side is used for the process service of longtime runningFormula need to only go wrong in process and carry out recalculating distribution, and automatic load balancing is used for the process service of short-term operationMode can carry out load balancing distribution in real time.
Specific embodiment
The principle and features of the present invention will be described below with reference to the accompanying drawings, and the given examples are served only to explain the present invention, andIt is non-to be used to limit the scope of the present invention.
A kind of implementation method combination Zookeeper distributed type assemblies management of expansible big data High Availabitity of the present invention is specialProperty, design it is a kind of there is distributed, expansible node High Availabitity and load balancing scheme, solve user not of the same trade or businessThe problem of exploitation node High Availabitity and load balancing are needed using multiple technologies stack under business scene and various communications protocols is born simultaneouslyIt carries equalization scheme and supports that automatic and sluggard's two ways, user can more preferably be selected according to different business scene, it can alsoUsing load balancing interface exploitation is extended according to special scenes demand.
Zookeeper is distributed, open source code a distributed application program coordination service, is GoogleMono- realization increased income of Chubby is the significant components of Hadoop and Hbase;It is one and provides consistency for Distributed ApplicationThe software of service, the function of providing include:Configuring maintenance, domain name service, distributed synchronization, group service etc..
Two kinds of technologies of dependence Zookeeper are needed in the present invention:Transient node and monitor.
There are two types of nodes in Zookeeper, respectively transient node and persistent node.Ephemeral Node areThe transient node of Zookeeper, the life cycle of such node, which depends on, creates their session, once conversation end, temporarilyNode will be automatically left out;Persistent node, English name Persistent Node refer to after node creates, just deposit alwaysThis node is actively being removed until there is delete operation, will not disappeared because of the client session failure for creating the node.
Zookeeper clients can set monitor (Watch) on node, when node state changes, thanSuch as the increase, deletion, change of data, it will the operation corresponding to triggering monitor.When monitor is triggered, ZookeeperIt will be sent to client and only send a notice, because monitor can only be triggered once.
The High Availabitity (HA) of node, i.e.,:High Available, high availability cluster are to ensure having for business continuanceImitate solution, it is general there are two or more than two nodes, and be divided into active node (Active) and standby node(Standby);Usually the active node that is known as the business that is carrying out, and a backup as active node is then referred to as standbyUse node;When active node goes wrong, when causing the business being currently running (task) to be not normally functioning, standby node is at this timeIt will detect, and the active node that immediately continues performs business;So as to fulfill business do not interrupt or short interruption.
As shown in figure 3, in order to realize the High Availabitity of node serve, need to establish specific catalogue knot in ZookeeperStructure (i.e. the first data directory structure), that is, the NameSpace of Zookeeper, root can be named as project name(ZKNameSpace), second-level directory is the host name (Nost of each node:Including Nost_A, Nost_B, Nost_C etc., HostFor the host of erection schedule service, also referred to as node), three-level catalogue is divided into two classes:One kind is node control information, including activationInformation (Active) and active and standby control information (Controller_A, Controller_B), another kind of is node serve information(Agents), classification storage can be carried out according to service type below node serve information.
As shown in figure 3, under each node (Host, below by taking Nost_A as an example), all there are two controller (Controller_A, Controller_B, Controller_A is state of activation herein, and Controller_B is spare), each controllerThere is the function of monitoring another controller, and delay in another controller that (machine of delaying, referring to operating system can not be from a serious system for machineRecover or system hardware level goes wrong in system mistake, so that system is for a long time without response, and restarting meter of having toThe phenomenon that calculation machine.Can also refer to some process occur serious error needs restart) when take over its subordinate node serve process (toolBody, master controller is under the control of main control information according to the service of node serve information realization corresponding node, preparation controllerAccording to the service of node serve information realization corresponding node under the control of standby control information), and restart the control for machine of delayingDevice.
The present invention monitors the state of a process of two controllers, and a controller wherein by ZookeeperProcess when the error occurs, realize automatically switch.When initial, the master controller Controller_A of each node is (hereinafter referred to asCA it is) activation (Active) process, when CA delays machine, (transient node stores the transient node created in ZookeeperCatalogue is /ZKNamespace/Host_A/Active) it will be deleted by time-out, while preparation controller Controller_B is (followingAbbreviation CB) notice that transient node will be received is deleted;Then, CB will re-create transient node in Zookeeper, and connectPipe becomes the function of activation process.It, will pull-up CA processes again while CB takes over as activation process.
In order to normally take over existing node serve process, it is initially in the master controller of state of activationBy the information on services corresponding to the service processes being currently running, (information on services is specifically as follows Controller_A for meeting before the machine of delayingPort and PID) it is stored on Zookeeper (catalogue of storage is /ZKNamespace/Host/Agents/Topic/ID).OftenWhen secondary preparation controller Controller_B is switched to state of activation, it can attempt using the process service letter recorded on ZookeeperCease whether the service processes delayed before machine to detect master controller Controller_A also exist.If master controllerThe service processes that Controller_A delays before machine exist and in health status, then Controller_B pairs of preparation controllerIt takes over, and the function being managed to existing service process is realized with this.And the controller being active can be examinedLook into the state for oneself possessing service processes, and according to preset algorithm, the service processes stopped being restarted orOther processing operations.
Each controller can check the current state of oneself in real time, if laying oneself open to state of activation, then will continue toThe operating status for oneself possessing service processes is checked, if the service processes oneself possessed mistake occur or exception is moved backGo out, then the predetermined operation according to preset algorithm is to there is mistake or the service processes that exit extremely are handled or againIt opens.
The service processes of controller administration corresponding node being active include the load balancing of control node.
As shown in figure 4, in order to realize node load balancing function, need to establish specific catalogue knot in ZookeeperStructure (i.e. the second data directory structure), that is, the NameSpace of Zookeeper, root can be named as project name(ZKNameSpace), second-level directory is load balancing master catalogue (Balancer), and three-level catalogue is the service class under each nodeNot (Topic:Including Topic_A, Topic_B, Topic_C etc., Topic is process service type, according to different methods of service orClassifying content, same service type can have multiple processes under its command and provide service), level Four catalogue is configuration information(Configuration, the configuration information of recording controller) and nodal information (Nost), wherein nodal information are each nodeHostname (including Nost_A, Nost_B, Nost_C etc.).
The controller Controller being active on each node can using Zookeeper come carry out node itBetween service processes distribution, it is therefore desirable to the configuration information storage of service processes will be started on Zookeeper.For eachThe controller Controller being active can monitor multiple service types, and each service type is mutual indepedent,The controller Controller being active can be right respectively according to the number of nodes under each service type and configuration informationEach service processes are individually dispatched.Each corresponding configuration information of service type deposit in Zookeeper on (catalogue is/ZKNamespace/Balancer/Topic/Configuration)。
The service type managed can be needed when the controller Controller being active starts in ZookeeperCorresponding transient node is created under catalogue, and (transient node creaties directory as/ZKNamespace/Balancer/Topic/Hosts/Host_A), need to start a certain number of service processes on each node, the controller being activeController can be calculated when starting needs service processes quantity to be started on transient node.All controls being activeDevice can monitor the quantity of node, and calculate the service processes that initiate by its own is needed in corresponding node, also can according to service intoThe variation of journey sum is adjusted the service processes quantity oneself administered.Likewise, when the clothes of the controller on a nodeWhen business process fails, (specific catalogue is /ZKNamespace/Balancer/Topic/Hosts/ under corresponding Hosts cataloguesHost_A transient node) can disappear automatically, other nodes are according to monitoring situation to service processes quantity to be started is needed to carry outIt recalculates, is timely adjusted correspondingly.Here the method for adjustment service processes quantity follows the principle that last in, first out, i.e.,Newest service processes are preferentially stopped.
Each node needs service processes algorithm to be started as follows:According to the quantity of all transient nodes withThe need service processes quantity to be started being written in Configuration catalogues, can calculate needs to start on present nodeNumber of processes.Current service processes Distribution Algorithm process includes, and lists host name all under Hosts catalogues, and makeIt is ranked up with lexcographical order, using serial number of the host name of present node in entire sequence as initial Index, then usedFollowing code carries out the calculating of number of processes.
Int number=0;(it is integer variable to define service processes quantity)
For (int i=index;i<total;I+=hostNum)
number++;(using serial number of the host name of present node in entire sequence as initial Index, when currentWhen serial number of the host name of node in entire sequence is less than total sequence quantity, according to total node number amount, present node serial numberAnd process sum to be started is needed, which to carry out present node, needs service processes quantity to be started to calculate)
}
return number;(return node needs service processes quantity to be started)
In order to adapt to different business scenarios, it is proposed that two class load-balancing methods, first, above-mentioned automatic load balancing:Load balancing can be carried out automatically at once when newly-increased or while losing node, compare the service processes for adapting to short-term operation, becauseFor for the process of longtime running, according to above-mentioned automatic equalization mode, whenever a newly-increased node or one is lostDuring node, just have process and be moved to end, and restart on new node, and the cost for starting stopping Long Running Tasks is veryHigh, the process newly started is with original state of a process it is difficult to ensure that consistent.Therefore, occur number of nodes variation when at once intoThe balanced strategy of row, the task for longtime running is not a kind of suitable mode;In order to avoid disadvantages mentioned above, the present invention also carriesSluggard's load balancing has been supplied, sluggard's load balancing can't actively delete original service processes when new node adds in cluster,And when configuration information file changes or has service processes when the error occurs, it just has and loads relatively low controller to startNew service processes, in this way can be to avoid unknown problem caused by the original service processes possibility of stopping.
The computational methods that number of nodes to be started is needed to be described with automatic load balancing on each node of sluggard's load balancingIt is identical.Difference is:
1) the service processes quantity being currently running can be recorded on Zookeeper that (physical record is in node by each nodeIn message file), other nodes can be understood and there remains how many a service processes at present and can start;
2) each node will not actively stop service processes, but can update when occurring service processes failure on nodeData in Zookeeper are determined to start the node of service processes by preset algorithm;
3) each node can (Leader be elected, and is generally opened in cluster by the Leader Selection on ZookeeperDynamic or Leader delays and performs after machine) mode, try to become Leader (collection group decision and leader node, a Zookeeper clusterOnly there are one leader, similar Master/Slave patterns after client submits request, are first sent to Leader, LeaderAs recipient, it is broadcast to each Server), and only current Leader can start service processes;
4) each node load can add in election queue less than the quantity of configuration, and when reaching destination service number of processesElection can be exited.
A kind of key point of the implementation method of expansible big data High Availabitity is invented to be:(1) distributed type assemblies design,When part of nodes delays machine, ensure that service is normal and provide;(2) load-balanced server High Availabitity is carried out using Zookeeper to setMeter ensures the load-balancing mechanism normal operation of cluster;(3) using automatic load balancing and sluggard's load balancing both of which weighing apparatus sideMethod is suitble to different application scenarios;(4) Controller of load balancing can independently be expanded according to special scenes demandExhibition;(5) load balancing accomplishes that freely (elastic telescopic refers to according to business demand and strategy, its elasticity of adjust automatically elastic telescopicThe management service of computing resource reaches the service ability of optimization combination of resources, increases computing capability when portfolio rises, work as industryBusiness amount decline when reduce computing capability, the stability and high availability of operation system are ensured with this, at the same save computing resource intoThis), it dynamically calculates and carries out load distribution.
It illustrates how to realize by the present invention for solving Flume load balancing below.
Flume is the High Availabitity that Cloudera is provided, highly reliable, distributed massive logs acquisition, polymerizationWith the system of transmission, Flume supports to customize Various types of data sender in log system, for collecting data;Meanwhile FlumeIt provides and simple process is carried out to data, and write the ability of various data receivings (customizable).
Flume is usually used in acquiring daily record data, and the Agent of oneself is placed in the server for needing gathered data, ifThe daily record data of acquisition is more dispersed, type is more, it is necessary to which a reliable load-balancing mechanism solves load assignment problem.The structure of Flume load balancing application is as shown in figure 5, there are three node, storage Flume for deployment in ZookeeperThe nodal information and configuration information of Controller, process service (in particular to Flume Agent) difference portion of gathered dataThree Flume Agent have given tacit consent to during beginning on three hosts on each host in administration.
Firstly the need of the Essential Environment for erecting operation, including JDK, Zookeeper, Flume and will need externally to carryFlume Agent configurations for service finish.
Then Flume Controller required informations are configured, using the monitoring method realized or pass through monitoringInterface realizes the monitoring method needed for oneself, and monitoring content includes Flume Controller processes and delays machine, Flume AgentHealth status.
Finally start all process services, itself can be registered in Zookeeper by Flume Controller, and establishment is facedWhen nodal directory preserve information.
High Availabitity is realized:
When the Active processes of Flume Controller break down, the corresponding transient node in ZookeeperIt can remove automatically, Flume Controller standby nodes can monitor Active process transient nodes and be eliminated, and automatically switchFor Active states, transient node, and the new Flume Controller processes of pull-up one again are registered in ZookeeperAs backup.
Flume Controller can be on the service processes synchronizing information to Zookeeper possessed, as spare FlumeWhen Controller starts, service processes information can be obtained, and attempts to take over.If process service is not present, FlumeController will start new process service, when starting new process service every time, by the information of process service (hereNo. PID of fingering journey and port numbers) it is stored on Zookeeper so that subsequent processes service recovery uses.
Load balancing is realized:
Flume Controller can periodically call load-balancing method, attempt to recalculate number of processes.ForCan obtain needs the node for sharing process, can list the file under all Hosts here, and to the title of these nodesIt is ranked up, the location of acquisition node itself, it is to be started according to total node number amount, present node serial number and need laterService processes sum, which carries out present node, needs number of processes to be started to calculate.
When newly increasing a node every time, catalogue can be established on Zookeeper and preserves information, represent that there are one newNode shares process service, and load-balancing method can stop process service, and equilibrium assignment again.When losing a node every timeWhen, the transient node on Zookeeper can be removed, and at this moment the process service lost can be recalculated distribution by load-balancing methodTo on remaining node.
Sluggard's load-balancing method can also be used, unlike before, sluggard's load-balancing method can't be attemptedService processes are stopped, but are just retried when the error occurs checking service processes.But the if number of processAmount can enter the election queue of launching process less than desired startup number, Balancer, attempt to start new process.
The present invention utilizes the characteristic of Zookeeper, and node has carried out distributed treatment, and the information of each node passes throughZookeeper is shared and is monitored.Due to saving the details of each node in Zookeeper, so active and standbyController can monitor mutually the situation of other side at runtime, be in after the machine of delaying suddenly of Active states, spareIt timely taking over tasks and pull-up can be attempted has delayed the Controller of machine, and ensure that process services normal operation, accomplish eachProcess services High Availabitity.Since the information of each node is stored in Zookeeper, so Controller can be byAccording to load-balancing algorithm, dynamically increased and decreased according to the quantity of service processes, it, can also be again when increasing newly or losing nodeIt calculates and carries out load balance.
The present invention carries out load balancing on the basis of the service of bottom process, since each independent process service can be configuredCertain service access request quantity, therefore when access request quantity increase, it is only necessary to calculating needs process service to be startedQuantity is shielded access request and is initiated with which kind of communication protocol, it is only necessary to which communication protocol as desired is encapsulated in process serviceIn, load-balancing algorithm carries out load balancing according to service processes quantity.So the present invention adapts to the load of various communications protocolsEqualization problem, user can carry out load balancing with ready-made process service, can also be according to special scenes demand in the present inventionLoad balancing frame under independently extended.And the present invention is divided into two kinds of mechanism according to process service time to live and is bornEquilibrium is carried, sluggard's load balancing mode is used for the process service of longtime running, only need to go wrong progress again in processDistribution is calculated, automatic load balancing mode is used for the process service of short-term operation, load balancing point can be carried out in real timeMatch.
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all the present invention spirit andWithin principle, any modification, equivalent replacement, improvement and so on should all be included in the protection scope of the present invention.