CN108228393A

Movatterモバイル変換

Info

Publication number: CN108228393A
Application number: CN201711339686.7A
Authority: CN
Inventors: 万宝月; 杨进展
Original assignee: Zhejiang Aerospace Heng Jia Data Technology Co Ltd
Current assignee: Zhejiang Aerospace Heng Jia Data Technology Co Ltd
Priority date: 2017-12-14
Filing date: 2017-12-14
Publication date: 2018-06-29

Abstract

The present invention relates to a kind of implementation methods of expansible big data High Availabitity, include the following steps, the first data directory structure are built in Zookeeper, and two controllers, respectively master controller and preparation controller are configured for each node；When initial, setting master controller is activation process；When master controller delays machine, then the transient node that master controller creates in the first data directory structure is deleted by time-out；After preparation controller receives the notice that transient node is deleted by time-out, transient node is re-created in the first data directory structure, and take over as activation process；While preparation controller takes over as activation process, the process of pull-up master controller again.The present invention utilizes the characteristic of Zookeeper, and node has carried out distributed treatment, and the information of each node is shared and monitored by Zookeeper, accomplishes each process service High Availabitity.

Description

A kind of implementation method of expansible big data High Availabitity

Technical field

The present invention relates to big data fields, and in particular to a kind of implementation method of expansible big data High Availabitity.

Background technology

A kind of implementation method of Nginx server load balancings involved in the prior art, by Nginx servers using tactileHairdo mechanism dynamically updates the resource information of back-end server cluster, then according to the resource information of back-end server cluster andDifferent types of service, load value of the computational back-end server under various businesses type select the industry belonging to Http requestsThe back-end server of load value minimum handles Http requests in service type.Meanwhile Nginx server load balancing systems canReasonably user's request is allocated, makes state of the load in relative equilibrium of back-end server cluster, is closed so as to playReason utilizes the purpose of server resource.Fig. 1 is the network topological diagram of load-balancing method in the prior art, and user is connect by terminalEnter Internet and access required service, load balancing relies on Nginx servers and shunts service of the access request to server clusterDevice.Fig. 2 is the flow chart of load-balancing method in the prior art, and idiographic flow is：S1, Nginx server pass through trigger-typeThe resource information of new mechanism back-end server cluster, resource information include cpu busy percentage, memory usage, the magnetic of serverDisk I/O utilization rates and network bandwidth utilization factor；It is equal using load after S2, Nginx server receive the Http requests of clientThe algorithms selection back-end server processing Http that weighs is asked, and server process result is returned to client；S3, load balancing are calculatedMethod is according to different types of service and the resource information of back-end server cluster, and computational back-end server is under various businesses typeLoad value, select the Http request belonging to type of service in load value minimum back-end server handle the Http ask.

The method and system of above-mentioned existing Nginx server load balancings has the drawback that：

1. the load balancing of Http requests can only be solved, and do not have scalability：Above-mentioned load-balancing method is to utilizeAfter Nginx servers receive Http request progress URL parsings, load distribution is carried out according to dispatching algorithm, load balancing is solved and asksTopic.Do not refer to how the problem of load balancing in the case where non-Http is asked solves, such as RMI, JMS, RPC in method.SideBe only the load balance process flow for indicating Http requests in method, for non-Http ask no expansible interface orService, therefore the program does not also have autgmentability.

2. without High Availabitity function and elastic telescopic mechanism, and it is not distributed type assemblies：Above-mentioned load-balancing method solvesWhen a large number of users is accessed by Http requests in load balancing to back-end services cluster group, this is built upon NginxUnder the premise of server works normally.If Nginx servers are delayed because of a variety of causes after machine, Http requests cannot be on time normalBe sent to back-end server cluster, lead to user's access exception.

Due to relying on the load balancing of Nginx servers, need to determine back-end server number of clusters in advance, be takenDevice resource information of being engaged in is collected, and load distribution is carried out according to algorithm.Therefore back-end server cluster does not have dynamic expansion function, ifA back-end server is further added by, modification Nginx servers is needed to be configured and restart, cause user that cannot access institute for a period of timeIt needs to service.In the period of user's visit capacity is less, back-end server cluster must be kept all in operating status, waste of resource, noIt can accomplish that dynamic reduces the equally loaded again of back-end server quantity.

For rear end cluster and Nginx servers not using distributed thought, lead to the load balancing of rear end cluster notIt can freely stretch, Nginx servers do not have High Availabitity function.

Invention content

The technical problems to be solved by the invention are to provide a kind of implementation method of expansible big data High Availabitity, solveUser develops the problem of node High Availabitity under different business scene and various communications protocols.

The technical solution that the present invention solves above-mentioned technical problem is as follows：A kind of realization side of expansible big data High AvailabitityMethod includes the following steps,

Build the first data directory structure in Zookeeper, and two controllers be configured for each node, respectively based onController and preparation controller；

When initial, setting master controller is activation process, and the master that master controller stores in the first data directory structureUnder the control for controlling information, and according to the service of node serve information administration corresponding node stored in the first data directory structureProcess；

When master controller delays machine, then the transient node that master controller creates in the first data directory structure is deleted by time-outIt removes；

After preparation controller receives the notice that transient node is deleted by time-out, re-create and face in the first data directory structureShi Jiedian, and activation process is taken over as, and the control of standby control information that preparation controller stores in the first data directory structureUnder system, and according to the service processes of node serve information take over corresponding node stored in the first data directory structure；

While preparation controller takes over as activation process, the process of pull-up master controller again.

The beneficial effects of the invention are as follows：A kind of implementation method of expansible big data High Availabitity of the present invention utilizesThe characteristic of Zookeeper, node have carried out distributed treatment, and the information of each node is shared and monitored by Zookeeper；Due to saving the details of each node in Zookeeper, so master controller and preparation controller at runtime can be mutualMutually monitor the situation of other side, the master controller for being in state of activation is delayed suddenly after machine, and spare preparation controller can be taken in timeTask simultaneously attempts pull-up and has delayed the master controller of machine, and ensure that process services normal operation, accomplishes that each process service height canWith.

Based on the above technical solution, the present invention can also be improved as follows.

Further, the first data directory structure includes root, second-level directory and three-level catalogue, wherein, rootProject name is named as, second-level directory is the host name of each node, and being created in three-level catalogue has node control message file and sectionPoint service information file, node control message file include active information file, main control message file and standby control information textPart；

The node serve information is stored in node serve message file, and master controller or preparation controller create interimNode is stored in active information file, and the main control information is stored in main control message file, the standby control informationIt is stored in standby control message file.

Further, master controller and preparation controller check in real time according to the transient node recorded in the active information fileThe state of activation of itself；

When master controller or preparation controller, which check, lays oneself open to state of activation, then master controller or preparation controller continueIt checks the state for oneself possessing service processes, and the service processes stopped being handled according to preset algorithm.

Further, before master controller delays machine, the information on services for the service processes being currently running is stored in by master controllerOn the node serve message file；

After master controller delays machine, when preparation controller takes over as activation process, preparation controller is according to the node serveThe information on services recorded on message file whether there is to detect the service processes that master controller is delayed before machine.

Further, when preparation controller detects that the service processes that master controller is delayed before machine exist and in health statusWhen, then the service processes that preparation controller delays to master controller before machine take over；

Before preparation controller delays machine, the information on services for the service processes being currently running is stored in the node by preparation controllerIn service information file.

Further, the service of controller administration corresponding node being active, the load balancing including control node,The step of load balancing of controller control node being active, includes,

The second data directory structure is built in Zookeeper, and node is started to the configuration information storage of service processesIn the second data directory structure；

The controller being active according to the multiple mutually independent service types of the node serve information monitoring, andEach service processes are individually dispatched respectively according to the number of nodes under each service type and configuration information.

Further, the second data directory structure includes root, second-level directory, three-level catalogue and level Four catalogue,In, root is named as project name, and second-level directory is named as load balancing master catalogue, and three-level catalogue is the service under each nodeClassification, creating in level Four catalogue has configuration information file and nodal information file, and nodal information file designation is the master of each nodeMachine title；The configuration information is stored in the configuration information file.

Further, the controller being active is distinguished according to the number of nodes under each service type and configuration informationThe process individually dispatched includes automatic load balancing, and the detailed process of automatic load balancing is,

When the controller being active starts, calculating needs service processes quantity to be started on transient node,And the service processes quantity oneself administered is adjusted according to the variation of service processes sum, while the service managed in needsTransient node is created in nodal information file under category list；

When the service processes failure of the controller on a node, then under the corresponding service type catalogue for needing to manageNodal information file in transient node be automatically left out, the controller being active on other nodes is according to monitoringSituation is to needing service processes quantity to be started to recalculate, and be adjusted correspondingly.

Further, the calculating process of service processes to be started is needed on node specifically, according to all number of nodes andThe need service processes quantity to be started being written in configuration information file, calculating needs service processes to be started on present nodeQuantity.

Further, the controller being active is distinguished according to the number of nodes under each service type and configuration informationThe process individually dispatched further includes sluggard's load balancing, and the principle of sluggard's load balancing is, when new node adds in cluster,Original service processes on present node will not actively be stopped；When configuration information file changes or has a service processes appearanceDuring mistake, load and start new service processes less than the controller corresponding to the node of preset configuration quantity.

Advantageous effect using above-mentioned further scheme is：Since the information of each node carries out in ZookeeperStorage, so controller can dynamically be increased and decreased according to load-balancing algorithm according to the quantity of service processes, when increasing newly orWhen losing node, carry out load balance can also be recalculated.The present invention carries out load balancing on the basis of the service of bottom process,Since certain service access request quantity can be configured in each independent process service, when access request quantity increasesWhen, it is only necessary to calculating needs process quantity of service to be started, shields access request and is initiated with which kind of communication protocol, it is only necessary to be pressedIt is encapsulated in process service according to the communication protocol of demand, load-balancing algorithm carries out load balancing according to service processes quantity.InstituteThe problem of load balancing of various communications protocols is adapted to the present invention, user can carry out load balancing with ready-made process service,It can also independently be extended under the load balancing frame of the present invention according to special scenes demand.And the present invention is according to processService time to live is divided into two kinds of mechanism and carries out load balancing, and sluggard load balancing side is used for the process service of longtime runningFormula need to only go wrong in process and carry out recalculating distribution, and automatic load balancing is used for the process service of short-term operationMode can carry out load balancing distribution in real time.

Description of the drawings

Fig. 1 is the network topological diagram of Nginx server load balancings method in the prior art；

Fig. 2 is the flow chart of Nginx server load balancings method in the prior art；

Fig. 3 is with the reality of the first data knot catalogue structure in a kind of implementation method of expansible big data High Availabitity of the present inventionThe structure chart of existing High Availabitity；

Fig. 4 is realizes with the second data directory structure in a kind of implementation method of expansible big data High Availabitity of the present inventionThe structure chart of load balancing；

Fig. 5 is loaded for Flume in a kind of specific embodiment of the implementation method of expansible big data High Availabitity of the present inventionThe structure chart of equilibrium application.

Specific embodiment

The principle and features of the present invention will be described below with reference to the accompanying drawings, and the given examples are served only to explain the present invention, andIt is non-to be used to limit the scope of the present invention.

A kind of implementation method combination Zookeeper distributed type assemblies management of expansible big data High Availabitity of the present invention is specialProperty, design it is a kind of there is distributed, expansible node High Availabitity and load balancing scheme, solve user not of the same trade or businessThe problem of exploitation node High Availabitity and load balancing are needed using multiple technologies stack under business scene and various communications protocols is born simultaneouslyIt carries equalization scheme and supports that automatic and sluggard's two ways, user can more preferably be selected according to different business scene, it can alsoUsing load balancing interface exploitation is extended according to special scenes demand.

Zookeeper is distributed, open source code a distributed application program coordination service, is GoogleMono- realization increased income of Chubby is the significant components of Hadoop and Hbase；It is one and provides consistency for Distributed ApplicationThe software of service, the function of providing include：Configuring maintenance, domain name service, distributed synchronization, group service etc..

Two kinds of technologies of dependence Zookeeper are needed in the present invention：Transient node and monitor.

There are two types of nodes in Zookeeper, respectively transient node and persistent node.Ephemeral Node areThe transient node of Zookeeper, the life cycle of such node, which depends on, creates their session, once conversation end, temporarilyNode will be automatically left out；Persistent node, English name Persistent Node refer to after node creates, just deposit alwaysThis node is actively being removed until there is delete operation, will not disappeared because of the client session failure for creating the node.

Zookeeper clients can set monitor (Watch) on node, when node state changes, thanSuch as the increase, deletion, change of data, it will the operation corresponding to triggering monitor.When monitor is triggered, ZookeeperIt will be sent to client and only send a notice, because monitor can only be triggered once.

The High Availabitity (HA) of node, i.e.,：High Available, high availability cluster are to ensure having for business continuanceImitate solution, it is general there are two or more than two nodes, and be divided into active node (Active) and standby node(Standby)；Usually the active node that is known as the business that is carrying out, and a backup as active node is then referred to as standbyUse node；When active node goes wrong, when causing the business being currently running (task) to be not normally functioning, standby node is at this timeIt will detect, and the active node that immediately continues performs business；So as to fulfill business do not interrupt or short interruption.

As shown in figure 3, in order to realize the High Availabitity of node serve, need to establish specific catalogue knot in ZookeeperStructure (i.e. the first data directory structure), that is, the NameSpace of Zookeeper, root can be named as project name(ZKNameSpace), second-level directory is the host name (Nost of each node：Including Nost_A, Nost_B, Nost_C etc., HostFor the host of erection schedule service, also referred to as node), three-level catalogue is divided into two classes：One kind is node control information, including activationInformation (Active) and active and standby control information (Controller_A, Controller_B), another kind of is node serve information(Agents), classification storage can be carried out according to service type below node serve information.

As shown in figure 3, under each node (Host, below by taking Nost_A as an example), all there are two controller (Controller_A, Controller_B, Controller_A is state of activation herein, and Controller_B is spare), each controllerThere is the function of monitoring another controller, and delay in another controller that (machine of delaying, referring to operating system can not be from a serious system for machineRecover or system hardware level goes wrong in system mistake, so that system is for a long time without response, and restarting meter of having toThe phenomenon that calculation machine.Can also refer to some process occur serious error needs restart) when take over its subordinate node serve process (toolBody, master controller is under the control of main control information according to the service of node serve information realization corresponding node, preparation controllerAccording to the service of node serve information realization corresponding node under the control of standby control information), and restart the control for machine of delayingDevice.

The present invention monitors the state of a process of two controllers, and a controller wherein by ZookeeperProcess when the error occurs, realize automatically switch.When initial, the master controller Controller_A of each node is (hereinafter referred to asCA it is) activation (Active) process, when CA delays machine, (transient node stores the transient node created in ZookeeperCatalogue is /ZKNamespace/Host_A/Active) it will be deleted by time-out, while preparation controller Controller_B is (followingAbbreviation CB) notice that transient node will be received is deleted；Then, CB will re-create transient node in Zookeeper, and connectPipe becomes the function of activation process.It, will pull-up CA processes again while CB takes over as activation process.

In order to normally take over existing node serve process, it is initially in the master controller of state of activationBy the information on services corresponding to the service processes being currently running, (information on services is specifically as follows Controller_A for meeting before the machine of delayingPort and PID) it is stored on Zookeeper (catalogue of storage is /ZKNamespace/Host/Agents/Topic/ID).OftenWhen secondary preparation controller Controller_B is switched to state of activation, it can attempt using the process service letter recorded on ZookeeperCease whether the service processes delayed before machine to detect master controller Controller_A also exist.If master controllerThe service processes that Controller_A delays before machine exist and in health status, then Controller_B pairs of preparation controllerIt takes over, and the function being managed to existing service process is realized with this.And the controller being active can be examinedLook into the state for oneself possessing service processes, and according to preset algorithm, the service processes stopped being restarted orOther processing operations.

Each controller can check the current state of oneself in real time, if laying oneself open to state of activation, then will continue toThe operating status for oneself possessing service processes is checked, if the service processes oneself possessed mistake occur or exception is moved backGo out, then the predetermined operation according to preset algorithm is to there is mistake or the service processes that exit extremely are handled or againIt opens.

The service processes of controller administration corresponding node being active include the load balancing of control node.

As shown in figure 4, in order to realize node load balancing function, need to establish specific catalogue knot in ZookeeperStructure (i.e. the second data directory structure), that is, the NameSpace of Zookeeper, root can be named as project name(ZKNameSpace), second-level directory is load balancing master catalogue (Balancer), and three-level catalogue is the service class under each nodeNot (Topic：Including Topic_A, Topic_B, Topic_C etc., Topic is process service type, according to different methods of service orClassifying content, same service type can have multiple processes under its command and provide service), level Four catalogue is configuration information(Configuration, the configuration information of recording controller) and nodal information (Nost), wherein nodal information are each nodeHostname (including Nost_A, Nost_B, Nost_C etc.).

The controller Controller being active on each node can using Zookeeper come carry out node itBetween service processes distribution, it is therefore desirable to the configuration information storage of service processes will be started on Zookeeper.For eachThe controller Controller being active can monitor multiple service types, and each service type is mutual indepedent,The controller Controller being active can be right respectively according to the number of nodes under each service type and configuration informationEach service processes are individually dispatched.Each corresponding configuration information of service type deposit in Zookeeper on (catalogue is/ZKNamespace/Balancer/Topic/Configuration)。

The service type managed can be needed when the controller Controller being active starts in ZookeeperCorresponding transient node is created under catalogue, and (transient node creaties directory as/ZKNamespace/Balancer/Topic/Hosts/Host_A), need to start a certain number of service processes on each node, the controller being activeController can be calculated when starting needs service processes quantity to be started on transient node.All controls being activeDevice can monitor the quantity of node, and calculate the service processes that initiate by its own is needed in corresponding node, also can according to service intoThe variation of journey sum is adjusted the service processes quantity oneself administered.Likewise, when the clothes of the controller on a nodeWhen business process fails, (specific catalogue is /ZKNamespace/Balancer/Topic/Hosts/ under corresponding Hosts cataloguesHost_A transient node) can disappear automatically, other nodes are according to monitoring situation to service processes quantity to be started is needed to carry outIt recalculates, is timely adjusted correspondingly.Here the method for adjustment service processes quantity follows the principle that last in, first out, i.e.,Newest service processes are preferentially stopped.

Each node needs service processes algorithm to be started as follows：According to the quantity of all transient nodes withThe need service processes quantity to be started being written in Configuration catalogues, can calculate needs to start on present nodeNumber of processes.Current service processes Distribution Algorithm process includes, and lists host name all under Hosts catalogues, and makeIt is ranked up with lexcographical order, using serial number of the host name of present node in entire sequence as initial Index, then usedFollowing code carries out the calculating of number of processes.

Int number=0；(it is integer variable to define service processes quantity)

For (int i=index；i<total；I+=hostNum)

number++；(using serial number of the host name of present node in entire sequence as initial Index, when currentWhen serial number of the host name of node in entire sequence is less than total sequence quantity, according to total node number amount, present node serial numberAnd process sum to be started is needed, which to carry out present node, needs service processes quantity to be started to calculate)

}

return number；(return node needs service processes quantity to be started)

In order to adapt to different business scenarios, it is proposed that two class load-balancing methods, first, above-mentioned automatic load balancing：Load balancing can be carried out automatically at once when newly-increased or while losing node, compare the service processes for adapting to short-term operation, becauseFor for the process of longtime running, according to above-mentioned automatic equalization mode, whenever a newly-increased node or one is lostDuring node, just have process and be moved to end, and restart on new node, and the cost for starting stopping Long Running Tasks is veryHigh, the process newly started is with original state of a process it is difficult to ensure that consistent.Therefore, occur number of nodes variation when at once intoThe balanced strategy of row, the task for longtime running is not a kind of suitable mode；In order to avoid disadvantages mentioned above, the present invention also carriesSluggard's load balancing has been supplied, sluggard's load balancing can't actively delete original service processes when new node adds in cluster,And when configuration information file changes or has service processes when the error occurs, it just has and loads relatively low controller to startNew service processes, in this way can be to avoid unknown problem caused by the original service processes possibility of stopping.

The computational methods that number of nodes to be started is needed to be described with automatic load balancing on each node of sluggard's load balancingIt is identical.Difference is：

1) the service processes quantity being currently running can be recorded on Zookeeper that (physical record is in node by each nodeIn message file), other nodes can be understood and there remains how many a service processes at present and can start；

2) each node will not actively stop service processes, but can update when occurring service processes failure on nodeData in Zookeeper are determined to start the node of service processes by preset algorithm；

3) each node can (Leader be elected, and is generally opened in cluster by the Leader Selection on ZookeeperDynamic or Leader delays and performs after machine) mode, try to become Leader (collection group decision and leader node, a Zookeeper clusterOnly there are one leader, similar Master/Slave patterns after client submits request, are first sent to Leader, LeaderAs recipient, it is broadcast to each Server), and only current Leader can start service processes；

4) each node load can add in election queue less than the quantity of configuration, and when reaching destination service number of processesElection can be exited.

A kind of key point of the implementation method of expansible big data High Availabitity is invented to be：(1) distributed type assemblies design,When part of nodes delays machine, ensure that service is normal and provide；(2) load-balanced server High Availabitity is carried out using Zookeeper to setMeter ensures the load-balancing mechanism normal operation of cluster；(3) using automatic load balancing and sluggard's load balancing both of which weighing apparatus sideMethod is suitble to different application scenarios；(4) Controller of load balancing can independently be expanded according to special scenes demandExhibition；(5) load balancing accomplishes that freely (elastic telescopic refers to according to business demand and strategy, its elasticity of adjust automatically elastic telescopicThe management service of computing resource reaches the service ability of optimization combination of resources, increases computing capability when portfolio rises, work as industryBusiness amount decline when reduce computing capability, the stability and high availability of operation system are ensured with this, at the same save computing resource intoThis), it dynamically calculates and carries out load distribution.

It illustrates how to realize by the present invention for solving Flume load balancing below.

Flume is the High Availabitity that Cloudera is provided, highly reliable, distributed massive logs acquisition, polymerizationWith the system of transmission, Flume supports to customize Various types of data sender in log system, for collecting data；Meanwhile FlumeIt provides and simple process is carried out to data, and write the ability of various data receivings (customizable).

Flume is usually used in acquiring daily record data, and the Agent of oneself is placed in the server for needing gathered data, ifThe daily record data of acquisition is more dispersed, type is more, it is necessary to which a reliable load-balancing mechanism solves load assignment problem.The structure of Flume load balancing application is as shown in figure 5, there are three node, storage Flume for deployment in ZookeeperThe nodal information and configuration information of Controller, process service (in particular to Flume Agent) difference portion of gathered dataThree Flume Agent have given tacit consent to during beginning on three hosts on each host in administration.

Firstly the need of the Essential Environment for erecting operation, including JDK, Zookeeper, Flume and will need externally to carryFlume Agent configurations for service finish.

Then Flume Controller required informations are configured, using the monitoring method realized or pass through monitoringInterface realizes the monitoring method needed for oneself, and monitoring content includes Flume Controller processes and delays machine, Flume AgentHealth status.

Finally start all process services, itself can be registered in Zookeeper by Flume Controller, and establishment is facedWhen nodal directory preserve information.

High Availabitity is realized：

When the Active processes of Flume Controller break down, the corresponding transient node in ZookeeperIt can remove automatically, Flume Controller standby nodes can monitor Active process transient nodes and be eliminated, and automatically switchFor Active states, transient node, and the new Flume Controller processes of pull-up one again are registered in ZookeeperAs backup.

Flume Controller can be on the service processes synchronizing information to Zookeeper possessed, as spare FlumeWhen Controller starts, service processes information can be obtained, and attempts to take over.If process service is not present, FlumeController will start new process service, when starting new process service every time, by the information of process service (hereNo. PID of fingering journey and port numbers) it is stored on Zookeeper so that subsequent processes service recovery uses.

Load balancing is realized：

Flume Controller can periodically call load-balancing method, attempt to recalculate number of processes.ForCan obtain needs the node for sharing process, can list the file under all Hosts here, and to the title of these nodesIt is ranked up, the location of acquisition node itself, it is to be started according to total node number amount, present node serial number and need laterService processes sum, which carries out present node, needs number of processes to be started to calculate.

When newly increasing a node every time, catalogue can be established on Zookeeper and preserves information, represent that there are one newNode shares process service, and load-balancing method can stop process service, and equilibrium assignment again.When losing a node every timeWhen, the transient node on Zookeeper can be removed, and at this moment the process service lost can be recalculated distribution by load-balancing methodTo on remaining node.

Sluggard's load-balancing method can also be used, unlike before, sluggard's load-balancing method can't be attemptedService processes are stopped, but are just retried when the error occurs checking service processes.But the if number of processAmount can enter the election queue of launching process less than desired startup number, Balancer, attempt to start new process.

The present invention utilizes the characteristic of Zookeeper, and node has carried out distributed treatment, and the information of each node passes throughZookeeper is shared and is monitored.Due to saving the details of each node in Zookeeper, so active and standbyController can monitor mutually the situation of other side at runtime, be in after the machine of delaying suddenly of Active states, spareIt timely taking over tasks and pull-up can be attempted has delayed the Controller of machine, and ensure that process services normal operation, accomplish eachProcess services High Availabitity.Since the information of each node is stored in Zookeeper, so Controller can be byAccording to load-balancing algorithm, dynamically increased and decreased according to the quantity of service processes, it, can also be again when increasing newly or losing nodeIt calculates and carries out load balance.

The present invention carries out load balancing on the basis of the service of bottom process, since each independent process service can be configuredCertain service access request quantity, therefore when access request quantity increase, it is only necessary to calculating needs process service to be startedQuantity is shielded access request and is initiated with which kind of communication protocol, it is only necessary to which communication protocol as desired is encapsulated in process serviceIn, load-balancing algorithm carries out load balancing according to service processes quantity.So the present invention adapts to the load of various communications protocolsEqualization problem, user can carry out load balancing with ready-made process service, can also be according to special scenes demand in the present inventionLoad balancing frame under independently extended.And the present invention is divided into two kinds of mechanism according to process service time to live and is bornEquilibrium is carried, sluggard's load balancing mode is used for the process service of longtime running, only need to go wrong progress again in processDistribution is calculated, automatic load balancing mode is used for the process service of short-term operation, load balancing point can be carried out in real timeMatch.

The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all the present invention spirit andWithin principle, any modification, equivalent replacement, improvement and so on should all be included in the protection scope of the present invention.

Claims

1. a kind of implementation method of expansible big data High Availabitity, it is characterised in that：Include the following steps,

The first data directory structure is built in Zookeeper, and two controllers, respectively main control is configured for each nodeDevice and preparation controller；

When initial, setting master controller is activation process, and the main control that master controller stores in the first data directory structureUnder the control of information, and according to stored in the first data directory structure node serve information administration corresponding node service intoJourney；

When master controller delays machine, then the transient node that master controller creates in the first data directory structure is deleted by time-out；

After preparation controller receives the notice that transient node is deleted by time-out, interim section is re-created in the first data directory structurePoint, and take over as activation process, and under the control of standby control information that is stored in the first data directory structure of preparation controller,And according to the service processes of node serve information take over corresponding node stored in the first data directory structure；

2. a kind of implementation method of expansible big data High Availabitity according to claim 1, it is characterised in that：DescribedOne data directory structure includes root, second-level directory and three-level catalogue, wherein, root is named as project name, two level meshThe host name for each node is recorded, being created in three-level catalogue has node control message file and node serve message file, node controlMessage file processed includes active information file, main control message file and standby control message file；

The node serve information is stored in node serve message file, the transient node that master controller or preparation controller createIt is stored in active information file, the main control information is stored in main control message file, the standby control information storageIn standby control message file.

3. a kind of implementation method of expansible big data High Availabitity according to claim 2, it is characterised in that：Main controlDevice and preparation controller check the state of activation of itself in real time according to the transient node recorded in the active information file；

When master controller or preparation controller, which check, lays oneself open to state of activation, then master controller or preparation controller continue checking forOneself possess the state of service processes, and the service processes stopped being handled according to preset algorithm.

4. a kind of implementation method of expansible big data High Availabitity according to claim 3, it is characterised in that：In master controlDevice processed is delayed before machine, and the information on services for the service processes being currently running is stored in the node serve message file by master controllerOn；

After master controller delays machine, when preparation controller takes over as activation process, preparation controller is according to the node serve informationThe information on services recorded on file whether there is to detect the service processes that master controller is delayed before machine.

5. a kind of implementation method of expansible big data High Availabitity according to claim 4, it is characterised in that：When standby controlDevice processed detects that the service processes that master controller is delayed before machine exist and during in health status, then preparation controller is to main controlThe service processes that device is delayed before machine take over；

Before preparation controller delays machine, the information on services for the service processes being currently running is stored in the node serve by preparation controllerOn message file.

6. a kind of implementation method of expansible big data High Availabitity according to any one of claims 1 to 5, feature existIn：The service processes of controller administration corresponding node being active, the load balancing including control node, in activationThe step of load balancing of the controller control node of state, includes,

Build the second data directory structure in Zookeeper, and the configuration information that node is started to service processes is stored in theIn two data directory structures；

The controller being active according to the multiple mutually independent service types of the node serve information monitoring, and according toNumber of nodes and configuration information under each service type respectively individually dispatch each service processes.

7. a kind of implementation method of expansible big data High Availabitity according to claim 6, it is characterised in that：DescribedTwo data directory structures include root, second-level directory, three-level catalogue and level Four catalogue, wherein, root is named as entry nameClaim, second-level directory is named as load balancing master catalogue, and three-level catalogue is the service type under each node, and being created in level Four catalogue hasConfiguration information file and nodal information file, nodal information file designation are the Hostname of each node；The configuration information is depositedStorage is in the configuration information file.

8. a kind of implementation method of expansible big data High Availabitity according to claim 7, it is characterised in that：In sharpThe process packet that the controller of state living is individually dispatched respectively according to the number of nodes under each service type and configuration informationAutomatic load balancing is included, the detailed process of automatic load balancing is,

When the controller being active starts, calculating needs service processes quantity to be started in corresponding node, and rootThe service processes quantity oneself administered is adjusted according to the variation of service processes sum, while the service type managed in needsTransient node is created in nodal information file under catalogue；

When the service processes failure of the controller on a node, then the section under the service type catalogue that corresponding needs manageTransient node in point message file is automatically left out, and the controller being active on other nodes is according to monitoring situationTo service processes quantity to be started is needed to recalculate, and be adjusted correspondingly.

9. a kind of implementation method of expansible big data High Availabitity according to claim 8, it is characterised in that：On nodeThe calculating process of service processes to be started is needed specifically, according to the need being written in all number of nodes and configuration information fileService processes quantity to be started, calculating needs service processes quantity to be started on present node.

10. a kind of implementation method of expansible big data High Availabitity according to claim 7, it is characterised in that：It is inThe process that the controller of state of activation is individually dispatched respectively according to the number of nodes under each service type and configuration informationSluggard's load balancing is further included, the principle of sluggard's load balancing is that, when new node adds in cluster, will not actively stop working as prosthomereOriginal service processes on point；When configuration information file changes or has service processes when the error occurs, load is less than pre-Establishing puts the controller corresponding to the node of quantity to start new service processes.