Movatterモバイル変換


[0]ホーム

URL:


CN1175353C - A Realization Method of Dual Computer Backup - Google Patents

A Realization Method of Dual Computer Backup
Download PDF

Info

Publication number
CN1175353C
CN1175353CCNB01100844XACN01100844ACN1175353CCN 1175353 CCN1175353 CCN 1175353CCN B01100844X ACNB01100844X ACN B01100844XACN 01100844 ACN01100844 ACN 01100844ACN 1175353 CCN1175353 CCN 1175353C
Authority
CN
China
Prior art keywords
machine
registry
standby
application
host
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB01100844XA
Other languages
Chinese (zh)
Other versions
CN1366242A (en
Inventor
强 郭
郭强
张刚
吴俊�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co LtdfiledCriticalHuawei Technologies Co Ltd
Priority to CNB01100844XApriorityCriticalpatent/CN1175353C/en
Publication of CN1366242ApublicationCriticalpatent/CN1366242A/en
Application grantedgrantedCritical
Publication of CN1175353CpublicationCriticalpatent/CN1175353C/en
Anticipated expirationlegal-statusCritical
Expired - Fee Relatedlegal-statusCriticalCurrent

Links

Images

Landscapes

Abstract

The present invention discloses a method for implementing dual-system standby, which realizes synchronous running between a host and a standby computer through the communication of an application layer and the support of a system layer between dual-system. The method is characterized in that a management layer is abstracted from the application layer and the system layer of the dual-system and is used for the service management of the dual-system; the service management at least comprises the switching control of floating IPs, the mirror image operation of a file and a register and the application management of the host and the standby computer. By the adoption of the method, the host and the standby computer can synchronously run in real time under various software environments for the host and the standby computer and the dynamic switching of the IPs can be provided when the host and the standby computer are switched. Simultaneously, the differences of interfaces among different software of the host and the standby computer can also be eliminated.

Description

Translated fromChinese
一种双机备份的实现方法A Realization Method of Dual Computer Backup

技术领域technical field

本发明涉及双机备份技术,特别是指一种能保证双机实时同步运行并可提供统一软件接口的双机备份的实现方法。The invention relates to a dual-machine backup technology, in particular to a method for realizing the dual-machine backup which can ensure the real-time synchronous operation of the two machines and can provide a unified software interface.

背景技术Background technique

目前各种网络飞速发展,而且网络的规模也日益庞大,位于网管中心控制存储所有数据的中心服务器或其它功能服务器的地位及其安全可靠性也就越来越重要,因此,对于重要的数据服务器必须采用主备机双机备份的方式,以保证整个网络的安全运行。所谓双机备份就是指在系统运行过程中,有两台机器装载完全相同的系统和数据,并通过通信电缆保持同步,当前工作的为主用机,另一台为备用机,备机不断检测主机状态,当备机检测到主机出现物理异常或接收到主动切换的命令时,则进行主备机切换。对于双机备份技术而言,保证运行实时同步和数据的一致性是极为重要的。在Windows NT环境下,支持双机的软件统称为群集(CLUSTER)软件,现有的CLUSTER软件虽然可以提供磁盘镜像功能,用来同步两台机器上的一些重要文件,但其均不支持注册表的实时镜像,对于一些配置信息保存在注册表中的应用程序,切换后就无法正常准确地运行。At present, all kinds of networks are developing rapidly, and the scale of the network is becoming larger and larger. The status of the central server or other functional server located in the network management center to control and store all data and its safety and reliability are becoming more and more important. Therefore, for important data servers The dual-machine backup method of the main and standby machines must be adopted to ensure the safe operation of the entire network. The so-called dual-machine backup means that during the operation of the system, there are two machines loaded with exactly the same system and data, and kept in sync through the communication cable. The current working machine is the active machine, and the other is the standby machine. Master status, when the standby machine detects a physical abnormality in the main machine or receives an active switch command, it will switch between the main machine and the standby machine. For dual-machine backup technology, it is extremely important to ensure real-time synchronization and data consistency. In the Windows NT environment, the software that supports dual machines is collectively referred to as cluster (CLUSTER) software. Although the existing CLUSTER software can provide disk mirroring functions to synchronize some important files on the two machines, it does not support the registry For some applications whose configuration information is stored in the registry, they cannot run normally and accurately after switching.

另外,现有的CLUSTER软件虽然大都可以提供浮动IP、共享机器名以及共享磁盘阵列,并支持数据库的热备份,但这些CLUSTER软件由于供应商的不同而具有多种不同的接口,每种CLUSTER软件都有自己特定的接口方式。而且,系统在实际运行过程中会因为不同的系统规模和不同的用户要求导致系统配置差异较大,进而造成接口差异,在双机软件和应用程序之间没有一个中间隔离层,可以用来屏蔽系统层的差异而满足不同的组网和配置需求,于是,接口差异也无法消除。In addition, although most of the existing CLUSTER software can provide floating IP, shared machine name and shared disk array, and support hot backup of the database, these CLUSTER software have many different interfaces due to different suppliers, each CLUSTER software Each has its own specific interface. Moreover, in the actual operation of the system, due to different system scales and different user requirements, the system configuration will vary greatly, which will cause interface differences. There is no intermediate isolation layer between the dual-machine software and the application program, which can be used to shield Different networking and configuration requirements can be met due to the differences in the system layer, so the interface differences cannot be eliminated.

现在比较常用的双机CLUSTER软件是微软公司的NT CLUSTER和NCR公司的NCR CLUSTER,它们都存在不同程度的问题。NT CLUSTER不支持非共享磁盘阵列的配置,如果是单独两台服务器就无法实现双机备份。NCR CLUSTER软件虽然不存在上述问题,但它无法正确检测网络故障,如果网卡出现故障则系统有可能出现双ACTIVE状态,即双主机状态,致使恢复网络系统时会因IP地址冲突而导致系统资源异常。The more commonly used dual-machine CLUSTER software is Microsoft's NT CLUSTER and NCR's NCR CLUSTER, and they all have problems to varying degrees. NT CLUSTER does not support the configuration of non-shared disk arrays, and dual-machine backup cannot be achieved if there are two separate servers. Although the NCR CLUSTER software does not have the above problems, it cannot correctly detect network failures. If the network card fails, the system may appear in a dual ACTIVE state, that is, a dual host state, resulting in abnormal system resources due to IP address conflicts when restoring the network system. .

发明内容Contents of the invention

为了解决以上问题,本发明的主要目的在于提供一种双机备份的实现方法,使得双机在任何双机软件环境下均能保证系统实时同步运行,同时可提高双机备份的可靠性和安全性。In order to solve the above problems, the main purpose of the present invention is to provide a method for implementing dual-machine backup, so that the dual-machine can ensure real-time synchronous operation of the system in any dual-machine software environment, and can improve the reliability and safety of dual-machine backup sex.

为达到上述目的,本发明的技术方案是这样实现的:In order to achieve the above object, technical solution of the present invention is achieved in that way:

一种双机备份的实现方法,在双机系统的应用层和系统层之间设置一个用以实现双机系统间业务管理的管理层,在管理层中,系统监控所有应用程序的运行状态、检测主机和备机之间的网络通信状态是否正常,如果正常,则继续进行检测,并且主机和备机分别对自身存储的文件和注册表进行检查和更新;否则,判断是否能进行浮动IP切换,如果能切换,则主机和备机进行切换,备机接管主机所拥有的浮动IP并对外提供服务,并通知应用层中相应的应用程序进行运行状态切换,如果不能切换,则结束当前切换流程。A method for realizing dual-machine backup. A management layer is set between the application layer and the system layer of the dual-system system to realize business management between the dual-system systems. In the management layer, the system monitors the running status of all applications, Detect whether the network communication status between the host and the backup is normal, if normal, continue to detect, and the host and backup check and update their own stored files and registry; otherwise, determine whether floating IP switching can be performed , if the switch can be made, the master and the standby will switch over, and the standby will take over the floating IP owned by the master and provide services to the outside world, and notify the corresponding application in the application layer to switch the running state. If the switch cannot be performed, the current switch process will end .

其中,所述是否能进行浮动IP切换的判断为:判断浮动IP是否存在以及检测网络是否正常运行。Wherein, the judgment of whether floating IP switching is possible includes: judging whether floating IP exists and detecting whether the network is running normally.

对于备机,所述对自身文件和注册表进行检查和更新为:备机检测本地的备份数据,判断是否有新的备份数据,如果有,则进行本地恢复,更新所备份的数据;否则继续检测。这里,所述本地恢复具体包括:备机将从主机拷贝来的由更新的注册表信息生成的数据文件在本机上恢复为注册表信息,并更新备机注册表。For the standby machine, the self-file and registry are checked and updated as follows: the standby machine detects the local backup data, and judges whether there is new backup data, and if so, performs local recovery and updates the backed-up data; otherwise, continue detection. Here, the local recovery specifically includes: the standby machine restores the data file generated by the updated registry information copied from the host machine to the registry information on the local machine, and updates the standby machine registry.

对于主机,所述对自身文件和注册表进行检查和更新为:主机检查本机是否存在有注册文件或注册表路径或内容的变化,如果有,则对变化文件或注册表内容进行镜像,并更新备机的信息;否则,继续检查。For the host, the checking and updating of its own files and registry is as follows: the host checks whether there is a change in the registration file or registry path or content in the host, if there is, mirroring the changed file or registry content, and Update the information of the standby machine; otherwise, continue to check.

当注册表内容发生变化时,所述对变化的注册表内容进行镜像为:判断备机是否允许主机访问其注册表,如果允许,则直接向备机备份更新注册表信息;否则,主机将变化的注册表信息备份为一数据文件,再将该数据文件拷贝到备机上。When the registry content changes, the mirroring of the changed registry content is as follows: judge whether the standby machine allows the host to access its registry, if allowed, directly update the registry information to the standby machine backup; otherwise, the host will change The registry information of the computer is backed up as a data file, and then the data file is copied to the backup machine.

该方法进一步包括:在所述的管理层设置主应用模块、群组管理模块和应用管理模块。所述主机和备机进行切换时,主应用模块向应用管理模块发送切换消息。所述切换消息包括主用向备用切换消息、备用向主用切换消息。The method further includes: setting a main application module, a group management module and an application management module in the management layer. When the main machine and the standby machine are switched, the main application module sends a switching message to the application management module. The switching message includes a master-to-standby switch message and a backup-to-main switch message.

该方法还进一步包括:应用管理模块实时检测所管理应用程序当前运行状态,在发现应用程序异常后向主应用模块发送应用程序异常消息。The method further includes: the application management module detects the current running state of the managed application program in real time, and sends an application program exception message to the main application module after finding an application program exception.

该方法还进一步包括:在所述的管理层设置一提供数据通道的模块。The method further includes: setting a module for providing data channels in the management layer.

上述方案中,所述管理层采用阻塞式操作。In the above solution, the management layer adopts blocking operation.

由上述技术方案可以看出,本发明的方法采用了三层结构的思想,是在应用层和系统层之间抽象出一个管理层进行双机管理。对于系统层来说,本发明提供了虚拟动态IP技术,即当主备机切换时实现IP动态切换;且提供系统层的统一接口,以保证不会因系统配置差异而导致接口差异,其中,系统层包括NT CLUSTER、NCR CLUSTER软件或其他的CLUSTER软件。对于应用层而言,本发明为应用层提供了一个简单统一的通信接口,同时负责监控应用程序的运行,以及维护切换时应用层的先后逻辑性,实现两种控制逻辑--切换和重起,不仅从物理上保证了系统的不间断运行,并且从高层应用逻辑上保证了系统服务的不间断。It can be seen from the above technical solutions that the method of the present invention adopts the idea of a three-layer structure, and abstracts a management layer between the application layer and the system layer for dual-machine management. For the system layer, the present invention provides a virtual dynamic IP technology, that is, to realize IP dynamic switching when the main machine and the standby machine are switched; Layers include NT CLUSTER, NCR CLUSTER software or other CLUSTER software. For the application layer, the present invention provides a simple and unified communication interface for the application layer, is responsible for monitoring the operation of the application program, and maintains the sequence logic of the application layer when switching, and realizes two control logics-switching and restarting , not only physically guarantees the uninterrupted operation of the system, but also guarantees the uninterrupted system service from the high-level application logic.

所述的浮动IP是指系统在需要使用一个IP地址时动态给机器指定这个IP地址,而不用在系统设置中设置,该IP地址在系统IP地址配置表中找不到,该IP地址是一个和客户端约定好的IP地址,动态绑定到主机上的。它是系统对外提供的一种服务资源,当系统发生切换时将IP地址也切换过去。The floating IP means that the system dynamically assigns this IP address to the machine when it needs to use an IP address, instead of setting it in the system settings. This IP address cannot be found in the system IP address configuration table. This IP address is a The IP address agreed with the client is dynamically bound to the host. It is a service resource provided by the system externally. When the system is switched, the IP address is also switched.

双机系统通过心跳线握手保持双机状态的一致性维护,在本发明中,心跳线是指心跳网络,即专门用一块网卡进行双机间的通信,一般采用直连网线。当心跳线握手出现异常时,通过附加判断来决定是否进行IP切换,附加判断包括检测浮动IP是否存在和检测网络是否通信正常,这样就可以确保系统不会出现两台机器同时拥有浮动IP的情况发生。The two-machine system maintains the consistency maintenance of the two-machine state through the handshake of the heartbeat line. In the present invention, the heartbeat line refers to the heartbeat network, that is, a network card is specially used for communication between the two machines, and a direct network cable is generally used. When the handshake of the heartbeat line is abnormal, it is determined whether to perform IP switching through additional judgments. The additional judgments include checking whether the floating IP exists and whether the network communication is normal, so as to ensure that the system does not have two machines with floating IPs at the same time. occur.

对于双机切换,高层应用程序所依赖的执行环境中包括配置文件或注册表信息,为了使切换成功,必须主备机保持这些配置文件或注册表的同步,本发明中的系统不断检测指定路径或指定注册表子项的变化,当主机需要镜像的文件或注册表发生变化时文件镜像软件会通过系统的通知消息得到文件或注册表具体变化情况并进行备机的同步修改,以保证高层应用的配置信息的同步。由于所有的变化通知是通过系统事件进行通知的,其变化通知时间为毫秒级的间隔,所以可以保证同步的实时性。For dual-machine switching, the execution environment that high-level applications depend on includes configuration files or registry information. In order to make the switch successful, the master and standby machines must keep the synchronization of these configuration files or registry information. The system in the present invention constantly detects the designated path Or specify the change of the registry subkey. When the host needs to mirror the file or the registry changes, the file mirroring software will get the specific changes of the file or registry through the system notification message and perform synchronous modification on the backup machine to ensure that the high-level application Synchronization of configuration information. Since all change notifications are notified through system events, and the change notification time is millisecond-level intervals, the real-time synchronization can be guaranteed.

双机管理主要是监控需要进行备份的应用软件的状态,如果是关键应用采用内部协议进行握手通信并通告状态;如果是非关键应用或其他无代码的应用软件则监控其运行状态是否为活动,同时通过双机软件提供的开发接口获取系统的状态和切换动作对应用层软件进行相应的控制。Dual-machine management is mainly to monitor the state of the application software that needs to be backed up. If it is a key application, it uses an internal protocol to perform handshake communication and notify the state; if it is a non-critical application or other application software without code, it monitors whether its running state is active. Through the development interface provided by the dual-machine software, the state of the system and the switching action are obtained to control the application layer software accordingly.

因此,本发明所提供的双机备份的实现方法具有以下的特点:Therefore, the implementation method of dual-machine backup provided by the present invention has the following characteristics:

1、浮动IP功能彻底解决了主备机上同时拥有浮动IP的情况,使其不会出现双主用的状态,并可以完全实现网卡IP的动态切换,除了可以检测系统主机故障外,还可以检测网络故障。1. The floating IP function completely solves the situation that the main and standby machines have floating IP at the same time, so that there will be no dual-active state, and it can fully realize the dynamic switching of the network card IP. In addition to detecting the failure of the system host, it can also detect Network failure.

2、文件及注册表镜像可以实现主备机注册表的实时同步,进而保证双机切换后能够正常运行。2. The file and registry mirroring can realize the real-time synchronization of the registry of the main and standby machines, thereby ensuring the normal operation of the two machines after switching.

3、双机管理为应用程序提供了更方便、灵活的接口,能够支持不同的CLUSTER环境,从而消除了不同双机软件间的接口差异。3. The dual-machine management provides a more convenient and flexible interface for the application program, and can support different CLUSTER environments, thus eliminating the interface differences between different dual-machine software.

附图说明Description of drawings

图1为本发明浮动IP实现的一实施例的流程图;Fig. 1 is the flowchart of an embodiment that floating IP of the present invention realizes;

图2为本发明的系统结构示意图;Fig. 2 is a schematic structural diagram of the system of the present invention;

图3为本发明方法双机管理实现的对象图;Fig. 3 is the object diagram that the double machine management of the present invention method realizes;

图4为本发明方法双机管理中群组管理模块的状态图;Fig. 4 is the state diagram of the group management module in the double machine management of the present invention method;

图5为本发明方法双机管理中应用管理模块的状态图;Fig. 5 is a state diagram of the application management module in the dual-machine management of the method of the present invention;

图6为本发明方法文件注册表镜像实现的流程图。Fig. 6 is a flow chart of the implementation of file registry mirroring in the method of the present invention.

具体实施方式Detailed ways

有关本发明的详细说明及技术内容,配合附图说明如下:The detailed description and technical content of the present invention are as follows in conjunction with the accompanying drawings:

WINDOWS NT所提供的应用程序接口(API)--浮动IP服务,其核心部分是网络状态的检测。本发明通过心跳线握手检测系统网络状态并保持主备机之间的通信,保证系统网络资源服务的不间断运行。双机系统中当主机出现故障(掉电、网络故障或机器硬件故障而导致停机)时,心跳线握手出现异常,则备机通过检测浮动IP是否存在以及网络运行是否正常来决定是否进行IP切换,如果浮动IP存在且网络正常,则备用系统可以接管主机所拥有的浮动IP并对外提供服务,其切换动作在客户端不会感觉到系统故障,只是会出现几秒钟的暂停。而且,由于先检测确认后才切换IP,因此也不会出现两台机器同时拥有浮动IP的情况。The application program interface (API) provided by WINDOWS NT--the floating IP service, its core part is the detection of network status. The invention detects the network state of the system by shaking hands with the heartbeat line and maintains the communication between the main machine and the standby machine, so as to ensure the uninterrupted operation of the system network resource service. In the dual-machine system, when the main machine fails (power failure, network failure or machine hardware failure causes downtime), and the heartbeat line handshake is abnormal, the standby machine decides whether to perform IP switching by checking whether the floating IP exists and whether the network is running normally. , if the floating IP exists and the network is normal, the standby system can take over the floating IP owned by the host and provide external services. The switching action will not feel the system failure on the client side, but there will be a pause for a few seconds. Moreover, because the IP is switched after detection and confirmation, there will be no situation where two machines have floating IPs at the same time.

当两台独立的服务器同时接入网络时,两台服务器此时均为备用状态,要竞争主用,由于现有软件没有控制双机切换的过程,很可能出现双主用的状况。本发明先检测再切换的方法即可解决此问题,其具体操作过程如图1所示:双机同时检查网络状态,判断本机当前是否处于备用状态?如果不是,就回到检查网络状态;如果是,则判断对端是否为备用、且本机是否具有优先主用权以及本机网络是否正常?如果有一个条件不满足,就回到检查网络状态;如果条件都满足,则本机切换为主用,然后确认是否结束?如果不结束,则回到检查网络状态。When two independent servers are connected to the network at the same time, the two servers are in a standby state at this time, and they must compete for the primary use. Since the existing software does not control the process of switching between the two servers, a dual-active state is likely to occur. The present invention can solve this problem by first detecting and then switching. Its specific operation process is as shown in Figure 1: the two machines check the network status at the same time, and judge whether the machine is currently in a standby state? If not, go back to check the network status; if yes, judge whether the peer end is a backup, and whether the local machine has the priority and whether the local network is normal? If one of the conditions is not met, go back to checking the network status; if all the conditions are met, the machine will switch to active, and then confirm whether it is over? If not, go back to checking network status.

参见图2所示,本发明的系统采用三层结构,即:应用层21,用于监控备份的应用程序;管理层22,用于双机管理;系统层23,包括NT CLUSTER双机系统层、NCR CLUSTER双机系统层或其它双机系统。对于应用层21的软件只需考虑与管理层22之间的接口,而不需要考虑系统层23是什么系统,如此,无论系统层23有多大的变化,只需要对管理层22相应的部分稍加改动,而不会影响到应用层21的应用。Referring to shown in Fig. 2, system of the present invention adopts three-layer structure, namely: application layer 21, is used for monitoring the application program of backup; Management layer 22, is used for dual-machine management; System layer 23, comprises NT CLUSTER dual-machine system layer , NCR CLUSTER dual-machine system layer or other dual-machine systems. For the software of the application layer 21, only the interface with the management layer 22 needs to be considered, and there is no need to consider what system the system layer 23 is. Add changes without affecting the application of the application layer 21.

如图3所示,双机管理的实现主要涉及到三个对象:主应用模块、群组管理模块和应用管理模块,每个模块是一个类,它们之间通过消息的传递完成主备机间的切换以及出错状态的处理。其中,CClusterMng-App类31代表主应用模块,用于管理程序的初始化和结束处理。CGroup类32代表群组管理模块,主要用于管理群组应用程序的运行,监视应用程序的状态,重起或者切换应用程序组;实现与系统层的交互,主要是实现CLUSTER系统的通知事件和手工切换。CApplication类33代表应用管理模块,用来管理每个应用程序的运行和状态变化。当应用程序发生异常时,负责通知CGroup32。As shown in Figure 3, the implementation of dual-machine management mainly involves three objects: the main application module, the group management module and the application management module. switching and handling of error states. Among them, the CClusterMng-App class 31 represents the main application module, which is used for the initialization and end processing of the management program. CGroup class 32 represents the group management module, which is mainly used to manage the operation of the group application program, monitor the status of the application program, restart or switch the application program group; realize the interaction with the system layer, mainly to realize the notification event and the notification event of the CLUSTER system Switch manually. The CApplication class 33 represents an application management module, which is used to manage the running and state changes of each application. When an exception occurs in the application, it is responsible for notifying CGroup32.

另外,双机管理还可以涉及一个CLinkMng类34,该类可为CLUSTERMNG程序提供一个数据通道,通讯代码使用通讯的共享资源,但由于该类的意义不大,可选择使用。In addition, dual-machine management can also involve a CLinkMng class 34, which can provide a data channel for the CLUSTERMNG program, and the communication code uses the shared resources of communication, but because this class has little meaning, it can be used selectively.

图4为CGroup的状态图,由于采用阻塞式操作,所以在状态的切换中不会被打断,因此图中只有主用和备用两个状态,没有其他状态。图4中使用到的消息描述如下:Figure 4 is the state diagram of CGroup. Due to the blocking operation, it will not be interrupted during the state switching. Therefore, there are only two states in the figure, the main state and the standby state, and there are no other states. The messages used in Figure 4 are described as follows:

Msg1表示CClusterMng管理的某个应用程序产生异常消息。当CApplication检测到应用程序异常,发送一条消息到CClusterMng,按照配置CClusterMng应该切换且已经处理完。Msg1 indicates that an application managed by CClusterMng generates an exception message. When CApplication detects that the application is abnormal, it sends a message to CClusterMng. According to the configuration, CClusterMng should switch and the processing has been completed.

Msg2为备用到主用切换消息(ONLINE),系统CLUSTER软件通知CClusterMng,且已经处理完。Msg2 is the standby-to-active switching message (ONLINE), and the system CLUSTER software notifies CClusterMng that it has been processed.

Msg3为主用向备用切换消息(OFFLINE),系统CLUSTER通知CClusterMng,并已经处理完成。Msg3 is the active-standby switching message (OFFLINE), and the system CLUSTER notifies CClusterMng that the processing has been completed.

图5为CApplication的状态图,由于采用阻塞式操作,所以在状态的切换中不会被打断,因此图中只有主用、备用和一个出错状态,没有其他状态。其中涉及到的消息有Msg1~Msg6,该组消息的具体描述如下:Figure 5 is the state diagram of CApplication. Due to the blocking operation, it will not be interrupted during the state switching. Therefore, there are only active, standby and an error state in the figure, and there are no other states. The messages involved are Msg1~Msg6. The specific description of this group of messages is as follows:

Msg1为备用向主用切换消息(ONLINE),CClusterMng发送切换消息,通知CApplication将管理的应用程序从备用切换到主用,并已处理完成。Msg1 is the standby-to-active switching message (ONLINE). CClusterMng sends the switching message to notify CApplication to switch the managed application program from standby to active, and the processing has been completed.

Msg2为主用到备用的切换消息(OFFLINE),CClusterMng发送切换消息,通知CApplication将管理的应用程序从主用切换到备用,并已处理完成。Msg2 is the switching message from active to standby (OFFLINE). CClusterMng sends the switching message to notify CApplication to switch the managed application program from active to standby, and the processing has been completed.

Msg3表示应用程序异常且重启动失败,进入出错状态。Msg3 means that the application is abnormal and fails to restart, and enters an error state.

Msg4为握手超时消息,备用状态或者主用状态检测到应用程序握手异常且重启动失败,则进入出错状态。Msg4 is the handshake timeout message. If the standby state or active state detects that the application handshake is abnormal and the restart fails, it will enter the error state.

Msg5表示主用机进程进入正常状态。Msg5 indicates that the process of the master machine has entered the normal state.

Msg6表示备用机进程进入正常状态。Msg6 indicates that the process of the standby machine has entered the normal state.

参见图6所示,文件和注册表镜像具体的处理过程是这样的:Referring to Figure 6, the specific process of file and registry mirroring is as follows:

1)首先判断系统是否要退出本线程?如果是,则跳出本线程,否则,判断本机是否处于备用状态?1) First determine whether the system wants to exit this thread? If yes, jump out of this thread, otherwise, determine whether the machine is in a standby state?

2)如果处于备用状态,则检测本地的备份数据,如果有新的备份数据则进行本地恢复,然后回到步骤1)。所谓本地恢复是针对注册表而言,如果主机的注册表信息有变化而备机又不允许其它机器访问本机的注册表时,主机就会将变化过的注册表备份为一个数据文件,然后将该数据文件拷贝到备机上,备机检测到该数据文件后,在本机上将该文件恢复为注册表信息,并更新本机的注册表,以保证与主机数据信息的一致性。2) If it is in standby state, then detect the local backup data, if there is new backup data, then perform local recovery, then get back to step 1). The so-called local recovery is for the registry. If the host’s registry information changes and the standby machine does not allow other machines to access the local registry, the host will back up the changed registry as a data file, and then Copy the data file to the standby machine. After the standby machine detects the data file, it restores the file to the registry information on the local machine, and updates the registry of the local machine to ensure consistency with the data information of the main machine.

3)如果本机为非备用状态,则检查本机是否存在有注册文件或注册表路径或内容的变化,如果有,则进行变化文件或注册表内容的镜像,即向备机备份数据,更新备机的信息。具体来说,对于文件就直接拷贝到备机上;对于注册表,要先检查备机是否允许主机访问其注册表,如果允许则直接向备机备份更新注册表信息,否则,主机将变化的注册表信息备份为一数据文件,再将数据文件直接拷贝到备机上,然后回到步骤1)。否则,直接回到步骤1)。3) If the machine is in a non-standby state, check whether there is a change in the registration file or registry path or content in the machine, and if so, perform a mirror image of the changed file or registry content, that is, back up the data to the standby machine, and update Backup information. Specifically, the files are directly copied to the standby machine; for the registry, first check whether the standby machine allows the host to access its registry, and if so, directly update the registry information to the backup machine; otherwise, the host will change the registration The table information is backed up as a data file, and then the data file is directly copied to the standby machine, and then returns to step 1). Otherwise, go directly to step 1).

该方法使得主备机之间不仅文件保持一致,注册表也保持一致,如此才不会在主备切换时由于注册表信息不同而使应用程序无法正常运行。This method makes not only the files consistent between the master and standby machines, but also the registry, so that the application program will not run normally due to different registry information when the master and standby are switched.

本发明的方法是将完成上述功能的应用程序同时放置在主备机上各一套,以保证双机同步正常运行。The method of the present invention is to place the application programs for completing the above functions on each of the main and standby machines at the same time, so as to ensure the synchronous normal operation of the two machines.

Claims (12)

Translated fromChinese
1、一种双机备份的实现方法,其特征在于,在双机系统的应用层和系统层之间设置一个用以实现双机系统间业务管理的管理层,在管理层中,系统监控所有应用程序的运行状态,检测主机和备机之间的网络通信状态是否正常,如果正常,则继续进行检测,并且主机和备机分别对自身存储的文件和注册表进行检查和更新;否则,判断是否能进行浮动IP切换,如果能切换,则主机和备机进行切换,备机接管主机所拥有的浮动IP并对外提供服务,并通知应用层中相应的应用程序进行运行状态切换,如果不能切换,则结束当前切换流程。1, a kind of implementation method of double-machine backup, it is characterized in that, between the application layer of double-machine system and the system layer, a management layer in order to realize business management between double-machine systems is set, in the management layer, system monitors all The running status of the application program, check whether the network communication status between the host and the backup is normal, if normal, continue to detect, and the host and backup check and update their own stored files and registry; otherwise, judge Whether the floating IP can be switched, if it can be switched, the main machine and the standby machine will be switched, and the standby machine will take over the floating IP owned by the main machine and provide services to the outside world, and notify the corresponding application in the application layer to switch the running status, if it cannot be switched , the current switching process ends.2、根据权利要求1所述的方法,其特征在于,所述是否能进行浮动IP切换的判断为:判断浮动IP是否存在以及检测网络是否正常运行。2. The method according to claim 1, wherein the judging whether floating IP switching is possible is: judging whether floating IP exists and checking whether the network is operating normally.3、根据权利要求1所述的方法,其特征在于,对于备机,所述对自身文件和注册表进行检查和更新为:备机检测本地的备份数据,判断是否有新的备份数据,如果有,则进行本地恢复,更新所备份的数据;否则继续检测。3. The method according to claim 1, characterized in that, for the standby machine, the checking and updating of its own files and registry is as follows: the standby machine detects local backup data, and judges whether there is new backup data, if If yes, perform local recovery and update the backed up data; otherwise, continue detection.4、根据权利要求3所述的方法,其特征在于,所述本地恢复具体包括:备机将从主机拷贝来的由更新的注册表信息生成的数据文件在本机上恢复为注册表信息,并更新备机注册表。4. The method according to claim 3, wherein the local recovery specifically comprises: the standby machine restores the data file generated by the updated registry information copied from the main machine to the registry information on the machine, And update the registry of the standby machine.5、根据权利要求1所述的方法,其特征在于,对于主机,所述对自身文件和注册表进行检查和更新为:主机检查本机是否存在有注册文件或注册表路径或内容的变化,如果有,则对变化文件或注册表内容进行镜像,并更新备机的信息;否则,继续检查。5. The method according to claim 1, characterized in that, for the host, the checking and updating of its own files and registry is as follows: the host checks whether there is a change in the registration file or registry path or content of the host, If there is, mirror the changed file or registry content, and update the information of the standby machine; otherwise, continue to check.6、根据权利要求5所述的方法,其特征在于,当注册表内容发生变化时,所述对变化的注册表内容进行镜像为:判断备机是否允许主机访问其注册表,如果允许,则直接向备机备份更新注册表信息;否则,主机将变化的注册表信息备份为一数据文件,再将该数据文件拷贝到备机上。6. The method according to claim 5, wherein when the content of the registry changes, the mirroring of the changed registry content is as follows: judging whether the standby machine allows the master to access its registry, and if so, then Directly back up and update the registry information to the standby machine; otherwise, the host backs up the changed registry information as a data file, and then copies the data file to the standby machine.7、根据权利要求1所述的方法,其特征在于,该方法进一步包括:在所述的管理层设置主应用模块、群组管理模块和应用管理模块。7. The method according to claim 1, further comprising: setting a main application module, a group management module and an application management module in the management layer.8、根据权利要求7所述的方法,其特征在于,所述主机和备机进行切换时,主应用模块向应用管理模块发送切换消息。8. The method according to claim 7, characterized in that, when the main machine and the standby machine are switched, the main application module sends a switching message to the application management module.9、根据权利要求8所述的方法,其特征在于,所述切换消息包括主用向备用切换消息、备用向主用切换消息。9 . The method according to claim 8 , wherein the switching message includes a switching message from active to standby, and a switching message from standby to active.10、根据权利要求7所述的方法,其特征在于,该方法进一步包括:应用管理模块实时检测所管理应用程序当前运行状态,在发现应用程序异常后向主应用模块发送应用程序异常消息。10. The method according to claim 7, further comprising: the application management module detects the current running state of the managed application program in real time, and sends an application program exception message to the main application module after finding an application program exception.11、根据权利要求7所述的方法,其特征在于,该方法进一步包括:在所述的管理层设置一提供数据通道的模块。11. The method according to claim 7, further comprising: setting a module for providing data channels in the management layer.12、根据权利要求7所述的方法,其特征在于,所述管理层采用阻塞式操作。12. The method according to claim 7, wherein the management layer adopts a blocking operation.
CNB01100844XA2001-01-192001-01-19 A Realization Method of Dual Computer BackupExpired - Fee RelatedCN1175353C (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CNB01100844XACN1175353C (en)2001-01-192001-01-19 A Realization Method of Dual Computer Backup

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CNB01100844XACN1175353C (en)2001-01-192001-01-19 A Realization Method of Dual Computer Backup

Publications (2)

Publication NumberPublication Date
CN1366242A CN1366242A (en)2002-08-28
CN1175353Ctrue CN1175353C (en)2004-11-10

Family

ID=4651935

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CNB01100844XAExpired - Fee RelatedCN1175353C (en)2001-01-192001-01-19 A Realization Method of Dual Computer Backup

Country Status (1)

CountryLink
CN (1)CN1175353C (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN100362484C (en)*2005-05-112008-01-16华为技术有限公司 Multi-machine backup method
CN103580926B (en)*2013-11-132017-12-05国家电网公司A kind of light-weight hot standby system synchronization method

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN100416522C (en)*2002-09-252008-09-03华为技术有限公司 The method of adopting dual-computer hot backup and its terminal equipment access system
CN1310481C (en)*2003-05-212007-04-11华为技术有限公司Method for realizing application characteristic dual processor backup
CN1299203C (en)*2004-09-142007-02-07中国人民解放军上海警备区司令部指挥自动化工作站Data disaster tocerance backup control system
CN100461802C (en)*2004-09-162009-02-11中兴通讯股份有限公司Monitoring method for double machine application system based on surrogate process
JP4289293B2 (en)*2004-12-202009-07-01日本電気株式会社 Start control method, duplex platform system, and information processing apparatus
CN100461697C (en)*2006-04-182009-02-11华为技术有限公司 Business takeover method and backup machine based on equipment disaster recovery
CN1921369B (en)*2006-08-082011-02-09华为技术有限公司Adapting method for network connection
CN101216843B (en)*2008-01-172010-09-29四川格瑞特科技有限公司Multi-point multi-hop data real time backup method
CN101741601B (en)*2008-11-062012-02-15上海市医疗保险信息中心Structured disaster backup system and backup method
CN101599858B (en)*2009-06-252013-01-16中兴通讯股份有限公司Method for managing host computer and standby computer and server
CN102497288A (en)*2011-12-132012-06-13华为技术有限公司Dual-server backup method and dual system implementation device
CN103077242B (en)*2013-01-112016-03-09北京佳讯飞鸿电气股份有限公司The method of a kind of fulfillment database server two-node cluster hot backup
CN104035833A (en)*2013-03-072014-09-10联发科技股份有限公司 Method and system for verifying integrity of machine-readable code
US9614932B2 (en)*2013-03-142017-04-04Microsoft Technology Licensing, LlcManaging and implementing web application data snapshots
CN104111881B (en)*2014-07-252016-03-30中国航天科工集团第二研究院七〇六所A kind of arbitration device for dual-computer redundancy Hot Spare computing machine
CN104390674B (en)*2014-11-252017-12-19苏州赛智达智能科技有限公司A kind of liquid level induction system
CN104484632A (en)*2014-12-082015-04-01张君Method for carrying out parallel transferring on RFID (Radio Frequency Identification) signal aiming at bottled liquid food production
CN104537460A (en)*2014-12-082015-04-22张君RFID signal parallel transmitting system for bottled liquid food production
CN104679604A (en)*2015-02-122015-06-03大唐移动通信设备有限公司Method and device for switching between master node and standby node
CN105391574A (en)*2015-10-282016-03-09曙光云计算技术有限公司Server address setting method and device
CN105553701A (en)*2015-12-112016-05-04国网青海省电力公司Distribution network adjustment and control system and control method thereof
CN106354589A (en)*2016-08-242017-01-25天津天大求实电力新技术股份有限公司Double-unit hot standby method of micro-grid energy management system service programs
CN110535645A (en)*2018-05-242019-12-03上海赢亥信息科技有限公司A kind of standby system and method for digital asset management device
CN109507866A (en)*2018-12-072019-03-22天津津航计算技术研究所A kind of double-machine redundancy system and method based on network address drift technology
CN110297867B (en)*2019-06-282021-08-17浪潮软件集团有限公司 Database cluster operation method and system based on domestic CPU and distributed container cluster

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN100362484C (en)*2005-05-112008-01-16华为技术有限公司 Multi-machine backup method
CN103580926B (en)*2013-11-132017-12-05国家电网公司A kind of light-weight hot standby system synchronization method

Also Published As

Publication numberPublication date
CN1366242A (en)2002-08-28

Similar Documents

PublicationPublication DateTitle
CN1175353C (en) A Realization Method of Dual Computer Backup
US6477663B1 (en)Method and apparatus for providing process pair protection for complex applications
US7689862B1 (en)Application failover in a cluster environment
US8464092B1 (en)System and method for monitoring an application or service group within a cluster as a resource of another cluster
EP1024428B1 (en)Managing a clustered computer system
US7392421B1 (en)Framework for managing clustering and replication
US8713362B2 (en)Obviation of recovery of data store consistency for application I/O errors
CN1190733C (en)Method and system for failure recovery for data management and application program
US8239518B2 (en)Method for detecting and resolving a partition condition in a cluster
US8230256B1 (en)Method and apparatus for achieving high availability for an application in a computer cluster
CA2290289C (en)Cluster node distress signal
US20050108593A1 (en)Cluster failover from physical node to virtual node
US20070226359A1 (en)System and method for providing java based high availability clustering framework
US20040153749A1 (en)Redundant multi-processor and logical processor configuration for a file server
US20030074426A1 (en)Dynamic cluster versioning for a group
US7444335B1 (en)System and method for providing cooperative resource groups for high availability applications
CN110912991A (en)Super-fusion-based high-availability implementation method for double nodes
EP1428149A1 (en)A system and method for a multi-node environment with shared storage
CN101179432A (en) A Method of Realizing System High Availability in Multi-machine Environment
JP2004246892A (en)Remotely accessible resource management method in multi-node distributed data processing system
JP2008052407A (en) Cluster system
CN114827148B (en)Cloud security computing method and device based on cloud fault-tolerant technology and storage medium
JP2012173996A (en)Cluster system, cluster management method and cluster management program
CN113849136A (en)Automatic FC block storage processing method and system based on domestic platform
Glider et al.The software architecture of a san storage control system

Legal Events

DateCodeTitleDescription
C10Entry into substantive examination
SE01Entry into force of request for substantive examination
C06Publication
PB01Publication
C14Grant of patent or utility model
GR01Patent grant
C17Cessation of patent right
CF01Termination of patent right due to non-payment of annual fee

Granted publication date:20041110

Termination date:20140119


[8]ページ先頭

©2009-2025 Movatter.jp