CN116781488A

Movatterモバイル変換

Info

Publication number: CN116781488A
Application number: CN202210225644.5A
Authority: CN
Inventors: 王鑫; 李奇书; 田国良
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Group Jiangsu Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Group Jiangsu Co Ltd
Priority date: 2022-03-09
Filing date: 2022-03-09
Publication date: 2023-09-19

Abstract

The application provides a high availability realization method, a device, a database architecture, equipment and a product of a database, wherein the method is applied to the database architecture, the database architecture comprises a master node cluster and a slave node cluster, the master node cluster comprises a master node, a backup master node, a first master agent node and a first backup agent node, the slave node cluster comprises a plurality of slave nodes, a second master agent node and a second backup agent node, and the method comprises the following steps: acquiring state information of each node in the database architecture; performing exception analysis on the state information to obtain an exception analysis result; and based on the abnormal analysis result, performing node switching on the database architecture, wherein the processing flow of the node switching comprises at least one of master-slave switching of a master node, master-slave switching of the master node, switching in a slave node cluster, master-slave switching of a proxy node and node-free switching. The application can realize the database high availability method with high accuracy and high fault tolerance.

Description

Translated fromChinese

数据库高可用实现方法、装置、数据库架构、设备和产品Database high availability implementation methods, devices, database architectures, equipment and products

技术领域Technical field

本申请涉及数据库技术领域，尤其涉及一种数据库高可用实现方法、装置、数据库架构、设备和产品。This application relates to the field of database technology, and in particular to a method, device, database architecture, equipment and products for implementing high availability of a database.

背景技术Background technique

在通用的mysql高可用架构中，一般使用keepalived+mysql+MHA的部署方式，在这种数据库架构中，keepalived部署在mysql的主机上，以提供VIP功能；同时keepalived配置脚本，去检测mysql的服务状态，当mysql出现异常宕机时，keepalived会将VIP切换到mysql的从库，MHA去实现mysql的主从切换，以实现mysql数据库的高可用。In the general MySQL high-availability architecture, the deployment method of keepalived+mysql+MHA is generally used. In this database architecture, keepalived is deployed on the mysql host to provide VIP functions; at the same time, the keepalived configuration script is used to detect the mysql service. status, when mysql has an abnormal downtime, keepalived will switch the VIP to the slave database of mysql, and MHA will implement the master-slave switching of mysql to achieve high availability of the mysql database.

然而，mysql主机CPU异常、内存异常、磁盘读写异常等情况，并不会触发mysql的主从切换，只有当mysql主机宕机或进程不可用时，才会触发主从切换；同时，现有技术中的数据库架构依赖keepalived的自动检测机制，一旦keepalived之间的心跳链接出现异常，那么会导致无法执行mysql切换脚本；Keepalived出现异常，但mysql数据库正常时，也会触发VIP的漂移以及mysql数据库的切换。综上，现有的数据库高可用实现方法的准确率不高，且容错率也不高。However, mysql host CPU abnormalities, memory abnormalities, disk read and write abnormalities, etc. will not trigger the master-slave switch of mysql. Only when the mysql host is down or the process is unavailable, the master-slave switch will be triggered; at the same time, the existing technology The database architecture in Keepalived relies on the automatic detection mechanism of keepalived. Once the heartbeat link between keepalived is abnormal, the mysql switching script will not be executed; when Keepalived is abnormal but the mysql database is normal, it will also trigger VIP drift and the mysql database. switch. In summary, the existing methods for implementing high database availability do not have high accuracy and fault tolerance rates.

发明内容Contents of the invention

本申请提供一种数据库高可用实现方法、装置、电子设备和计算机程序产品，用以实现高准确率和高容错率的数据库高可用方法。This application provides a method, device, electronic equipment and computer program product for realizing high availability of a database to achieve high availability of a database with high accuracy and fault tolerance.

本申请提供一种数据库高可用实现方法，应用于数据库架构，所述数据库架构包括主节点集群和从节点集群，所述主节点集群包括主节点、备份主节点、第一主代理节点和第一备份代理节点，所述从节点集群包括多个从节点、第二主代理节点和第二备份代理节点，所述方法包括：This application provides a database high availability implementation method, which is applied to a database architecture. The database architecture includes a master node cluster and a slave node cluster. The master node cluster includes a master node, a backup master node, a first master agent node and a first master node. Backup agent node, the slave node cluster includes multiple slave nodes, a second master agent node and a second backup agent node, and the method includes:

获取所述数据库架构中各节点的状态信息；Obtain status information of each node in the database architecture;

对所述状态信息进行异常分析，得到异常分析结果；Perform abnormal analysis on the status information to obtain abnormal analysis results;

基于所述异常分析结果，对所述数据库架构进行节点切换，所述节点切换的处理流程包括主节点的主备切换、主节点的主从切换、从节点集群内的切换、代理节点的主备切换和无节点切换中的至少一种。Based on the abnormal analysis results, node switching is performed on the database architecture. The processing flow of the node switching includes active and standby switching of the master node, master and slave switching of the master node, switching within the slave node cluster, and active and standby of the agent node. At least one of switching and nodeless switching.

根据本申请提供的一种数据库高可用实现方法，若所述异常分析结果为主节点发生异常，所述对所述数据库架构进行节点切换，包括：According to a database high-availability implementation method provided by this application, if the abnormal analysis result is abnormal on the main node, the node switching of the database architecture includes:

获取所述主节点和所述备份主节点的数据同步状态；Obtain the data synchronization status of the primary node and the backup primary node;

基于所述数据同步状态，对所述主节点进行节点切换。Based on the data synchronization status, node switching is performed on the master node.

根据本申请提供的一种数据库高可用实现方法，若所述数据同步状态为数据完全同步，所述对所述主节点进行节点切换，包括：According to a database high-availability implementation method provided by this application, if the data synchronization state is complete data synchronization, the node switching of the master node includes:

将所述主节点切换至所述备份主节点，并重新生成所述备份主节点到所述从节点集群的主从关系；Switch the master node to the backup master node, and regenerate the master-slave relationship between the backup master node and the slave node cluster;

将所述第一主代理节点的负载指向所述备份主节点，并将所述主节点从所述主节点集群中踢出；Point the load of the first master agent node to the backup master node, and kick the master node out of the master node cluster;

若所述数据同步状态为数据不同步，所述对所述主节点进行节点切换，包括：If the data synchronization status is data out of synchronization, performing node switching on the master node includes:

停止所述主节点和所述备份主节点的主从关系，并对所述备份主节点和所述主节点进行一致性处理，以使所述备份主节点与所述主节点的数据同步；Stop the master-slave relationship between the master node and the backup master node, and perform consistency processing on the backup master node and the master node to synchronize the data of the backup master node and the master node;

若所述数据同步状态为数据同步存在问题，所述对所述主节点进行节点切换，包括：If the data synchronization status indicates that there is a problem with data synchronization, performing node switching on the master node includes:

在所述从节点集群中确定出与所述主节点数据同步状态最接近的目标从节点；Determine the target slave node in the slave node cluster that is closest to the data synchronization status of the master node;

对所述目标从节点和所述主节点进行一致性处理，以使所述目标从节点与所述主节点的数据同步；Perform consistency processing on the target slave node and the master node to synchronize the data of the target slave node and the master node;

将所述主节点切换至所述目标从节点，并重新生成所述目标从节点到所述从节点集群的主从关系；Switch the master node to the target slave node, and regenerate the master-slave relationship between the target slave node and the slave node cluster;

将所述第一主代理节点的负载指向所述目标从节点，并将所述主节点和所述备份主节点从所述主节点集群中踢出。Point the load of the first master agent node to the target slave node, and kick the master node and the backup master node from the master node cluster.

根据本申请提供的一种数据库高可用实现方法，若所述异常分析结果为从节点发生异常，所述对所述数据库架构进行节点切换，包括：According to a database high-availability implementation method provided by this application, if the abnormality analysis result is that an abnormality occurs in the slave node, the node switching of the database architecture includes:

将发生异常的从节点从所述从节点集群中踢出，并将所述发生异常的从节点从所述第二主代理节点和所述第二备份代理节点的负载配置中删除。The abnormal slave node is kicked out from the slave node cluster, and the abnormal slave node is deleted from the load configuration of the second primary agent node and the second backup agent node.

根据本申请提供的一种数据库高可用实现方法，若所述异常分析结果为主代理节点发生异常，所述对所述数据库架构进行节点切换，包括：According to a database high-availability implementation method provided by this application, if the abnormal analysis result is abnormal at the main agent node, the node switching of the database architecture includes:

将发生异常的主代理节点进行停止处理，并清除所述发生异常的主代理节点所绑定的虚拟IP；Stop processing the abnormal main agent node and clear the virtual IP bound to the abnormal main agent node;

将所述虚拟IP配置到所述发生异常的主代理节点对应的备份代理节点。Configure the virtual IP to the backup agent node corresponding to the abnormal primary agent node.

根据本申请提供的一种数据库高可用实现方法，所述数据库架构中各节点均部署有agent模块；According to a database high-availability implementation method provided by this application, each node in the database architecture is deployed with an agent module;

所述获取所述数据库架构中各节点的状态信息，包括：The obtaining the status information of each node in the database architecture includes:

通过各agent模块，采集所述数据库架构中各节点的状态信息。Through each agent module, the status information of each node in the database architecture is collected.

本申请还提供一种数据库高可用实现装置，部署于数据库架构，所述数据库架构包括主节点集群和从节点集群，所述主节点集群包括主节点、备份主节点、第一主代理节点和第一备份代理节点，所述从节点集群包括多个从节点、第二主代理节点和第二备份代理节点，所述装置包括：This application also provides a database high availability implementation device, which is deployed in a database architecture. The database architecture includes a master node cluster and a slave node cluster. The master node cluster includes a master node, a backup master node, a first master agent node and a third master node. A backup agent node, the slave node cluster includes a plurality of slave nodes, a second master agent node and a second backup agent node, and the device includes:

获取模块，用于获取所述数据库架构中各节点的状态信息；An acquisition module is used to acquire the status information of each node in the database architecture;

分析模块，用于对所述状态信息进行异常分析，得到异常分析结果；An analysis module is used to perform abnormal analysis on the status information and obtain abnormal analysis results;

切换模块，用于基于所述异常分析结果，对所述数据库架构进行节点切换，所述节点切换的处理流程包括主节点的主备切换、主节点的主从切换、从节点集群内的切换、代理节点的主备切换和无节点切换中的至少一种。A switching module, configured to perform node switching on the database architecture based on the abnormal analysis results. The processing flow of the node switching includes active and standby switching of the master node, master-slave switching of the master node, and switching within the slave node cluster. At least one of active and backup switching of the agent node and nodeless switching.

本申请还提供一种数据库架构，所述数据库架构包括主节点集群、从节点集群和数据库高可用实现装置，所述主节点集群包括主节点、备份主节点、第一主代理节点和第一备份代理节点，所述从节点集群包括多个从节点、第二主代理节点和第二备份代理节点；This application also provides a database architecture. The database architecture includes a master node cluster, a slave node cluster and a database high availability implementation device. The master node cluster includes a master node, a backup master node, a first master agent node and a first backup Agent node, the slave node cluster includes a plurality of slave nodes, a second primary agent node and a second backup agent node;

所述数据库高可用实现装置包括：The database high availability implementation device includes:

本申请还提供一种电子设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述程序时实现如上述任一种所述数据库高可用实现方法。This application also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, the high availability of any of the above databases is achieved. Implementation.

本申请还提供一种计算机程序产品，包括计算机程序，所述计算机程序被处理器执行时实现如上述任一种所述数据库高可用实现方法。The present application also provides a computer program product, which includes a computer program that, when executed by a processor, implements any one of the above high-availability database implementation methods.

本申请提供的数据库高可用实现方法、装置、数据库架构、设备和产品，该方法应用于数据库架构，数据库架构包括主节点集群和从节点集群，主节点集群包括主节点、备份主节点、第一主代理节点和第一备份代理节点，从节点集群包括多个从节点、第二主代理节点和第二备份代理节点，基于此，在数据库架构中引入备份主节点，以及在主节点集群和从节点集群中均引入代理节点，从而在弃用keepalived的情况下，仍可以实现VIP功能，进而实现keepalived与数据库的分离，以用来解决数据库与VIP之间的耦合关系，最终提高数据库高可用实现方法的准确率，且各集群中代理节点均采用主备方式部署，从而提高数据库高可用实现方法的容错率；通过获取数据库架构中各节点的状态信息，并对状态信息进行异常分析，得到异常分析结果，从而可以对数据库架构进行全面的异常分析，相比只对主节点的服务状态进行异常分析，本申请可以进一步提高数据库高可用实现方法的准确率和容错率；此外，节点切换的处理流程包括主节点的主备切换、主节点的主从切换、从节点集群内的切换、代理节点的主备切换和无节点切换中的至少一种，相对只对主节点进行主从切换，本申请可以进一步提高数据库高可用实现方法的准确率和容错率。综上，本申请可以实现高准确率和高容错率的数据库高可用方法。This application provides a database high-availability implementation method, device, database architecture, equipment and products. The method is applied to the database architecture. The database architecture includes a master node cluster and a slave node cluster. The master node cluster includes a master node, a backup master node, and a first node. The master agent node and the first backup agent node, the slave node cluster includes multiple slave nodes, the second master agent node and the second backup agent node. Based on this, the backup master node is introduced in the database architecture, and the master node cluster and the slave node are Agent nodes are introduced into the node cluster, so that even if keepalived is abandoned, the VIP function can still be implemented, thereby achieving the separation of keepalived and the database to solve the coupling relationship between the database and VIP, and ultimately improve the high availability of the database. The accuracy of the method, and the agent nodes in each cluster are deployed in active and backup mode, thereby improving the fault tolerance rate of the high-availability implementation method of the database; by obtaining the status information of each node in the database architecture and performing abnormal analysis on the status information, the abnormality is obtained Analyze the results, so that a comprehensive exception analysis can be performed on the database architecture. Compared with only performing exception analysis on the service status of the master node, this application can further improve the accuracy and fault tolerance rate of the database high-availability implementation method; in addition, the processing of node switching The process includes at least one of master node switching, master node switching, slave node switching within the slave node cluster, agent node master switching, and nodeless switching. Compared with master node switching only, this process Application can further improve the accuracy and fault tolerance rate of the database high availability implementation method. In summary, this application can realize a high-availability database method with high accuracy and high fault tolerance.

附图说明Description of drawings

为了更清楚地说明本申请或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍，显而易见地，下面描述中的附图是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to explain the technical solutions in this application or the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are of the present invention. For some embodiments of the application, those of ordinary skill in the art can also obtain other drawings based on these drawings without exerting creative efforts.

图1为本申请提供的数据库高可用实现方法的流程示意图之一；Figure 1 is one of the flow diagrams of the database high availability implementation method provided by this application;

图2为本申请提供的数据库高可用实现方法的流程示意图之二；Figure 2 is the second schematic flow chart of the database high availability implementation method provided by this application;

图3为本申请提供的数据库高可用实现方法的流程示意图之三；Figure 3 is the third schematic flow chart of the database high availability implementation method provided by this application;

图4为本申请提供的数据库高可用实现方法的流程示意图之四；Figure 4 is the fourth schematic flow chart of the database high availability implementation method provided by this application;

图5为本申请提供的数据库高可用实现装置的结构示意图；Figure 5 is a schematic structural diagram of the database high availability implementation device provided by this application;

图6为本申请提供的数据库架构的结构示意图；Figure 6 is a schematic structural diagram of the database architecture provided by this application;

图7为本申请提供的电子设备的结构示意图。Figure 7 is a schematic structural diagram of an electronic device provided by this application.

具体实施方式Detailed ways

为使本申请的目的、技术方案和优点更加清楚，下面将结合本申请中的附图，对本申请中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。In order to make the purpose, technical solutions and advantages of this application clearer, the technical solutions in this application will be clearly and completely described below in conjunction with the drawings in this application. Obviously, the described embodiments are part of the embodiments of this application. , not all examples. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.

现有技术中，在通用的mysql高可用架构中，一般使用keepalived+mysql+MHA的部署方式，在这种数据库架构中，keepalived部署在mysql的主机上，以提供VIP功能；同时keepalived配置脚本，去检测mysql的服务状态，当mysql出现异常宕机时，keepalived会将VIP切换到mysql的从库，MHA去实现mysql的主从切换，以实现mysql数据库的高可用。In the existing technology, in the general MySQL high-availability architecture, the deployment method of keepalived+mysql+MHA is generally used. In this database architecture, keepalived is deployed on the MySQL host to provide VIP functions; at the same time, the keepalived configuration script, To detect the service status of mysql, when mysql is abnormally down, keepalived will switch the VIP to the slave database of mysql, and MHA will implement the master-slave switching of mysql to achieve high availability of the mysql database.

综上，如何提高数据库高可用实现方法的准确率和容错率，是目前亟需解决的问题。To sum up, how to improve the accuracy and fault tolerance of the database high-availability implementation method is an urgent problem that needs to be solved.

针对上述问题，本申请提供了一种数据库高可用实现方法，该方法应用于数据库架构，所述数据库架构包括主节点集群和从节点集群，所述主节点集群包括主节点、备份主节点、第一主代理节点和第一备份代理节点，所述从节点集群包括多个从节点、第二主代理节点和第二备份代理节点。In response to the above problems, this application provides a method for implementing high availability of a database. The method is applied to a database architecture. The database architecture includes a master node cluster and a slave node cluster. The master node cluster includes a master node, a backup master node, and a third node. A master agent node and a first backup agent node. The slave node cluster includes a plurality of slave nodes, a second master agent node and a second backup agent node.

此处，主节点为数据库集群中的主机节点，即为数据库集群中的主库；备份主节点为数据库集群中备份的主机节点，即为数据库集群中备份的主库。Here, the master node is the host node in the database cluster, which is the main library in the database cluster; the backup master node is the backup host node in the database cluster, which is the backup main library in the database cluster.

此处，代理节点为数据库集群中的代理服务器，其可以实现VIP(虚拟IP)功能。具体地，在代理服务器上，直接以命令形式(例如，ifconfig eth0:0 192.168.1.56netmask255.255.252.0)在eth0网卡上绑定多个IP，以实现VIP的功能。Here, the proxy node is a proxy server in the database cluster, which can implement the VIP (virtual IP) function. Specifically, on the proxy server, directly bind multiple IPs to the eth0 network card in the form of a command (for example, ifconfig eth0:0 192.168.1.56netmask255.255.252.0) to implement the VIP function.

在一实施例中，代理节点可以为nginx代理服务器。当然，也可以为其他代理服务器，本申请实施例对此不作具体限定。In one embodiment, the proxy node may be an nginx proxy server. Of course, it can also be other proxy servers, which are not specifically limited in the embodiments of this application.

可以理解的是，在数据库架构中，引入备份主节点和代理节点，就可以实现VIP功能，从而可以弃用keepalived，进而避免keepalived之间的心跳链接出现异常，那么会导致无法执行数据库的切换，且避免Keepalived出现异常，但数据库正常时，也会触发VIP的漂移以及数据库的切换。因此，实现keepalived与数据库分离，可以用来解决数据库与VIP之间的耦合关系，从而提高数据库高可用实现方法的准确率。It is understandable that in the database architecture, by introducing backup master nodes and proxy nodes, the VIP function can be implemented, so that keepalived can be abandoned, thereby avoiding abnormalities in the heartbeat links between keepalived, which will result in the inability to perform database switching. And to avoid Keepalived exceptions, but when the database is normal, VIP drift and database switching will also be triggered. Therefore, the separation of keepalived and the database can be used to solve the coupling relationship between the database and VIP, thereby improving the accuracy of the database high-availability implementation method.

此处，第一备份代理节点为第一主代理节点的备份节点。第二备份代理节点为第二主代理节点的备份节点。Here, the first backup agent node is the backup node of the first primary agent node. The second backup agent node is the backup node of the second primary agent node.

此处，从节点为数据库集群中的从机节点，即为数据库集群中的从库；多个从节点的数量可以根据实际需求进行设定，本申请实施例对此不作限定。Here, the slave node is a slave node in the database cluster, that is, a slave database in the database cluster; the number of multiple slave nodes can be set according to actual needs, which is not limited in the embodiments of this application.

为便于理解，在本实施例及以下各实施例，数据库架构以mysql架构为例进行说明，主节点以mysql master为例进行说明，备份主节点以mysql master backup为例进行说明，第一主代理节点以nginx1为例进行说明，第一备份代理节点以nginx2为例进行说明，各从节点以mysql slave为例进行说明，第二主代理节点以slave nginx1为例进行说明，第二备份代理节点以slave nginx2为例进行说明。For ease of understanding, in this embodiment and the following embodiments, the database architecture is explained using the mysql architecture as an example, the main node is explained using mysql master as an example, the backup master node is explained as an example using mysql master backup, and the first master agent The node is explained using nginx1 as an example. The first backup agent node is explained as nginx2 as an example. Each slave node is explained as mysql slave. The second main agent node is explained as slave nginx1. The second backup agent node is explained as Slave nginx2 is taken as an example to illustrate.

图1为本申请提供的数据库高可用实现方法的流程示意图之一，如图1所示，本申请提供的数据库高可用实现方法，包括：Figure 1 is one of the flow diagrams of the database high availability implementation method provided by this application. As shown in Figure 1, the database high availability implementation method provided by this application includes:

步骤110，获取所述数据库架构中各节点的状态信息。Step 110: Obtain status information of each node in the database architecture.

此处，各节点包括主节点、备份主节点、第一主代理节点、第一备份代理节点、多个从节点、第二主代理节点和第二备份代理节点。在一具体实施例中，各节点包括主节点、第一主代理节点、多个从节点和第二主代理节点即可。Here, each node includes a master node, a backup master node, a first master agent node, a first backup agent node, a plurality of slave nodes, a second master agent node, and a second backup agent node. In a specific embodiment, each node includes a master node, a first master agent node, a plurality of slave nodes and a second master agent node.

此处，状态信息包括节点的各项信息。在一具体实施例中，状态信息包括服务状态信息和主机状态信息。Here, the status information includes various information of the node. In a specific embodiment, the status information includes service status information and host status information.

具体地，状态信息可以包括主机状态信息，例如CPU利用率、内存使用情况、磁盘信息、磁盘利用率、进程名称、进程占用CPU、进程占用内存、磁盘读写速度、CPU iowait time等信息。状态还可以服务状态信息，例如数据库的执行失败数、访问失败数、访问超时数、版本信息、连接信息、慢查询速率、读取/写入速率、更新速率、缓存线程数量、运行线程数量、执行速率、打开的连接数、客户端接收/发送的数据大小、内存/CPU使用情况、数据库锁、deadlock等信息。状态信息还可以包括网卡IP，以供检测IP是否可用。状态信息还包括代理节点的进程状态，以供对代理节点进行异常检测。状态信息还包括数据库主从之间状态，包括主从关系图谱、主从同步复制状态、主从复制延迟时间、主从数据一致性信息等。状态信息还包括数据库的二进制日志，以供将二进制日志中记录的执行语句，以及对应语句的Position值，全部进行保存和备份，从而防止丢失数据。Specifically, the status information may include host status information, such as CPU utilization, memory usage, disk information, disk utilization, process name, process occupying CPU, process occupying memory, disk read and write speed, CPU iowait time and other information. The status can also serve status information, such as the number of execution failures, access failures, access timeouts, version information, connection information, slow query rate, read/write rate, update rate, number of cache threads, number of running threads, Execution rate, number of open connections, size of data received/sent by the client, memory/CPU usage, database lock, deadlock and other information. The status information can also include the network card IP to detect whether the IP is available. The status information also includes the process status of the agent node for abnormality detection of the agent node. The status information also includes the status between the database master and slave, including the master-slave relationship map, master-slave synchronous replication status, master-slave replication delay time, master-slave data consistency information, etc. The status information also includes the binary log of the database, which is used to save and back up all execution statements recorded in the binary log and the Position value of the corresponding statement to prevent data loss.

此外，实时监测数据库的二进制日志，还原二进制日志中记录的数据库语句，并记录对应语句的Position值、执行时间、日志文件等信息，将信息全部记录并传输统一位置进行保存。In addition, the binary log of the database is monitored in real time, the database statements recorded in the binary log are restored, and the Position value, execution time, log file and other information of the corresponding statement are recorded, and all the information is recorded and transferred to a unified location for storage.

此外，在上述步骤110之前，该第一主代理节点的负载指向主节点，且其临时绑定有VIP；该第二主代理节点的负载指向从节点集群中各从节点，且其临时绑定有VIP。In addition, before the above step 110, the load of the first master agent node is directed to the master node, and it is temporarily bound to the VIP; the load of the second master agent node is directed to each slave node in the slave node cluster, and it is temporarily bound to the VIP. There are VIPs.

步骤120，对所述状态信息进行异常分析，得到异常分析结果。Step 120: Perform abnormal analysis on the status information to obtain abnormal analysis results.

具体地，对各节点的状态信息进行异常分析，确定各节点是否发生异常；若发生异常，则确定发生异常的异常节点，并根据异常节点确定异常分析结果；若未发生异常，则确定异常分析结果为未发生异常。Specifically, abnormality analysis is performed on the status information of each node to determine whether an abnormality occurs in each node; if an abnormality occurs, the abnormal node where the abnormality occurs is determined, and the abnormal analysis result is determined based on the abnormal node; if no abnormality occurs, the abnormality analysis is determined The result is that no exception occurred.

更为具体地，对所述状态信息中的服务状态信息进行异常分析，若某一节点的服务状态异常时，则确定异常分析结果为该某一节点出现异常；或者，对所述状态信息中的主机状态信息进行异常分析，若某一节点的主机发生异常，则根据主机的异常状态产生告警，若告警中没有出现收敛，则确定异常分析结果为该某一节点出现异常。More specifically, perform abnormal analysis on the service status information in the status information. If the service status of a certain node is abnormal, it is determined that the abnormal analysis result is that a certain node is abnormal; or, perform abnormality analysis on the service status information in the status information. Perform abnormal analysis on the host status information. If the host of a certain node is abnormal, an alarm will be generated based on the abnormal status of the host. If there is no convergence in the alarm, the abnormal analysis result will be determined to be that a certain node is abnormal.

更为具体地，根据采集到的状态信息，基于运维大数据，对网络、主机、数据库等多维指标进行关联分析，通过对大量指标、日志数据进行多维、实时、动态的智能分析，进行实时的异常检测。More specifically, based on the collected status information and operation and maintenance big data, we conduct correlation analysis on multi-dimensional indicators such as network, host, and database, and perform multi-dimensional, real-time, and dynamic intelligent analysis on a large number of indicators and log data to conduct real-time anomaly detection.

在一具体实施例中，将所述状态信息输入至异常检测模型，得到所述异常检测模型输出的异常分析结果。In a specific embodiment, the status information is input into an anomaly detection model to obtain anomaly analysis results output by the anomaly detection model.

其中，可以基于ARIMA、Holt-Winter、LSTM等时序算法分析指标序列的周期、趋势等特征，从而拟合时序曲线并预测；基于GBDT、XGBOOST、Lightgbm等决策树算法，训练得到指标劣化导致各类故障事故关联概率模型，从而基于预测得到的劣化指标预测故障事件。Among them, the period, trend and other characteristics of the indicator sequence can be analyzed based on time series algorithms such as ARIMA, Holt-Winter and LSTM to fit the time series curve and predict; based on decision tree algorithms such as GBDT, XGBOOST and Lightgbm, training can obtain various types of indicators caused by degradation. Fault accidents are associated with probability models to predict fault events based on predicted degradation indicators.

进一步地，可以针对不同类型的监控点，基于历史性能数据的数据特征适配不同算法，挖掘数据波动趋势，离线训练并构建该异常检测模型，通过异常检测模型实时检测监控点的数据异常，协助快速发现数据库架构的异常。Furthermore, for different types of monitoring points, different algorithms can be adapted based on the data characteristics of historical performance data, data fluctuation trends can be mined, the anomaly detection model can be trained and constructed offline, and data anomalies of the monitoring points can be detected in real time through the anomaly detection model to assist Quickly discover database schema anomalies.

进一步地，结合阈值的设置，精准识别异常的能力，依据系统可用性、性能情况等指标对数据库架构中的每个节点进行健康状态预测，进而得到异常分析结果。Furthermore, combined with the setting of thresholds and the ability to accurately identify anomalies, the health status of each node in the database architecture is predicted based on system availability, performance and other indicators, and the anomaly analysis results are obtained.

步骤130，基于所述异常分析结果，对所述数据库架构进行节点切换，所述节点切换的处理流程包括主节点的主备切换、主节点的主从切换、从节点集群内的切换、代理节点的主备切换和无节点切换中的至少一种。Step 130: Based on the abnormal analysis results, perform node switching on the database architecture. The processing flow of the node switching includes active and standby switching of the master node, master-slave switching of the master node, switching within the slave node cluster, and proxy node At least one of active/standby switchover and nodeless switchover.

具体地，若异常分析结果为主节点发生异常，则对主节点进行主备切换或主从切换；若异常分析结果为从节点发生异常，则对发生异常的从节点在从节点集群内进行切换；若异常分析结果为主代理节点发生异常，则对主代理节点进行主备切换；若异常分析结果为未发生异常，则无节点切换。Specifically, if the abnormal analysis result shows that the master node is abnormal, then perform a master-standby switch or master-slave switch on the master node; if the abnormal analysis result is that the slave node is abnormal, then the abnormal slave node is switched within the slave node cluster. ; If the abnormality analysis result shows that the master agent node is abnormal, then the master agent node will be switched between the active and backup nodes; if the abnormality analysis result is that no exception occurs, there will be no node switching.

此处，主节点的主备切换为将主节点集群中的主节点与备份主节点进行切换。主节点的主从切换为将主节点集群中的主节点与从节点集群中的从节点进行切换。从节点集群内的切换为将发生异常的从节点从从节点集群内踢出。代理节点的主备切换为将主代理节点切换为对应的备份代理节点。无节点切换为各节点未发生异常，无需做任何处理。Here, the active/standby switch of the master node is to switch the master node and the backup master node in the master node cluster. The master-slave switch of the master node is to switch the master node in the master node cluster and the slave node in the slave node cluster. The switch within the slave node cluster is to kick out the abnormal slave node from the slave node cluster. The primary and secondary agent node switching is to switch the primary agent node to the corresponding backup agent node. No node switching means no exception occurs on each node and no processing is required.

本申请实施例提供的数据库高可用实现方法，应用于数据库架构，数据库架构包括主节点集群和从节点集群，主节点集群包括主节点、备份主节点、第一主代理节点和第一备份代理节点，从节点集群包括多个从节点、第二主代理节点和第二备份代理节点，基于此，在数据库架构中引入备份主节点，以及在主节点集群和从节点集群中均引入代理节点，从而在弃用keepalived的情况下，仍可以实现VIP功能，进而实现keepalived与数据库的分离，以用来解决数据库与VIP之间的耦合关系，最终提高数据库高可用实现方法的准确率，且各集群中代理节点均采用主备方式部署，从而提高数据库高可用实现方法的容错率；通过获取数据库架构中各节点的状态信息，并对状态信息进行异常分析，得到异常分析结果，从而可以对数据库架构进行全面的异常分析，相比只对主节点的服务状态进行异常分析，本申请实施例可以进一步提高数据库高可用实现方法的准确率和容错率；此外，节点切换的处理流程包括主节点的主备切换、主节点的主从切换、从节点集群内的切换、代理节点的主备切换和无节点切换中的至少一种，相对只对主节点进行主从切换，本申请实施例可以进一步提高数据库高可用实现方法的准确率和容错率。综上，本申请实施例可以实现高准确率和高容错率的数据库高可用方法。The database high-availability implementation method provided by the embodiment of this application is applied to the database architecture. The database architecture includes a master node cluster and a slave node cluster. The master node cluster includes a master node, a backup master node, a first master agent node, and a first backup agent node. , the slave node cluster includes multiple slave nodes, a second master agent node and a second backup agent node. Based on this, a backup master node is introduced in the database architecture, and agent nodes are introduced in both the master node cluster and the slave node cluster, so that When keepalived is abandoned, the VIP function can still be implemented, thereby realizing the separation of keepalived and the database to solve the coupling relationship between the database and VIP, and ultimately improve the accuracy of the database high-availability implementation method, and in each cluster The agent nodes are deployed in active and backup mode, thereby improving the fault tolerance rate of the database high-availability implementation method; by obtaining the status information of each node in the database architecture and performing abnormal analysis on the status information, the abnormal analysis results are obtained, so that the database architecture can be modified. Comprehensive exception analysis, compared with only performing exception analysis on the service status of the master node, the embodiments of this application can further improve the accuracy and fault tolerance rate of the database high availability implementation method; in addition, the node switching process includes the active and backup of the master node. At least one of switching, master-slave switching of the master node, switching within the slave node cluster, master-slave switching of the agent node, and nodeless switching. Compared with only master-slave switching of the master node, the embodiments of the present application can further improve the database Accuracy and fault tolerance of highly available implementation methods. In summary, the embodiments of the present application can implement a high-availability method for databases with high accuracy and high fault tolerance.

基于上述实施例，图2为本申请提供的数据库高可用实现方法的流程示意图之二，如图2所示，若所述异常分析结果为主节点发生异常，上述步骤130中，对所述数据库架构进行节点切换，包括：Based on the above embodiment, Figure 2 is a schematic flowchart 2 of the database high availability implementation method provided by this application. As shown in Figure 2, if the abnormal analysis result is abnormal on the master node, in the above step 130, the database is The architecture performs node switching, including:

步骤131，获取所述主节点和所述备份主节点的数据同步状态。Step 131: Obtain the data synchronization status of the primary node and the backup primary node.

此处，若异常分析结果为主节点发生异常，则主节点的异常状态可以包括：主节点的CPU、内存、磁盘读写速度等主机状态信息异常，根据主机状态信息产生告警信息，且告警持续不能收敛，和/或，主节点服务状态异常，不能正常提供服务，和/或，主节点存在大量死锁，不能正常提供业务，当然，还包括其他异常状态，此处不再一一赘述。Here, if the abnormal analysis result indicates that the master node is abnormal, the abnormal status of the master node may include: abnormal host status information such as the CPU, memory, disk read and write speed of the master node, and alarm information is generated based on the host status information, and the alarm continues It cannot converge, and/or the service status of the master node is abnormal and cannot provide services normally, and/or the master node has a large number of deadlocks and cannot provide services normally. Of course, there are also other abnormal states, which will not be described here.

例如，主节点的CPU、内存、磁盘IO、网络信息状态信息、数据库锁信息等，与性能指标中设置的预设性能指标进行对比，结合阈值设置，进行综合判定主节点的健康状态，如果性能指标超出阈值，则认为主节点出现异常，产生告警，若告警没有出现收敛，如CPU长时间使用率达到90％以上、内存消耗殆尽、磁盘读写异常、数据库瞬间流量过大、主节点出现大量死锁等，可以判定主节点发生异常。For example, the CPU, memory, disk IO, network information status information, database lock information, etc. of the master node are compared with the preset performance indicators set in the performance indicators, combined with the threshold setting, to comprehensively determine the health status of the master node. If the performance If the indicator exceeds the threshold, it is considered that the master node is abnormal and an alarm is generated. If the alarm does not converge, for example, the CPU usage reaches more than 90% for a long time, the memory is exhausted, disk read and write abnormalities, the database instantaneous traffic is too large, the master node appears A large number of deadlocks, etc. can determine that the master node is abnormal.

此处，数据同步状态可以包括但不限于：数据完全同步、数据不同步、数据同步存在问题等等。Here, the data synchronization status may include but is not limited to: data completely synchronized, data out of sync, data synchronization problems, etc.

其中，数据完全同步表示主节点可以与备份主节点马上实现切换；数据不同步表示备份主节点可以在短时间与主节点实现数据同步；数据同步存在问题表示备份主节点与主节点无法实现数据同步或短时间无法实现数据同步。Among them, complete data synchronization means that the primary node can immediately switch to the backup primary node; data out of synchronization means that the backup primary node can achieve data synchronization with the primary node in a short period of time; data synchronization problems mean that the backup primary node and the primary node cannot achieve data synchronization. Or data synchronization cannot be achieved for a short period of time.

步骤132，基于所述数据同步状态，对所述主节点进行节点切换。Step 132: Perform node switching on the master node based on the data synchronization status.

具体地，不同的数据同步状态，对主节点进行节点切换的步骤不一样。Specifically, the steps for node switching of the master node are different in different data synchronization states.

本申请实施例提供的方法，获取主节点和备份主节点的数据同步状态，从而基于数据同步状态确定主节点的节点切换步骤，相对主节点发生异常后，直接以一种节点切换方式进行切换，本申请实施例的节点切换方式更为准确，从而进一步地提高数据库高可用实现方法的准确率和容错率。The method provided by the embodiment of the present application obtains the data synchronization status of the master node and the backup master node, thereby determining the node switching steps of the master node based on the data synchronization status. After an abnormality occurs on the relative master node, the switch is directly performed in a node switching manner. The node switching method in the embodiment of the present application is more accurate, thereby further improving the accuracy and fault tolerance rate of the database high availability implementation method.

基于上述任一实施例，若数据同步状态为数据完全同步，在上述步骤132中，对所述主节点进行节点切换，包括：Based on any of the above embodiments, if the data synchronization status is fully synchronized data, in the above step 132, node switching is performed on the master node, including:

将所述第一主代理节点的负载指向所述备份主节点，并将所述主节点从所述主节点集群中踢出。Point the load of the first master agent node to the backup master node, and kick the master node out of the master node cluster.

需要说明的是，将主节点切换至备份主节点，则备份主节点被确定为数据库的主节点，此时，应该生成备份主节点到从节点集群的主从关系，即备份主节点作为从节点集群中各从节点对应的数据库主节点。可以理解的是，此时主节点已不是从节点集群中各从节点对应的数据库主节点。It should be noted that when the master node is switched to the backup master node, the backup master node is determined to be the master node of the database. At this time, a master-slave relationship from the backup master node to the slave node cluster should be generated, that is, the backup master node serves as the slave node. The database master node corresponding to each slave node in the cluster. It can be understood that at this time, the master node is no longer the database master node corresponding to each slave node in the slave node cluster.

此外，还需要说明的是，将第一主代理节点的负载指向备份主节点，从而实现VIP的切换功能。由于主节点发生异常，因此，将主节点从主节点集群中踢出。In addition, it should be noted that the load of the first master agent node is directed to the backup master node, thereby realizing the VIP switching function. Because the master node is abnormal, the master node is kicked out from the master node cluster.

进一步地，在将主节点从主节点集群中踢出后，产生告警信息，并将告警信息进行输出，以提示运维人员，从而使运维人员对主节点进行维护。Further, after the master node is kicked out of the master node cluster, alarm information is generated and output to prompt the operation and maintenance personnel, so that the operation and maintenance personnel can maintain the master node.

进一步地，在主节点维护后，且主节点正常后，可以将主节点重新加入主节点集群。Furthermore, after the master node is maintained and the master node is normal, the master node can be rejoined to the master node cluster.

基于上述任一实施例，若所述数据同步状态为数据不同步，在上述步骤132中，对所述主节点进行节点切换，包括：Based on any of the above embodiments, if the data synchronization status is data out of synchronization, in the above step 132, node switching is performed on the master node, including:

需要说明的是，停止主节点和备份主节点的主从关系，可以防止数据不同步的两节点，继续进行同步。It should be noted that stopping the master-slave relationship between the master node and the backup master node can prevent the two nodes whose data are out of sync from continuing to synchronize.

此处，对备份主节点和主节点进行一致性处理的具体步骤为：对比主节点和备份主节点的数据同步情况，从而基于数据同步情况查询记录的数据库语句执行信息，进而将二者的数据追加到一致。Here, the specific steps for consistency processing between the backup master node and the primary node are: compare the data synchronization status of the master node and the backup master node, query the recorded database statement execution information based on the data synchronization status, and then combine the two data Append to consistency.

此外，还需要说明的是，将主节点切换至备份主节点，则备份主节点被确定为数据库的主节点，此时，应该生成备份主节点到从节点集群的主从关系，即备份主节点作为从节点集群中各从节点对应的数据库主节点。可以理解的是，此时主节点已不是从节点集群中各从节点对应的数据库主节点。In addition, it should be noted that when the master node is switched to the backup master node, the backup master node is determined to be the master node of the database. At this time, a master-slave relationship from the backup master node to the slave node cluster should be generated, that is, the backup master node As the database master node corresponding to each slave node in the slave node cluster. It can be understood that at this time, the master node is no longer the database master node corresponding to each slave node in the slave node cluster.

基于上述任一实施例，若所述数据同步状态为数据同步存在问题，在上述步骤132中，对所述主节点进行节点切换，包括：Based on any of the above embodiments, if the data synchronization status is that there is a problem with data synchronization, in the above step 132, perform node switching on the master node, including:

此处，数据同步存在问题表示备份主节点与主节点无法实现数据同步或无法短时间实现数据同步等等。例如，超过30分钟没有实现数据同步，或者需要追加执行的语句过多，如需要追加执行的语句超过50条。Here, there is a problem with data synchronization, which means that the backup master node and the master node cannot achieve data synchronization or cannot achieve data synchronization in a short time, etc. For example, data synchronization has not been achieved for more than 30 minutes, or there are too many statements that need to be executed additionally, such as more than 50 statements that need to be executed additionally.

需要说明的是，在从节点集群中确定出与主节点数据同步状态最接近的目标从节点，可以减少后续一致性处理的时间。It should be noted that determining the target slave node in the slave node cluster that is closest to the data synchronization status of the master node can reduce the time of subsequent consistency processing.

此处，对目标从节点和主节点进行一致性处理的具体步骤为：对比主节点和目标从节点的数据同步情况，从而基于数据同步情况查询记录的数据库语句执行信息，进而将二者的数据追加到一致。Here, the specific steps for consistency processing between the target slave node and the master node are: compare the data synchronization status of the master node and the target slave node, query the recorded database statement execution information based on the data synchronization status, and then combine the data of the two. Append to consistency.

此外，还需要说明的是，将主节点切换至目标从节点，则目标从节点被确定为数据库的主节点，此时，应该生成目标从节点到从节点集群的主从关系，即目标从节点作为从节点集群中各从节点对应的数据库主节点。可以理解的是，此时主节点已不是从节点集群中各从节点对应的数据库主节点。In addition, it should be noted that when the master node is switched to the target slave node, the target slave node is determined to be the master node of the database. At this time, a master-slave relationship from the target slave node to the slave node cluster should be generated, that is, the target slave node As the database master node corresponding to each slave node in the slave node cluster. It can be understood that at this time, the master node is no longer the database master node corresponding to each slave node in the slave node cluster.

此外，还需要说明的是，将第一主代理节点的负载指向目标从节点，从而实现VIP的切换功能。由于主节点发生异常，因此，将主节点从主节点集群中踢出，同时，由于备份主节点的数据同步存在问题，因此，也将备份主节点从主节点集群中踢出。In addition, it should be noted that the load of the first master agent node is directed to the target slave node, thereby realizing the VIP switching function. Because the master node is abnormal, the master node is kicked out from the master node cluster. At the same time, because there is a problem with the data synchronization of the backup master node, the backup master node is also kicked out from the master node cluster.

进一步地，在将主节点和备份主节点从主节点集群中踢出后，产生告警信息，并将告警信息进行输出，以提示运维人员，从而使运维人员对主节点和备份主节点进行维护。Further, after the master node and the backup master node are kicked out from the master node cluster, alarm information is generated and output to prompt the operation and maintenance personnel, so that the operation and maintenance personnel can perform maintenance on the master node and the backup master node. maintain.

进一步地，在主节点维护后，且主节点正常后，可以将主节点重新加入主节点集群。在备份主节点维护后，且备份主节点正常后，可以将备份主节点重新加入主节点集群。Furthermore, after the master node is maintained and the master node is normal, the master node can be rejoined to the master node cluster. After the backup master node is maintained and the backup master node is normal, the backup master node can be added to the master node cluster again.

本申请实施例提供的方法，对于不同的数据同步状态，有不同的节点切换方式，因此，本申请实施例的节点切换方式更为准确，从而进一步地提高数据库高可用实现方法的准确率和容错率。此外，在不同的节点切换方式中，均将主节点切换至正常的节点，从而保证数据库的高可用。The method provided by the embodiment of the present application has different node switching methods for different data synchronization states. Therefore, the node switching method of the embodiment of the present application is more accurate, thereby further improving the accuracy and fault tolerance of the database high availability implementation method. Rate. In addition, in different node switching methods, the master node is switched to a normal node to ensure high availability of the database.

基于上述任一实施例，图3为本申请提供的数据库高可用实现方法的流程示意图之三，如图3所示，若所述异常分析结果为从节点发生异常，上述步骤130中，对所述数据库架构进行节点切换，包括：Based on any of the above embodiments, Figure 3 is a schematic flowchart 3 of the database high availability implementation method provided by this application. As shown in Figure 3, if the abnormal analysis result is that an abnormality occurs in the slave node, in the above step 130, all The above database architecture is used for node switching, including:

步骤133，将发生异常的从节点从所述从节点集群中踢出，并将所述发生异常的从节点从所述第二主代理节点和所述第二备份代理节点的负载配置中删除。Step 133: Kick the abnormal slave node from the slave node cluster, and delete the abnormal slave node from the load configuration of the second primary agent node and the second backup agent node.

此处，若异常分析结果为从节点发生异常，则从节点的异常状态可以包括：从节点的CPU、内存、磁盘读写速度等主机状态信息异常，根据主机状态信息产生告警信息，且告警持续不能收敛，和/或，从节点出现主从同步延迟严重、同步状态异常，和/或，从节点出现宕机，和/或，服务不可用；当然，还包括其他异常状态，此处不再一一赘述。Here, if the abnormal analysis result is that an abnormality occurs in the slave node, the abnormal status of the slave node may include: abnormal host status information such as the CPU, memory, disk read and write speed of the slave node, and alarm information is generated based on the host status information, and the alarm continues Failure to converge, and/or, the slave node has serious master-slave synchronization delay, abnormal synchronization status, and/or, the slave node is down, and/or the service is unavailable; of course, other abnormal states are also included, which are not discussed here. Let’s go over them one by one.

需要说明的是，由于从节点发生异常，因此，将发生异常的从节点从从节点集群中踢出。It should be noted that because an exception occurs on the slave node, the abnormal slave node will be kicked out from the slave node cluster.

此外，还需要说明的是，将发生异常的从节点从第二主代理节点和第二备份代理节点的负载配置中删除，以使业务不会再分到该发生异常的从节点。In addition, it should be noted that the abnormal slave node is deleted from the load configuration of the second master agent node and the second backup agent node, so that the business is no longer distributed to the abnormal slave node.

进一步地，在将发生异常的从节点从从节点集群中踢出后，产生告警信息，并将告警信息进行输出，以提示运维人员，从而使运维人员对发生异常的从节点进行维护。Further, after the abnormal slave node is kicked out from the slave node cluster, alarm information is generated, and the alarm information is output to prompt the operation and maintenance personnel, so that the operation and maintenance personnel can maintain the abnormal slave node.

进一步地，在发生异常的从节点维护后，且该从节点正常后，可以将该从节点重新加入从节点集群。具体地，从节点恢复正常并检测通过后，将该从节点重新加入到第二主代理节点和第二备份代理节点的负载配置中。Further, after the abnormal slave node is maintained and the slave node becomes normal, the slave node can be rejoined into the slave node cluster. Specifically, after the slave node returns to normal and passes the test, the slave node is re-added to the load configuration of the second primary agent node and the second backup agent node.

进一步地，在将从节点重新加入到第二主代理节点和第二备份代理节点的负载配置后，产生告警信息，并将告警信息进行输出，以提示运维人员已重新加入。Further, after the slave node is rejoined to the load configuration of the second primary agent node and the second backup agent node, alarm information is generated, and the alarm information is output to prompt the operation and maintenance personnel that the slave node has rejoined.

可以理解的是，在从节点发生异常时，不需要改变主从关系，仅在从节点集群中进行智能切换即可。It can be understood that when an exception occurs on a slave node, there is no need to change the master-slave relationship, and only intelligent switching can be performed in the slave node cluster.

本申请实施例提供的方法，在从节点发生异常后，可以对从节点进行切换，相对只对主节点进行切换，本申请实施例可以进一步提高数据库高可用实现方法的准确率和容错率。同时，在从节点发生异常时，不需要改变主从关系，只需将发生异常的从节点从代理节点的负载配置中删除，从而提高切换效率。The method provided by the embodiments of this application can switch the slave node after an abnormality occurs in the slave node, instead of only switching the master node. The embodiments of this application can further improve the accuracy and fault tolerance rate of the method for implementing high availability of the database. At the same time, when an exception occurs on a slave node, there is no need to change the master-slave relationship. You only need to delete the abnormal slave node from the load configuration of the agent node, thereby improving switching efficiency.

基于上述任一实施例，图4为本申请提供的数据库高可用实现方法的流程示意图之四，如图4所示，若所述异常分析结果为主代理节点发生异常，上述步骤130中，对所述数据库架构进行节点切换，包括：Based on any of the above embodiments, Figure 4 is a schematic flowchart 4 of the database high availability implementation method provided by this application. As shown in Figure 4, if the abnormal analysis result is abnormal at the main agent node, in the above step 130, the The database architecture performs node switching, including:

步骤134，将发生异常的主代理节点进行停止处理，并清除所述发生异常的主代理节点所绑定的虚拟IP；Step 134: Stop processing the abnormal main agent node and clear the virtual IP bound to the abnormal main agent node;

步骤135，将所述虚拟IP配置到所述发生异常的主代理节点对应的备份代理节点。Step 135: Configure the virtual IP to the backup agent node corresponding to the abnormal main agent node.

此处，若异常分析结果为主代理节点发生异常，则主代理节点的异常状态可以包括：代理节点的CPU、内存、磁盘IO、网络流量、处理用户请求延迟过高、单位分钟内出现大量错误码、瞬时流量多大等主机状态信息异常，根据主机状态信息产生告警信息，且告警持续不能收敛，和/或，出现IP、VIP、端口任意一点出现异常；当然，还包括其他异常状态，此处不再一一赘述。Here, if the abnormal analysis results show that an abnormality occurs on the main agent node, the abnormal status of the main agent node can include: the CPU, memory, disk IO, network traffic of the agent node, excessive delay in processing user requests, and a large number of errors per minute. If the host status information such as code and instantaneous traffic is abnormal, alarm information will be generated based on the host status information, and the alarm continues to fail to converge, and/or an abnormality occurs at any point in the IP, VIP, or port; of course, other abnormal statuses are also included, here I won’t go into details one by one.

例如，判断代理节点的健康状态，及健康发展趋势，如果出现CPU使用率持续上升、内存使用持续高位、磁盘IO读写异常、网络流量异常、代理节点的处理用户请求延迟过高、代理节点单位分钟内出现大量错误码，代理节点瞬时流量过大等问题，结合阈值设置，进行综合判定，如果出现如CPU使用率90％以上，或代理节点单位分钟内出现大量的5XX的错误码，则可以预判定代理节点异常，产生告警，若告警没有出现收敛，则可以判定代理节点发生异常。For example, to determine the health status and healthy development trend of the agent node, if the CPU usage continues to increase, memory usage continues to be high, disk IO read and write abnormalities, network traffic abnormalities, the agent node's processing user request delay is too high, the agent node unit A large number of error codes appear within minutes, and the instantaneous traffic of the agent node is too large. Combined with the threshold setting, a comprehensive judgment is made. If there are problems such as CPU usage of more than 90%, or a large number of 5XX error codes appearing within the unit minute of the agent node, you can Pre-determine that the agent node is abnormal and generate an alarm. If the alarm does not converge, it can be determined that the agent node is abnormal.

需要说明的是，由于主代理节点发生异常，因此，将发生异常的主代理节点进行停止处理。此处，发生异常的主代理节点可以为第一主代理节点或第二主代理节点。It should be noted that because the master agent node is abnormal, the abnormal master agent node will be stopped for processing. Here, the main agent node where the exception occurs may be the first main agent node or the second main agent node.

此外，还需要说明的是，在上述步骤110之前，主代理节点绑定有虚拟IP(VIP)。因此，在主代理节点发生异常时，需要清除发生异常的主代理节点所绑定的虚拟IP，例如，清除eth0上临时绑定的VIP。In addition, it should be noted that before the above step 110, the main agent node is bound to a virtual IP (VIP). Therefore, when an exception occurs on the main agent node, you need to clear the virtual IP bound to the abnormal main agent node, for example, clear the VIP temporarily bound on eth0.

此外，还需要说明的是，将虚拟IP配置到发生异常的主代理节点对应的备份代理节点，以使备份代理节点开始接替主代理节点进行工作，此时，若发生异常的主代理节点为第一主代理节点，则此时第一备份代理节点负载指向主节点。In addition, it should be noted that the virtual IP is configured to the backup agent node corresponding to the abnormal main agent node, so that the backup agent node starts to take over the work of the main agent node. At this time, if the abnormal main agent node is the third If there is one primary agent node, then the load of the first backup agent node is directed to the primary node.

本申请实施例提供的方法，在主代理节点发生异常后，可以对主代理节点进行切换，相对只对主节点进行切换，本申请实施例可以进一步提高数据库高可用实现方法的准确率和容错率。The method provided by the embodiments of the present application can switch the main agent node after an abnormality occurs in the main agent node. Compared with only switching the main node, the embodiments of the present application can further improve the accuracy and fault tolerance rate of the database high availability implementation method. .

基于上述任一实施例，所述数据库架构中各节点均部署有agent模块；上述步骤110包括：Based on any of the above embodiments, each node in the database architecture is deployed with an agent module; the above step 110 includes:

此处，agent模块用于采集节点的各项信息。具体地，agent模块用于采集以下信息：主机状态信息，例如CPU利用率、内存使用情况、磁盘信息、磁盘利用率、进程名称、进程占用CPU、进程占用内存、磁盘读写速度、CPU iowait time等信息；服务状态信息，例如数据库的执行失败数、访问失败数、访问超时数、版本信息、连接信息、慢查询速率、读取/写入速率、更新速率、缓存线程数量、运行线程数量、执行速率、打开的连接数、客户端接收/发送的数据大小、内存/CPU使用情况、数据库锁、deadlock等信息；网卡IP，以供检测IP是否可用；代理节点的进程状态，以供对代理节点进行异常检测；数据库主从之间状态，包括主从关系图谱、主从同步复制状态、主从复制延迟时间、主从数据一致性信息等；数据库的二进制日志，以供将二进制日志中记录的执行语句，以及对应语句的Position值，全部进行保存和备份，从而防止丢失数据。Here, the agent module is used to collect various information of the node. Specifically, the agent module is used to collect the following information: host status information, such as CPU utilization, memory usage, disk information, disk utilization, process name, process occupying CPU, process occupying memory, disk read and write speed, CPU iowait time and other information; service status information, such as the number of database execution failures, number of access failures, number of access timeouts, version information, connection information, slow query rate, read/write rate, update rate, number of cache threads, number of running threads, Execution rate, number of open connections, size of data received/sent by the client, memory/CPU usage, database lock, deadlock and other information; network card IP for detecting whether the IP is available; process status of the agent node for monitoring the agent The node performs anomaly detection; the status between the database master and slave, including the master-slave relationship map, master-slave synchronous replication status, master-slave replication delay time, master-slave data consistency information, etc.; the binary log of the database for recording in the binary log The execution statements and the Position values of the corresponding statements are all saved and backed up to prevent data loss.

此外，可以通过agent模块，实时监测数据库的二进制日志，还原二进制日志中记录的数据库语句，并记录对应语句的Position值、执行时间、日志文件等信息，将信息全部记录并传输统一位置进行保存。In addition, the agent module can be used to monitor the binary log of the database in real time, restore the database statements recorded in the binary log, and record the Position value, execution time, log file and other information of the corresponding statement, and record all the information and transmit it to a unified location for storage.

本申请实施例提供的方法，通过agent模块，获取数据库架构中各节点的状态信息，从而可以更全面地获取各节点的各项信息，进而可以进一步提高数据库高可用实现方法的准确率和容错率。The method provided by the embodiments of this application obtains the status information of each node in the database architecture through the agent module, so that various information of each node can be obtained more comprehensively, which can further improve the accuracy and fault tolerance rate of the high availability implementation method of the database. .

下面对本申请提供的数据库高可用实现装置进行描述，下文描述的数据库高可用实现装置与上文描述的数据库高可用实现方法可相互对应参照。The database high availability implementation device provided by this application is described below. The database high availability implementation device described below and the database high availability implementation method described above can be mutually referenced.

该装置部署于数据库架构，所述数据库架构包括主节点集群和从节点集群，所述主节点集群包括主节点、备份主节点、第一主代理节点和第一备份代理节点，所述从节点集群包括多个从节点、第二主代理节点和第二备份代理节点，图5为本申请提供的数据库高可用实现装置的结构示意图，如图5所示，该装置包括：The device is deployed in a database architecture. The database architecture includes a master node cluster and a slave node cluster. The master node cluster includes a master node, a backup master node, a first master agent node and a first backup agent node. The slave node cluster It includes multiple slave nodes, a second master agent node and a second backup agent node. Figure 5 is a schematic structural diagram of the database high availability implementation device provided by this application. As shown in Figure 5, the device includes:

获取模块510，用于获取所述数据库架构中各节点的状态信息；The acquisition module 510 is used to obtain the status information of each node in the database architecture;

分析模块520，用于对所述状态信息进行异常分析，得到异常分析结果；The analysis module 520 is used to perform abnormal analysis on the status information and obtain abnormal analysis results;

切换模块530，用于基于所述异常分析结果，对所述数据库架构进行节点切换，所述节点切换的处理流程包括主节点的主备切换、主节点的主从切换、从节点集群内的切换、代理节点的主备切换和无节点切换中的至少一种。The switching module 530 is configured to perform node switching on the database architecture based on the abnormal analysis results. The processing flow of the node switching includes active and standby switching of the master node, master-slave switching of the master node, and switching within the slave node cluster. , at least one of active and backup switching of the agent node and nodeless switching.

基于上述任一实施例，若所述异常分析结果为主节点发生异常，切换模块530还用于：Based on any of the above embodiments, if the abnormal analysis result is abnormal on the master node, the switching module 530 is also used to:

基于上述任一实施例，若所述数据同步状态为数据完全同步，切换模块530还用于：Based on any of the above embodiments, if the data synchronization state is complete data synchronization, the switching module 530 is also used to:

若所述数据同步状态为数据不同步，切换模块530还用于：If the data synchronization state is data out of synchronization, the switching module 530 is also used to:

若所述数据同步状态为数据同步存在问题，切换模块530还用于：If the data synchronization status is that there is a problem with data synchronization, the switching module 530 is also used to:

基于上述任一实施例，若所述异常分析结果为从节点发生异常，切换模块530还用于：Based on any of the above embodiments, if the abnormality analysis result is that an abnormality occurs in the slave node, the switching module 530 is also used to:

基于上述任一实施例，若所述异常分析结果为主代理节点发生异常，切换模块530还用于：Based on any of the above embodiments, if the abnormal analysis result is abnormal at the main agent node, the switching module 530 is also used to:

基于上述任一实施例，所述数据库架构中各节点均部署有agent模块；Based on any of the above embodiments, each node in the database architecture is deployed with an agent module;

获取模块510还用于：The acquisition module 510 is also used to:

下面对本申请提供的数据库架构进行描述，下文描述的数据库架构与上文描述的数据库高可用实现方法可相互对应参照。The database architecture provided by this application is described below. The database architecture described below and the database high availability implementation method described above can be mutually referenced.

图6为本申请提供的数据库架构的结构示意图，如图6所示，该数据库架构包括：主节点集群、从节点集群和数据库高可用实现装置，所述主节点集群包括主节点、备份主节点、第一主代理节点和第一备份代理节点，所述从节点集群包括多个从节点、第二主代理节点和第二备份代理节点；Figure 6 is a schematic structural diagram of the database architecture provided by this application. As shown in Figure 6, the database architecture includes: a master node cluster, a slave node cluster and a database high availability implementation device. The master node cluster includes a master node and a backup master node. , a first master agent node and a first backup agent node, the slave node cluster includes a plurality of slave nodes, a second master agent node and a second backup agent node;

此处，数据库高可用实现装置与上文描述的数据库高可用实现装置可相互对应参照，此处不再一一赘述。Here, the database high-availability implementation device and the database high-availability implementation device described above can be referred to each other correspondingly, and will not be described again one by one here.

图7示例了一种电子设备的实体结构示意图，如图7所示，该电子设备可以包括：处理器(processor)710、通信接口(Communications Interface)720、存储器(memory)730和通信总线740，其中，处理器710，通信接口720，存储器730通过通信总线740完成相互间的通信。处理器710可以调用存储器730中的逻辑指令，以执行数据库高可用实现方法，该方法应用于数据库架构，所述数据库架构包括主节点集群和从节点集群，所述主节点集群包括主节点、备份主节点、第一主代理节点和第一备份代理节点，所述从节点集群包括多个从节点、第二主代理节点和第二备份代理节点，该方法包括：接收搜索请求，并基于所述搜索请求搜索数据得到搜索结果；基于当前分页列表的条目数量，从所述搜索结果中筛选出分页结果；从其他端中获取所述分页结果对应的第一实时状态数据；基于所述第一实时状态数据，确定所述当前分页列表的分页数据，并将所述分页数据进行缓存。Figure 7 illustrates a schematic diagram of the physical structure of an electronic device. As shown in Figure 7, the electronic device may include: a processor (processor) 710, a communications interface (Communications Interface) 720, a memory (memory) 730, and a communication bus 740. Among them, the processor 710, the communication interface 720, and the memory 730 complete communication with each other through the communication bus 740. The processor 710 can call logical instructions in the memory 730 to execute a database high availability implementation method. The method is applied to a database architecture. The database architecture includes a master node cluster and a slave node cluster. The master node cluster includes a master node, a backup node, and a slave node cluster. A master node, a first master agent node and a first backup agent node. The slave node cluster includes a plurality of slave nodes, a second master agent node and a second backup agent node. The method includes: receiving a search request and based on the Search request search data to obtain search results; filter out the paging results from the search results based on the number of entries in the current paging list; obtain the first real-time status data corresponding to the paging results from other terminals; based on the first real-time status data, determine the paging data of the current paging list, and cache the paging data.

此外，上述的存储器730中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the above-mentioned logical instructions in the memory 730 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code. .

另一方面，本申请还提供一种计算机程序产品，所述计算机程序产品包括计算机程序，计算机程序可存储在非暂态计算机可读存储介质上，所述计算机程序被处理器执行时，计算机能够执行上述各方法所提供的数据库高可用实现方法，该方法应用于数据库架构，所述数据库架构包括主节点集群和从节点集群，所述主节点集群包括主节点、备份主节点、第一主代理节点和第一备份代理节点，所述从节点集群包括多个从节点、第二主代理节点和第二备份代理节点，该方法包括：接收搜索请求，并基于所述搜索请求搜索数据得到搜索结果；基于当前分页列表的条目数量，从所述搜索结果中筛选出分页结果；从其他端中获取所述分页结果对应的第一实时状态数据；基于所述第一实时状态数据，确定所述当前分页列表的分页数据，并将所述分页数据进行缓存。On the other hand, the present application also provides a computer program product. The computer program product includes a computer program. The computer program can be stored on a non-transitory computer-readable storage medium. When the computer program is executed by a processor, the computer can Execute the database high availability implementation method provided by the above methods. The method is applied to the database architecture. The database architecture includes a master node cluster and a slave node cluster. The master node cluster includes a master node, a backup master node, and a first master agent. node and a first backup proxy node, the slave node cluster includes a plurality of slave nodes, a second master proxy node and a second backup proxy node. The method includes: receiving a search request, and searching data based on the search request to obtain search results. ; Based on the number of entries in the current paging list, filter out the paging results from the search results; obtain the first real-time status data corresponding to the paging results from other terminals; based on the first real-time status data, determine the current The paging data of the paging list and caches the paging data.

又一方面，本申请还提供一种非暂态计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现以执行上述各方法提供的数据库高可用实现方法，该方法应用于数据库架构，所述数据库架构包括主节点集群和从节点集群，所述主节点集群包括主节点、备份主节点、第一主代理节点和第一备份代理节点，所述从节点集群包括多个从节点、第二主代理节点和第二备份代理节点，该方法包括：接收搜索请求，并基于所述搜索请求搜索数据得到搜索结果；基于当前分页列表的条目数量，从所述搜索结果中筛选出分页结果；从其他端中获取所述分页结果对应的第一实时状态数据；基于所述第一实时状态数据，确定所述当前分页列表的分页数据，并将所述分页数据进行缓存。On the other hand, the present application also provides a non-transitory computer-readable storage medium on which a computer program is stored. The computer program is implemented when executed by the processor to execute the database high-availability implementation method provided by each of the above methods. The method Applied to database architecture, the database architecture includes a master node cluster and a slave node cluster. The master node cluster includes a master node, a backup master node, a first master agent node and a first backup agent node. The slave node cluster includes multiple a slave node, a second primary agent node and a second backup agent node. The method includes: receiving a search request, and searching data based on the search request to obtain search results; based on the number of entries in the current paging list, from the search results Filter out the paging results; obtain the first real-time status data corresponding to the paging results from other terminals; determine the paging data of the current paging list based on the first real-time status data, and cache the paging data.

以上所描述的装置实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下，即可以理解并实施。The device embodiments described above are only illustrative. The units described as separate components may or may not be physically separated. The components shown as units may or may not be physical units, that is, they may be located in One location, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. Persons of ordinary skill in the art can understand and implement the method without any creative effort.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件。基于这样的理解，上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品可以存储在计算机可读存储介质中，如ROM/RAM、磁碟、光盘等，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and of course, it can also be implemented by hardware. Based on this understanding, the part of the above technical solution that essentially contributes to the existing technology can be embodied in the form of a software product. The computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., including a number of instructions to cause a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods described in various embodiments or certain parts of the embodiments.

最后应说明的是：以上实施例仅用以说明本申请的技术方案，而非对其限制；尽管参照前述实施例对本申请进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present application, but not to limit it; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be Modifications are made to the technical solutions described in the foregoing embodiments, or equivalent substitutions are made to some of the technical features; however, these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions in the embodiments of the present application.