CN117194556A

Movatterモバイル変換

Info

Publication number: CN117194556A
Application number: CN202210602267.2A
Authority: CN
Inventors: 李海涛; 唐霏
Original assignee: China Mobile Communications Group Co Ltd; China Mobile IoT Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile IoT Co Ltd
Priority date: 2022-05-30
Filing date: 2022-05-30
Publication date: 2023-12-08

Abstract

Description

Translated fromChinese

一种时序数据同步装置及方法、存储介质A time series data synchronization device, method and storage medium

技术领域Technical field

本申请涉及物联网技术领域，尤其涉及一种时序数据同步装置及方法、存储介质。This application relates to the technical field of the Internet of Things, and in particular to a time series data synchronization device and method, and a storage medium.

背景技术Background technique

随着第5代移动通信技术(5th Generation Mobile Communication Technology，5G)和物联网(Internet of Things，IoT)技术的高速发展，数据呈现指数级增长，其中物联网数据是时序数据最典型的应用领域，海量设备持续产生的采样数据，对数据高并发的写入吞吐、高效的时序数据查询分析、低成本的时序数据存储都提出了巨大的挑战。With the rapid development of 5th Generation Mobile Communication Technology (5G) and Internet of Things (IoT) technology, data is growing exponentially, among which Internet of Things data is the most typical application field of time series data. , the sampling data continuously generated by massive devices poses huge challenges to high concurrent data write throughput, efficient time series data query and analysis, and low-cost time series data storage.

目前，对于时序数据的存储和处理往往采用关系型数据库的方式进行处理，但基于时序数据的特点，关系型数据库无法满足对时序数据的有效存储和处理，因而，时序数据库(Influx DataBase，InfluxDB)脱颖而出，成为目前流行度最高的时序数据库，但InfluxDB目前只开源了单节点的InfluxDB，并且现有的开源的InfluxDB集群在进行数据存储的过程中，容易造成InfluxDB集群存储的数据不一致或丢失，InfluxDB集群数据存储准确性较低。At present, time series data are often stored and processed using relational databases. However, based on the characteristics of time series data, relational databases cannot effectively store and process time series data. Therefore, time series database (Influx DataBase, InfluxDB) It stands out and becomes the most popular time series database at present. However, InfluxDB currently only open-sources the single-node InfluxDB, and during the data storage process of the existing open-source InfluxDB cluster, it is easy to cause inconsistency or loss of data stored in the InfluxDB cluster. InfluxDB Cluster data storage accuracy is low.

发明内容Contents of the invention

有鉴于此，本申请实施例期望提供一种时序数据同步装置及方法、存储介质，能够保证InfluxDB集群存储数据的一致性和完整性，提高InfluxDB集群数据存储的准确性。In view of this, embodiments of the present application hope to provide a time series data synchronization device and method, and a storage medium that can ensure the consistency and integrity of InfluxDB cluster data storage and improve the accuracy of InfluxDB cluster data storage.

为达到上述目的，本申请的技术方案是这样实现的：In order to achieve the above purpose, the technical solution of this application is implemented as follows:

第一方面，本申请实施例提供一种时序数据同步装置，时序数据同步装置包括：消息队列、数据采集器、时序数据库的负载均衡器、时序数据库实例、同步时序数据库的负载均衡器和同步时序数据库实例；In the first aspect, embodiments of the present application provide a time series data synchronization device. The time series data synchronization device includes: a message queue, a data collector, a load balancer of a time series database, a time series database instance, a load balancer of a synchronization time series database, and a time series synchronization device. Database instance;

数据采集器，用于消费消息队列中的第一数据，以将第一数据推送至时序数据库的负载均衡器中；消息队列中存储接收到的写入数据；The data collector is used to consume the first data in the message queue to push the first data to the load balancer of the time series database; the message queue stores the received written data;

时序数据库的负载均衡器，用于将第一数据转发至第一时序数据库实例中；The load balancer of the time series database is used to forward the first data to the first time series database instance;

第一时序数据库实例，用于将第一数据写入预创建的默认数据保留策略中；并将第一数据推送至同步时序数据库的负载均衡器中；The first time series database instance is used to write the first data into the pre-created default data retention policy; and push the first data to the load balancer that synchronizes the time series database;

同步时序数据库的负载均衡器，用于将第一数据转发到同步时序数据库实例中；The load balancer of the synchronous time series database is used to forward the first data to the synchronous time series database instance;

同步时序数据库实例，用于将第一数据依次写入第二时序数据库实例对应的预创建的非默认数据保留策略中；第二时序数据库实例为除所述第一时序数据库实例外的时序数据库实例。Synchronizing the time series database instance is used to write the first data sequentially into the pre-created non-default data retention policy corresponding to the second time series database instance; the second time series database instance is a time series database instance other than the first time series database instance. .

第二方面，本申请实施例提供一种时序数据同步方法，应用于时序数据同步装置，所述方法包括：In the second aspect, embodiments of the present application provide a time series data synchronization method, which is applied to a time series data synchronization device. The method includes:

通过数据采集器消费消息队列中的第一数据，以将第一数据转发至第一时序数据库实例中；Consume the first data in the message queue through the data collector to forward the first data to the first time series database instance;

通过第一时序数据库实例将第一数据写入预创建的默认数据保留策略中；并将第一数据转发到同步时序数据库实例中；Write the first data into the pre-created default data retention policy through the first time series database instance; and forward the first data to the synchronized time series database instance;

通过同步时序数据库实例将第一数据依次写入第二时序数据库实例对应的预创建的非默认数据保留策略中；第二时序数据库实例为除第一时序数据库实例外的时序数据库实例。The first data is sequentially written into the pre-created non-default data retention policy corresponding to the second time series database instance by synchronizing the time series database instance; the second time series database instance is a time series database instance other than the first time series database instance.

第三方面，本申请实施例提供一种存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现上述时序数据同步的方法。In a third aspect, embodiments of the present application provide a storage medium on which a computer program is stored. When the computer program is executed by a processor, the above method for synchronizing time series data is implemented.

本申请实施例提供一种时序数据同步装置及方法、存储介质，该时序数据同步装置包括：消息队列、数据采集器、时序数据库的负载均衡器、时序数据库实例、同步时序数据库的负载均衡器和同步时序数据库实例；数据采集器，用于消费所述消息队列中的第一数据，以将第一数据推送至时序数据库的负载均衡器中；消息队列中存储接收到的写入数据；时序数据库的负载均衡器，用于将第一数据转发至第一时序数据库实例中；第一时序数据库实例，用于将第一数据写入预创建的默认数据保留策略中；并将第一数据推送至同步时序数据库的负载均衡器中；同步时序数据库的负载均衡器，用于将第一数据转发到同步时序数据库实例中；同步时序数据库实例，用于将第一数据依次写入第二时序数据库实例对应的预创建的非默认数据保留策略中；第二时序数据库实例为除第一时序数据库实例外的时序数据库实例。采用上述时序数据同步装置的实现方案，在进行时序数据同步的过程中，利用数据采集器将消息队列中的时序数据写入时序数据库实例之后，时序数据库实例将写入的数据写入时序数据库预创建的默认永久数据保留策略中，同时将写入的数据推送到同步时序数据库的负载均衡上，在同步时序数据库实例接收到同步时序数据库负载均衡推送过来的消息时，通过判断发送数据的时序数据库实例，将接收到的消息存储在除发送数据的时序数据库以外的其他时序数据库实例的预创建的非默认数据保留策略中，能够将写入的数据不仅存储在发送数据的时序数据库实例对应的预创建默认永久数据保留策略中，而且还能够将写入的数据同步存储在除发送数据以外的其他时序数据库对应的非默认数据保留策略中，保证InfluxDB集群存储数据的一致性和完整性，提高InfluxDB集群数据存储的准确性。Embodiments of the present application provide a time series data synchronization device and method, and a storage medium. The time series data synchronization device includes: a message queue, a data collector, a load balancer of a time series database, a time series database instance, a load balancer that synchronizes the time series database, and Synchronous time series database instance; data collector, used to consume the first data in the message queue to push the first data to the load balancer of the time series database; store the received write data in the message queue; time series database The load balancer is used to forward the first data to the first time series database instance; the first time series database instance is used to write the first data into the pre-created default data retention policy; and push the first data to In the load balancer of the synchronized time series database; the load balancer of the synchronized time series database is used to forward the first data to the synchronized time series database instance; the synchronized time series database instance is used to write the first data to the second time series database instance in sequence In the corresponding pre-created non-default data retention policy; the second time series database instance is a time series database instance other than the first time series database instance. Using the implementation scheme of the above time series data synchronization device, during the process of time series data synchronization, after using the data collector to write the time series data in the message queue into the time series database instance, the time series database instance writes the written data into the time series database pre-processing unit. In the created default permanent data retention policy, the written data is pushed to the load balancer of the synchronization time series database at the same time. When the synchronization time series database instance receives the message pushed by the synchronization time series database load balancer, it determines the time series database that sent the data. Instance, the received message is stored in the pre-created non-default data retention policy of other time series database instances other than the time series database that sent the data, and the written data can be stored not only in the pre-created time series database instance corresponding to the time series database instance that sent the data. Create a default permanent data retention policy, and also be able to synchronously store written data in non-default data retention policies corresponding to other time series databases except sending data, ensuring the consistency and integrity of InfluxDB cluster storage data, and improving InfluxDB Accuracy of cluster data storage.

附图说明Description of the drawings

图1为本申请实施例提供的一种时序数据同步装置示意图一；Figure 1 is a schematic diagram 1 of a timing data synchronization device provided by an embodiment of the present application;

图2为本申请实施例提供的一种时序数据同步装置示意图二；Figure 2 is a schematic diagram 2 of a timing data synchronization device provided by an embodiment of the present application;

图3为本申请实施例提供的一种时序数据同步装置示意图三；Figure 3 is a schematic diagram three of a timing data synchronization device provided by an embodiment of the present application;

图4为本申请实施例提供的一种时序数据同步方法流程图。Figure 4 is a flow chart of a time series data synchronization method provided by an embodiment of the present application.

具体实施方式Detailed ways

为了能够更加详尽地了解本申请实施例的特点及技术内容，下面结合说明书附图及具体实施例对本申请的技术方案做进一步的详细阐述，所附附图仅供参考说明之用，并非用来限定本申请实施例。In order to understand the characteristics and technical content of the embodiments of the present application in more detail, the technical solutions of the present application are further described in detail below in conjunction with the drawings and specific embodiments of the description. The attached drawings are for reference only and are not intended to be used. Limit the examples of this application.

除非另有定义，本文所使用的所有技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文所使用的术语只是为了描述本申请实施例的目的，不是旨在限制本申请。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. The terms used herein are only for the purpose of describing the embodiments of the present application and are not intended to limit the present application.

在以下的描述中，涉及到“一些实施例”，其描述了所有可能实施例的子集，但是可以理解，“一些实施例”可以是所有可能实施例的相同子集或不同子集，并且可以在不冲突的情况下相互结合。还需要指出，本申请实施例所涉及的术语“第一/第二/第三”仅是用于区别类似的对象，不代表针对对象的特定排序，可以理解地，“第一/第二/第三”在允许的情况下可以互换特定的顺序或先后次序，以使这里描述的本申请实施例能够以除了在这里图示或描述以外的顺序实施例。In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or a different subset of all possible embodiments, and Can be combined with each other without conflict. It should also be pointed out that the terms "first/second/third" involved in the embodiments of this application are only used to distinguish similar objects and do not represent a specific ordering of objects. It is understandable that "first/second/third" The third "specific order or sequence may be interchanged where permitted, so that the embodiments of the application described herein can be implemented in an order other than that illustrated or described herein.

目前，现有技术中，在物联网数据指数级增长的情况下，主要利用现有的开源技术时序数据库转发机制(Influx DataBase relay，InfluxDB-relay)和时序数据库集群代理服务(Influx DataBase proxy，influxDB-proxy)进行时序数据处理，而现有技术中的InfluxDB-relay和influx-proxy两个集群存在以下技术问题：At present, among the existing technologies, with the exponential growth of IoT data, the existing open source technology time series database forwarding mechanism (Influx DataBase relay, InfluxDB-relay) and time series database cluster proxy service (Influx DataBase proxy, influxDB) are mainly used. -proxy) for time series data processing, and the two clusters of InfluxDB-relay and influx-proxy in the existing technology have the following technical problems:

(1)在InfluxDB-relay集群的技术方案中，当有数据写入时，能够支持通过超文本传输协议(Hyper Text Transfer Protocol，HTTP)或者用户数据报协议(User DatagramProtocol，udp)方式向InfluxDB写入数据，仅仅解决了如何进行数据备份的问题，并未解决InfluxDB读写性能差的问题；InfluxDB-relay技术方案未对写入失败的数据做任何重试机制的处理，当数据写入失败时，就会造成多台InfluxDB之间数据不一致和数据丢失；InfluxDB-relay方案目前已长期未更新。(1) In the technical solution of InfluxDB-relay cluster, when data is written, it can support writing to InfluxDB through Hyper Text Transfer Protocol (HTTP) or User Datagram Protocol (User Datagram Protocol, udp). Entering data only solves the problem of how to perform data backup, but does not solve the problem of poor read and write performance of InfluxDB; the InfluxDB-relay technical solution does not implement any retry mechanism for data that fails to be written. When data writing fails, , it will cause data inconsistency and data loss between multiple InfluxDBs; the InfluxDB-relay solution has not been updated for a long time.

(2)在InfluxDB-proxy集群的技术方案中使用的组件较多，增加了使用者的学习成本，且不易于后期的维护；InfluxDB-proxy技术方案中，在机器性能达到极限时，当数据请求失败，重试机制在无形中增加了机器的负载。(2) There are many components used in the technical solution of InfluxDB-proxy cluster, which increases the user's learning cost and is not easy to maintain in the future. In the technical solution of InfluxDB-proxy, when the machine performance reaches the limit, when the data request In case of failure, the retry mechanism virtually increases the load on the machine.

为解决上述问题，在本申请实施例中提供一种时序数据同步装置1，如图1所示，时序数据同步装置包括：消息队列10、数据采集器11、时序数据库的负载均衡器12、时序数据库实例13、同步时序数据库的负载均衡器14和同步时序数据库实例15；In order to solve the above problems, an embodiment of the present application provides a time series data synchronization device 1. As shown in Figure 1, the time series data synchronization device includes: a message queue 10, a data collector 11, a load balancer 12 of a time series database, a time series data synchronization device 1, and a time series data synchronization device 1. Database instance 13, load balancer 14 of synchronized time series database and synchronized time series database instance 15;

数据采集器11，用于消费所述消息队列10中的第一数据，以将第一数据推送至时序数据库的负载均衡器12中；消息队列10中存储接收到的写入数据；The data collector 11 is used to consume the first data in the message queue 10 to push the first data to the load balancer 12 of the time series database; the message queue 10 stores the received write data;

时序数据库的负载均衡器12，用于将第一数据转发至第一时序数据库实例13中；The load balancer 12 of the time series database is used to forward the first data to the first time series database instance 13;

第一时序数据库实例13，用于将第一数据写入预创建的默认的永久数据保留策略中；并将第一数据推送至所述同步时序数据库的负载均衡器14中；The first time series database instance 13 is used to write the first data into the pre-created default permanent data retention policy; and push the first data to the load balancer 14 of the synchronized time series database;

同步时序数据库的负载均衡器14，用于将第一数据转发到所述同步时序数据库实例15中；The load balancer 14 of the synchronization time series database is used to forward the first data to the synchronization time series database instance 15;

同步时序数据库实例15，用于将第一数据依次写入第二时序数据库实例对应的预创建的非默认数据保留策略中；第二时序数据库实例为除第一时序数据库实例外的时序数据库实例。The synchronized time series database instance 15 is used to sequentially write the first data into the pre-created non-default data retention policy corresponding to the second time series database instance; the second time series database instance is a time series database instance other than the first time series database instance.

在本申请实施例中，时序数据是指时间序列数据，时间序列数据是统一指标按时间顺序记录的数据列，在同一数据列中的各个数据必须是同口径的，要求具有可比性。时序数据可以是时期数，也可以是时点数，时间序列分析的目的是通过找出样本内时间序列的统计特性和发展规律性，构建时间序列模型，进行样本外预测，时序数据的存储和处理采用关系型数据库的方式进行处理会导致其无法进行高效的存储和数据的查询，时序数据解决方案通过使用特殊的存储方式，使得时序数据可以高效存储和快速处理海量时序数据，是解决海量数据处理的一项重要技术。该技术采用特殊数据存储方式，极大提高了时间相关数据的处理能力，相对于关系型数据库它的存储空间减半，查询速度极大的提高，其时间序列函数优越的查询性能远超过关系型数据库。In the embodiment of this application, time series data refers to time series data. Time series data is a data column recorded by a unified indicator in chronological order. Each data in the same data column must be of the same caliber and is required to be comparable. Time series data can be a number of periods or a number of time points. The purpose of time series analysis is to find out the statistical characteristics and development regularity of the time series within the sample, build a time series model, perform out-of-sample prediction, and store and process time series data. Using a relational database for processing will make it impossible to store and query data efficiently. The time series data solution uses a special storage method to enable time series data to be efficiently stored and quickly process massive time series data. It is a solution to massive data processing. an important technology. This technology uses a special data storage method to greatly improve the processing capabilities of time-related data. Compared with relational databases, its storage space is halved and the query speed is greatly improved. Its superior query performance of time series functions far exceeds that of relational databases. database.

在本申请实施例中，在数据采集器消费消息队列中的第一数据之前，时序数据同步装置将接收到的写入数据转换为时序数据库行协议的格式；并将写入数据存储至消息队列中对应的主题中；消息队列中存储至少一个主题对应的数据。In the embodiment of the present application, before the data collector consumes the first data in the message queue, the time series data synchronization device converts the received write data into the format of the time series database row protocol; and stores the write data into the message queue. in the corresponding topic; the message queue stores data corresponding to at least one topic.

在本申请实施例中，当数据写入方要将时序数据写入时序数据库中时，时序数据库同步装置将接收到的写入数据先按照时序数据行协议的格式进行时序数据的格式转换。In the embodiment of the present application, when the data writer wants to write the time series data into the time series database, the time series database synchronization device first converts the received write data into the time series data format according to the format of the time series data line protocol.

在本申请实施例中，时序数据库为时间序列数据库，时间序列数据库指主要用于处理带时间标签的数据，带时间标签的数据也称为时间序列数据，目前比较常见的开源时序数据库为InfluxDB、分布式可伸缩时间序列数据库(Opentime series database，OpenTSDB)、Prometheus、Graphite。In the embodiment of this application, the time series database is a time series database. The time series database is mainly used to process data with time tags. Data with time tags is also called time series data. Currently, the more common open source time series databases are InfluxDB, Distributed scalable time series database (Opentime series database, OpenTSDB), Prometheus, Graphite.

在本申请实施例中，示例性地，以InfluxDB时序数据库为例，InfluxDB是一个由InfluxData公司开发的开源时序型数据。它由谷歌开发的一种编译性语言Go语言写成，着力于高性能地查询与存储时序型数据。InfluxDB被广泛应用于存储系统的监控数据，IoT行业的实时数据等场景。InfluxDB常用的一种使用场景：监控数据统计。每毫秒记录一下电脑内存的使用情况，然后就可以根据统计的数据，利用图形化界面制作内存使用情况的折线图；可以理解为按时间记录一些数据，然后制作图表做统计。In the embodiment of this application, the InfluxDB time series database is taken as an example. InfluxDB is an open source time series data developed by InfluxData Company. It is written in Go language, a compiled language developed by Google, and focuses on high-performance query and storage of time series data. InfluxDB is widely used in scenarios such as monitoring data of storage systems and real-time data in the IoT industry. A commonly used usage scenario of InfluxDB: monitoring data statistics. Record the computer memory usage every millisecond, and then use the graphical interface to create a line chart of memory usage based on the statistical data; it can be understood as recording some data by time, and then making a chart for statistics.

在本申请实施例中，时序数据库行协议是读取和写入时序数据的格式。In the embodiment of this application, the time series database row protocol is a format for reading and writing time series data.

示例性地，在本申请实施例中，当数据写入方要将时序数据写入到时序数据库InfluxDB中时，时序数据同步装置将接收到的写入数据先按照时序数据库InfluxDB行协议的格式进行时序数据的格式转换。Exemplarily, in the embodiment of the present application, when the data writer wants to write the time series data into the time series database InfluxDB, the time series data synchronization device first performs the received write data according to the format of the time series database InfluxDB row protocol. Format conversion of time series data.

在本申请实施例中，在时序数据同步装置将接收到的写入数据转换为时序数据库行协议的格式之后，还需要进一步地将写入的数据存储至消息队列对应的主题中。In the embodiment of the present application, after the time series data synchronization device converts the received written data into the format of the time series database row protocol, it is further necessary to store the written data into the topic corresponding to the message queue.

在本申请实施例中，消息队列是一种异步的服务间通信方式，是分布式应用间交换信息的重要组件，消息队列可驻留在内存或磁盘上，队列可以存储消息直到它们被应用程序读走。通过消息队列，应用程序可以在不知道彼此位置的情况下独立处理消息，或者在处理消息前不需要等待接收此消息。消息队列可以解决应用解耦、异步消息、流量削峰等问题，是实现高性能、高可用、可伸缩和最终一致性架构中不可或缺的一环。现在比较常见的消息队列产品主要有ActiveMQ、RabbitMQ、ZeroMQ、Kafka、RocketMQ等。In the embodiment of this application, the message queue is an asynchronous inter-service communication method and an important component for exchanging information between distributed applications. The message queue can reside in memory or disk, and the queue can store messages until they are used by the application. Read away. Message queues allow applications to process messages independently without knowing each other's locations or without waiting to receive a message before processing it. Message queues can solve problems such as application decoupling, asynchronous messages, and traffic peaking, and are an indispensable part of achieving high performance, high availability, scalability, and eventual consistency architecture. The more common message queue products now include ActiveMQ, RabbitMQ, ZeroMQ, Kafka, RocketMQ, etc.

示例性地，在本申请实施例中，以消息队列kafka为例，在时序数据同步装置将接收到的写入数据转换为时序数据库行协议的格式之后，再通过消息队列kafka生产者将转换格式后的数据写入到kafka固定的主题中。Illustratively, in the embodiment of this application, taking the message queue Kafka as an example, after the time series data synchronization device converts the received write data into the format of the time series database row protocol, the message queue Kafka producer will then convert the format The final data is written to the Kafka fixed topic.

在本申请实施例中，kafka生产者是向主题发布消息的客户端应用程序，kafka生产者用于持续不断的向某个主题发送消息。其中，主题是指消息的种类，一个主题代表一类的消息，主题的实质相当于对消息进行分类，可以理解的是，主题可以类比于数据库中的表。In the embodiment of this application, the kafka producer is a client application that publishes messages to a topic, and the kafka producer is used to continuously send messages to a certain topic. Among them, the topic refers to the type of message, and a topic represents a type of message. The essence of the topic is equivalent to classifying the message. It is understandable that the topic can be analogized to a table in a database.

需要说明的是，固定主题是指每次数据写入kafka时的主题是一样的，固定主题不一定是在kafka中已经创建好的主题，当首次有数据写入kafka时，若kafka中不存在该类型数据对应的主题，则kafka会自动创建和该类型数据对应的一个主题。It should be noted that the fixed topic means that the topic is the same every time data is written to kafka. The fixed topic is not necessarily a topic that has been created in kafka. When data is written to kafka for the first time, if it does not exist in kafka If the topic corresponds to this type of data, kafka will automatically create a topic corresponding to this type of data.

需要说明的是，一个消息队列中可以至少存储有一个主题对应的数据。It should be noted that a message queue can store data corresponding to at least one topic.

在本申请实施例中，当数据写入方将数据转换为InfluxDB行协议的格式，再通过kafka生产者将数据写入到kafka的一个固定主题中之后，数据采集器Telegraf消费消息队列kafka中的第一数据，并将消费的第一数据推送至时序数据库的负载均衡器nginx中。In the embodiment of this application, when the data writer converts the data into the format of the InfluxDB row protocol, and then writes the data into a fixed topic of kafka through the kafka producer, the data collector Telegraf consumes the message queue kafka. The first data is pushed to the load balancer nginx of the time series database.

在本申请实施例中，数据采集器Telegraf是InfluxData公司开发的一个数据采集器，用来收集各种监控数据。In the embodiment of this application, the data collector Telegraf is a data collector developed by InfluxData Company and is used to collect various monitoring data.

在本申请实施例中，数据采集器用于消费消息队列中数据采集器订阅的主题的第一数据；数据采集器订阅的主题与第一数据存储至消息队列的主题相同。In this embodiment of the present application, the data collector is used to consume the first data of the topic subscribed by the data collector in the message queue; the topic subscribed by the data collector is the same as the topic where the first data is stored in the message queue.

在本申请实施例中，多台数据采集器Telegraf处于同一个消息队列kafka的消费组中并且订阅相同主题，该主题与数据写入消息队列kafka的主题一致，当数据采集器Telegraf消费到订阅主题的数据后会将数据推送到时序数据库InfluxDB的负载均衡器nginx上。In the embodiment of this application, multiple data collectors Telegraf are in the consumer group of the same message queue kafka and subscribe to the same topic. This topic is consistent with the topic of data written to the message queue kafka. When the data collector Telegraf consumes the subscribed topic The data will be pushed to the load balancer nginx of the time series database InfluxDB.

在本申请实施例中，当InfluxDB的负载均衡器nginx收到数据后会通过负载均衡算法将数据转发到InfluxDB实例上。In the embodiment of this application, when InfluxDB's load balancer nginx receives the data, it will forward the data to the InfluxDB instance through the load balancing algorithm.

需要说明的是，将InfluxDB的负载均衡器nginx收到的数据转发到InfluxDB实例上的负载均衡算法可以根据实际情况进行选择，在本申请中不做具体的限定。It should be noted that the load balancing algorithm for forwarding the data received by InfluxDB's load balancer nginx to the InfluxDB instance can be selected according to the actual situation and is not specifically limited in this application.

在本申请实施例中，写入数据是统一写入消息队列，然后再通过Telegraf去消费消息队列中的数据，再将数据写入InfluxDB。使用消息队列可以对写入请求进行削峰处理，避免写入请求过多造成InfluxDB数据库集群的压力。如果InfluxDB数据丢失也可以重新消费消息队列来恢复数据。In the embodiment of this application, data is written uniformly into the message queue, and then the data in the message queue is consumed through Telegraf, and then the data is written into InfluxDB. The message queue can be used to reduce the peak load of write requests and avoid the pressure on the InfluxDB database cluster caused by too many write requests. If InfluxDB data is lost, the message queue can be re-consumed to recover the data.

在本申请实施例中，在第一时序数据库实例InfluxDB收到InfluxDB的负载均衡器nginx转发过来的数据后，会将收到的数据写入到默认的永久数据保留策略autogen中。In the embodiment of this application, after the first time series database instance InfluxDB receives the data forwarded by InfluxDB's load balancer nginx, it will write the received data into the default permanent data retention policy autogen.

需要说明的是，默认的永久数据保留策略autogen是在部署好InfluxDB数据库实例后，InfluxDB默认进行创建的。It should be noted that the default permanent data retention policy autogen is created by InfluxDB by default after the InfluxDB database instance is deployed.

在本申请实施例中，在第一时序数据库实例InfluxDB将所述第一数据写入预创建的默认的永久数据保留策略中后，第一时序数据库实例InfluxDB需将第一数据推送至所述同步时序数据库的负载均衡器，在通过同步时序数据库的负载均衡器将数据转发到同步时序数据库实例中。In the embodiment of this application, after the first time series database instance InfluxDB writes the first data into the pre-created default permanent data retention policy, the first time series database instance InfluxDB needs to push the first data to the synchronization The load balancer of the time series database forwards the data to the synchronous time series database instance through the load balancer of the synchronous time series database.

在本申请实施例中，为了当InfluxDB数据库写入数据时将数据推送到同步时序数据库实例InfluxDB-sync中，需要创建InfluxDB数据库的订阅。In the embodiment of this application, in order to push the data to the synchronous time series database instance InfluxDB-sync when the InfluxDB database writes data, a subscription to the InfluxDB database needs to be created.

示例性地，在本申请实施例中，创建InfluxDB数据库的订阅可以使用语句influx-execute"CREATE SUBSCRIPTION sync_sub ON fsis.autogen DESTINATIONS ANYInfluxDB-sync实例地址"创建数据库fsis的默认永久保留策略的订阅，当向InfluxDB的fsis.autogen写入数据时，InfluxDB会把数据推送到同步时序数据库实例InfluxDB-sync实例上。Illustratively, in the embodiment of this application, to create a subscription to the InfluxDB database, you can use the statement influx-execute "CREATE SUBSCRIPTION sync_sub ON fsis.autogen DESTINATIONS ANYInfluxDB-sync instance address" to create a subscription to the default permanent retention policy of the database fsis. When InfluxDB's fsis.autogen writes data, InfluxDB will push the data to the synchronous time series database instance InfluxDB-sync instance.

在本申请实施例中，在InfluxDB默认的永久数据保留策略autogen做了数据订阅配置，当默认的永久数据保留策略autogen中有数据写入时，该InfluxDB会将数据推送到InfluxDB-sync的负载均衡器nginx上，InfluxDB-sync的负载均衡器nginx收到数据后会通过负载均衡算法将数据转发到InfluxDB-sync实例上。In the embodiment of this application, data subscription configuration is made in the default permanent data retention policy autogen of InfluxDB. When data is written in the default permanent data retention policy autogen, the InfluxDB will push the data to the load balancing of InfluxDB-sync. On the server nginx, InfluxDB-sync's load balancer nginx will forward the data to the InfluxDB-sync instance through the load balancing algorithm after receiving the data.

需要说明的是，InfluxDB-sync的负载均衡器nginx将收到的数据转发到InfluxDB-sync实例的方式可以根据实际情况进行选择，在这里不做具体的限定。It should be noted that the way InfluxDB-sync's load balancer nginx forwards received data to the InfluxDB-sync instance can be selected based on the actual situation, and there is no specific limit here.

在本申请实施例中，当把数据推送到同步时序数据库实例InfluxDB-sync中后，同步时序数据库实例InfluxDB-sync需要将推送来的数据依次写入到第二时序数据库实例对应的预创建的非默认数据保留策略中。In the embodiment of this application, after the data is pushed to the synchronous time series database instance InfluxDB-sync, the synchronous time series database instance InfluxDB-sync needs to write the pushed data in sequence to the pre-created non-database corresponding to the second time series database instance. in the default data retention policy.

在本申请实施例中，在部署好InfluxDB数据库实例后，InfluxDB会默认创建默认的保留策略autogen，因而非默认的保留策略autogen_sync需要在每台InfluxDB实例上进行创建。In the embodiment of this application, after the InfluxDB database instance is deployed, InfluxDB will create the default retention policy autogen by default, so the non-default retention policy autogen_sync needs to be created on each InfluxDB instance.

示例性地，在每台InfluxDB实例上创建非默认的保留策略autogen_sync可以使用语句influx-execute"CREATE RETENTION POLICY autogen_sync ON fsis DURATION 0sREPLICATION 1"创建数据库fsis的非默认保留策略autogen_sync。For example, to create a non-default retention policy autogen_sync on each InfluxDB instance, you can use the statement influx-execute "CREATE RETENTION POLICY autogen_sync ON fsis DURATION 0sREPLICATION 1" to create a non-default retention policy autogen_sync for the database fsis.

在本申请实施例中，在InfluxDB-sync实例收到数据后会排除掉已写入数据的InfluxDB实例，再将数据依次写入到其它InfluxDB实例的非默认的保留策略autogen_sync中。In the embodiment of this application, after the InfluxDB-sync instance receives the data, it will exclude the InfluxDB instance to which the data has been written, and then write the data to the non-default retention policy autogen_sync of other InfluxDB instances in turn.

在本申请实施例中，使用者无需了解整个InfluxDB集群构建方案的设计与原理，直接配置和使用即可。In the embodiment of this application, users do not need to understand the design and principles of the entire InfluxDB cluster construction solution, and can directly configure and use it.

可选地，时序数据库的负载均衡器，还用于在接收到查询请求的情况下，将查询请求转发至第三时序数据库实例中，以供第三时序数据库实例通过查询默认数据保留策略和非默认数据保留策略以实现数据查询的过程。Optionally, the load balancer of the time series database is also used to forward the query request to the third time series database instance when receiving the query request, so that the third time series database instance can query the default data retention policy and non- Default data retention policy to implement the process of data query.

在本申请实施例中，在对数据进行查询时，可以直接通过InfluxDB负载均衡器nginx进行查询，当InfluxDB负载均衡器nginx收到查询请求时，会根据负载均衡算法将查询请求转发到InfluxDB实例上进行数据查询。In the embodiment of this application, when querying data, the query can be performed directly through the InfluxDB load balancer nginx. When the InfluxDB load balancer nginx receives the query request, it will forward the query request to the InfluxDB instance according to the load balancing algorithm. Perform data queries.

在本申请实施例中，在将数据写入方写入的时序数据写入InfluxDB实例过程中，将从InfluxDB负载均衡器nginx过来的数据写到InfluxDB默认的永久保留策略autogen中，将从InfluxDB-sync过来的数据写到非默认的保留策略autogen_sync中，因而，在数据查询时需要同时查询默认的永久保留策略autogen与非默认保留策略autogen_sync才能查到InfluxDB集群中存储的完整的数据。In the embodiment of this application, during the process of writing the time series data written by the data writer into the InfluxDB instance, the data from the InfluxDB load balancer nginx is written to the InfluxDB default permanent retention policy autogen, and the data from the InfluxDB- The data from sync is written to the non-default retention policy autogen_sync. Therefore, when querying data, you need to query the default permanent retention policy autogen and the non-default retention policy autogen_sync at the same time to find the complete data stored in the InfluxDB cluster.

示例性地，对默认的永久保留策略autogen与非默认保留策略autogen_sync中存储的时序数据的语句可以为：select*from autogen.device_analog，autogen_sync.device_analog。For example, the statements for the time series data stored in the default permanent retention policy autogen and the non-default retention policy autogen_sync can be: select * from autogen.device_analog, autogen_sync.device_analog.

可选地，同步时序数据库实例，还用于将第一数据写入默认永久保留策略的数据格式更新为写入非默认保留策略的数据格式；根据预配置的时序数据库地址排除第一时序数据库实例，并依次调用时序数据库的数据写入接口，将第一数据写入所述第二时序数据库实例对应的非默认数据保留策略。Optionally, synchronizing the time series database instance is also used to update the data format for writing the first data into the default permanent retention policy to the data format for writing into the non-default retention policy; and exclude the first time series database instance according to the preconfigured time series database address. , and sequentially call the data writing interface of the time series database to write the first data into the non-default data retention policy corresponding to the second time series database instance.

在本申请实施例中，当数据写入方写入时序数据时，Telegraf消费到订阅主题的数据后会将数据推送到InfluxDB的负载均衡器nginx上，InfluxDB的负载均衡器nginx收到数据后会通过负载均衡算法将数据转发到InfluxDB实例上，如图2所示，当写入的数据到达InfluxDB时，InfluxDB会在默认的保留策略autogen中写入数据，并根据InfluxDB配置订阅信息将数据推送到InfluxDB-sync实例中，InfluxDB-sync实例收到数据后将接收到的写入默认永久保留策略autogen的数据格式修改为写入非默认保留策略autogen_sync的数据格式，然后根据配置的InfluxDB数据库地址信息将发送者InfluxDB实例排除掉，最后依次调用InfluxDB数据写入接口将数据写入到其它InfluxDB实例的autogen_sync中。In the embodiment of this application, when the data writer writes the time series data, Telegraf will push the data to the load balancer nginx of InfluxDB after consuming the data of the subscription topic. After receiving the data, the load balancer nginx of InfluxDB will The data is forwarded to the InfluxDB instance through the load balancing algorithm, as shown in Figure 2. When the written data reaches InfluxDB, InfluxDB will write the data in the default retention policy autogen and push the data to In the InfluxDB-sync instance, after receiving the data, the InfluxDB-sync instance changes the received data format written in the default permanent retention policy autogen to the data format written in the non-default retention policy autogen_sync, and then changes it according to the configured InfluxDB database address information. The sender InfluxDB instance is excluded, and finally the InfluxDB data writing interface is called in order to write the data to autogen_sync of other InfluxDB instances.

在本申请实施例中，采用该种同步方法可快速将一条数据写入请求传播到InfluxDB集群的其它InfluxDB实例上，几乎无时间延迟。In the embodiment of this application, this synchronization method can be used to quickly propagate a data write request to other InfluxDB instances in the InfluxDB cluster with almost no time delay.

可选地，同步时序数据库实例，还用于当检测出第二时序数据库实例同步数据失败的情况下，将第一数据写入为第二时序数据库实例创建的目录的文件中；每隔第一预设时段，检测第二时序数据库实例是否运行正常，若检测出第二时序数据库实例运行正常，则依次将文件中的第一数据写入至第二时序数据库实例对应的非默认数据保留策略。Optionally, the synchronization time series database instance is also used to write the first data to a file in the directory created for the second time series database instance when it is detected that the second time series database instance fails to synchronize data; every first time During a preset period, it is detected whether the second time series database instance is running normally. If it is detected that the second time series database instance is running normally, the first data in the file is sequentially written to the non-default data retention policy corresponding to the second time series database instance.

在本申请实施例中，当InfluxDB-sync实例同步数据失败时，首先会将数据存储到本地磁盘，然后将失败的节点置为down状态。In the embodiment of this application, when the InfluxDB-sync instance fails to synchronize data, the data will first be stored in the local disk, and then the failed node will be set to the down state.

在本申请实施例中，在InfluxDB-sync实例同步数据失败的情况下，InfluxDB-sync实例会周期性的向每个InfluxDB实例发送ping(Packet Internet Groper，ping)消息，首先检测InfluxDB是否出现宕机的现象，如果发现有该失败节点处的InfluxDB处于宕机状态，则将该失败的节点置为宕机状态down，如果发现InfluxDB运行正常则将该失败的节点的状态设置为正常状态up。In the embodiment of this application, when the InfluxDB-sync instance fails to synchronize data, the InfluxDB-sync instance will periodically send ping (Packet Internet Groper, ping) messages to each InfluxDB instance, and first detect whether InfluxDB is down. phenomenon, if it is found that InfluxDB at the failed node is down, the failed node will be set to the down state. If it is found that InfluxDB is running normally, the status of the failed node will be set to the normal state up.

需要说明的是，InfluxDB-sync实例会周期性的向每个InfluxDB实例发送ping消息的时间间隔可以根据实际情况进行选择，本申请中不做具体的限定。It should be noted that the time interval at which the InfluxDB-sync instance will periodically send ping messages to each InfluxDB instance can be selected based on the actual situation and is not specifically limited in this application.

在本申请实施例中，所有检测到节点失败的消息都会以文件的形式保存到本地磁盘中，InfluxDB-sync实例会为每个InfluxDB实例创建单独的目录，创建的目录路径按照InfluxDB的主机名host进行命名，失败消息会写入对应目录的失败文件failed.proc中，当失败消息的条数超过阈值，或者超过一定时间，InfluxDB-sync实例会将失败文件failed.proc中的数据移动到待处理文件ready文件中，ready文件命名按照数字单调递增，比如1.ready，2.ready，以此类推。In the embodiment of this application, all messages detecting node failure will be saved to the local disk in the form of files. The InfluxDB-sync instance will create a separate directory for each InfluxDB instance. The created directory path is based on the host name of InfluxDB. Name it, and the failure message will be written to the failure file failed.proc in the corresponding directory. When the number of failure messages exceeds the threshold, or exceeds a certain period of time, the InfluxDB-sync instance will move the data in the failure file failed.proc to pending processing. In the file ready file, the ready file name increases monotonically according to the number, such as 1.ready, 2.ready, and so on.

需要说明的是，对于失败消息设置的阈值及规定时间的取值可以根据实际情况进行选择，本申请中不做具体的限定。It should be noted that the threshold value set for the failure message and the specified time value can be selected according to the actual situation, and are not specifically limited in this application.

需要说明的是，对文件命名的方式可以根据实际情况进行选择，除本申请中对文件命名的方式外，其他的对文件命名的方式均在本申请保护的范围内。It should be noted that the method of naming files can be selected according to the actual situation. Except for the method of naming files in this application, other methods of naming files are within the scope of protection of this application.

在本申请实施例中，当失败消息的条数超过阈值，或者超过一定时间，InfluxDB-sync实例会将失败文件failed.proc中的数据移动到待处理文件ready文件中之后，InfluxDB-sync实例会周期性的检测本地磁盘中是否有ready文件，如果发现有，则开始尝试进行数据的恢复。具体地，在进行数据恢复时，InfluxDB-sync实例会扫描每个InfluxDB对应的目录，如果发现有InfluxDB的状态为宕机状态down则跳过该目录。In the embodiment of this application, when the number of failed messages exceeds the threshold, or exceeds a certain time, the InfluxDB-sync instance will move the data in the failed file failed.proc to the ready file to be processed, and then the InfluxDB-sync instance will Periodically check whether there is a ready file in the local disk, and if found, start trying to recover the data. Specifically, during data recovery, the InfluxDB-sync instance will scan the directory corresponding to each InfluxDB. If an InfluxDB is found to be in the down state, the directory will be skipped.

在本申请实施例中，InfluxDB-sync进行数据恢复时，InfluxDB-sync实例会一次性将一个ready文件的全部内容加载到内存中，并进行逐条恢复，与此同时，InfluxDB-sync会创建一个临时文件record，以.record结尾，以记录恢复结果，比如1.ready.record，成功恢复一条数据，则往record文件写入1，否则写入0，直到恢复完一个文件中的所有数据，恢复完一个文件后，将恢复失败的数据重新写入原ready文件，并删除record文件，然后进行下一个文件恢复。In the embodiment of this application, when InfluxDB-sync performs data recovery, the InfluxDB-sync instance will load the entire contents of a ready file into the memory at one time and restore it piece by piece. At the same time, InfluxDB-sync will create a temporary The file record ends with .record to record the recovery result, such as 1.ready.record. If a piece of data is successfully recovered, 1 is written to the record file, otherwise 0 is written until all data in a file is recovered and the recovery is completed. After a file is restored, the data that failed to be restored is rewritten into the original ready file, the record file is deleted, and then the next file is restored.

需要说明的是，InfluxDB-sync实例在一次性将一个ready文件的全部内容加载到内存中时，需要根据实际情况进行阈值的选择，避免造成文件过大而导致处理速度较慢的问题。It should be noted that when an InfluxDB-sync instance loads the entire contents of a ready file into memory at one time, the threshold needs to be selected based on the actual situation to avoid the problem of slow processing speed due to excessively large files.

可选地，时序数据同步装置还包括：分布式应用程序协调服务软件zookeeper，zookeeper用于为多个同步时序数据库实例中的第一同步时序数据库实例分配分布式锁。Optionally, the time series data synchronization device also includes: distributed application coordination service software zookeeper, which is used to allocate distributed locks to the first synchronization time series database instance among multiple synchronization time series database instances.

在本申请实施例中，zookeeper是分布式系统基础架构Hadoop和数据库Hbase的重要组件，它是一个为分布式应用提供一致性服务的软件，提供的功能包括：配置维护、域名服务、分布式同步、组服务等。zookeeper是一个分布式协调服务的开源框架，主要用来解决分布式集群中应用系统的一致性的问题，例如怎样避免同时操作同一数据造成脏读的问题。zookeeper本质上是一个分布式的小文件存储系统，提供基于类似于文件系统的目录树方式的数据存储，并且可以对树种的节点进行有效管理，从而来维护和监控存储的数据的状态变化，将通过监控这些数据状态的变化，从而可以达到基于数据的集群管理。In the embodiment of this application, zookeeper is an important component of the distributed system infrastructure Hadoop and the database Hbase. It is a software that provides consistency services for distributed applications. The functions provided include: configuration maintenance, domain name services, and distributed synchronization. , group services, etc. Zookeeper is an open source framework for distributed coordination services. It is mainly used to solve the consistency problem of application systems in distributed clusters, such as how to avoid dirty reads caused by operating the same data at the same time. Zookeeper is essentially a distributed small file storage system that provides data storage based on a directory tree similar to a file system, and can effectively manage the nodes of the tree to maintain and monitor the status changes of the stored data. By monitoring changes in the status of these data, data-based cluster management can be achieved.

在本申请实施例中，在对InfluxDB-sync模块接收到的数据同步至InfluxDB模块中时，因InfluxDB-sync模块可能存在多个同步时序数据库实例InfluxDB-sync，为了避免在同步的过程中存在脏数据，因而在同步任务同一时刻，只能在一个InfluxDB-sync实例上运行，所以采用基于zookeeper分布式锁的方式来决定由那个InfluxDB-sync实例来执行当前时间点的定时任务。In the embodiment of this application, when the data received by the InfluxDB-sync module is synchronized to the InfluxDB module, because the InfluxDB-sync module may have multiple synchronization timing database instances InfluxDB-sync, in order to avoid dirty things during the synchronization process Therefore, the synchronization task can only run on one InfluxDB-sync instance at the same time, so the zookeeper distributed lock method is used to determine which InfluxDB-sync instance will execute the scheduled task at the current point in time.

可选地，时序数据同步装置还包括：同步接口，第一同步时序数据库实例，还用于在从zookeeper中获取到分布式锁的情况下，通过同步接口接收数据同步指令；根据数据同步指令获取时序数据库集群中的待同步数据；将待同步数据同步至时序数据库集群中不包括待同步数据的时序数据库实例中；其中，时序数据库集群由多个时序数据库实例组建。Optionally, the time series data synchronization device also includes: a synchronization interface, a first synchronization time series database instance, and is also used to receive data synchronization instructions through the synchronization interface when the distributed lock is obtained from zookeeper; obtain according to the data synchronization instructions The data to be synchronized in the time series database cluster; the data to be synchronized is synchronized to the time series database instance in the time series database cluster that does not include the data to be synchronized; wherein the time series database cluster is composed of multiple time series database instances.

在本申请实施例中，时序数据库同步装置还提供一个同步接口，主要应用于单节点的InfluxDB数据库升级为InfluxDB集群的场景中。In the embodiment of this application, the time series database synchronization device also provides a synchronization interface, which is mainly used in the scenario of upgrading a single-node InfluxDB database to an InfluxDB cluster.

在本申请实施例中，在单节点InfluxDB数据库升级为InfluxDB集群的场景下，若只是将数据写入方写入的数据进行同步，则不能将单节点InfluxDB之前的数据同步到InfluxDB集群的其它InfluxDB实例中。In the embodiment of this application, in the scenario where a single-node InfluxDB database is upgraded to an InfluxDB cluster, if only the data written by the data writer is synchronized, the data before the single-node InfluxDB cannot be synchronized to other InfluxDBs in the InfluxDB cluster. In the instance.

在本申请实施例中，时序数据同步装置对外提供一个同步接口，可以手动指定在InfluxDB集群中同步那张表中的数据。In the embodiment of this application, the time series data synchronization device provides a synchronization interface to the outside, and the data in the table can be manually specified to be synchronized in the InfluxDB cluster.

在本申请实施例中，时序数据同步装置通过同步接口接收到外部输入的同步数据指令时，首先会去zookeeper获取分布式锁，若获取锁成功，则开始执行同步任务；若获取锁失败，则结束本次定时任务的执行。In the embodiment of this application, when the time series data synchronization device receives an externally input synchronization data instruction through the synchronization interface, it will first go to zookeeper to obtain the distributed lock. If the lock acquisition is successful, the synchronization task will be started; if the lock acquisition fails, then End the execution of this scheduled task.

在本申请实施例中，在获取锁成功的情况下，需要获取时序数据库集群中的待同步数据。In the embodiment of this application, when the lock is acquired successfully, the data to be synchronized in the time series database cluster needs to be acquired.

在本申请实施例中，在对待同步数据进行获取时，可以采用区间同步的方式，具体地，区间同步是指两台InfluxDB实例之间的默认永久保留策略autogen.表名是否与非默认的保留策略autogen_sync.表名在某个时间段数据条数是否一致，若不一致则执行同步，在执行同步时会判断该时间段的数据条数是否大于默认的最大条数，若大于最大条数则根据递归+二分法继续拆分时间段，当拆分出来的时间段的最大数据条数小于默认的最大数据条数时，则开始同步数数据，同步数据时会先查询出该时间段autogen.表名的数据，再去查询autogen_sync.表名的数据，然后逐条对比数据是否相同，若存在不相同的数据，则补偿差异数据。In the embodiment of this application, when acquiring the data to be synchronized, interval synchronization can be adopted. Specifically, interval synchronization refers to whether the default permanent retention policy autogen. between two InfluxDB instances is consistent with the non-default retention policy. Strategy autogen_sync. Whether the number of data items in the table name in a certain time period is consistent. If not, synchronization will be performed. When synchronization is performed, it will be judged whether the number of data items in the time period is greater than the default maximum number. If it is greater than the maximum number, it will be based on Recursion + dichotomy continues to split the time period. When the maximum number of data in the split time period is less than the default maximum number of data, data synchronization begins. When synchronizing data, the autogen. table of the time period will first be queried. Name data, then query the data of autogen_sync. table name, and then compare the data one by one to see if they are the same. If there is different data, compensate for the difference data.

在本申请另一实施例中，在对待同步数据进行获取时，还可以采用逐条对比的方式，具体地，逐条对比同步和区间同步相较，省去了判断数据条数是否一致的步骤，其余的步骤与区间同步一致，在这里不再赘述。In another embodiment of the present application, when acquiring the data to be synchronized, a piece-by-piece comparison method can also be used. Specifically, the piece-by-piece comparison synchronization and the interval synchronization comparison save the step of judging whether the number of data pieces is consistent. The steps are consistent with interval synchronization and will not be repeated here.

需要说明的是，获取待同步数据的方式不限于本申请中的区间同步和逐条对比的方式，具体的，可以根据实际情况进行选择，在本申请中不做具体的限定。It should be noted that the method of obtaining the data to be synchronized is not limited to the interval synchronization and item-by-item comparison methods in this application. Specifically, the method can be selected according to the actual situation, and is not specifically limited in this application.

在本申请实施例中，无论是项目初期还是项目后期都可以很灵活的对InfluxDB集群进行切换；且能够当集群中的数据出现异常时能够快速进行数据补偿恢复数据。In the embodiment of this application, the InfluxDB cluster can be switched flexibly no matter in the early stage of the project or in the later stage of the project; and when the data in the cluster is abnormal, data compensation can be quickly performed to restore the data.

可选地，第一同步时序数据库实例，还用于在从zookeeper中获取到分布式锁的情况下，每隔第二预设时段，对时序数据库集群中的多个时序数据库实例中的数据进行对比，得到多个时序数据库实例之间的差异数据，差异数据即为待同步数据；将差异数据同步至多个时序数据库实例中不包括差异数据的时序数据库实例中。Optionally, the first synchronized time series database instance is also used to perform data synchronization on the data in multiple time series database instances in the time series database cluster every second preset period when the distributed lock is obtained from zookeeper. Compare and obtain differential data between multiple time series database instances. The differential data is the data to be synchronized; synchronize the differential data to multiple time series database instances that do not include differential data.

在本申请实施例中，在单节点的InfluxDB数据库升级为InfluxDB集群的场景下，进行时序数据同步时，还可以利用通过定时任务在固定时间频率内对比InfluxDB集群中各InfluxDB实例的数据是否一致，若不一致则补偿差异数据进行数据同步。In the embodiment of this application, when a single-node InfluxDB database is upgraded to an InfluxDB cluster, when performing time series data synchronization, scheduled tasks can also be used to compare whether the data of each InfluxDB instance in the InfluxDB cluster is consistent within a fixed time frequency. If they are inconsistent, the difference data will be compensated for data synchronization.

在本申请实施例中，当定时任务执行时，首先会去zookeeper获取分布式锁，若获取锁成功，则开始执行同步任务，若获取锁失败，则结束本次定时任务的执行。In the embodiment of this application, when a scheduled task is executed, it will first go to ZooKeeper to obtain the distributed lock. If the lock is successfully acquired, the synchronization task will be started. If the lock acquisition fails, the execution of this scheduled task will be ended.

在本申请实施例中，当获取锁成功的前提下，开始执行同步任务时，会先从配置中查询出需要同步的表信息及InfluxDB实例信息，然后根据同步时间的间隔步长将同步时间段拆分为多个小的同步时间段，最后根据拆分后同步时间段分线程去同步每个时间段每张表的数据信息。当同步完成后再去聚合每个线程对每张表的同步结果，若一张表的所有时间段都同步成功了，则更新这张表的同步时间为最新同步成功的时间，若一张表的多个时间段同步失败了，则将该表的最新同步时间更新为最先同步失败时间段的时间，下次同步任务运行时会重新同步之前同步失败的数据。In the embodiment of this application, when the lock is acquired successfully and the synchronization task is started, the table information and InfluxDB instance information that need to be synchronized will first be queried from the configuration, and then the synchronization time period will be set according to the interval step of the synchronization time. Split into multiple small synchronization time periods, and finally use threads to synchronize the data information of each table in each time period according to the split synchronization time period. After synchronization is completed, the synchronization results of each thread for each table are aggregated. If all time periods of a table are successfully synchronized, the synchronization time of this table is updated to the latest successful synchronization time. If a table If synchronization fails in multiple time periods, the latest synchronization time of the table will be updated to the time of the first synchronization failure period. The data that failed to synchronize before will be resynchronized the next time the synchronization task is run.

需要说明的是，在同步的过程中，确定待同步数据的方式同利用同步接口同步数据方式中的确定待同步数据的方式相同，在这里不再赘述。It should be noted that during the synchronization process, the method of determining the data to be synchronized is the same as the method of determining the data to be synchronized in the data synchronization method using the synchronization interface, and will not be described again here.

可选地，第一同步时序数据库实例，还用于将待同步数据逐条同步至时序数据库集群中不包括待同步数据的时序数据库实例中；和/或，第一同步时序数据库实例，还用于基于预设同步时间段，将待同步数据同步至时序数据库集群中不包括待同步数据的时序数据库实例中。Optionally, the first synchronized time series database instance is also used to synchronize the data to be synchronized one by one to the time series database instances in the time series database cluster that do not include the data to be synchronized; and/or the first synchronized time series database instance is also used to Based on the preset synchronization time period, synchronize the data to be synchronized to the time series database instance in the time series database cluster that does not include the data to be synchronized.

在本申请实施例中，第一同步时序数据库实例可以采用区间同步的方式进行数据同步，区间同步是指两台InfluxDB实例之间的autogen.表名是否与autogen_sync.表名在某个时间段数据条数是否一致，若不一致则执行同步，在执行同步时会判断该时间段的数据条数是否大于默认的最大条数，若大于最大条数则根据递归+二分法继续拆分时间段，当拆分出来的时间段的最大数据条数小于默认的最大数据条数时，则开始同步数数据，同步数据时会先查询出该时间段autogen.表名的数据，再去查询autogen_sync.表名的数据，然后逐条对比数据是否相同，若存在不相同的数据，则补偿差异数据。In the embodiment of this application, the first synchronization time series database instance can use interval synchronization to synchronize data. Interval synchronization refers to whether the autogen. table name between two InfluxDB instances matches the autogen_sync. table name in a certain time period. Whether the number of items is consistent. If not, synchronization will be performed. During synchronization, it will be judged whether the number of data items in the time period is greater than the default maximum number. If it is greater than the maximum number, the time period will continue to be split according to the recursion + dichotomy method. When When the maximum number of data items in the split time period is less than the default maximum number of data items, the data will be synchronized. When synchronizing data, the data of the autogen. table name in the time period will be queried first, and then the autogen_sync. table name will be queried. data, and then compare the data one by one to see if they are the same. If there is different data, compensate for the difference.

在本申请另一实施例中，第一同步时序数据库实例还可以采用逐条对比的方式进行数据同步，逐条对比同步相较于区间同步省去了判断数据条数是否一致的步骤，其余的步骤与区间同步一致，在这里不再赘述。In another embodiment of the present application, the first synchronization time series database instance can also perform data synchronization by comparing items one by one. Compared with interval synchronization, item-by-item comparison synchronization omits the step of judging whether the number of data items is consistent. The remaining steps are the same as those of interval synchronization. The interval is synchronized and consistent, so I won’t go into details here.

可选地，时序数据同步装置，还用于将接收到的写入数据转换为时序数据库行协议的格式；并将写入数据存储至所述消息队列中对应的主题中；消息队列中存储至少一个主题对应的数据；数据采集器，还用于消费所述消息队列中，数据采集器订阅的主题的第一数据；数据采集器订阅的主题与第一数据存储至消息队列的主题相同。Optionally, the time series data synchronization device is also used to convert the received write data into the format of the time series database row protocol; and store the write data into the corresponding topic in the message queue; store in the message queue at least Data corresponding to a topic; the data collector is also used to consume the first data of the topic subscribed by the data collector in the message queue; the topic subscribed by the data collector is the same as the topic where the first data is stored in the message queue.

在本申请实施例中，当数据写入方写入数据时，时序数据同步装置将数据写入方写入的数据根据时序数据库协议的格式进行数据转化，再将数据写入到消息队列对应的主题中。In the embodiment of the present application, when the data writer writes data, the time series data synchronization device converts the data written by the data writer according to the format of the time series database protocol, and then writes the data to the corresponding message queue. in topic.

需要说明的是，消息队列中存储至少一个主题对应的数据。It should be noted that the message queue stores data corresponding to at least one topic.

在本申请实施例中，再将数据写入到消息队列对应的主题中后，数据采集器根据订阅的主题消费消息队列中对应的第一数据。In this embodiment of the present application, after writing the data into the topic corresponding to the message queue, the data collector consumes the corresponding first data in the message queue according to the subscribed topic.

在本申请实施例中，多台数据采集器Telegraf处于同一个kafka的消费组中且订阅相同主题，该主题与数据写入kafka的主题一致。In the embodiment of this application, multiple data collectors Telegraf are in the same Kafka consumer group and subscribe to the same topic, which is consistent with the topic where data is written to Kafka.

可以理解的是，本申请实施例提供的一种时序数据同步装置，在进行时序数据同步的过程中，利用数据采集器将消息队列中的时序数据写入时序数据库实例之后，时序数据库实例将写入的数据写入时序数据库预创建的默认永久数据保留策略中，同时将写入的数据推送到同步时序数据库的负载均衡上，在同步时序数据库实例接收到同步时序数据库负载均衡推送过来的消息时，通过判断发送数据的时序数据库实例，将接收到的消息存储在除发送数据的时序数据库以外的其他时序数据库实例的预创建的非默认数据保留策略中，能够将写入的数据不仅存储在发送数据的时序数据库实例对应的预创建默认永久数据保留策略中，而且还能够将写入的数据同步存储在除发送数据以外的其他时序数据库对应的非默认数据保留策略中，保证InfluxDB集群存储数据的一致性和完整性，提高InfluxDB集群数据存储的准确性。It can be understood that the time series data synchronization device provided by the embodiment of the present application uses a data collector to write the time series data in the message queue to the time series database instance during the time series data synchronization process. The time series database instance will write The incoming data is written into the default permanent data retention policy pre-created in the time series database, and the written data is pushed to the load balancing of the synchronization time series database. When the synchronization time series database instance receives the message pushed by the load balancing of the synchronization time series database, , by determining the time series database instance that sent the data, and storing the received message in the pre-created non-default data retention policy of other time series database instances other than the time series database that sent the data, the written data can be stored not only in the time series database that sent the data, but also in the time series database instance that sent the data. In the pre-created default permanent data retention policy corresponding to the time series database instance of the data, the written data can also be synchronously stored in the non-default data retention policy corresponding to other time series databases in addition to sending data, ensuring that the InfluxDB cluster stores data Consistency and integrity improve the accuracy of InfluxDB cluster data storage.

基于上述实施例，在本申请中提供的一种时序数据同步装置，如图3所示，该时序数据同步装置包括数据写入方、消息队列、数据采集器、时序数据库的负载均衡器、时序数据库实例、同步时序数据库的负载均衡器、同步时序数据库实例、查询接口。Based on the above embodiments, a time series data synchronization device is provided in this application, as shown in Figure 3. The time series data synchronization device includes a data writer, a message queue, a data collector, a load balancer of a time series database, and a time series data synchronization device. Database instances, load balancers for synchronized time series databases, synchronized time series database instances, and query interfaces.

数据写入方，是需要将数据存储到InfluxDB时序数据库中的用户；The data writer is the user who needs to store data in the InfluxDB time series database;

消息队列，用于获取数据写入方写入的消息数据并进行存储；Message queue, used to obtain message data written by the data writer and store it;

数据采集器，用于消费所述消息队列中的第一数据，以将所述第一数据推送至所述时序数据库的负载均衡器中；A data collector configured to consume the first data in the message queue to push the first data to the load balancer of the time series database;

时序数据库的负载均衡器，用于将所述第一数据转发至第一时序数据库实例中；The load balancer of the time series database is used to forward the first data to the first time series database instance;

第一时序数据库实例，用于将所述第一数据写入预创建的默认的永久数据保留策略中；并将所述第一数据推送至所述同步时序数据库的负载均衡器中；A first time series database instance, configured to write the first data into a pre-created default permanent data retention policy; and push the first data to the load balancer of the synchronized time series database;

同步时序数据库的负载均衡器，用于将所述第一数据转发到所述同步时序数据库实例中；The load balancer of the synchronized time series database is used to forward the first data to the synchronized time series database instance;

同步时序数据库实例，用于将所述第一数据依次写入第二时序数据库实例对应的预创建的非默认数据保留策略中。Synchronizing the time series database instance is used to sequentially write the first data into the pre-created non-default data retention policy corresponding to the second time series database instance.

本申请实施例提供一种时序数据库同步方法，应用于时序数据同步装置，如图4所示，该方法包括：The embodiment of the present application provides a time series database synchronization method, which is applied to a time series data synchronization device. As shown in Figure 4, the method includes:

S101、通过数据采集器消费消息队列中的第一数据，以将第一数据转发至第一时序数据库实例中。S101. Use the data collector to consume the first data in the message queue to forward the first data to the first time series database instance.

在本申请实施例中，时序数据库为时间序列数据库，时间序列数据库指主要用于处理带时间标签的数据，带时间标签的数据也称为时间序列数据，目前比较常见的开源时序数据库为InfluxDB、OpenTSDB、Prometheus、Graphite。In the embodiment of this application, the time series database is a time series database. The time series database is mainly used to process data with time tags. Data with time tags is also called time series data. Currently, the more common open source time series databases are InfluxDB, OpenTSDB, Prometheus, Graphite.

在本申请实施例中，示例性地，以InfluxDB时序数据库为例，InfluxDB是一个由InfluxData开发的开源时序型数据。它由Go写成，着力于高性能地查询与存储时序型数据。InfluxDB被广泛应用于存储系统的监控数据，IoT行业的实时数据等场景。InfluxDB常用的一种使用场景：监控数据统计。每毫秒记录一下电脑内存的使用情况，然后就可以根据统计的数据，利用图形化界面制作内存使用情况的折线图；可以理解为按时间记录一些数据，然后制作图表做统计。In the embodiment of this application, the InfluxDB time series database is taken as an example. InfluxDB is an open source time series data developed by InfluxData. It is written in Go and focuses on high-performance query and storage of time series data. InfluxDB is widely used in scenarios such as monitoring data of storage systems and real-time data in the IoT industry. A commonly used usage scenario of InfluxDB: monitoring data statistics. Record the computer memory usage every millisecond, and then use the graphical interface to create a line chart of memory usage based on the statistical data; it can be understood as recording some data by time, and then making a chart for statistics.

在本申请实施例中，消息队列是一种异步的服务间通信方式，是分布式应用间交换信息的重要组件，消息队列可驻留在内存或磁盘上，队列可以存储消息直到它们被应用程序读走。通过消息队列，应用程序可以在不知道彼此位置的情况下独立处理消息，或者在处理消息前不需要等待接收此消息。消息队列可以解决应用解耦、异步消息、流量削峰等问题，是实现高性能、高可用、可伸缩和最终一致性架构中不可以或缺的一环。现在比较常见的消息队列产品主要有ActiveMQ、RabbitMQ、ZeroMQ、Kafka、RocketMQ等。In the embodiment of this application, the message queue is an asynchronous inter-service communication method and an important component for exchanging information between distributed applications. The message queue can reside in memory or disk, and the queue can store messages until they are used by the application. Read away. Message queues allow applications to process messages independently without knowing each other's locations or without waiting to receive a message before processing it. Message queues can solve problems such as application decoupling, asynchronous messages, and traffic peaking, and are an indispensable part of achieving high performance, high availability, scalability, and eventual consistency architecture. The more common message queue products now include ActiveMQ, RabbitMQ, ZeroMQ, Kafka, RocketMQ, etc.

示例性地，在本申请实施例中，以kafka消息队列为例，在时序数据同步装置将接收到的写入数据转换为时序数据库行协议的格式之后，再通过消息队列kafka生产者将转换格式后的数据写入到kafka固定的主题中。Illustratively, in the embodiment of this application, taking the Kafka message queue as an example, after the time series data synchronization device converts the received write data into the format of the time series database row protocol, the Kafka producer uses the message queue to convert the format The final data is written to the Kafka fixed topic.

在本申请实施例中，数据采集器Telegraf是InfluxData开发的一个数据采集器collector，用来收集各种监控数据。In the embodiment of this application, the data collector Telegraf is a data collector developed by InfluxData and is used to collect various monitoring data.

在本申请实施例中，多台数据采集器Telegraf处于同一个kafka的消费组中并且订阅相同主题，该主题与数据写入kafka的主题一致，当Telegraf消费到订阅主题的数据后会将数据推送到InfluxDB的负载均衡器nginx上。In the embodiment of this application, multiple data collectors Telegraf are in the same Kafka consumer group and subscribe to the same topic. This topic is consistent with the topic where data is written to Kafka. When Telegraf consumes the data of the subscribed topic, it will push the data To InfluxDB's load balancer nginx.

S102、通过第一时序数据库实例将第一数据写入预创建的默认数据保留策略中；并将第一数据转发到同步时序数据库实例中。S102. Write the first data into the pre-created default data retention policy through the first time series database instance; and forward the first data to the synchronization time series database instance.

在本申请实施例中，在第一时序数据库实例InfluxDB收到InfluxDB的负载均衡器nginx转发过来的第一数据后，会将收到的第一数据写入到默认的永久数据保留策略autogen中。In the embodiment of this application, after the first time series database instance InfluxDB receives the first data forwarded by InfluxDB's load balancer nginx, it will write the received first data into the default permanent data retention policy autogen.

在本申请实施例中，在第一时序数据库实例InfluxDB将所述第一数据写入预创建的默认的永久数据保留策略中后，第一时序数据库实例InfluxDB还需将第一数据推送至同步时序数据库的负载均衡器，在通过同步时序数据库的负载均衡器将数据转发到同步时序数据库实例中。In the embodiment of this application, after the first time series database instance InfluxDB writes the first data into the pre-created default permanent data retention policy, the first time series database instance InfluxDB also needs to push the first data to the synchronization time series The load balancer of the database forwards the data to the synchronous time series database instance through the load balancer of the synchronous time series database.

示例性地，在本申请实施例中，创建InfluxDB数据库的订阅可以使用语句influx-execute"CREATE SUBSCRIPTION sync_sub ON fsis.autogen DESTINATIONS ANYInfluxDB-sync实例地址"创建数据库fsis的默认永久保留策略的订阅，当向InfluxDB的fsis.autogen写入数据时，InfluxDB会把数据推送到InfluxDB-sync实例上。Illustratively, in the embodiment of this application, to create a subscription to the InfluxDB database, you can use the statement influx-execute "CREATE SUBSCRIPTION sync_sub ON fsis.autogen DESTINATIONS ANYInfluxDB-sync instance address" to create a subscription to the default permanent retention policy of the database fsis. When InfluxDB's fsis.autogen writes data, InfluxDB will push the data to the InfluxDB-sync instance.

S103、通过同步时序数据库实例将第一数据依次写入第二时序数据库实例对应的预创建的非默认数据保留策略中；第二时序数据库实例为除第一时序数据库实例外的时序数据库实例。S103. Sequentially write the first data into the pre-created non-default data retention policy corresponding to the second time series database instance by synchronizing the time series database instance; the second time series database instance is a time series database instance other than the first time series database instance.

在本申请实施例中，在同步时序数据库实例InfluxDB-sync对收到数据进行同步时，会排除掉默认永久保留策略中已写入数据的时序数据库InfluxDB实例，再将数据依次写入到其它时序数据库InfluxDB实例的非默认的保留策略autogen_sync中。In the embodiment of this application, when the synchronization time series database instance InfluxDB-sync synchronizes the received data, the time series database InfluxDB instance that has written data in the default permanent retention policy will be excluded, and then the data will be written to other time series in sequence. The non-default retention policy autogen_sync for database InfluxDB instances.

在本申请实施例中，当同步时序数据库实例检测出第二时序数据库实例同步数据失败的情况下，利用同步时序数据库实例将第一数据写入为第二时序数据库实例创建的目录的文件中；每隔第一预设时段，利用同步时序数据库实例检测第二时序数据库实例是否运行正常；若检测出第二时序数据库实例运行正常，则利用同步时序数据库实例依次将文件中的第一数据写入至第二时序数据库实例对应的非默认数据保留策略。In the embodiment of this application, when the synchronization time series database instance detects that the second time series database instance fails to synchronize data, the synchronization time series database instance is used to write the first data into a file in the directory created for the second time series database instance; Every first preset period, use the synchronized time series database instance to detect whether the second time series database instance is running normally; if it is detected that the second time series database instance is running normally, use the synchronized time series database instance to sequentially write the first data in the file to the non-default data retention policy corresponding to the second time series database instance.

在本申请实施例中，当同步时序数据库InfluxDB-sync实例同步数据失败时，首先会将数据存储到本地磁盘，然后将失败的节点置为down状态。In the embodiment of this application, when the synchronization time series database InfluxDB-sync instance fails to synchronize data, the data will first be stored in the local disk, and then the failed node will be set to the down state.

在本申请实施例中，在同步时序数据库InfluxDB-sync实例同步数据失败的情况下，同步时序数据库InfluxDB-sync实例会周期性的向每个时序数据库InfluxDB实例发送ping消息，首先检测时序数据库InfluxDB是否出现宕机的现象，如果发现有该失败节点处的时序数据库InfluxDB处于宕机状态，则将该失败的节点置为down，如果发现时序数据库InfluxDB运行正常则将该失败的节点的状态设置为up。In the embodiment of this application, when the synchronization time series database InfluxDB-sync instance fails to synchronize data, the synchronization time series database InfluxDB-sync instance will periodically send ping messages to each time series database InfluxDB instance, and first detect whether the time series database InfluxDB If a downtime occurs, if the time series database InfluxDB at the failed node is found to be down, set the failed node to down. If it is found that the time series database InfluxDB is running normally, set the status of the failed node to up. .

需要说明的是，同步时序数据库InfluxDB-sync实例会周期性的向每个时序数据库InfluxDB实例发送ping消息的时间间隔可以根据实际情况进行选择，本申请中不做具体的限定。It should be noted that the synchronous time series database InfluxDB-sync instance will periodically send ping messages to each time series database InfluxDB instance. The time interval can be selected according to the actual situation and is not specifically limited in this application.

在本申请实施例中，所有检测到节点失败的消息都会以文件的形式保存到本地磁盘中，同步时序数据库InfluxDB-sync实例会为每个时序数据库InfluxDB实例创建单独的目录，创建的目录路径按照时序数据库InfluxDB的主机名host进行命名，失败消息会写入对应目录的failed.proc文件中，当失败消息的条数超过阈值，或者超过一定时间，同步时序数据库InfluxDB-sync实例会将failed.proc中的数据移动到ready文件中，ready文件命名按照数字单调递增，比如1.ready，2.ready，以此类推。In the embodiment of this application, all messages detecting node failure will be saved to the local disk in the form of files. The synchronized time series database InfluxDB-sync instance will create a separate directory for each time series database InfluxDB instance. The created directory path is as follows The time series database InfluxDB is named host, and the failure message will be written to the failed.proc file in the corresponding directory. When the number of failure messages exceeds the threshold, or exceeds a certain time, the synchronization time series database InfluxDB-sync instance will write failed.proc The data in is moved to the ready file, and the ready file name increases monotonically according to the number, such as 1.ready, 2.ready, and so on.

在本申请实施例中，当失败消息的条数超过阈值，或者超过一定时间，同步时序数据库InfluxDB-sync实例会将failed.proc中的数据移动到ready文件中之后，同步时序数据库InfluxDB-sync实例会周期性的检测本地磁盘中是否有ready文件，如果发现有，则开始尝试进行数据的恢复。具体地，在进行数据恢复时，同步时序数据库InfluxDB-sync实例会扫描每个时序数据库InfluxDB对应的目录，如果发现有时序数据库InfluxDB的状态为down则跳过该目录。In the embodiment of this application, when the number of failed messages exceeds the threshold or exceeds a certain time, the synchronization time series database InfluxDB-sync instance will move the data in failed.proc to the ready file. After the synchronization time series database InfluxDB-sync instance It will periodically check whether there is a ready file in the local disk, and if it is found, it will start trying to recover the data. Specifically, during data recovery, the synchronous time series database InfluxDB-sync instance will scan the directory corresponding to each time series database InfluxDB. If the status of the time series database InfluxDB is found to be down, the directory will be skipped.

在本申请实施例中，同步时序数据库InfluxDB-sync进行数据恢复时，同步时序数据库InfluxDB-sync实例会一次性将一个ready文件的全部内容加载到内存中，并进行逐条恢复，与此同时，同步时序数据库InfluxDB-sync会创建一个以.record结尾的临时文件以记录恢复结果，比如1.ready.record，成功恢复一条数据，则往record文件写入1，否则写入0，直到恢复完一个文件中的所有数据，恢复完一个文件后，将恢复失败的数据重新写入原ready文件，并删除record文件，然后进行下一个文件恢复。In the embodiment of this application, when the synchronization time series database InfluxDB-sync performs data recovery, the synchronization time series database InfluxDB-sync instance will load the entire content of a ready file into the memory at one time and restore it one by one. At the same time, the synchronization Time series database InfluxDB-sync will create a temporary file ending with .record to record the recovery results, such as 1.ready.record. If a piece of data is successfully restored, 1 will be written to the record file, otherwise 0 will be written until a file is restored. After recovering all the data in the file, the data that failed to be restored will be rewritten into the original ready file, and the record file will be deleted, and then the next file will be restored.

需要说明的是，同步时序数据库InfluxDB-sync实例在一次性将一个ready文件的全部内容加载到内存中时，需要根据实际情况进行阈值的选择，避免造成文件过大而导致处理速度较慢的问题。It should be noted that when the InfluxDB-sync instance of the synchronization time series database loads the entire content of a ready file into the memory at one time, the threshold needs to be selected based on the actual situation to avoid the problem of slow processing speed caused by the file being too large. .

可选地，在本申请提供的一种时序数据同步方法的另一实施例中，所述方法包括：Optionally, in another embodiment of a timing data synchronization method provided by this application, the method includes:

从多个同步时序数据库实例确定出第一同步时序数据库实例；Determine the first synchronization time series database instance from multiple synchronization time series database instances;

利用第一同步时序数据库实例通过同步接口接收数据同步指令；根据数据同步指令获取时序数据库集群中的待同步数据；Use the first synchronization time series database instance to receive data synchronization instructions through the synchronization interface; obtain the data to be synchronized in the time series database cluster according to the data synchronization instructions;

利用第一同步时序数据库实例将待同步数据同步至时序数据库集群中不包括待同步数据的时序数据库实例中。The first synchronized time series database instance is used to synchronize the data to be synchronized to a time series database instance in the time series database cluster that does not include the data to be synchronized.

在本申请实施例中，在对待同步数据进行获取时，可以采用区间同步的方式，具体地，区间同步是指两台InfluxDB实例之间的autogen.表名是否与autogen_sync.表名在某个时间段数据条数是否一致，若不一致则执行同步，在执行同步时会判断该时间段的数据条数是否大于默认的最大条数，若大于最大条数则根据递归+二分法继续拆分时间段，当拆分出来的时间段的最大数据条数小于默认的最大数据条数时，则开始同步数数据，同步数据时会先查询出该时间段autogen.表名的数据，再去查询autogen_sync.表名的数据，然后逐条对比数据是否相同，若存在不相同的数据，则补偿差异数据。In the embodiment of this application, when acquiring the data to be synchronized, interval synchronization can be adopted. Specifically, interval synchronization refers to whether the autogen. table name between two InfluxDB instances matches the autogen_sync. table name at a certain time. Whether the number of data in the segment is consistent. If not, synchronization will be performed. When synchronizing, it will be judged whether the number of data in the time period is greater than the default maximum number. If it is greater than the maximum number, the time period will continue to be split according to the recursion + dichotomy method. , when the maximum number of data items in the split time period is less than the default maximum number of data items, data synchronization will begin. When synchronizing data, the data of the autogen. table name in the time period will be queried first, and then autogen_sync will be queried. The data of the table name is then compared one by one to see if the data is the same. If there is different data, the difference data is compensated.

可选地，在本申请提供的一种时序数据同步方法的再一实施例中，所述方法包括：Optionally, in yet another embodiment of a time series data synchronization method provided by this application, the method includes:

利用所述第一同步时序数据库实例每隔第二预设时段，对时序数据库集群中的多个时序数据库实例中的数据进行对比，得到所述多个时序数据库实例之间的差异数据；Using the first synchronized time series database instance every second preset time period, compare the data in multiple time series database instances in the time series database cluster to obtain difference data between the multiple time series database instances;

利用所述第一同步时序数据库实例将所述差异数据同步至所述多个时序数据库实例中不包括所述差异数据的时序数据库实例中。The first synchronized time series database instance is used to synchronize the difference data to a time series database instance that does not include the difference data among the plurality of time series database instances.

在本申请实施例中，当获取锁成功的前提下，开始执行同步任务时，会先从配置中查询出需要同步的表信息及InfluxDB实例信息，然后根据同步时间的间隔步长将同步时间段拆分为多个小的同步时间段，最后根据拆分后同步时间段分线程去同步每个时间段每张表的数据信息。当同步完成后再去聚合每个线程对每张表的同步结果，若一张表的所有时间段都同步成功了，则更新这张表的同步时间为最新同步成功的时间，若一张表的多个时间段同步失败了，则将该表的最新同步时间更新为最先同步失败时间段的时间，下次同步任务运行时会重新同步之前同步失败的数据。In the embodiment of this application, when the lock is acquired successfully and the synchronization task is started, the table information and InfluxDB instance information that need to be synchronized will first be queried from the configuration, and then the synchronization time period will be set according to the interval step of the synchronization time. Split into multiple small synchronization time periods, and finally use threads to synchronize the data information of each table in each time period according to the split synchronization time period. After the synchronization is completed, the synchronization results of each thread for each table are aggregated. If all time periods of a table are successfully synchronized, the synchronization time of this table is updated to the latest successful synchronization time. If a table If synchronization fails in multiple time periods, the latest synchronization time of the table will be updated to the time of the first synchronization failure period. The data that failed to synchronize before will be resynchronized the next time the synchronization task is run.

可以理解的是，本申请实施例提供的一种时序数据同步方法，在进行时序数据同步的过程中，利用数据采集器将消息队列中的时序数据写入时序数据库实例之后，时序数据库实例将写入的数据写入时序数据库预创建的默认永久数据保留策略中，同时将写入的数据推送到同步时序数据库的负载均衡上，在同步时序数据库实例接收到同步时序数据库负载均衡推送过来的消息时，通过判断发送数据的时序数据库实例，将接收到的消息存储在除发送数据的时序数据库以外的其他时序数据库实例的预创建的非默认数据保留策略中，能够将写入的数据不仅存储在发送数据的时序数据库实例对应的预创建默认永久数据保留策略中，而且还能够将写入的数据同步存储在除发送数据以外的其他时序数据库对应的非默认数据保留策略中，保证InfluxDB集群存储数据的一致性和完整性，提高InfluxDB集群数据存储的准确性。It can be understood that, in the time series data synchronization method provided by the embodiment of the present application, in the process of time series data synchronization, after using the data collector to write the time series data in the message queue into the time series database instance, the time series database instance will write The incoming data is written into the default permanent data retention policy pre-created in the time series database, and the written data is pushed to the load balancing of the synchronization time series database. When the synchronization time series database instance receives the message pushed by the load balancing of the synchronization time series database, , by determining the time series database instance that sent the data, and storing the received message in the pre-created non-default data retention policy of other time series database instances other than the time series database that sent the data, the written data can be stored not only in the time series database that sent the data, but also in the time series database instance that sent the data. In the pre-created default permanent data retention policy corresponding to the time series database instance of the data, the written data can also be synchronously stored in the non-default data retention policy corresponding to other time series databases in addition to sending data, ensuring that the InfluxDB cluster stores data Consistency and integrity improve the accuracy of InfluxDB cluster data storage.

本申请实施例提供一种存储介质，其上存储有计算机程序，上述计算机可读存储介质存储有一个或者多个程序，上述一个或者多个程序可被一个或者多个处理器执行，应用于时序数据同步装置1中，该计算机程序实现如上述的时序数据同步的方法。Embodiments of the present application provide a storage medium on which a computer program is stored. The computer-readable storage medium stores one or more programs. The one or more programs can be executed by one or more processors and are applied to timing sequences. In the data synchronization device 1, the computer program implements the above-mentioned time series data synchronization method.

需要说明的是，在本文中，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。It should be noted that, in this document, the terms "comprising", "comprises" or any other variations thereof are intended to cover a non-exclusive inclusion, such that a process, method, article or device that includes a series of elements not only includes those elements, It also includes other elements not expressly listed or inherent in the process, method, article or apparatus. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of additional identical elements in a process, method, article or apparatus that includes that element.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件，但很多情况下前者是更佳的实施方式。基于这样的理解，本公开的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中，包括若干指令用以使得一台图像显示设备(可以是手机，计算机，服务器，空调器，或者网络设备等)执行本公开各个实施例所述的方法。Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better. implementation. Based on this understanding, the technical solution of the present disclosure can be embodied in the form of a software product in essence or that contributes to related technologies. The computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk). ), includes several instructions to cause an image display device (which can be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods described in various embodiments of the present disclosure.

以上所述，仅为本申请的具体实施方式，但本申请的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应以所述权利要求的保护范围为准。The above are only specific embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the present application. should be covered by the protection scope of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.