CN113779048A

Movatterモバイル変換

Info

Publication number: CN113779048A
Application number: CN202010561160.9A
Authority: CN
Inventors: 杨均达; 韩有凰; 任维; 赖耀宇; 鲁强
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2020-06-18
Filing date: 2020-06-18
Publication date: 2021-12-10
Anticipated expiration: 2040-06-18
Also published as: CN113779048B

Abstract

Translated fromChinese

本发明公开了一种数据处理方法和装置，涉及计算机技术领域。该方法的一具体实施方式包括：解析源库中的日志文件得到第一增量数据，从所述第一增量数据的消息头中获取第一增量信息；根据所述第一增量信息中的主键标识和表名称，确定目标库中的第二增量数据；比对所述第一增量信息是否大于所述第二增量数据的第二增量信息，进而根据比对结果将所述目标库中的所述第二增量数据替换为所述第一增量数据或不替换。该实施方式通过增量信息过滤历史数据，保证高吞吐量下数据同步的顺序执行，减少不必要的数据更新操作，避免历史数据覆盖新数据的情况发生。

The invention discloses a data processing method and device, and relates to the technical field of computers. A specific implementation of the method includes: parsing the log files in the source library to obtain first incremental data, and obtaining first incremental information from a message header of the first incremental data; according to the first incremental information The primary key identifier and table name in The second incremental data in the target library is replaced with the first incremental data or not. This embodiment filters historical data through incremental information, ensures sequential execution of data synchronization under high throughput, reduces unnecessary data update operations, and avoids the occurrence of historical data overwriting new data.

Description

Data processing method and device

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a data processing method and apparatus.

Background

Currently, when the method is applied to master-slave copy of a database or incremental recovery of data, a Binlog analysis message is processed in a sequence or disorder message mode. The Binlog analysis message supports various routing modes: single topoic single partition, single topoic multi-partition, multi-topoic single partition, and multi-topoic multi-partition. Because of the limited per-partition performance and number of writes per second, multi-partition performance is superior to single partition.

In the process of implementing the invention, the inventor finds that the prior art has at least the following problems:

1. the requirement of a sequential message mode on a processing sequence is strict, and the problem of delay in message consumption of Binlog analysis is easy to occur;

2. although the processing speed of the out-of-order message mode is better than that of the sequential message mode, when data operation is frequent, the problem that data synchronization errors are caused by the fact that new data are covered by historical data is easily caused.

Disclosure of Invention

In view of this, embodiments of the present invention provide a data processing method and apparatus, which can at least solve the problem in the prior art that Binlog analysis message consumption is slow or data synchronization is wrong.

To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a data processing method including:

analyzing a log file in a source library to obtain first incremental data, and acquiring first incremental information from a message header of the first incremental data;

determining second incremental data in the target library according to the primary key identification and the table name in the first incremental information; wherein the primary key identification represents a location in the log file where the first delta information is located;

and comparing whether the first incremental information is larger than second incremental information of the second incremental data or not, and replacing the second incremental data in the target library with the first incremental data or not according to a comparison result.

Optionally, the first incremental information includes a first execution time in the source library;

the comparing whether the first incremental information is larger than second incremental information of the second incremental data, and then replacing the second incremental data in the target library with the first incremental data or not according to a comparison result includes:

comparing whether the first execution time is larger than a second execution time of the second incremental data;

if so, determining that the first incremental data is the latest data, and replacing the second incremental data in the target library with the first incremental data;

if the first incremental data is less than the preset value, determining that the first incremental data is not the latest data, and not executing the replacement operation.

Optionally, the first incremental information further includes a first file sequence number, where the first file sequence number is obtained by splitting a file name of the log file;

comparing whether the first file sequence number is larger than a second file sequence number of the second incremental data or not under the condition that the first execution time is the same as the second execution time;

Optionally, the incremental information further includes a first file offset, where the first file offset represents a storage order of the first incremental data in the log file;

comparing whether the first file offset is greater than a second file offset of the second incremental data or not under the condition that the first execution time is the same as the second execution time and the first file sequence number is the same as the second file sequence number;

Optionally, the method further includes: setting the key name of the distributed lock as the primary key identifier and the table name, so as to lock the first incremental data by using the distributed lock; and

releasing the distributed lock upon determining that the first delta data is up-to-date data or not up-to-date data.

To achieve the above object, according to another aspect of the embodiments of the present invention, there is provided a data processing apparatus including:

the analysis module is used for analyzing the log file in the source library to obtain first incremental data and acquiring first incremental information from a message header of the first incremental data;

the determining module is used for determining second incremental data in the target library according to the primary key identification and the table name in the first incremental information; wherein the primary key identification represents a location in the log file where the first delta information is located;

and the comparison module is used for comparing whether the first incremental information is larger than second incremental information of the second incremental data or not, and replacing the second incremental data in the target library with the first incremental data or not according to a comparison result.

the comparison module is used for:

Optionally, the system further comprises a distributed lock module, configured to:

setting the key name of the distributed lock as the primary key identifier and the table name, so as to lock the first incremental data by using the distributed lock; and

To achieve the above object, according to still another aspect of embodiments of the present invention, there is provided a data processing electronic device.

The electronic device of the embodiment of the invention comprises: one or more processors; a storage device, configured to store one or more programs, which when executed by the one or more processors, cause the one or more processors to implement any of the data processing methods described above.

To achieve the above object, according to still another aspect of embodiments of the present invention, there is provided a computer-readable medium on which a computer program is stored, the program implementing any of the data processing methods described above when executed by a processor.

According to the scheme provided by the invention, one embodiment of the invention has the following advantages or beneficial effects: historical data are filtered by analyzing the execution time of the SQL source library, the Binlog files and the corresponding file offsets in the first incremental data message header through the Binlog, so that the data synchronization sequence execution under high throughput is ensured, unnecessary data updating operation is reduced, and the situation that the historical data cover new data is avoided.

Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is a schematic main flow chart of a data processing method according to an embodiment of the present invention;

FIG. 2 is a flow diagram illustrating an alternative data processing method according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart diagram of an alternative data processing method according to an embodiment of the invention;

FIG. 4 is a schematic flow chart diagram of yet another alternative data processing method according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of the main blocks of a data processing apparatus according to an embodiment of the present invention;

FIG. 6 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;

FIG. 7 is a schematic block diagram of a computer system suitable for use with a mobile device or server implementing an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Referring to fig. 1, a main flowchart of a data processing method according to an embodiment of the present invention is shown, which includes the following steps:

s101: analyzing a log file in a source library to obtain first incremental data, and acquiring first incremental information from a message header of the first incremental data;

s102: determining second incremental data in the target library according to the primary key identification and the table name in the first incremental information; wherein the primary key identification represents a location in the log file where the first delta information is located;

s103: and comparing whether the first incremental information is larger than second incremental information of the second incremental data or not, and replacing the second incremental data in the target library with the first incremental data or not according to a comparison result.

In the above embodiment, in step S101, the Binlog is a log file in a binary format, and is used to record SQL (Structured Query Language) statement information that is added or deleted to or modified from the database by the user. Binlog is generally used for master-slave synchronization of a database and incremental recovery of data, so the scheme can be regarded as master-slave synchronization of incremental data.

One Bilog file comprises a plurality of Bilog records generated by SQL execution, the Bilog message is a message sent out by analyzing the Bilog records in the Bilog file through MQ (message queue), and one Bilog message corresponds to the Bilog records generated by one SQL execution.

And analyzing the Binlog file to obtain a plurality of Binlog records. For a single Bilog record (namely the first incremental data), the message header of the single Bilog record is analyzed, and the execution time execTime of SQL in the MySQL source library, the file name log _ name of the file where the Bilog record line is located and the file offset fileOffset are obtained. The file number filinum is then cut from log _ name, for example 002067 from mysql-bin.002067. log.

For step S102, the primary key identifier is similar to a line number in Excel, and corresponds to a line of data in Excel, so as to represent the position of the Binlog record in the Binlog file in the MySQL database. Multiple Binlog files may exist in the MySQL database, the primary key identifiers in the same Binlog file may not be repeated, but the primary key identifiers of different Binlog files may be repeated.

For this situation, the primary key identifier and the table name in the first incremental information are used to determine execution information of a Binlog parsing message (i.e. second incremental data) corresponding to R edis (which may also be other cache class databases) in order to obtain second incremental information: the execution time of SQL in the source library, the file sequence number of the Binlog record row and the file offset are in the following format: execTime-file Num-file offset.

It should be noted that the incremental data in Redis is stored in the form of key-value, where key is the main key identifier and table name, for example, com: xx: online: order:100000000, and value is Bi nlog record, including incremental information, such as execTime, fileNum, and fileOffset.

For step S103, if there is no record in Redis, that is, the second incremental data is empty, it is verified that the first incremental data is not processed, and the first incremental data is directly stored, so that the service logic processing is completed.

If yes, the execTime, the fileNum and the fileOffset need to be compared one by one to determine whether the second incremental data in the Redis needs to be replaced by the first incremental data or not, and the specific operation refers to the description shown in the subsequent fig. 2 to fig. 4, which is not described herein again.

In addition, in order to ensure that the synchronization of the incremental data is not interfered by other data synchronization processes, after the first incremental data is obtained by analysis, the first incremental data is locked by using a distributed lock, and then the lock is released immediately after the flow is judged to be finished so as to execute normal business logic. The lock key mark bit main key mark + table name can be realized during locking, and the information can be directly deleted after the lock is released.

In the method provided by the embodiment, the execTime, the fileNum and the fileOffset in the first incremental data message header are analyzed through the Binlog to filter the historical data, so that the data synchronization sequence execution under the high throughput is ensured, unnecessary data updating operation is reduced, and the situation that the historical data covers new data is avoided.

Referring to fig. 2, a schematic flow chart of an alternative data processing method according to an embodiment of the present invention is shown, including the following steps:

s201: analyzing a log file in a source library to obtain first incremental data, and acquiring first incremental information from a message header of the first incremental data; wherein the first incremental information comprises a primary key identifier, a table name, and a first execution time in a source library;

s202: determining second incremental data in the target library according to the primary key identification and the table name in the first incremental information; wherein the primary key identification represents a location in the log file where the first delta information is located;

s203: comparing whether the first execution time is larger than a second execution time of the second incremental data;

s204: if so, determining that the first incremental data is the latest data, and replacing the second incremental data in the target library with the first incremental data;

s205: if the first incremental data is less than the preset value, determining that the first incremental data is not the latest data, and not executing the replacement operation.

In the above embodiment, for steps S201 and S202, reference may be made to the description of steps S101 and S102 shown in fig. 1, and details are not repeated here.

In the above embodiment, in steps S203 to S205, if the second incremental data corresponding to the primary key identifier and the table name of the first incremental data is recorded in the Redis, the execution time execTime of SQL is first compared:

1) the second execTime is larger than the first execTime, namely the first incremental data is not latest, and the first incremental data is directly skipped without processing;

2) the second execTime is less than the first execTime, which indicates that the first incremental data is up-to-date, and the business logic can be processed normally, i.e., the second incremental data in the Redis is replaced by the first incremental data.

It should be noted that, for the same primary key identifier, the Redis only stores the key information of the latest record in the Binlog parsing message that has been processed by the primary key identifier.

The following is a specific example of the present embodiment:

for the same data in the MySQL source library, 1 SQL (A for short) is executed at 6:00 and 1 SQL (B for short) is executed at 6:01, and 2 Bilog messages are obtained in total. The execution time of A is 6:00, the file number is 101, and the file offset is 100000. The execution time of B is 6:01, the file number is 101 (or 102, which is not limited here), and the file offset is 200000.

The scheme adopts a disordered message mode, so that the A, B received by Redis is in an indefinite order. Suppose B arrives Redis first, meaning that B is stored in the current Redis. After the Binlog file is analyzed to obtain the data A, since the execution time 6:01 of the B which is newly recorded in the Redis is greater than 6:00, the fact that the data A is not the latest data is indicated, and then no processing is conducted. Otherwise, if A reaches Redis first, A in Redis needs to be replaced by B.

The method provided by the embodiment is suitable for the situation that the incremental data have different execution times, and whether the analyzed first incremental data is latest or not can be determined only by comparing the execution times of the first incremental data and the second incremental data, so that the purpose of data synchronization is achieved, and the situation of data synchronization errors is avoided.

Referring to fig. 3, a schematic flow chart of another alternative data processing method according to the embodiment of the present invention is shown, which includes the following steps:

s301: analyzing a log file in a source library to obtain first incremental data, and acquiring first incremental information from a message header of the first incremental data; the first incremental information comprises a primary key identifier, a table name, a first execution time in a source library and a first file sequence number; the first file sequence number is obtained by splitting the file name of the log file;

s302: determining second incremental data in the target library according to the primary key identification and the table name in the first incremental information; wherein the primary key identification represents a location in the log file where the first delta information is located;

s303: comparing whether the first execution time is larger than a second execution time of the second incremental data;

s304: if so, determining that the first incremental data is the latest data, and replacing the second incremental data in the target library with the first incremental data;

s305: if so, determining that the first incremental data is not the latest data, and not executing the replacement operation;

s306: if so, comparing whether the first file serial number is larger than a second file serial number of the second incremental data or not;

s307: if so, determining that the first incremental data is the latest data, and replacing the second incremental data in the target library with the first incremental data;

s308: if the first incremental data is less than the preset value, determining that the first incremental data is not the latest data, and not executing the replacement operation.

In the above embodiment, for steps S301 and S302, reference may be made to the description of steps S101 and S102 shown in fig. 1, and for steps S303 to S305, reference may be made to the description of steps S203 to S205 shown in fig. 2, which is not described herein again.

In the foregoing embodiment, for steps S306 to S308, the Binlog file number increases with the increase of data, the amount of data that can be carried by one Binlog file is limited, and when the upper limit is reached, a new Binlog file is created, and the file number in the new Binlog file is greater than the file number of the current Binlog file.

If the second execTime of the second incremental data in the Redis is equal to the first execTime of the first incremental data, it is proved that the two pieces of incremental data SQL are executed almost simultaneously (at the second level) in the source library, and at this time, the file sequence numbers filinum of the two pieces of incremental data SQL are compared:

1) if the second filenamenum is larger than the first filenamenum, the first incremental data is proved to be historical data, and processing is skipped;

2) and if the second filenameum is smaller than the first filenameum, the first incremental data is proved to be latest, and the second incremental data in the Redis is replaced by the first incremental data to complete the normal service logic.

The following is a specific example of the present embodiment:

for the same data in the MySQL source library, 1 SQL (A for short) is executed at 6:00 and 1 SQL (B for short) is executed at 6:00, wherein 2 Binlog messages are received when A is executed first.

Within the same Binlog file, the file offset gradually increases with recording time. Suppose that after the execution of A, the file is recorded into the Binlog file mysql-bin.101, and after the recording, the file reaches the upper limit, so that A is located at the end of the file, and the file offset is the largest in the file. A new Binlog file mysql-bin.102 is then created and B is recorded into this file so that B is at the top of the file rank with a small file offset.

Assuming that the generated A and B are generated, the execution time of the A is 6:00, the file sequence number is 101, and the file offset is 100000; the execution time of B is 6:00, the file sequence number is 102, the file offset is 1000, and the file sequence number is smaller than B but larger than B compared with A.

And processing the Binlog message in an out-of-order mode, and if the A reaches Redis first, recording the A in the Redis. After B arrives, a needs to be replaced with B, since file sequence number 102(B) >101(a), i.e. indicating that B is up-to-date. However, if B arrives first and then arrives at A, A is discarded.

The method provided by the embodiment is mainly applied to the situation that the execution time is the same second level but the file numbers are different, and whether the currently analyzed first incremental data is latest or not is judged by comparing the file numbers, so that data synchronization errors are avoided, and the purpose of data synchronization is achieved.

Referring to fig. 4, a schematic flow chart of another alternative data processing method according to the embodiment of the present invention is shown, which includes the following steps:

s401: analyzing a log file in a source library to obtain first incremental data, and acquiring first incremental information from a message header of the first incremental data; the first incremental information comprises a primary key identifier, a table name, a first execution time in a source library, a first file sequence number and a first file offset; the first file sequence number is obtained by splitting a file name of the log file, and the first file offset represents the storage ordering of the first incremental data in the log file;

s402: determining second incremental data in the target library according to the primary key identification and the table name in the first incremental information; wherein the primary key identification represents a location in the log file where the first delta information is located;

s403: comparing whether the first execution time is larger than a second execution time of the second incremental data;

s404: if so, determining that the first incremental data is the latest data, and replacing the second incremental data in the target library with the first incremental data;

s405: if so, determining that the first incremental data is not the latest data, and not executing the replacement operation;

s406: if so, comparing whether the first file serial number is larger than a second file serial number of the second incremental data or not;

s407: if so, determining that the first incremental data is the latest data, and replacing the second incremental data in the target library with the first incremental data;

s408: if so, determining that the first incremental data is not the latest data, and not executing the replacement operation;

s409: if so, comparing whether the first file offset is larger than a second file offset of the second incremental data or not;

s410: if so, determining that the first incremental data is the latest data, and replacing the second incremental data in the target library with the first incremental data;

s411: if the first incremental data is less than the preset value, determining that the first incremental data is not the latest data, and not executing the replacement operation.

In the above embodiment, for steps S401 and S402, reference may be made to the description of steps S101 and S102 shown in fig. 1, reference may be made to the description of steps S203 to S205 shown in fig. 2 for steps S403 to S405, and reference may be made to the description of steps S306 to S308 shown in fig. 3 for steps S406 to S408, which are not described again here.

In the foregoing embodiment, for steps S409 to S411, when the second filenameum of the second incremental data in the Redis is equal to the first filenameum of the first incremental data in the MySQL source library, it indicates that the two pieces of SQL are executed at similar time.

In the same Binlog file, the file offset increases with the insertion of the data record, so the file offset of the new record data in the same file is always larger than the historical file offset. Therefore, in the case where the second execTime is equal to the first execTime and the second fileNum is equal to the first fileNum, the file offsets fileOffset of the two are compared:

1) the first fileOffset is larger than the second fileOffset, which indicates that the first incremental data is the latest, the business logic can be processed normally, and the second incremental data is replaced by the first incremental data;

2) the first fileOffset is less than the second fileOffset, indicating that the first delta data is not current and that processing may be skipped.

The following is a specific example of the present embodiment:

for the same data in the MySQL source library, 1 SQL (A for short) is executed at 6:00, and 1 SQL (B for short) is executed at 6:00, wherein the A is executed first and 2 Binlog messages are received: the execution time of A is 6:00, the file sequence number is 101, and the file offset is 100000; the execution time of B is 6:00, the file sequence number is 101, and the file offset is 100100.

If A reaches Redis first, the A is recorded in the Redis, and if B reaches and judges that 100100(B) >100000(A), the A in the Redis is replaced by the B; but B arrives after A arrives, A is abandoned.

The method provided by the embodiment is mainly applied to the situation that the execution time at the same second level is the same as the file sequence number but different file offsets are obtained, and the latest incremental data in the same Binlog file can be judged by comparing the file offsets, so that the purpose of data synchronization is achieved.

The method provided by the embodiment of the invention compares the SQL execution time, the file serial number and the file offset of the analyzed first incremental data with the second incremental data in Redis on the basis of using the Binlog out-of-order message and ensuring the message consumption performance so as to judge whether the first incremental data is the latest or not, and compared with the prior art, the method at least has the following beneficial effects:

1) unnecessary Binlog analysis message processing is reduced, and the consumption rate of the Binlog analysis message is improved; for example, A- > B- > C is frequently changed at the same data second level, the Redis receives A- > C- > B, and the scheme determines that B can be discarded without processing, so that one-time data processing is reduced, and the Redis is enabled to store the latest data;

2) when the data volume to be processed is large, the waste of corresponding server resources is reduced, and resources such as a server CPU (central processing unit), a memory and the like are reduced, so that message utilization is not required to be processed;

3) because the Binlog analysis messages are all data of the whole data of the data line, the consumption purpose same as that of the sequence message can be achieved under the condition that the second level of a single data line is frequently changed, and the condition that the historical data covers new data under the high-concurrency operation of the data can be reduced.

Referring to fig. 5, a schematic diagram of main modules of adata processing apparatus 500 according to an embodiment of the present invention is shown, including:

theanalysis module 501 is configured to analyze a log file in a source library to obtain first incremental data, and obtain first incremental information from a message header of the first incremental data;

a determiningmodule 502, configured to determine second incremental data in the target library according to the primary key identifier and the table name in the first incremental information; wherein the primary key identification represents a location in the log file where the first delta information is located;

a comparing module 503, configured to compare whether the first incremental information is greater than second incremental information of the second incremental data, and replace the second incremental data in the target library with the first incremental data or not according to a comparison result.

In the implementation device of the invention, the first incremental information comprises a first execution time in the source library;

the comparing module 503 is configured to:

In the implementation device of the present invention, the first incremental information further includes a first file sequence number, where the first file sequence number is obtained by splitting a file name of the log file;

the comparing module 503 is configured to:

In the implementation device of the present invention, the incremental information further includes a first file offset, where the first file offset represents a storage order of the first incremental data in the log file;

the comparing module 503 is configured to:

The apparatus further comprises a distributed lock module 504 (not shown) for:

In addition, the detailed implementation of the device in the embodiment of the present invention has been described in detail in the above method, so that the repeated description is not repeated here.

FIG. 6 illustrates anexemplary system architecture 600 to which embodiments of the invention may be applied.

As shown in fig. 6, thesystem architecture 600 may include

terminal devices

601, 602, 603, anetwork 604, and a server 605 (by way of example only). Thenetwork 604 serves to provide a medium for communication links between the

terminal devices

601, 602, 603 and theserver 605.Network 604 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.

A user may use the

terminal devices

601, 602, 603 to interact with theserver 605 via thenetwork 604 to receive or send messages or the like. Various communication client applications can be installed on the

terminal devices

601, 602, 603.

The

terminal devices

601, 602, 603 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

Theserver 605 may be a server providing various services, such as a background management server (for example only) providing support for shopping websites browsed by users using the

terminal devices

601, 602, 603.

It should be noted that the method provided by the embodiment of the present invention is generally executed by theserver 605, and accordingly, the apparatus is generally disposed in theserver 605.

It should be understood that the number of terminal devices, networks, and servers in fig. 6 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to FIG. 7, shown is a block diagram of acomputer system 700 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 7, thecomputer system 700 includes a Central Processing Unit (CPU)701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from astorage section 708 into a Random Access Memory (RAM) 703. In theRAM 703, various programs and data necessary for the operation of thesystem 700 are also stored. TheCPU 701, theROM 702, and theRAM 703 are connected to each other via abus 704. An input/output (I/O)interface 705 is also connected tobus 704.

The following components are connected to the I/O interface 705: aninput portion 706 including a keyboard, a mouse, and the like; anoutput section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; astorage section 708 including a hard disk and the like; and acommunication section 709 including a network interface card such as a LAN card, a modem, or the like. Thecommunication section 709 performs communication processing via a network such as the internet. Adrive 710 is also connected to the I/O interface 705 as needed. Aremovable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on thedrive 710 as necessary, so that a computer program read out therefrom is mounted into thestorage section 708 as necessary.

In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through thecommunication section 709, and/or installed from theremovable medium 711. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 701.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor comprises an analysis module, a determination module and a comparison module. The names of these modules do not in some cases form a limitation on the modules themselves, and for example, an alignment module may also be described as an "incremental information alignment module".

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise:

According to the technical scheme of the embodiment of the invention, the execution time of the SQL source base, the Binlog file and the offset of the corresponding file in the first incremental data message header are analyzed through the Binlog to filter the historical data, so that the data synchronization sequence execution under high throughput is ensured, unnecessary data updating operation is reduced, and the situation that the historical data covers new data is avoided.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.