Movatterモバイル変換


[0]ホーム

URL:


CN113779048B - A data processing method and device - Google Patents

A data processing method and device
Download PDF

Info

Publication number
CN113779048B
CN113779048BCN202010561160.9ACN202010561160ACN113779048BCN 113779048 BCN113779048 BCN 113779048BCN 202010561160 ACN202010561160 ACN 202010561160ACN 113779048 BCN113779048 BCN 113779048B
Authority
CN
China
Prior art keywords
data
increment
information
incremental
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010561160.9A
Other languages
Chinese (zh)
Other versions
CN113779048A (en
Inventor
杨均达
韩有凰
任维
赖耀宇
鲁强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co LtdfiledCriticalBeijing Jingdong Century Trading Co Ltd
Priority to CN202010561160.9ApriorityCriticalpatent/CN113779048B/en
Publication of CN113779048ApublicationCriticalpatent/CN113779048A/en
Application grantedgrantedCritical
Publication of CN113779048BpublicationCriticalpatent/CN113779048B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

The invention discloses a data processing method and device, and relates to the technical field of computers. The method comprises the steps of analyzing log files in a source library to obtain first increment data, obtaining first increment information from a message header of the first increment data, determining second increment data in a target library according to a main key identification and a table name in the first increment information, comparing whether the first increment information is larger than the second increment information of the second increment data, and replacing the second increment data in the target library with the first increment data or not according to a comparison result. According to the embodiment, the historical data is filtered through the incremental information, so that the sequential execution of data synchronization under high throughput is ensured, unnecessary data updating operation is reduced, and the situation that the historical data covers new data is avoided.

Description

Data processing method and device
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data processing method and apparatus.
Background
Currently, when applied to master-slave replication of databases or incremental recovery of data, binlog parsing messages are typically processed using sequential or out-of-order messaging. Binlog parsing messages support multiple routing modes, single-topic single partition, single-topic multi-partition, multi-topic single partition, and multi-topic multi-partition. The performance of multiple partitions is superior to single partitions because of the limited performance per partition and number of writes per second.
In carrying out the present invention, the inventors have found that at least the following problems exist in the prior art:
1. the sequential message mode has strict processing sequence requirements, and the problem of Binlog analysis message consumption delay is easy to occur;
2. Although the out-of-order message mode is superior to the sequential message mode in processing speed, when the data operation is more frequent, the problem that the history data is covered with new data to cause data synchronization errors is easily caused.
Disclosure of Invention
In view of this, embodiments of the present invention provide a data processing method and apparatus, which at least can solve the problem in the prior art that Binlog parsing message consumes slowly or data synchronization errors.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a data processing method including:
analyzing a log file in a source library to obtain first incremental data, and obtaining first incremental information from a message header of the first incremental data;
determining second incremental data in a target library according to a primary key identifier and a table name in the first incremental information, wherein the primary key identifier represents the position of the first incremental information in the log file;
And comparing whether the first increment information is larger than the second increment information of the second increment data, and replacing the second increment data in the target library with the first increment data or not according to a comparison result.
Optionally, the first incremental information includes a first execution time in a source library;
The comparing whether the first increment information is larger than the second increment information of the second increment data or not, and further replacing the second increment data in the target library with the first increment data or not according to a comparison result, includes:
Comparing whether the first execution time is greater than a second execution time of the second incremental data;
If the first incremental data is larger than the second incremental data, determining that the first incremental data is the latest data, and replacing the second incremental data in the target library with the first incremental data;
If the first increment data is smaller than the latest data, the first increment data is determined not to be the latest data, and the replacement operation is not executed.
Optionally, the first incremental information further includes a first file sequence number, where the first file sequence number is obtained by splitting a file name of the log file;
The comparing whether the first increment information is larger than the second increment information of the second increment data or not, and further replacing the second increment data in the target library with the first increment data or not according to a comparison result, includes:
comparing whether the first file sequence number is larger than a second file sequence number of the second incremental data or not under the condition that the first execution time and the second execution time are the same;
If the first incremental data is larger than the second incremental data, determining that the first incremental data is the latest data, and replacing the second incremental data in the target library with the first incremental data;
If the first increment data is smaller than the latest data, the first increment data is determined not to be the latest data, and the replacement operation is not executed.
Optionally, the incremental information further includes a first file offset, the first file offset representing a storage ordering of the first incremental data in the log file;
The comparing whether the first increment information is larger than the second increment information of the second increment data or not, and further replacing the second increment data in the target library with the first increment data or not according to a comparison result, includes:
comparing whether the first file offset is larger than a second file offset of the second incremental data or not under the condition that the first execution time is the same as the second execution time and the first file sequence number is the same as the second file sequence number;
If the first incremental data is larger than the second incremental data, determining that the first incremental data is the latest data, and replacing the second incremental data in the target library with the first incremental data;
If the first increment data is smaller than the latest data, the first increment data is determined not to be the latest data, and the replacement operation is not executed.
Optionally, setting key names of distributed locks as the primary key identification and the table names to lock the first increment data by the distributed locks, and
The distributed lock is released after determining that the first delta data is up-to-date data or not.
To achieve the above object, according to another aspect of an embodiment of the present invention, there is provided a data processing apparatus including:
the analysis module is used for analyzing the log file in the source library to obtain first increment data, and acquiring first increment information from the message header of the first increment data;
the determining module is used for determining second incremental data in a target library according to a main key identifier and a table name in the first incremental information, wherein the main key identifier represents the position of the first incremental information in the log file;
and the comparison module is used for comparing whether the first increment information is larger than the second increment information of the second increment data or not, and further replacing the second increment data in the target library with the first increment data or not according to the comparison result.
Optionally, the first incremental information includes a first execution time in a source library;
the comparison module is used for:
Comparing whether the first execution time is greater than a second execution time of the second incremental data;
If the first incremental data is larger than the second incremental data, determining that the first incremental data is the latest data, and replacing the second incremental data in the target library with the first incremental data;
If the first increment data is smaller than the latest data, the first increment data is determined not to be the latest data, and the replacement operation is not executed.
Optionally, the first incremental information further includes a first file sequence number, where the first file sequence number is obtained by splitting a file name of the log file;
the comparison module is used for:
comparing whether the first file sequence number is larger than a second file sequence number of the second incremental data or not under the condition that the first execution time and the second execution time are the same;
If the first incremental data is larger than the second incremental data, determining that the first incremental data is the latest data, and replacing the second incremental data in the target library with the first incremental data;
If the first increment data is smaller than the latest data, the first increment data is determined not to be the latest data, and the replacement operation is not executed.
Optionally, the incremental information further includes a first file offset, the first file offset representing a storage ordering of the first incremental data in the log file;
the comparison module is used for:
comparing whether the first file offset is larger than a second file offset of the second incremental data or not under the condition that the first execution time is the same as the second execution time and the first file sequence number is the same as the second file sequence number;
If the first incremental data is larger than the second incremental data, determining that the first incremental data is the latest data, and replacing the second incremental data in the target library with the first incremental data;
If the first increment data is smaller than the latest data, the first increment data is determined not to be the latest data, and the replacement operation is not executed.
Optionally, the system further comprises a distributed lock module for:
setting key names of distributed locks as the primary key identification and the table names to lock the first incremental data by the distributed locks, and
The distributed lock is released after determining that the first delta data is up-to-date data or not.
To achieve the above object, according to still another aspect of an embodiment of the present invention, there is provided a data processing electronic device.
The electronic device comprises one or more processors and a storage device, wherein the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors are enabled to realize any one of the data processing methods.
To achieve the above object, according to still another aspect of the embodiments of the present invention, there is provided a computer-readable medium having stored thereon a computer program which, when executed by a processor, implements any of the above-described data processing methods.
According to the scheme provided by the invention, one embodiment of the scheme has the advantages that the SQL source library execution time, the Binlog file and the corresponding file offset in the first incremental data message header are analyzed through Binlog, historical data are filtered, sequential execution of data synchronization under high throughput is guaranteed, unnecessary data updating operation is reduced, and the condition that the historical data cover new data is avoided.
Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic flow diagram of a data processing method according to an embodiment of the present invention;
FIG. 2 is a flow chart of an alternative data processing method according to an embodiment of the invention;
FIG. 3 is a flow chart of another alternative data processing method according to an embodiment of the invention;
FIG. 4 is a flow chart of yet another alternative data processing method according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of the main modules of a data processing apparatus according to an embodiment of the present invention;
FIG. 6 is an exemplary system architecture diagram in which embodiments of the present invention may be applied;
fig. 7 is a schematic diagram of a computer system suitable for use in implementing a mobile device or server of an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Referring to fig. 1, a main flowchart of a data processing method provided by an embodiment of the present invention is shown, including the following steps:
S101, analyzing a log file in a source library to obtain first incremental data, and acquiring first incremental information from a message header of the first incremental data;
s102, determining second incremental data in a target library according to a primary key identifier and a table name in the first incremental information, wherein the primary key identifier represents the position of the first incremental information in the log file;
s103, comparing whether the first increment information is larger than the second increment information of the second increment data, and replacing the second increment data in the target library with the first increment data or not according to a comparison result.
In the above embodiment, for step S101, binlog is a log file in binary format, and is used to record SQL (Structured Query Language ) statement information that is added or subtracted to or from the database by the user. Binlog is commonly used for master-slave synchronization of databases and incremental recovery of data, and thus the present scheme can be regarded as a master-slave synchronization of incremental data.
The Binlog message is a message sent out through the MQ (message queue) by analyzing the Binlog records in the Binlog file, and the Binlog message corresponds to the Binlog records generated by one SQL execution.
Analyzing the Binlog file to obtain a plurality of Binlog records. For a single Binlog record (i.e. the first incremental data), the message header is analyzed, and the file name log_name and the file offset fileOffset of the file where the line is recorded at execution time execTime, binlog of SQL in the MySQL source library are obtained. File number fileNum is then cut from log_name, e.g., 002067 is cut from mysql-bin.002067. Log.
For step S102, the primary key identification is similar to the line number in Excel, and corresponds to a line of data in Excel, and is used to represent the position of the Binlog record in the Binlog file in the MySQL database. There may be multiple Binlog files in the MySQL database, the primary key identification in the same Binlog file may not be repeated, but the primary key identification of a different Binlog file may be repeated.
For this case, the primary key identifier and the table name in the first incremental information are adopted to determine R edis (other cache type databases can be adopted) corresponding Binlog analysis information (namely second incremental data) execution information, so as to obtain second incremental information, namely, the execution time of SQL in the source library, the file serial number and the file offset where the Binlog record line is located, and the format is execTime-file Num-fileOffset.
It should be noted that, the incremental data in Redis is stored in the form of key-value, where key is a primary key identifier and a table name, for example com: xx: online: order:100000000, and value is Bi nlog record, including incremental information, such as execTime, fileNum, fileOffset.
For step S103, if there is no record in the Redis, that is, the second incremental data is empty, it is proved that the first incremental data is not processed, and the first incremental data is directly stored, so as to complete the service logic processing.
However, if so, execTime, fileNum, fileOffset needs to be compared one by one to determine whether the second incremental data in the dis needs to be replaced with the first incremental data or not, and specific operations are described with reference to the following fig. 2 to 4, which are not repeated here.
In addition, in order to ensure that the synchronization of the incremental data is not interfered by other data synchronization processes, after the first incremental data is obtained through analysis, the first incremental data is locked by using a distributed lock, and after the judgment process is finished, the lock is released immediately so as to execute normal business logic. When locking, the key mark bit main key mark+list name will be deleted directly after releasing the lock.
According to the method provided by the embodiment, the historical data is filtered through the execTime, fileNum and fileOffset in the Binlog analysis first incremental data message header, so that sequential execution of data synchronization under high throughput is guaranteed, unnecessary data updating operation is reduced, and the condition that the historical data covers new data is avoided.
Referring to fig. 2, an alternative flow chart of a data processing method according to an embodiment of the invention is shown, comprising the following steps:
S201, analyzing a log file in a source library to obtain first increment data, and acquiring first increment information from a message header of the first increment data, wherein the first increment information comprises a primary key identifier, a table name and first execution time in the source library;
S202, determining second incremental data in a target library according to a primary key identifier and a table name in the first incremental information, wherein the primary key identifier represents the position of the first incremental information in the log file;
s203, comparing whether the first execution time is greater than a second execution time of the second incremental data;
S204, if the first incremental data is larger than the first incremental data, determining that the first incremental data is the latest data, and replacing the second incremental data in the target library with the first incremental data;
and S205, if the first increment data is smaller than the latest data, determining that the first increment data is not the latest data, and not executing replacement operation.
In the above embodiment, for the steps S201 and S202, reference may be made to the descriptions of the steps S101 and S102 shown in fig. 1, and the descriptions are not repeated here.
In the above embodiment, for steps S203 to S205, if the second incremental data corresponding to the primary key identifier and the table name of the first incremental data is recorded in the Redis, the SQL execution time execTime is compared first:
1) The second execTime is greater than the first execTime, indicating that the first delta data is not up to date, and directly skipping without processing;
2) The second execTime is less than the first execTime, indicating that the first delta data is up-to-date, and business logic can be handled normally, i.e., the second delta data in Redis is replaced with the first delta data.
It should be noted that, for the same main key identifier, only the key information of the latest record in the Binlog analysis message processed by the main key identifier is stored in the Redis.
The following is a specific example of the present embodiment:
For the same data in the MySQL source library, 1 SQL (short for A) is executed at 6:00, and 1 SQL (short for B) is executed at 6:01, so that a total of 2 Binlog messages are obtained. The execution time of A is 6:00, the file sequence number is 101, and the file offset is 100000.B has an execution time of 6:01, a file sequence number of 101 (or 102, without limitation, and a file offset of 200000).
The scheme adopts an out-of-order message mode, so that A, B received by Redis has an indefinite sequence. Let B arrive at the dis first, i.e. it means that B is stored in the current dis. After the Binlog file is analyzed to obtain the A, the execution time 6:01 of the latest recorded B in the Redis is greater than 6:00, which indicates that the A is not the latest data, and the processing is not performed. Otherwise, if a reaches Redis first, a in Redis needs to be replaced by B.
The method provided by the embodiment is suitable for the situation of different execution times of the incremental data, and can determine whether the analyzed first incremental data is up to date only by comparing the execution times of the first incremental data and the second incremental data, so that the data synchronization purpose is realized, and the situation of data synchronization errors is avoided.
Referring to FIG. 3, another alternative flow chart of a data processing method according to an embodiment of the invention is shown, comprising the steps of:
S301, analyzing a log file in a source library to obtain first increment data, and obtaining first increment information from a message header of the first increment data, wherein the first increment information comprises a main key identifier, a table name, a first execution time in the source library and a first file serial number;
S302, determining second incremental data in a target library according to a primary key identifier and a table name in the first incremental information, wherein the primary key identifier represents the position of the first incremental information in the log file;
s303, comparing whether the first execution time is greater than a second execution time of the second incremental data;
S304, if the first incremental data is larger than the first incremental data, determining that the first incremental data is the latest data, and replacing the second incremental data in the target library with the first incremental data;
s305, if the first increment data is smaller than the latest data, determining that the first increment data is not the latest data, and not executing replacement operation;
s306, if so, comparing whether the first file sequence number is larger than a second file sequence number of the second incremental data;
s307, if the first incremental data is larger than the first incremental data, determining that the first incremental data is the latest data, and replacing the second incremental data in the target library with the first incremental data;
And S308, if the first increment data is smaller than the latest data, determining that the first increment data is not the latest data, and not executing replacement operation.
In the above embodiment, for the steps S301 and S302, the descriptions of the steps S101 and S102 shown in fig. 1 may be referred to, and the descriptions of the steps S303 to S305 may be referred to the descriptions of the steps S203 to S205 shown in fig. 2, which are not repeated here.
In the above embodiment, for steps S306 to S308, the Binlog file sequence number is continuously increased along with the growth of data, but the amount of data that can be carried by a Binlog file is limited, when the upper limit is reached, a new Binlog file is created, and the file sequence number in the new Binlog file is greater than the file sequence number of the current Binlog file.
If the second execTime of the second incremental data in Redis is equal to the first execTime of the first incremental data, it is proved that the two pieces of incremental data SQL are executed almost simultaneously (in seconds) in the source library, and the file numbers fileNum of the two pieces of incremental data are compared at this time:
1) If the second fileNum is greater than the first fileNum, proving that the first incremental data is historical data, and skipping processing;
2) And if the second fileNum is smaller than the first fileNum, proving that the first incremental data is the latest, and replacing the second incremental data in the Redis with the first incremental data to complete normal business logic.
The following is a specific example of the present embodiment:
for the same data in the MySQL source library, 1 SQL (A for short) is executed at 6:00, 1 SQL (B for short) is executed at 6:00, and 2 Binlog messages are received after A is executed first.
In the same Binlog file, the file offset gradually increases with the recording time. It is assumed that after the execution of A is completed, the recorded file is recorded in a Binlog file mysql-bin.101, and after the recording is completed, the file storage reaches the upper limit, so that A is positioned at the end of the file, and the file offset is the largest in the file. Then a new Binlog file mysql-bin.102 is created and B is recorded into the file, so that B is located in the front of the file ordering and the file offset is small.
Assuming that the generated A and B have the execution time of 6:00, the file sequence number of 101, the file offset of 100000, the execution time of B is 6:00, the file sequence number of 102, the file offset of 1000, and the file offset is larger than B but smaller than B compared with the A file sequence number.
Binlog messages are processed in an out-of-order manner, and if A reaches Redis first, A is recorded in Redis. After B arrives, since file number 102 (B) >101 (a), it indicates that B is up to date, a needs to be replaced with B. But if B arrives first at a and then at a, a is discarded.
The method provided by the embodiment is mainly applied to the condition of executing time in the same second level but different file serial numbers, and judges whether the first increment data analyzed currently is up to date by comparing the file serial numbers, so that data synchronization errors are avoided, and the data synchronization purpose is realized.
Referring to FIG. 4, another alternative flow chart of a data processing method according to an embodiment of the invention is shown, comprising the steps of:
S401, analyzing a log file in a source library to obtain first increment data, and acquiring first increment information from a message header of the first increment data, wherein the first increment information comprises a main key identifier, a table name, a first execution time, a first file sequence number and a first file offset in the source library, the first file sequence number is obtained by splitting the file name of the log file, and the first file offset represents storage ordering of the first increment data in the log file;
S402, determining second incremental data in a target library according to a primary key identifier and a table name in the first incremental information, wherein the primary key identifier represents the position of the first incremental information in the log file;
S403, comparing whether the first execution time is greater than a second execution time of the second incremental data;
S404, if the first increment data is larger than the first increment data, determining that the first increment data is the latest data, and replacing the second increment data in the target library with the first increment data;
s405, if the first increment data is smaller than the latest data, determining that the first increment data is not the latest data, and not executing replacement operation;
S406, if so, comparing whether the first file sequence number is larger than a second file sequence number of the second incremental data;
S407, if the first increment data is larger than the first increment data, determining that the first increment data is the latest data, and replacing the second increment data in the target library with the first increment data;
S408, if the first increment data is smaller than the latest data, determining that the first increment data is not the latest data, and not executing replacement operation;
S409, if so, comparing whether the first file offset is larger than a second file offset of the second incremental data;
S410, if the first increment data is larger than the first increment data, determining that the first increment data is the latest data, and replacing the second increment data in the target library with the first increment data;
and S411, if the first increment data is smaller than the latest data, determining that the first increment data is not the latest data, and not executing replacement operation.
In the above embodiment, for the steps S401 and S402, the descriptions of the steps S101 and S102 shown in fig. 1 may be referred to, the descriptions of the steps S403 to S405 may be referred to, the descriptions of the steps S203 to S205 shown in fig. 2, and the descriptions of the steps S406 to S408 may be referred to, the descriptions of the steps S306 to S308 shown in fig. 3, which are not repeated here.
In the above embodiment, for steps S409 to S411, when the second fileNum of the second incremental data in the Redis is equal to the first fileNum of the first incremental data in the MySQL source library, it indicates that the two SQLs are executed at similar times.
In the same Binlog file, the file offset will increase with the insertion of the data record, so the file offset of the new record data under the same file must be greater than the history file offset. Thus, with second execTime equal to first execTime and second fileNum equal to first fileNum, the file offsets fileOffset of the two are compared:
1) The first fileOffset is greater than the second fileOffset, indicating that the first incremental data is up-to-date, business logic can be processed normally, and the second incremental data is replaced with the first incremental data;
2) The first fileOffset is less than the second fileOffset, indicating that the first delta data is not up to date, and processing may be skipped.
The following is a specific example of the present embodiment:
For the same data in the MySQL source library, 1 SQL (A for short) is executed at 6:00, 1 SQL (B for short) is executed at 6:00, and the A is executed first, wherein 2 Binlog messages are received, the A execution time is 6:00, the file serial number is 101, the file offset is 100000, the B execution time is 6:00, the file serial number is 101, and the file offset is 100100.
If A reaches Redis first, A is recorded in Redis, B reaches 100100 (B) >100000 (A) after B, A in Redis is replaced by B, but B reaches after A, A is discarded without processing.
The method provided by the embodiment is mainly applied to the situation that the execution time of the same second level is the same as the file serial number but different file offsets, and the latest increment data in the same Binlog file can be judged by comparing the file offsets, so that the data synchronization purpose is realized.
The method provided by the embodiment of the invention compares the SQL execution time, the file sequence number, the file offset and the second incremental data in Redis of the analyzed first incremental data on the basis of using the Binlog disorder message and guaranteeing the consumption performance of the message, so as to judge whether the first incremental data is up-to-date, and has at least the following beneficial effects compared with the prior art:
1) For example, A- > B- > C is frequently changed in the same data second level, and Redis receives A- > C- > B, and the scheme determines that B can be discarded and not processed, so that one-time data processing is reduced, and the Redis is ensured to store the latest data;
2) When the data volume to be processed is large, the resource waste of the corresponding server is reduced, and the utilization of resources such as a CPU (Central processing Unit) and a memory of the server without processing messages is reduced;
3) Because Binlog analysis messages are all data line full data, under the condition of single data line second-level frequent change, the same consumption purpose as that of sequential messages can be achieved, and the condition that historical data cover new data under the operation of high concurrence of data can be reduced.
Referring to fig. 5, a schematic diagram of main modules of a data processing apparatus 500 according to an embodiment of the present invention is shown, including:
The parsing module 501 is configured to parse the log file in the source library to obtain first incremental data, and obtain first incremental information from a message header of the first incremental data;
A determining module 502, configured to determine second incremental data in a target library according to a primary key identifier and a table name in the first incremental information, where the primary key identifier represents a location where the first incremental information is located in the log file;
And the comparison module 503 is configured to compare whether the first incremental information is greater than the second incremental information of the second incremental data, and further replace the second incremental data in the target library with the first incremental data or not according to a comparison result.
In the embodiment of the invention, the first increment information comprises a first execution time in a source library;
the comparison module 503 is configured to:
Comparing whether the first execution time is greater than a second execution time of the second incremental data;
If the first incremental data is larger than the second incremental data, determining that the first incremental data is the latest data, and replacing the second incremental data in the target library with the first incremental data;
If the first increment data is smaller than the latest data, the first increment data is determined not to be the latest data, and the replacement operation is not executed.
In the embodiment of the invention, the first increment information further comprises a first file sequence number, and the first file sequence number is obtained by splitting the file name of the log file;
the comparison module 503 is configured to:
comparing whether the first file sequence number is larger than a second file sequence number of the second incremental data or not under the condition that the first execution time and the second execution time are the same;
If the first incremental data is larger than the second incremental data, determining that the first incremental data is the latest data, and replacing the second incremental data in the target library with the first incremental data;
If the first increment data is smaller than the latest data, the first increment data is determined not to be the latest data, and the replacement operation is not executed.
In the embodiment of the invention, the incremental information further comprises a first file offset, and the first file offset represents the storage ordering of the first incremental data in the log file;
the comparison module 503 is configured to:
comparing whether the first file offset is larger than a second file offset of the second incremental data or not under the condition that the first execution time is the same as the second execution time and the first file sequence number is the same as the second file sequence number;
If the first incremental data is larger than the second incremental data, determining that the first incremental data is the latest data, and replacing the second incremental data in the target library with the first incremental data;
If the first increment data is smaller than the latest data, the first increment data is determined not to be the latest data, and the replacement operation is not executed.
The implementation of the present invention further includes a distributed lock module 504 (not shown) for:
setting key names of distributed locks as the primary key identification and the table names to lock the first incremental data by the distributed locks, and
The distributed lock is released after determining that the first delta data is up-to-date data or not.
In addition, the implementation of the apparatus in the embodiments of the present invention has been described in detail in the above method, so that the description is not repeated here.
Fig. 6 illustrates an exemplary system architecture 600 in which embodiments of the present invention may be applied.
As shown in fig. 6, the system architecture 600 may include terminal devices 601, 602, 603, a network 604, and a server 605 (by way of example only). The network 604 is used as a medium to provide communication links between the terminal devices 601, 602, 603 and the server 605. The network 604 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
A user may interact with the server 605 via the network 604 using the terminal devices 601, 602, 603 to receive or send messages, etc. Various communication client applications can be installed on the terminal devices 601, 602, 603.
The terminal devices 601, 602, 603 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 605 may be a server providing various services, such as a background management server (by way of example only) providing support for shopping-type websites browsed by users using terminal devices 601, 602, 603.
It should be noted that, the method provided by the embodiment of the present invention is generally performed by the server 605, and accordingly, the apparatus is generally disposed in the server 605.
It should be understood that the number of terminal devices, networks and servers in fig. 6 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 7, there is illustrated a schematic diagram of a computer system 700 suitable for use in implementing an embodiment of the present invention. The terminal device shown in fig. 7 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiment of the present invention.
As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU) 701, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the system 700 are also stored. The CPU 701, ROM 702, and RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Connected to the I/O interface 705 are an input section 706 including a keyboard, a mouse, and the like, an output section 707 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like, a storage section 708 including a hard disk, and the like, and a communication section 709 including a network interface card such as a LAN card, a modem, and the like. The communication section 709 performs communication processing via a network such as the internet. The drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read therefrom is mounted into the storage section 708 as necessary.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 709, and/or installed from the removable medium 711. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 701.
The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of a computer-readable storage medium may include, but are not limited to, an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules involved in the embodiments of the present invention may be implemented in software or in hardware. The described modules may also be provided in a processor, for example, a processor may be described as including an parsing module, a determining module, and an alignment module. The names of these modules do not in some way limit the module itself, and for example, the comparison module may also be described as an "incremental information comparison module".
As a further aspect, the invention also provides a computer readable medium which may be comprised in the device described in the above embodiments or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to include:
analyzing a log file in a source library to obtain first incremental data, and obtaining first incremental information from a message header of the first incremental data;
determining second incremental data in a target library according to a primary key identifier and a table name in the first incremental information, wherein the primary key identifier represents the position of the first incremental information in the log file;
And comparing whether the first increment information is larger than the second increment information of the second increment data, and replacing the second increment data in the target library with the first increment data or not according to a comparison result.
According to the technical scheme of the embodiment of the invention, the historical data is filtered through analyzing the SQL source library execution time, the Binlog file and the corresponding file offset in the first incremental data message header by Binlog, so that the sequential execution of data synchronization under high throughput is ensured, unnecessary data updating operation is reduced, and the condition that the historical data covers new data is avoided.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

CN202010561160.9A2020-06-182020-06-18 A data processing method and deviceActiveCN113779048B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202010561160.9ACN113779048B (en)2020-06-182020-06-18 A data processing method and device

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202010561160.9ACN113779048B (en)2020-06-182020-06-18 A data processing method and device

Publications (2)

Publication NumberPublication Date
CN113779048A CN113779048A (en)2021-12-10
CN113779048Btrue CN113779048B (en)2025-02-21

Family

ID=78835008

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202010561160.9AActiveCN113779048B (en)2020-06-182020-06-18 A data processing method and device

Country Status (1)

CountryLink
CN (1)CN113779048B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114116842B (en)*2021-11-252023-05-19上海柯林布瑞信息技术有限公司Multidimensional medical data real-time acquisition method and device, electronic equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
KR20200056357A (en)*2020-03-172020-05-22주식회사 실크로드소프트Technique for implementing change data capture in database management system

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US8510270B2 (en)*2010-07-272013-08-13Oracle International CorporationMYSQL database heterogeneous log based replication
CN102841897B (en)*2011-06-232016-03-02阿里巴巴集团控股有限公司A kind of method, Apparatus and system realizing incremental data and extract
CN104881494B (en)*2015-06-122019-02-19北京奇虎科技有限公司 Method, device and system for data synchronization with Redis server
CN105488187A (en)*2015-12-022016-04-13北京四达时代软件技术股份有限公司Method and device for extracting multi-source heterogeneous data increment
US11281706B2 (en)*2016-09-262022-03-22Splunk Inc.Multi-layer partition allocation for query execution
CN108984564A (en)*2017-06-022018-12-11北京京东尚科信息技术有限公司Data-storage system, method and apparatus
CN109885617A (en)*2019-01-292019-06-14中国工商银行股份有限公司The method of data synchronization and device of Distributed Heterogeneous Database system
CN110297866A (en)*2019-05-202019-10-01平安普惠企业管理有限公司Method of data synchronization and data synchronization unit based on log analysis
CN110879813B (en)*2019-11-202024-04-12浪潮软件股份有限公司 A method for implementing incremental synchronization of MySQL database based on binary log parsing
CN111125214B (en)*2019-12-022023-08-25武汉虹信技术服务有限责任公司Lightweight incremental data synchronization method, device and computer readable medium
CN110990365A (en)*2019-12-032020-04-10北京奇艺世纪科技有限公司Data synchronization method, device, server and storage medium
CN111259081A (en)*2020-02-042020-06-09杭州数梦工场科技有限公司Data synchronization method and device, electronic equipment and storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
KR20200056357A (en)*2020-03-172020-05-22주식회사 실크로드소프트Technique for implementing change data capture in database management system

Also Published As

Publication numberPublication date
CN113779048A (en)2021-12-10

Similar Documents

PublicationPublication DateTitle
CN112445626B (en)Data processing method and device based on message middleware
CN111190888A (en)Method and device for managing graph database cluster
WO2014123564A1 (en)Cloud-based streaming data receiver and persister
CN112948138B (en) A method and device for processing messages
CN113760982B (en)Data processing method and device
CN113761001B (en) A cross-system data synchronization method and device
CN111753019B (en)Data partitioning method and device applied to data warehouse
CN110019123B (en)Data migration method and device
US20220138074A1 (en)Method, electronic device and computer program product for processing data
US11599425B2 (en)Method, electronic device and computer program product for storage management
CN111125107A (en)Data processing method, device, electronic equipment and medium
CN110209662A (en)A kind of method and apparatus of automation load data
CN110502317B (en)Transaction management method and device
CN112214500A (en)Data comparison method and device, electronic equipment and storage medium
CN112748866B (en)Incremental index data processing method and device
CN113297222B (en)Report data acquisition method and device, electronic equipment and storage medium
CN111858586A (en)Data processing method and device
CN113779082B (en)Method and device for updating data
CN113760966B (en) Data processing method and device based on heterogeneous database system
CN113779048B (en) A data processing method and device
CN116226134A (en)Method and device for writing data into file and data writing database
CN112910855B (en)Sample message processing method and device
CN113760860B (en)Data reading method and device
CN113347052B (en)Method and device for counting user access data through access log
CN116450622B (en)Method, apparatus, device and computer readable medium for data warehouse entry

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp