Disclosure of Invention
The present invention proposes the following technical solution to one or more of the above technical drawbacks of the prior art.
A data matching method based on a temporary table of a database, the method comprising:
the method comprises the steps of obtaining a first data table of a source database, a second data table of a target database and a data matching range;
judging whether the data quantity of the first data table and the second data table in the data matching range is larger than a first threshold value or not, if not, directly matching the data of the first data table in the data matching range with the data of the second data table in the data matching range to obtain a matching result, and if so, performing temporary table matching;
and a temporary table matching step, namely establishing a first temporary data table in the source database, establishing a second temporary data table and a third temporary data table in the target database, and performing matching based on the first temporary data table, the second temporary data table and the third temporary data table to obtain a matching result.
Further, the operation of matching based on the first temporary data table, the second temporary data table and the third temporary data table to obtain a matching result is as follows: and calculating the MD5 value of the data of the first data table in the data matching range, storing the MD5 value of the data of the second data table in the data matching range in a first temporary data table, storing the MD5 value in a second temporary data table, inserting the MD5 value in the first temporary data table into a third temporary data table, and carrying out left link or internal link on the second temporary data table and the third temporary data table to obtain a matching result.
Still further, the matching result includes at least one of: the first data table and the second data table are the same data, the first data table and the second data table are different data, the second data table is more missing data than the first data table, and the second data table is more data than the first data table.
Further, after the matching is completed, initializing a diff thread, a missing thread, an extra thread and a match thread, wherein the diff thread is used for outputting different data in the first data table and the second data table, the missing thread is used for outputting missing data in the second data table compared with the first data table, the extra thread is used for outputting more data in the second data table compared with the first data table, and the match thread is used for outputting the same data in the first data table and the second data table.
Furthermore, initializing corresponding diff queues, missing queues, extra queues and match queues in the memory for the diff threads, missing threads, extra threads and match threads is used for realizing a producer and consumer relation pool, and the size of the memory occupied by the relation pool is as follows:
if the number of the pins is not equal,
is greater than->
,
Otherwise the first set of parameters is selected,
;
wherein ,
=1, 2, 3, 4 represent diff, missing, extra, and match queues, respectively, +_>
Representing the memory size occupied by the corresponding queue implementation producer and consumer relationship pool,/->
Representing the amount of data generated per time of the corresponding queue,/->
Representing the amount of data consumed per time of the corresponding queue,/->
Indicating the total data that the corresponding queue needs to outputQuantity (S)>
Representing the total time required for the total data amount output by the corresponding queue.
The invention also provides a data matching device based on the temporary table of the database, which comprises:
the acquisition unit acquires a first data table of the source database, a second data table of the target database and a data matching range;
the judging unit is used for judging whether the data quantity of the first data table and the second data table in the data matching range is larger than a first threshold value or not, if not, directly matching the data of the first data table in the data matching range with the data of the second data table in the data matching range to obtain a matching result, and if so, performing temporary table matching;
and the temporary table matching unit is used for establishing a first temporary data table in the source database, establishing a second temporary data table and a third temporary data table in the target database, and performing matching based on the first temporary data table, the second temporary data table and the third temporary data table to obtain a matching result.
Further, the operation of matching based on the first temporary data table, the second temporary data table and the third temporary data table to obtain a matching result is as follows: and calculating the MD5 value of the data of the first data table in the data matching range, storing the MD5 value in the first temporary data table, calculating the MD5 value of the data of the second data table in the data matching range, storing the MD5 value in the first temporary data table in the second temporary data table, inserting the MD5 value in the first temporary data table into the third temporary data table, and carrying out left link or internal link on the second temporary data table and the third temporary data table to obtain a matching result.
Still further, the matching result includes at least one of: the first data table and the second data table are the same data, the first data table and the second data table are different data, the second data table is more missing data than the first data table, and the second data table is more data than the first data table.
Further, after the matching is completed, initializing a diff thread, a missing thread, an extra thread and a match thread, wherein the diff thread is used for outputting different data in the first data table and the second data table, the missing thread is used for outputting missing data in the second data table compared with the first data table, the extra thread is used for outputting more data in the second data table compared with the first data table, and the match thread is used for outputting the same data in the first data table and the second data table.
Furthermore, initializing corresponding diff queues, missing queues, extra queues and match queues in the memory for the diff threads, missing threads, extra threads and match threads is used for realizing a producer and consumer relation pool, and the size of the memory occupied by the relation pool is as follows:
if the number of the pins is not equal,
is greater than->
,
Otherwise the first set of parameters is selected,
;
wherein ,
=1, 2, 3, 4 represent diff, missing, extra, and match queues, respectively, +_>
Representing the memory size occupied by the corresponding queue implementation producer and consumer relationship pool,/->
Representing the amount of data generated per time of the corresponding queue,/->
Representing the amount of data consumed per time of the corresponding queue,/->
Indicating the total data amount that the corresponding queue needs to output, < >>
Representing the total time required for the total data amount output by the corresponding queue.
The invention also proposes a computer readable storage medium having stored thereon computer program code which, when executed by a computer, performs any of the methods described above.
The invention has the technical effects that: the invention discloses a data matching method, a device and a storage medium based on a temporary database table, wherein the method comprises the following steps: the method comprises the steps of obtaining a first data table of a source database, a second data table of a target database and a data matching range; judging whether the data quantity of the first data table and the second data table in the data matching range is larger than a first threshold value or not, if not, directly matching the data of the first data table in the data matching range with the data of the second data table in the data matching range to obtain a matching result, and if so, performing temporary table matching; and a temporary table matching step, namely establishing a first temporary data table in the source database, establishing a second temporary data table and a third temporary data table in the target database, and performing matching based on the first temporary data table, the second temporary data table and the third temporary data table to obtain a matching result. In the invention, the matching range of the data is designated by the user, so that all data in all data tables are prevented from being matched, the calculated amount during data matching is reduced, and the data matching efficiency is improved. The space is saved, after the client side exits from the session, the temporary table can be automatically drop, and no data information occupies the database space; privacy, the client establishes a temporary table which is only used for specific transactions, and the table has special purpose and privacy and does not need to be shared for other transactions; the invention further improves the data security and improves the data matching efficiency by calculating the MD5 value of the data in the corresponding data matching range of the first data table and the second data table, and inserting the MD5 value in the first temporary data table of the source database (namely the source end) into the third temporary data table positioned on the target database (namely the target segment), and matching the MD5 value in the second temporary data table and the third temporary data table by the target data, thereby completing the matching of the data in the corresponding data matching range of the first data table and the second data table, and further improving the data security and the data matching efficiency by storing the MD5 value of the corresponding data in the temporary data table instead of the data per se.
Detailed Description
The present application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.
It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 shows a data matching method based on a temporary table of a database, the method comprising:
in the step S101, a first data table of the source database and a second data table of the target database, and a data matching range are obtained, where the data matching range may be two data tables with the same table name specified by the user, or may be two data tables with different table names, or may be a row or a column in two data tables, for example, a 5 th row in the first data table matches a 7 th row in the second data table, or a 3 rd column in the first data table matches a 7 th column in the second data table.
Step S102, judging whether the data quantity of the first data table and the second data table in the data matching range is larger than a first threshold value, if not, directly matching the data of the first data table in the data matching range with the data of the second data table in the data matching range to obtain a matching result, and if so, performing temporary table matching;
and step S103, a first temporary data table is established in the source database, a second temporary data table and a third temporary data table are established in the target database, and matching is performed based on the first temporary data table, the second temporary data table and the third temporary data table to obtain a matching result.
In the invention, a first data table of a source database, a second data table of a target database and a data matching range are firstly obtained, then whether the data quantity of the first data table and the second data table in the data matching range is larger than a first threshold value is judged, if not, the data of the first data table in the data matching range is directly matched with the data of the second data table in the data matching range to obtain a matching result, if so, a first temporary data table is built in the source database, a second temporary data table and a third temporary data table are built in the target database, and the matching result is obtained based on the first temporary data table, the second temporary data table and the third temporary data table. In the invention, the matching range of the data is designated by the user, so that all the data in all the data tables are prevented from being matched, the calculated amount during data matching is reduced, the data matching efficiency is improved, and the matching range of the data can be set by the user in a GUI (graphical user interface), command line and other modes. In the invention, the matching is also based on judging the size of the matched data volume, when the data volume is smaller, the matching is directly carried out, when the data volume is larger, the matching is carried out based on a temporary table, and when the temporary table is matched, the computing matching is carried out based on an MD5 value, and as the temporary matching is adopted, the following technical effects are achieved: the space is saved, after the client side exits from the session, the temporary table can be automatically drop, and no data information occupies the database space; privacy, the client establishes a temporary table which is only used for specific transactions, and the table has special purpose and privacy and does not need to be shared for other transactions; the efficiency is high, and the temporary table established by the client has independent operation and read-write performance, so that the processing speed and the processing efficiency are higher, which is another important invention point of the invention.
In a further embodiment, the first temporary data table, the second temporary data table and the third temporary data table are set in the memory to be accessible only by the corresponding process that created them, and other processes cannot access them, i.e. the clients (the source database, the client of the target database) establish the temporary table to serve only specific transactions, which table has the special purpose and the privacy, and does not need to be shared to other transactions, which improves the security of the data, which is another important invention point of the present invention.
In a further embodiment, the operation of matching based on the first temporary data table, the second temporary data table and the third temporary data table to obtain a matching result is: and calculating the MD5 value of the data of the first data table in the data matching range, storing the MD5 value of the data of the second data table in the data matching range in a first temporary data table, storing the MD5 value of the data of the second data table in a second temporary data table, inserting the MD5 value in the first temporary data table into a third temporary data table, and carrying out left link, internal link or right link on the second temporary data table and the third temporary data table to obtain a matching result.
In the invention, the MD5 value of the data in the corresponding data matching range of the first data table and the second data table is calculated and then written into the first temporary data table and the second temporary data table, and the MD5 value in the first temporary data table of the source database (namely the source end) is inserted into the third temporary data table positioned on the target database (namely the target section), and the MD5 value in the second temporary table and the third temporary table is matched on the target data, so that the matching of the data in the corresponding data matching range of the first data table and the second data table is completed.
In a further embodiment, the matching result comprises at least one of: the first data table and the second data table are the same data, the first data table and the second data table are different data, the second data table is more missing data than the first data table, and the second data table is more data than the first data table. Based on the matching results, the data synchronization of the source end and the target end can be performed.
In a further embodiment, after the matching is completed, a diff thread, a missing thread, an extra thread and a match thread are initialized, the diff thread is used for outputting data different from the first data table and the second data table, the missing thread is used for outputting data missing from the second data table than the first data table, the extra thread is used for outputting data more from the second data table than the first data table, and the match thread is used for outputting data identical to the first data table and the second data table. In the invention, by initializing the corresponding threads, the threads can run in parallel, thereby realizing the output of different matched data results and improving the data output efficiency, which is another important invention point of the invention.
In a further embodiment, for the diff thread, the missing thread, the extra thread, and the match thread, initializing corresponding diff queues, missing queues, extra queues, and match queues in a memory, where the relationship pool is used to implement a producer-consumer relationship pool, and the memory size occupied by the relationship pool is:
if the number of the pins is not equal,
is greater than->
,/>
Otherwise the first set of parameters is selected,
;
wherein ,
=1, 2, 3, 4 represent diff, missing, extra, and match queues, respectively, +_>
Representing the memory size occupied by the corresponding queue implementation producer and consumer relationship pool,/->
Representing the amount of data generated per time of the corresponding queue,/->
Representing the amount of data consumed per time of the corresponding queue,/->
Indicating the total data amount that the corresponding queue needs to output, < >>
Representing the total time required for the total data amount output by the corresponding queue.
In the invention, in order to prevent the loss during data output, a producer consumer working mode is simulated through the initialized corresponding queue, so as to achieve the technical effects of data peak clipping, valley filling and decoupling.
Fig. 2 shows a data matching device based on a temporary table of a database of the present invention, the device comprising:
the obtaining unit 201 obtains a first data table of the source database and a second data table of the target database, and a data matching range, where the data matching range may be two data tables with the same table names specified by a user, or may be two data tables with different table names, or may be a row or a column in the two data tables, for example, a 5 th row in the first data table matches a 7 th row in the second data table, or a 3 rd column in the first data table matches a 7 th column in the second data table.
The judging unit 202 is configured to judge whether the data amounts of the first data table and the second data table located in the data matching range are both greater than a first threshold, if not, directly match the data of the first data table located in the data matching range with the data of the second data table located in the data matching range to obtain a matching result, and if so, perform temporary table matching;
and a temporary table matching unit 203, configured to establish a first temporary data table in the source database, establish a second temporary data table and a third temporary data table in the target database, and perform matching based on the first temporary data table, the second temporary data table and the third temporary data table to obtain a matching result.
In the invention, a first data table of a source database, a second data table of a target database and a data matching range are firstly obtained, then whether the data quantity of the first data table and the second data table in the data matching range is larger than a first threshold value is judged, if not, the data of the first data table in the data matching range is directly matched with the data of the second data table in the data matching range to obtain a matching result, if so, a first temporary data table is built in the source database, a second temporary data table and a third temporary data table are built in the target database, and the matching result is obtained based on the first temporary data table, the second temporary data table and the third temporary data table. In the invention, the matching range of the data is designated by the user, so that all the data in all the data tables are prevented from being matched, the calculated amount during data matching is reduced, the data matching efficiency is improved, and the matching range of the data can be set by the user in a GUI (graphical user interface), command line and other modes. In the invention, the matching is also based on judging the size of the matched data volume, when the data volume is smaller, the matching is directly carried out, when the data volume is larger, the matching is carried out based on a temporary table, and when the temporary table is matched, the computing matching is carried out based on an MD5 value, and as the temporary matching is adopted, the following technical effects are achieved: the space is saved, after the client side exits from the session, the temporary table can be automatically drop, and no data information occupies the database space; privacy, the client establishes a temporary table which is only used for specific transactions, and the table has special purpose and privacy and does not need to be shared for other transactions; the efficiency is high, and the temporary table established by the client has independent operation and read-write performance, so that the processing speed and the processing efficiency are higher, which is another important invention point of the invention.
In a further embodiment, the first temporary data table, the second temporary data table and the third temporary data table are set in the memory to be accessible only by the corresponding process that created them, and other processes cannot access them, i.e. the clients (the source database, the client of the target database) establish the temporary table to serve only specific transactions, which table has the special purpose and the privacy, and does not need to be shared to other transactions, which improves the security of the data, which is another important invention point of the present invention.
In a further embodiment, the operation of matching based on the first temporary data table, the second temporary data table and the third temporary data table to obtain a matching result is: and calculating the MD5 value of the data of the first data table in the data matching range, storing the MD5 value of the data of the second data table in the data matching range in a first temporary data table, storing the MD5 value of the data of the second data table in a second temporary data table, inserting the MD5 value in the first temporary data table into a third temporary data table, and carrying out left link, internal link or right link on the second temporary data table and the third temporary data table to obtain a matching result.
In the invention, the MD5 value of the data in the corresponding data matching range of the first data table and the second data table is calculated and then written into the first temporary data table and the second temporary data table, and the MD5 value in the first temporary data table of the source database (namely the source end) is inserted into the third temporary data table positioned on the target database (namely the target section), and the MD5 value in the second temporary table and the third temporary table is matched on the target data, so that the matching of the data in the corresponding data matching range of the first data table and the second data table is completed.
In a further embodiment, the matching result comprises at least one of: the first data table and the second data table are the same data, the first data table and the second data table are different data, the second data table is more missing data than the first data table, and the second data table is more data than the first data table. Based on the matching results, the data synchronization of the source end and the target end can be performed.
In a further embodiment, after the matching is completed, a diff thread, a missing thread, an extra thread and a match thread are initialized, the diff thread is used for outputting data different from the first data table and the second data table, the missing thread is used for outputting data missing from the second data table than the first data table, the extra thread is used for outputting data more from the second data table than the first data table, and the match thread is used for outputting data identical to the first data table and the second data table. In the invention, by initializing the corresponding threads, the threads can run in parallel, thereby realizing the output of different matched data results and improving the data output efficiency, which is another important invention point of the invention.
In a further embodiment, for the diff thread, the missing thread, the extra thread, and the match thread, initializing corresponding diff queues, missing queues, extra queues, and match queues in a memory, where the relationship pool is used to implement a producer-consumer relationship pool, and the memory size occupied by the relationship pool is:
if the number of the pins is not equal,
is greater than->
,
Otherwise the first set of parameters is selected,
;
wherein ,
=1, 2, 3, 4 represent diff queue, missing queue, extra queue and match queue respectively,
representing the memory size occupied by the corresponding queue implementation producer and consumer relationship pool,/->
Representing the amount of data generated per time of the corresponding queue,/->
Representing the amount of data consumed per time of the corresponding queue,/->
Indicating the total data amount that the corresponding queue needs to output, < >>
Representing the total time required for the total data amount output by the corresponding queue.
In the invention, in order to prevent the loss during data output, a producer consumer working mode is simulated through the initialized corresponding queue, so as to achieve the technical effects of data peak clipping, valley filling and decoupling.
In one embodiment of the invention a computer storage medium is provided, on which a computer program is stored, which computer storage medium may be a hard disk, DVD, CD, flash memory or the like, which computer program, when being executed by a processor, carries out the above-mentioned method.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present application.
From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solutions of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the apparatus described in the embodiments or some parts of the embodiments of the present application.
Finally, what should be said is: the above embodiments are merely for illustrating the technical aspects of the present invention, and it should be understood by those skilled in the art that although the present invention has been described in detail with reference to the above embodiments: modifications and equivalents may be made thereto without departing from the spirit and scope of the invention, which is intended to be encompassed by the claims.