CN111966660B

Movatterモバイル変換

Info

Publication number: CN111966660B
Application number: CN202010648840.4A
Authority: CN
Inventors: 李永刚; 侯亚威; 胡上成; 郭力兵; 汪毅; 毛文; 吴云; 杨海民; 伊瑞海; 邓育民; 黄为
Original assignee: No63686 Troops Pla
Current assignee: No63686 Troops Pla
Priority date: 2020-07-08
Filing date: 2020-07-08
Publication date: 2024-07-05
Anticipated expiration: 2040-07-08
Also published as: CN111966660A

Abstract

The invention relates to a database comparison method based on ODBC access. The method is based on an ODBC access technology and is mainly used for comparing the structure and the content of two relational databases. The method consists of four parts, and forms a database comparison method based on ODBC access. The four parts are an extended text similarity calculation method, a formatted text comparison method based on weighted optimization, a database content comparison method based on ODBC access and a database comparison method based on ODBC access respectively. The invention can realize a database comparison method irrelevant to a database manufacturer, and the comparison result is a better scheme, so that the invention can be applied to the related fields of the relational database.

Description

Database comparison method based on ODBC access

Technical Field

The invention belongs to the field of computer software application, relates to application of a relational database, can be used for content comparison of the relational database, and particularly relates to a database comparison method based on ODBC access.

Background

In the context of database applications, it is often necessary to compare two databases, which may be the same type of database or heterogeneous databases, in order to understand the data change or data consistency of different services or different periods.

In general, a database provider can provide functions of data migration, data import and export, data backup and the like, and if databases to be compared are databases of unified type of manufacturers, the two databases can communicate with each other, but most databases do not provide a database comparison function; for databases of different manufacturers, data communication is basically not possible directly, and thus data comparison cannot be performed directly.

Open database interconnect (OpenDatabase Connectivity, ODBC) is part of the microsoft corporation's open services architecture for databases, which establishes a set of specifications and provides a standard API for accessing databases. Through the ODBC API and the drivers provided by the database vendors, the upper layer application program can directly access the database by ignoring the differences of the database vendors.

The method realizes a database comparison method irrelevant to manufacturers by utilizing the ODBC technology, and can be applied to the related fields of the relational database.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a database comparison method based on ODBC access aiming at the prior art, so that the database comparison method irrelevant to manufacturers is realized, and the method can be applied to the fields related to relational databases.

The invention solves the problems by adopting the following technical scheme: the database comparison method based on ODBC access, the input data of the method is the connection information of two groups of databases, the database content is obtained through an ODBC access technology, and the output information is the comparison result of two groups of database tables, and the method specifically comprises the following steps:

step 1), establishing ODBC connection and initializing a comparison environment;

Step 2), acquiring database table field information and database table contents;

Step 3), constructing a formatted text aiming at each table, calculating a comparison result of a database table structure, if the overall similarity is smaller than 1, the table comparison result is different, continuing the next table comparison, otherwise, turning to step 4);

and 4, calculating a comparison result of the table contents of the database for each table, wherein if the comparison results are the same, the table comparison results are the same, otherwise, the table comparison results are different.

Preferably, the formatted text in the step 3) includes a text key and text content, the text key includes a column name, a type, a length, whether the text key can be empty, whether the text key is a primary key, a default value and a description, and the text content is database related information.

Preferably, step 3) performs comparison on the formatted text constructed by each table by adopting a weighted preferred formatted text comparison method, wherein the input content of the method is two groups of formatted texts Rx, ry and comparison parameters Spx and Spy, wherein the comparison parameters Spx and Spy respectively comprise a comparison column and a key column, and the output content is two groups of formatted texts based on an optimal comparison strategy, the comparison similarity of each row of formatted texts and the overall similarity.

Preferably, the formatted text comparison method adopting weighting-based preference comprises the following steps:

and 1) data are tidied, and a comparison environment is initialized.

Step 2) calculating the similarity of the comparison units by adopting an expanded text similarity calculation method for the data columns to be compared;

step 3) constructing two groups of formatted text similarity matrixes;

Step 4) searching and sequencing an optimal comparison strategy of each line of texts;

step 5) outputting the comparison strategy and the text similarity.

Preferably, in step 2), an extended text similarity calculation method is provided, which is based on the longest common substring LCS and the edit distance LD, for two groups of input strings, obtaining the longest common substring and the edit distance, and calculating the similarity of the two groups of input strings.

Preferably, the method comprises the steps of:

step 1) initializing an LCS matrix and an LD matrix;

Step 2) respectively calculating LCS matrix first line data and LD matrix first line data;

step 3) according to each character of the character strings, circularly calculating an LCS matrix and other numerical values of the LD matrix, and respectively calculating LCS and LD distances of the two groups of character strings;

and 4) calculating the similarity between the two groups of character strings.

Compared with the prior art, the invention has the advantages that:

when the database comparison is carried out under the corresponding service scene, the invention can ignore the differences of the database providers, does not need to carry out special adaptation and software development on different database providers, directly realizes the comparison of the database contents, and can save considerable software development cost; meanwhile, after software implementation is carried out by using the content comparison and adaptation strategy provided by the scheme, a better comparison result can be quickly realized, and the problems of cost, efficiency, accuracy and the like caused by manual comparison are solved.

Drawings

FIG. 1 is a flowchart illustrating an implementation of an extended text similarity calculation method according to a part of the present invention.

FIG. 2 is a flow chart of an embodiment of a weighted-preference-based formatted text comparison method according to the second aspect of the present invention.

Fig. 3 is an example of a similarity matrix described in the second embodiment of the present invention, wherein the red background content is a step-wise selected optimal comparison scheme.

Fig. 4, 5, and 6 are examples of database comparison software running interfaces developed in accordance with the present invention.

Detailed Description

The invention is described in further detail below with reference to the embodiments of the drawings.

The embodiment provides a database comparison method based on ODBC access, which obtains database contents through an ODBC access technology and gives a comparison result of a relational database. The input data of the method is two groups of ODBC connection information, and the output information is the comparison result of two groups of database tables.

The method comprises the following basic steps:

Step 1, if the ODBC connection is not established in the environment, a database connection handle is created through SQLSETENVATTR, and if the connection fails, an error is directly returned. Database connection handles are then created by SQLConnect.

Step 2, calling SQLTables to obtain all table information in the database, and calling SQLColumns to obtain information content of each table field.

And 3, constructing a formatted text aiming at each table, wherein the text keys is { "column name", "type", "length", "whether the text can be empty", "whether the text is a primary key", "default value", "description" }, and the text content is related information of the database. And calculating a comparison result of the table structure of the database by a formatted text comparison method based on weighted optimization, if the overall similarity is smaller than 1, the table comparison result is different, continuing the next table comparison, otherwise, turning to the step 4.

The described formatted text is a text string with self-describing structure, and has the following characteristics:

(1) The text string is made up of multiple lines, each line being fed by a line Fu Jiange;

(2) Each line of data of the text character string consists of a plurality of columns, each column comprises the same column number at intervals;

(3) The text string is first behavioural column information (also called keys), and other behavioural data information except for the first behaviours.

The following is an example of formatted text:

ID,NAME,BIRTHDAY,SCHOOLID,CLASSNAME

0,Lucy wang,1970-05-16,0,Math

1,Teacher Liu,1979-03-16,1,English

2,Teacher Zhao,1958-03-16,3,History。

And 4, calculating a comparison result of the table contents of the database by a database content comparison method based on ODBC access for each table, wherein if the comparison results are the same, the table comparison results are the same, otherwise, the table comparison results are different.

The database content comparison method based on ODBC access obtains the database content through an ODBC access technology and gives out a comparison result of the relational database content. The input data of the method are two groups of ODBC connection information, a database table to be compared and field priority (optional) of manual intervention, and the output information is a comparison result of the contents of the database table to be compared. The method comprises the following steps:

Step2, acquiring database table field information through SQLColumns, and calling a Select from table_name through SQLExecDirect or SQLExecute; and acquiring the whole content in the database table.

And step 3, calling a part II, namely a formatting text comparison method based on weighted optimization, and calculating comparison results of table contents of two groups of databases, wherein if the overall similarity is smaller than 1, the database comparison results are different.

According to the formatted text comparison method based on weighted preference, input contents of the method are two groups of formatted texts Rx and Ry comparison parameters Spx and Spy, wherein Spx and Spy comprise (comparison columns and key columns), and output contents are two groups of formatted texts based on an optimal comparison strategy, comparison similarity of each row of formatted texts and overall similarity. The method specifically comprises the following steps:

Step 1, data are arranged, consistency of parameters Spx and Spy is tested, matching of Spx and Rx and matching of Spy and Ry are tested, and if the parameters are not matched, errors are directly returned; distributing a similarity matrix space; if the Rx length is less than the Ry length, the two sets of text are exchanged.

Step 2, for each row Ry [ j ] of Rx [ i ] and Ry of Rx, the following loop is performed:

For each input column to be compared Spx [ k ] and Spy [ l ], the text Rx [ i ] [ Spx [ k ] ] and Ry [ j ] [ Spy [ l ] ] similarity is calculated by an extended text similarity calculation method, and each column to be compared similarity is added.

For each key column Spx [ k ] and Spy [ l ] entered, text Rx [ i ] [ Spx [ k ] ] and Ry [ j ] [ Spy [ l ] ] similarities are calculated by an extended text similarity calculation method, and each key column similarity is added.

Wherein s₁ and s₂ are the comparison column similarity and the key column similarity, respectively, and len (spx) and len (spx') are the comparison column number and the key column number, respectively.

Step 3, each group of similarity data S_ij forms a similarity matrix S [ i ] [ j ] =S_ij, (0.ltoreq.i < len (Rx), 0.ltoreq.j < len (Ry))

And (4) marking all elements as not searched for the similarity matrix S, and sequentially operating according to the step (4).

And 4, searching the largest unlabeled element in the similarity matrix S, recording row and column numbers (i, j) of the current position, and marking the i th row, the j th row, the i th column and the j th column of the matrix to be searched, if the unlabeled row number in the current matrix S is 0 or the unlabeled column number in the current matrix S is 0, turning to the step 5, otherwise, continuing to the step 4.

Step 5, sorting the recorded rank numbers (i, j)_h, h=1, 2, & ltlen (Ry) according to the size of i, to obtain an optimal comparison strategy R_len(Ry)*2 of the formatted text, comparing the Ri 1 st line in the text 1 with the R_i,2 th line in the text 2 in turn, i=1, 2, & ltlen (Ry) with the highest overall similarity after comparison, and the similarity value isOverall similarity value

The expanded text similarity calculation method refers to the concept based on the Longest Common Substring (LCS) and the editing distance (LD), wherein input data are two groups of character strings A=a₁a₂...a_m、B＝b₁b₂...b_n, and an output result is the similarity S of the two groups of character strings. The method comprises the following steps:

Step 1, constructing an LCS matrix LCS_m*n and an LD matrix LD_m*n.

Step 2, respectively setting LCS matrix and LD matrix head line data: LCS_1,i＝0,i＝1,2,...,n,LD_1,i =i, i=1, 2,..

Step 3, for i=2..m, j=2..n were cycled as follows, respectively

Can be obtained

LCS(A,B)＝LCS_m,n LD(A,B)＝LD_m,n

Step 4, calculating the similarity of the character strings

In addition to the above embodiments, the present invention also includes other embodiments, and all technical solutions that are formed by equivalent transformation or equivalent substitution should fall within the protection scope of the claims of the present invention.

Claims

1. The database comparison method based on ODBC access is characterized in that the input data of the method is connection information of two groups of databases, database contents are obtained through an ODBC access technology, and the output information is comparison results of two groups of database tables, and specifically comprises the following steps:

Step 3), constructing formatted text for each table, calculating the comparison result of the table structure of the database, if the overall similarity is smaller than 1, the table comparison result is different, continuing the next table comparison, otherwise turning to step 4),

Comparing the formatted texts constructed by each table by adopting a formatted text comparison method based on weighted preference, wherein the input content of the method is two groups of formatted texts Rx, ry and comparison parameters Spx and Spy, wherein the comparison parameters Spx and Spy respectively comprise a comparison column and a key column, and the output content is two groups of formatted texts based on an optimal comparison strategy, the comparison similarity and the overall similarity of each row of formatted texts;

The formatted text comparison method based on weighting preference comprises the following steps:

step 3.1) data are tidied, and a comparison environment is initialized;

Verifying consistency of the comparison parameters Spx and Spy, matching Spx with Rx and Spy with Ry, and if the comparison parameters are not matched with Rx and Spy, directly returning an error; distributing a similarity matrix space; if the Rx length is smaller than the Ry length, exchanging the two groups of texts;

Step 3.2) calculating the similarity of the comparison units by adopting an expanded text similarity calculation method for the data columns to be compared;

for each row Rx [ i ] of Rx and each row Ry [ j ] of Ry, the following loop is made:

For each comparison column Spx [ k ] and Spy [ l ] entered, calculating the similarity of the texts Rx [ i ] [ Spx [ k ] and Ry [ j ] [ Spy [ l ] by adopting an expanded text similarity calculation method, and adding the similarity of each comparison column;

For each key column Spx [ k ] and Spy [ l ] entered, calculating the similarity of the texts Rx [ i ] [ Spx [ k ] and Ry [ j ] [ Spy [ l ] by adopting an expanded text similarity calculation method, and adding the similarity of each key column;

weighting and summing the similarity of the comparison columns and the similarity of the key columns to obtain Sij, wherein

Wherein s₁ and s₂ are the comparison column similarity and the key column similarity, respectively, and len (spx) and len (spx') are the comparison column number and the key column number, respectively;

Step 3.3) constructing two groups of formatted text similarity matrixes;

step 3.4) searching and sequencing an optimal comparison strategy of each line of texts;

Searching the largest unlabeled element in the similarity matrix S, recording row and column numbers (i, j) of the current position, labeling the i th row, the j th row, the i th column and the j th column of the matrix, if the unlabeled row number in the current matrix S is 0 or the unlabeled column number in the current matrix S is 0, turning to the step 3.5), otherwise, continuing to carry out the step 3.4);

Step 3.5) outputting a comparison strategy and the text similarity;

The recorded rank numbers (i, j)_h, h=1, 2,..len (Ry) are ordered according to i size to obtain an optimal comparison strategy R_len(Ry)*2 of formatted texts, the R_i,1 th line in the text Rx is sequentially compared with the R_i,2 th line in the text Ry by i=1, 2,..len (Ry), the overall similarity after comparison is the highest, and the similarity value is the highestOverall similarity value

And 4) calculating a comparison result of the table contents of the database for each table, wherein if the comparison results are the same, the table comparison results are the same, otherwise, the table comparison results are different.

2. The ODBC-based database comparing method according to claim 1, wherein said formatted text in step 3) comprises text keys and text contents, the text keys comprising column names, types, lengths, whether empty, primary keys, default values and specifications, and the text contents being database-related information.

3. The method for comparing databases based on ODBC access according to claim 1, wherein in step 2), an extended text similarity calculation method is used for calculating the similarity of two inputted strings based on the longest common substring LCS and the edit distance LD by obtaining the longest common substring and the edit distance for the two inputted strings.

4. The database comparison method based on ODBC access according to claim 3, wherein said text similarity calculation method comprises the steps of:

step 1) initializing an LCS matrix and an LD matrix;

and 4) calculating the similarity between the two groups of character strings.

5. The database comparison method based on ODBC access according to claim 4, wherein said text similarity calculation method comprises the steps of:

Step 1) constructing an LCS matrix LCS_m*n and an LD matrix LD_m*n;

step 2) respectively setting LCS matrix and LD matrix head line data: LCS_1,i＝0,i＝1,2,...,n,LD_1,i =i, i=1, 2,..n;

Step 3) for i=2 respectively, m, j=2.,. N is cycled as follows

Can be obtained

LCS(A,B)＝LCS_m,n,LD(A,B)＝LD_m,n；

Step 4) calculating the similarity of the character strings