Disclosure of Invention
The technical problem to be solved by the invention is to provide a database comparison method based on ODBC access aiming at the prior art, so that the database comparison method irrelevant to manufacturers is realized, and the method can be applied to the fields related to relational databases.
The invention solves the problems by adopting the following technical scheme: the database comparison method based on ODBC access, the input data of the method is the connection information of two groups of databases, the database content is obtained through an ODBC access technology, and the output information is the comparison result of two groups of database tables, and the method specifically comprises the following steps:
step 1), establishing ODBC connection and initializing a comparison environment;
Step 2), acquiring database table field information and database table contents;
Step 3), constructing a formatted text aiming at each table, calculating a comparison result of a database table structure, if the overall similarity is smaller than 1, the table comparison result is different, continuing the next table comparison, otherwise, turning to step 4);
and 4, calculating a comparison result of the table contents of the database for each table, wherein if the comparison results are the same, the table comparison results are the same, otherwise, the table comparison results are different.
Preferably, the formatted text in the step 3) includes a text key and text content, the text key includes a column name, a type, a length, whether the text key can be empty, whether the text key is a primary key, a default value and a description, and the text content is database related information.
Preferably, step 3) performs comparison on the formatted text constructed by each table by adopting a weighted preferred formatted text comparison method, wherein the input content of the method is two groups of formatted texts Rx, ry and comparison parameters Spx and Spy, wherein the comparison parameters Spx and Spy respectively comprise a comparison column and a key column, and the output content is two groups of formatted texts based on an optimal comparison strategy, the comparison similarity of each row of formatted texts and the overall similarity.
Preferably, the formatted text comparison method adopting weighting-based preference comprises the following steps:
and 1) data are tidied, and a comparison environment is initialized.
Step 2) calculating the similarity of the comparison units by adopting an expanded text similarity calculation method for the data columns to be compared;
step 3) constructing two groups of formatted text similarity matrixes;
Step 4) searching and sequencing an optimal comparison strategy of each line of texts;
step 5) outputting the comparison strategy and the text similarity.
Preferably, in step 2), an extended text similarity calculation method is provided, which is based on the longest common substring LCS and the edit distance LD, for two groups of input strings, obtaining the longest common substring and the edit distance, and calculating the similarity of the two groups of input strings.
Preferably, the method comprises the steps of:
step 1) initializing an LCS matrix and an LD matrix;
Step 2) respectively calculating LCS matrix first line data and LD matrix first line data;
step 3) according to each character of the character strings, circularly calculating an LCS matrix and other numerical values of the LD matrix, and respectively calculating LCS and LD distances of the two groups of character strings;
and 4) calculating the similarity between the two groups of character strings.
Compared with the prior art, the invention has the advantages that:
when the database comparison is carried out under the corresponding service scene, the invention can ignore the differences of the database providers, does not need to carry out special adaptation and software development on different database providers, directly realizes the comparison of the database contents, and can save considerable software development cost; meanwhile, after software implementation is carried out by using the content comparison and adaptation strategy provided by the scheme, a better comparison result can be quickly realized, and the problems of cost, efficiency, accuracy and the like caused by manual comparison are solved.
Detailed Description
The invention is described in further detail below with reference to the embodiments of the drawings.
The embodiment provides a database comparison method based on ODBC access, which obtains database contents through an ODBC access technology and gives a comparison result of a relational database. The input data of the method is two groups of ODBC connection information, and the output information is the comparison result of two groups of database tables.
The method comprises the following basic steps:
Step 1, if the ODBC connection is not established in the environment, a database connection handle is created through SQLSETENVATTR, and if the connection fails, an error is directly returned. Database connection handles are then created by SQLConnect.
Step 2, calling SQLTables to obtain all table information in the database, and calling SQLColumns to obtain information content of each table field.
And 3, constructing a formatted text aiming at each table, wherein the text keys is { "column name", "type", "length", "whether the text can be empty", "whether the text is a primary key", "default value", "description" }, and the text content is related information of the database. And calculating a comparison result of the table structure of the database by a formatted text comparison method based on weighted optimization, if the overall similarity is smaller than 1, the table comparison result is different, continuing the next table comparison, otherwise, turning to the step 4.
The described formatted text is a text string with self-describing structure, and has the following characteristics:
(1) The text string is made up of multiple lines, each line being fed by a line Fu Jiange;
(2) Each line of data of the text character string consists of a plurality of columns, each column comprises the same column number at intervals;
(3) The text string is first behavioural column information (also called keys), and other behavioural data information except for the first behaviours.
The following is an example of formatted text:
ID,NAME,BIRTHDAY,SCHOOLID,CLASSNAME
0,Lucy wang,1970-05-16,0,Math
1,Teacher Liu,1979-03-16,1,English
2,Teacher Zhao,1958-03-16,3,History。
And 4, calculating a comparison result of the table contents of the database by a database content comparison method based on ODBC access for each table, wherein if the comparison results are the same, the table comparison results are the same, otherwise, the table comparison results are different.
The database content comparison method based on ODBC access obtains the database content through an ODBC access technology and gives out a comparison result of the relational database content. The input data of the method are two groups of ODBC connection information, a database table to be compared and field priority (optional) of manual intervention, and the output information is a comparison result of the contents of the database table to be compared. The method comprises the following steps:
Step 1, if the ODBC connection is not established in the environment, a database connection handle is created through SQLSETENVATTR, and if the connection fails, an error is directly returned. Database connection handles are then created by SQLConnect.
Step2, acquiring database table field information through SQLColumns, and calling a Select from table_name through SQLExecDirect or SQLExecute; and acquiring the whole content in the database table.
And step 3, calling a part II, namely a formatting text comparison method based on weighted optimization, and calculating comparison results of table contents of two groups of databases, wherein if the overall similarity is smaller than 1, the database comparison results are different.
According to the formatted text comparison method based on weighted preference, input contents of the method are two groups of formatted texts Rx and Ry comparison parameters Spx and Spy, wherein Spx and Spy comprise (comparison columns and key columns), and output contents are two groups of formatted texts based on an optimal comparison strategy, comparison similarity of each row of formatted texts and overall similarity. The method specifically comprises the following steps:
Step 1, data are arranged, consistency of parameters Spx and Spy is tested, matching of Spx and Rx and matching of Spy and Ry are tested, and if the parameters are not matched, errors are directly returned; distributing a similarity matrix space; if the Rx length is less than the Ry length, the two sets of text are exchanged.
Step 2, for each row Ry [ j ] of Rx [ i ] and Ry of Rx, the following loop is performed:
For each input column to be compared Spx [ k ] and Spy [ l ], the text Rx [ i ] [ Spx [ k ] ] and Ry [ j ] [ Spy [ l ] ] similarity is calculated by an extended text similarity calculation method, and each column to be compared similarity is added.
For each key column Spx [ k ] and Spy [ l ] entered, text Rx [ i ] [ Spx [ k ] ] and Ry [ j ] [ Spy [ l ] ] similarities are calculated by an extended text similarity calculation method, and each key column similarity is added.
The similarity of the columns to be compared and the similarity of the key columns are weighted and summed to obtain Sij, wherein
Wherein s1 and s2 are the comparison column similarity and the key column similarity, respectively, and len (spx) and len (spx') are the comparison column number and the key column number, respectively.
Step 3, each group of similarity data Sij forms a similarity matrix S [ i ] [ j ] =Sij, (0.ltoreq.i < len (Rx), 0.ltoreq.j < len (Ry))
And (4) marking all elements as not searched for the similarity matrix S, and sequentially operating according to the step (4).
And 4, searching the largest unlabeled element in the similarity matrix S, recording row and column numbers (i, j) of the current position, and marking the i th row, the j th row, the i th column and the j th column of the matrix to be searched, if the unlabeled row number in the current matrix S is 0 or the unlabeled column number in the current matrix S is 0, turning to the step 5, otherwise, continuing to the step 4.
Step 5, sorting the recorded rank numbers (i, j)h, h=1, 2, & ltlen (Ry) according to the size of i, to obtain an optimal comparison strategy Rlen(Ry)*2 of the formatted text, comparing the Ri 1 st line in the text 1 with the Ri,2 th line in the text 2 in turn, i=1, 2, & ltlen (Ry) with the highest overall similarity after comparison, and the similarity value isOverall similarity value
The expanded text similarity calculation method refers to the concept based on the Longest Common Substring (LCS) and the editing distance (LD), wherein input data are two groups of character strings A=a1a2...am、B=b1b2...bn, and an output result is the similarity S of the two groups of character strings. The method comprises the following steps:
Step 1, constructing an LCS matrix LCSm*n and an LD matrix LDm*n.
Step 2, respectively setting LCS matrix and LD matrix head line data: LCS1,i=0,i=1,2,...,n,LD1,i =i, i=1, 2,..
Step 3, for i=2..m, j=2..n were cycled as follows, respectively
Can be obtained
LCS(A,B)=LCSm,n LD(A,B)=LDm,n
Step 4, calculating the similarity of the character strings
In addition to the above embodiments, the present invention also includes other embodiments, and all technical solutions that are formed by equivalent transformation or equivalent substitution should fall within the protection scope of the claims of the present invention.