Movatterモバイル変換


[0]ホーム

URL:


CN111966660B - Database comparison method based on ODBC access - Google Patents

Database comparison method based on ODBC access
Download PDF

Info

Publication number
CN111966660B
CN111966660BCN202010648840.4ACN202010648840ACN111966660BCN 111966660 BCN111966660 BCN 111966660BCN 202010648840 ACN202010648840 ACN 202010648840ACN 111966660 BCN111966660 BCN 111966660B
Authority
CN
China
Prior art keywords
comparison
similarity
database
text
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010648840.4A
Other languages
Chinese (zh)
Other versions
CN111966660A (en
Inventor
李永刚
侯亚威
胡上成
郭力兵
汪毅
毛文
吴云
杨海民
伊瑞海
邓育民
黄为
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
No63686 Troops Pla
Original Assignee
No63686 Troops Pla
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by No63686 Troops PlafiledCriticalNo63686 Troops Pla
Priority to CN202010648840.4ApriorityCriticalpatent/CN111966660B/en
Publication of CN111966660ApublicationCriticalpatent/CN111966660A/en
Application grantedgrantedCritical
Publication of CN111966660BpublicationCriticalpatent/CN111966660B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

The invention relates to a database comparison method based on ODBC access. The method is based on an ODBC access technology and is mainly used for comparing the structure and the content of two relational databases. The method consists of four parts, and forms a database comparison method based on ODBC access. The four parts are an extended text similarity calculation method, a formatted text comparison method based on weighted optimization, a database content comparison method based on ODBC access and a database comparison method based on ODBC access respectively. The invention can realize a database comparison method irrelevant to a database manufacturer, and the comparison result is a better scheme, so that the invention can be applied to the related fields of the relational database.

Description

Database comparison method based on ODBC access
Technical Field
The invention belongs to the field of computer software application, relates to application of a relational database, can be used for content comparison of the relational database, and particularly relates to a database comparison method based on ODBC access.
Background
In the context of database applications, it is often necessary to compare two databases, which may be the same type of database or heterogeneous databases, in order to understand the data change or data consistency of different services or different periods.
In general, a database provider can provide functions of data migration, data import and export, data backup and the like, and if databases to be compared are databases of unified type of manufacturers, the two databases can communicate with each other, but most databases do not provide a database comparison function; for databases of different manufacturers, data communication is basically not possible directly, and thus data comparison cannot be performed directly.
Open database interconnect (OpenDatabase Connectivity, ODBC) is part of the microsoft corporation's open services architecture for databases, which establishes a set of specifications and provides a standard API for accessing databases. Through the ODBC API and the drivers provided by the database vendors, the upper layer application program can directly access the database by ignoring the differences of the database vendors.
The method realizes a database comparison method irrelevant to manufacturers by utilizing the ODBC technology, and can be applied to the related fields of the relational database.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a database comparison method based on ODBC access aiming at the prior art, so that the database comparison method irrelevant to manufacturers is realized, and the method can be applied to the fields related to relational databases.
The invention solves the problems by adopting the following technical scheme: the database comparison method based on ODBC access, the input data of the method is the connection information of two groups of databases, the database content is obtained through an ODBC access technology, and the output information is the comparison result of two groups of database tables, and the method specifically comprises the following steps:
step 1), establishing ODBC connection and initializing a comparison environment;
Step 2), acquiring database table field information and database table contents;
Step 3), constructing a formatted text aiming at each table, calculating a comparison result of a database table structure, if the overall similarity is smaller than 1, the table comparison result is different, continuing the next table comparison, otherwise, turning to step 4);
and 4, calculating a comparison result of the table contents of the database for each table, wherein if the comparison results are the same, the table comparison results are the same, otherwise, the table comparison results are different.
Preferably, the formatted text in the step 3) includes a text key and text content, the text key includes a column name, a type, a length, whether the text key can be empty, whether the text key is a primary key, a default value and a description, and the text content is database related information.
Preferably, step 3) performs comparison on the formatted text constructed by each table by adopting a weighted preferred formatted text comparison method, wherein the input content of the method is two groups of formatted texts Rx, ry and comparison parameters Spx and Spy, wherein the comparison parameters Spx and Spy respectively comprise a comparison column and a key column, and the output content is two groups of formatted texts based on an optimal comparison strategy, the comparison similarity of each row of formatted texts and the overall similarity.
Preferably, the formatted text comparison method adopting weighting-based preference comprises the following steps:
and 1) data are tidied, and a comparison environment is initialized.
Step 2) calculating the similarity of the comparison units by adopting an expanded text similarity calculation method for the data columns to be compared;
step 3) constructing two groups of formatted text similarity matrixes;
Step 4) searching and sequencing an optimal comparison strategy of each line of texts;
step 5) outputting the comparison strategy and the text similarity.
Preferably, in step 2), an extended text similarity calculation method is provided, which is based on the longest common substring LCS and the edit distance LD, for two groups of input strings, obtaining the longest common substring and the edit distance, and calculating the similarity of the two groups of input strings.
Preferably, the method comprises the steps of:
step 1) initializing an LCS matrix and an LD matrix;
Step 2) respectively calculating LCS matrix first line data and LD matrix first line data;
step 3) according to each character of the character strings, circularly calculating an LCS matrix and other numerical values of the LD matrix, and respectively calculating LCS and LD distances of the two groups of character strings;
and 4) calculating the similarity between the two groups of character strings.
Compared with the prior art, the invention has the advantages that:
when the database comparison is carried out under the corresponding service scene, the invention can ignore the differences of the database providers, does not need to carry out special adaptation and software development on different database providers, directly realizes the comparison of the database contents, and can save considerable software development cost; meanwhile, after software implementation is carried out by using the content comparison and adaptation strategy provided by the scheme, a better comparison result can be quickly realized, and the problems of cost, efficiency, accuracy and the like caused by manual comparison are solved.
Drawings
FIG. 1 is a flowchart illustrating an implementation of an extended text similarity calculation method according to a part of the present invention.
FIG. 2 is a flow chart of an embodiment of a weighted-preference-based formatted text comparison method according to the second aspect of the present invention.
Fig. 3 is an example of a similarity matrix described in the second embodiment of the present invention, wherein the red background content is a step-wise selected optimal comparison scheme.
Fig. 4, 5, and 6 are examples of database comparison software running interfaces developed in accordance with the present invention.
Detailed Description
The invention is described in further detail below with reference to the embodiments of the drawings.
The embodiment provides a database comparison method based on ODBC access, which obtains database contents through an ODBC access technology and gives a comparison result of a relational database. The input data of the method is two groups of ODBC connection information, and the output information is the comparison result of two groups of database tables.
The method comprises the following basic steps:
Step 1, if the ODBC connection is not established in the environment, a database connection handle is created through SQLSETENVATTR, and if the connection fails, an error is directly returned. Database connection handles are then created by SQLConnect.
Step 2, calling SQLTables to obtain all table information in the database, and calling SQLColumns to obtain information content of each table field.
And 3, constructing a formatted text aiming at each table, wherein the text keys is { "column name", "type", "length", "whether the text can be empty", "whether the text is a primary key", "default value", "description" }, and the text content is related information of the database. And calculating a comparison result of the table structure of the database by a formatted text comparison method based on weighted optimization, if the overall similarity is smaller than 1, the table comparison result is different, continuing the next table comparison, otherwise, turning to the step 4.
The described formatted text is a text string with self-describing structure, and has the following characteristics:
(1) The text string is made up of multiple lines, each line being fed by a line Fu Jiange;
(2) Each line of data of the text character string consists of a plurality of columns, each column comprises the same column number at intervals;
(3) The text string is first behavioural column information (also called keys), and other behavioural data information except for the first behaviours.
The following is an example of formatted text:
ID,NAME,BIRTHDAY,SCHOOLID,CLASSNAME
0,Lucy wang,1970-05-16,0,Math
1,Teacher Liu,1979-03-16,1,English
2,Teacher Zhao,1958-03-16,3,History。
And 4, calculating a comparison result of the table contents of the database by a database content comparison method based on ODBC access for each table, wherein if the comparison results are the same, the table comparison results are the same, otherwise, the table comparison results are different.
The database content comparison method based on ODBC access obtains the database content through an ODBC access technology and gives out a comparison result of the relational database content. The input data of the method are two groups of ODBC connection information, a database table to be compared and field priority (optional) of manual intervention, and the output information is a comparison result of the contents of the database table to be compared. The method comprises the following steps:
Step 1, if the ODBC connection is not established in the environment, a database connection handle is created through SQLSETENVATTR, and if the connection fails, an error is directly returned. Database connection handles are then created by SQLConnect.
Step2, acquiring database table field information through SQLColumns, and calling a Select from table_name through SQLExecDirect or SQLExecute; and acquiring the whole content in the database table.
And step 3, calling a part II, namely a formatting text comparison method based on weighted optimization, and calculating comparison results of table contents of two groups of databases, wherein if the overall similarity is smaller than 1, the database comparison results are different.
According to the formatted text comparison method based on weighted preference, input contents of the method are two groups of formatted texts Rx and Ry comparison parameters Spx and Spy, wherein Spx and Spy comprise (comparison columns and key columns), and output contents are two groups of formatted texts based on an optimal comparison strategy, comparison similarity of each row of formatted texts and overall similarity. The method specifically comprises the following steps:
Step 1, data are arranged, consistency of parameters Spx and Spy is tested, matching of Spx and Rx and matching of Spy and Ry are tested, and if the parameters are not matched, errors are directly returned; distributing a similarity matrix space; if the Rx length is less than the Ry length, the two sets of text are exchanged.
Step 2, for each row Ry [ j ] of Rx [ i ] and Ry of Rx, the following loop is performed:
For each input column to be compared Spx [ k ] and Spy [ l ], the text Rx [ i ] [ Spx [ k ] ] and Ry [ j ] [ Spy [ l ] ] similarity is calculated by an extended text similarity calculation method, and each column to be compared similarity is added.
For each key column Spx [ k ] and Spy [ l ] entered, text Rx [ i ] [ Spx [ k ] ] and Ry [ j ] [ Spy [ l ] ] similarities are calculated by an extended text similarity calculation method, and each key column similarity is added.
The similarity of the columns to be compared and the similarity of the key columns are weighted and summed to obtain Sij, wherein
Wherein s1 and s2 are the comparison column similarity and the key column similarity, respectively, and len (spx) and len (spx') are the comparison column number and the key column number, respectively.
Step 3, each group of similarity data Sij forms a similarity matrix S [ i ] [ j ] =Sij, (0.ltoreq.i < len (Rx), 0.ltoreq.j < len (Ry))
And (4) marking all elements as not searched for the similarity matrix S, and sequentially operating according to the step (4).
And 4, searching the largest unlabeled element in the similarity matrix S, recording row and column numbers (i, j) of the current position, and marking the i th row, the j th row, the i th column and the j th column of the matrix to be searched, if the unlabeled row number in the current matrix S is 0 or the unlabeled column number in the current matrix S is 0, turning to the step 5, otherwise, continuing to the step 4.
Step 5, sorting the recorded rank numbers (i, j)h, h=1, 2, & ltlen (Ry) according to the size of i, to obtain an optimal comparison strategy Rlen(Ry)*2 of the formatted text, comparing the Ri 1 st line in the text 1 with the Ri,2 th line in the text 2 in turn, i=1, 2, & ltlen (Ry) with the highest overall similarity after comparison, and the similarity value isOverall similarity value
The expanded text similarity calculation method refers to the concept based on the Longest Common Substring (LCS) and the editing distance (LD), wherein input data are two groups of character strings A=a1a2...am、B=b1b2...bn, and an output result is the similarity S of the two groups of character strings. The method comprises the following steps:
Step 1, constructing an LCS matrix LCSm*n and an LD matrix LDm*n.
Step 2, respectively setting LCS matrix and LD matrix head line data: LCS1,i=0,i=1,2,...,n,LD1,i =i, i=1, 2,..
Step 3, for i=2..m, j=2..n were cycled as follows, respectively
Can be obtained
LCS(A,B)=LCSm,n LD(A,B)=LDm,n
Step 4, calculating the similarity of the character strings
In addition to the above embodiments, the present invention also includes other embodiments, and all technical solutions that are formed by equivalent transformation or equivalent substitution should fall within the protection scope of the claims of the present invention.

Claims (5)

CN202010648840.4A2020-07-082020-07-08Database comparison method based on ODBC accessActiveCN111966660B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202010648840.4ACN111966660B (en)2020-07-082020-07-08Database comparison method based on ODBC access

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202010648840.4ACN111966660B (en)2020-07-082020-07-08Database comparison method based on ODBC access

Publications (2)

Publication NumberPublication Date
CN111966660A CN111966660A (en)2020-11-20
CN111966660Btrue CN111966660B (en)2024-07-05

Family

ID=73361879

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202010648840.4AActiveCN111966660B (en)2020-07-082020-07-08Database comparison method based on ODBC access

Country Status (1)

CountryLink
CN (1)CN111966660B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114625710A (en)*2022-05-122022-06-14深圳市巨力方视觉技术有限公司Visual integration system capable of taking historical data for identification

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN107784102A (en)*2017-10-272018-03-09中国电子科技集团公司第二十八研究所A kind of data difference comparative approach based on oracle database
CN111046043A (en)*2019-12-112020-04-21北京西骏数据科技股份有限公司 A Fast and Accurate Verification Method for Database Tables

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
AU2001281111A1 (en)*2000-08-042002-02-18Infoglide CorporationSystem and method for comparing heterogeneous data sources
GB0217201D0 (en)*2002-07-242002-09-04Beach Solutions LtdXML database differencing engine
US7788282B2 (en)*2004-09-162010-08-31International Business Machines CorporationMethods and computer programs for database structure comparison

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN107784102A (en)*2017-10-272018-03-09中国电子科技集团公司第二十八研究所A kind of data difference comparative approach based on oracle database
CN111046043A (en)*2019-12-112020-04-21北京西骏数据科技股份有限公司 A Fast and Accurate Verification Method for Database Tables

Also Published As

Publication numberPublication date
CN111966660A (en)2020-11-20

Similar Documents

PublicationPublication DateTitle
Haas et al.DiscoveryLink: A system for integrated access to life sciences data sources
US7133864B2 (en)System and method for accessing biological data
CN100424696C (en) Method and system for processing abstract queries
US8700494B2 (en)Identifying product variants
US6684204B1 (en)Method for conducting a search on a network which includes documents having a plurality of tags
US7225197B2 (en)Data entry, cross reference database and search systems and methods thereof
US9251296B2 (en)Interactively setting a search value in a data finder tool
US20120278349A1 (en)Systems and methods for manipulation of inexact semi-structured data
US20090063545A1 (en)Systems and methods for generating an entity diagram
US11386063B2 (en)Data edge platform for improved storage and analytics
GB2414833A (en)Creating dynamic hierarchies based on search queries
CN1705945A (en)Global query correlation attributes
US8041731B2 (en)Efficient evaluation of SQL pivot operations
CN111858567A (en)Method and system for cleaning government affair data through standard data elements
US7483876B2 (en)Flexible access of data stored in a database
JP2019040598A5 (en)
Tseng et al.Integrating heterogeneous data warehouses using XML technologies
CN111966660B (en)Database comparison method based on ODBC access
WO2004097679A1 (en)Database device, database search device, and method thereof
US11620282B2 (en)Automated information retrieval system and semantic parsing
US10198249B1 (en)Accessing schema-free databases
US7676487B2 (en)Method and system for formatting and indexing data
US8005844B2 (en)On-line organization of data sets
JP5056384B2 (en) Search program, method and apparatus
DahlquistUsing Gen MAPP and MAPPFinder to View Microarray Data on Biological Pathways and Identify Global Trends in the Data

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp