CN118210785A

Movatterモバイル変換

Info

Publication number: CN118210785A
Application number: CN202410612735.3A
Authority: CN
Inventors: 邱春新; 蒋科巍
Original assignee: Jiangxi Normal University
Current assignee: Jiangxi Normal University
Priority date: 2024-05-17
Filing date: 2024-05-17
Publication date: 2024-06-18
Anticipated expiration: 2044-05-17
Also published as: CN118210785B

Abstract

Translated fromChinese

本发明公开了一种基于数据校验的数据库迁移方法及系统，该数据库迁移方法首先识别迁移对象中基于架构的第一数据以及基于内容的第二数据，迁移服务器匹配第一数据的第一哈希值和第二哈希值，确定第一数据的完整性。源数据库将第二数据分为多个数据片段，再将数据片段迁移至目标数据库。迁移服务器匹配数据片段的第一摘要和第二摘要，确定第二数据的完整性。在迁移过程中，迁移服务器通过接收和验证哈希值、数据摘要等方式监控数据迁移状态，提升了数据安全性。针对访问热度过高的数据可能被多次修改，导致数据反复迁移，本发明采用具有优先级的迁移队列，反向调整热度过高的数据的优先级，避免反复迁移导致数据前后不一致。

The present invention discloses a database migration method and system based on data verification. The database migration method first identifies the first data based on the architecture and the second data based on the content in the migration object. The migration server matches the first hash value and the second hash value of the first data to determine the integrity of the first data. The source database divides the second data into multiple data segments, and then migrates the data segments to the target database. The migration server matches the first summary and the second summary of the data segment to determine the integrity of the second data. During the migration process, the migration server monitors the data migration status by receiving and verifying hash values, data summaries, etc., thereby improving data security. In view of the fact that data with too high access popularity may be modified multiple times, resulting in repeated migration of data, the present invention adopts a migration queue with priority to reversely adjust the priority of data with too high popularity to avoid data inconsistency caused by repeated migration.

Description

Translated fromChinese

一种基于数据校验的数据库迁移方法及系统A database migration method and system based on data verification

技术领域Technical Field

本发明涉及数据库处理技术领域，尤其涉及一种基于数据校验的数据库迁移方法及系统。The present invention relates to the technical field of database processing, and in particular to a database migration method and system based on data verification.

背景技术Background technique

出于业务系统升级、扩容、整合等需求，数据的迁移和同步是数据库的常态工作。公告号为CN113986825B的中国专利公开了一种数据迁移的系统、方法、装置、电子设备及可读存储介质，该系统通过远程的云服务器实现数据的迁移，可以显著提高常态化的数据迁移效率。但是该方法需要压缩数据库文件，要求数据库处于静态工作环境。公开号为CN112256675A的中国专利申请公开了一种数据迁移方法、装置、终端设备及存储介质，该方法通过开启旧数据库和新数据库之间的双写入操作，实现迁移过程中的数据库访问。但是双写入操作可能由于写入延迟导致两组数据库的数据不一致。Due to the needs of business system upgrades, expansions, and integrations, data migration and synchronization are normal tasks of the database. The Chinese patent with announcement number CN113986825B discloses a system, method, device, electronic device, and readable storage medium for data migration. The system implements data migration through a remote cloud server, which can significantly improve the efficiency of normalized data migration. However, this method requires compressing database files and requires the database to be in a static working environment. The Chinese patent application with publication number CN112256675A discloses a data migration method, device, terminal device, and storage medium. The method enables database access during the migration process by enabling dual write operations between the old database and the new database. However, the dual write operation may cause inconsistent data in the two sets of databases due to write delays.

进一步的，除了迁移时的写入操作外，迁移服务器中的不可信端口可能窃取或破坏数据内容，因此需要在迁移完成后执行数据校验。公开号为CN117290332A的中国专利申请公开了一种数据迁移校验方法、装置、设备及存储介质，该方法根据源文件的哈希数据及目标文件的哈希数据校验数据完整性。单一的哈希算法使得数据内容向服务器公开，若服务器中存在不可信终端，不可信终端可能窃取或损坏内容数据。因此需要一种允许迁移过程中访问数据库的迁移方法，同时在保证数据安全的前提下，通过数据校验避免迁移前后数据不一致。Furthermore, in addition to the write operation during migration, the untrusted port in the migration server may steal or destroy the data content, so it is necessary to perform data verification after the migration is completed. The Chinese patent application with publication number CN117290332A discloses a data migration verification method, device, equipment and storage medium, which verifies the data integrity based on the hash data of the source file and the hash data of the target file. A single hash algorithm makes the data content public to the server. If there is an untrusted terminal in the server, the untrusted terminal may steal or damage the content data. Therefore, a migration method is needed that allows access to the database during the migration process, and at the same time, under the premise of ensuring data security, avoids data inconsistency before and after migration through data verification.

发明内容Summary of the invention

针对上述问题，本发明提供了一种基于数据校验的数据库迁移方法及系统，该方法根据数据类型分批迁移数据，缩短关闭访问接口的时间。根据数据的访问和修改情况更新优先级，避免热度高的数据多次迁移导致数据不一致。进一步的，本发明的迁移服务器提供数据迁移校验服务，不参与数据的读写操作，在实现完整性校验的同时保证数据安全。In view of the above problems, the present invention provides a database migration method and system based on data verification, which migrates data in batches according to data types, shortening the time of closing access interfaces. The priority is updated according to the access and modification of data, avoiding multiple migrations of popular data leading to data inconsistency. Furthermore, the migration server of the present invention provides data migration verification services, does not participate in data read and write operations, and ensures data security while implementing integrity verification.

本申请的发明目的可通过以下技术手段实现：The invention objectives of this application can be achieved through the following technical means:

一种基于数据校验的数据库迁移方法，包括以下步骤：A database migration method based on data verification includes the following steps:

步骤1：源数据库关闭访问接口，源数据库扫描迁移对象，识别迁移对象中基于架构的第一数据以及基于内容的第二数据；Step 1: The source database closes the access interface, and the source database scans the migration object to identify the first data based on the architecture and the second data based on the content in the migration object;

步骤2：迁移服务器创建源数据库与目标数据库的迁移映射，源数据库根据迁移映射将第一数据迁移至目标数据库；Step 2: The migration server creates a migration mapping between the source database and the target database, and the source database migrates the first data to the target database according to the migration mapping;

步骤3：源数据库生成第一数据的第一哈希值，目标数据库生成第一数据的第二哈希值，迁移服务器接收第一哈希值和第二哈希值，若第一哈希值与第二哈希值匹配，进入步骤4，否则返回至步骤2；Step 3: The source database generates a first hash value of the first data, the target database generates a second hash value of the first data, and the migration server receives the first hash value and the second hash value. If the first hash value matches the second hash value, the process proceeds to step 4, otherwise, the process returns to step 2.

步骤4：源数据库开启访问接口，源数据库将第二数据分为多个数据片段，源数据库基于优先级生成数据片段的迁移队列，并创建每一数据片段的第一存储日志，迁移服务器接收第一存储日志并在目标数据库创建第二存储日志；Step 4: The source database opens an access interface, divides the second data into multiple data segments, generates a migration queue for the data segments based on priorities, and creates a first storage log for each data segment. The migration server receives the first storage log and creates a second storage log in the target database.

步骤5：源数据库根据迁移队列将数据片段迁移至目标数据库，目标数据库根据第二存储日志存储数据片段；Step 5: The source database migrates the data segments to the target database according to the migration queue, and the target database stores the data segments according to the second storage log;

步骤6：访问接口接收对数据片段的修改请求，源数据库根据该修改请求修改至少一个数据片段，源数据库更新该数据片段的优先级并重新插入迁移队列，若迁移队列读取完毕，进入步骤7，否则返回至步骤5；Step 6: The access interface receives a request to modify the data segment. The source database modifies at least one data segment according to the modification request. The source database updates the priority of the data segment and reinserts it into the migration queue. If the migration queue is read, the process proceeds to step 7, otherwise it returns to step 5.

步骤7：迁移服务器确定需要验证的多组数据片段，源数据库生成每一数据片段的第一摘要，目标数据库生成每一数据片段的第二摘要；Step 7: The migration server determines multiple groups of data segments that need to be verified, the source database generates a first summary for each data segment, and the target database generates a second summary for each data segment;

步骤8：迁移服务器接收并匹配对应的第一摘要和第二摘要，若匹配成功，结束任务，否则将该数据片段添加至迁移队列，返回至步骤5。Step 8: The migration server receives and matches the corresponding first digest and second digest. If the match is successful, the task ends; otherwise, the data segment is added to the migration queue and the process returns to step 5.

在本发明中，在步骤1中，所述第一数据包括迁移对象的表、索引、存储规则、触发器中的一种或几种，所述第二数据包括文本、视频、图片中的一种或几种。In the present invention, in step 1, the first data includes one or more of a table, an index, a storage rule, and a trigger of a migration object, and the second data includes one or more of a text, a video, and a picture.

在本发明中，在步骤3中，源数据库定义8组初始哈希值h₀、h₁、...、h₇，将第一数据转换为字节序列并填充生成消息块，根据该消息块迭代运算初始哈希值后生成最终哈希值h'₀、h'₁、...、h'₇，将该8组最终哈希值组合成第一哈希值。In the present invention, in step 3, the source database defines 8 groups of initial hash values h₀ , h₁ , ..., h₇ , converts the first data into a byte sequence and fills it to generate a message block, iterates the initial hash value according to the message block to generate final hash values h'₀ , h'₁ , ..., h'₇ , and combines the 8 groups of final hash values into a first hash value.

在本发明中，在步骤4中，预设分片数量M，第二数据的内容数据对M取模，根据取模后的余数确定该内容数据所属的数据片段，M≥I。In the present invention, in step 4, the number of fragments M is preset, the content data of the second data is modulo M, and the data segment to which the content data belongs is determined according to the remainder after the modulo, M≥I.

在本发明中，在步骤4中，源数据库为每一数据片段分配标识码，第一存储日志包含标识码与第一存储索引，第二存储日志包含标识码与第二存储索引。In the present invention, in step 4, the source database assigns an identification code to each data segment, the first storage log includes the identification code and a first storage index, and the second storage log includes the identification code and a second storage index.

在本发明中，在步骤6中，数据片段i的优先级=1/T_i，数据热度，J为数据片段i的访问次数，t_j为第j次访问的时刻，α_j为第j次访问的衰减因子，j≤J。In the present invention, in step 6, the priority of data segment i = 1/T_i , and the data heat , J is the number of accesses to data segment i,_tj is the time of the jth access,_αj is the attenuation factor of the jth access, j≤J.

在本发明中，在步骤7中，迁移服务器根据需要验证的I组数据片段，生成包括验证标识集｛n_i｝和随机参数λ的验证请求，i≤I，n_i为第i组数据片段的标识码。In the present invention, in step 7, the migration server generates a verification request including a verification identification set {_ni } and a random parameter λ according to I groups of data segments to be verified, i≤I, and ni_is an identification code of the i-th group of data segments.

在本发明中，在步骤7中，源数据库接收验证标识集，根据第一存储日志提取第i组数据片段的内容数据F_i，根据该内容数据F_i确定第一参数p_i，第一摘要σ_i=sig(λ，F_i，p_i)，sig()为摘要生成算法。In the present invention, in step 7, the source database receives the verification identification set, extracts the content data F_i of the i-th group of data segments according to the first storage log, determines the first parameter p_i according to the content data F_i , and the first summary σ_i =sig(λ, F_i , p_i ), where sig() is a summary generation algorithm.

在本发明中，在步骤7中，目标数据库接收验证标识集，根据第二存储日志提取第i组数据片段的内容数据F'_i，根据该内容数据F'_i确定第二参数q_i，第二摘要δ_i=sig(λ，F'_i，q_i) ，sig()为摘要生成算法。In the present invention, in step 7, the target database receives the verification identification set, extracts the content data_F'i of the i-th group of data segments according to the second storage log, determines the second parameter q_i according to the content data_F'i , and the second summary δ_i =sig(λ,_F'i , q_i ), where sig() is a summary generation algorithm.

一种实现所述基于数据校验的数据库迁移方法的数据库迁移系统，包括：A database migration system for implementing the database migration method based on data verification, comprising:

源数据库，源数据库被配置为发送第一数据和第二数据，源数据库生成第一数据的第一哈希值以及第二数据的第一摘要；A source database, the source database is configured to send the first data and the second data, and the source database generates a first hash value of the first data and a first digest of the second data;

目标数据库，目标数据库被配置为接收第一数据和第二数据，目标数据库生成第一数据的第二哈希值以及第二数据的第二摘要；a target database, the target database being configured to receive the first data and the second data, the target database generating a second hash value of the first data and a second digest of the second data;

访问接口，访问接口被配置为接收对第二数据的修改请求；an access interface, the access interface being configured to receive a modification request for the second data;

迁移服务器，迁移服务器被配置为匹配第一哈希值与第二哈希值以及匹配第一摘要与第二摘要，其中，A migration server configured to match the first hash value with the second hash value and to match the first digest with the second digest, wherein:

第二数据包括多组数据片段，源数据库基于数据片段的优先级生成迁移队列，并根据修改请求更新数据片段的优先级并重新插入迁移队列。The second data includes multiple groups of data segments. The source database generates a migration queue based on the priorities of the data segments, and updates the priorities of the data segments according to the modification request and reinserts the data segments into the migration queue.

在本发明中，迁移服务器生成包括验证标识集和随机参数的验证请求，源数据库确定第一参数并基于第一参数与随机参数生成第一摘要，目标数据库确定第二参数并基于第二参数与随机参数生成第二摘要。In the present invention, the migration server generates a verification request including a verification identification set and a random parameter, the source database determines a first parameter and generates a first digest based on the first parameter and the random parameter, and the target database determines a second parameter and generates a second digest based on the second parameter and the random parameter.

实施本发明的基于数据校验的数据库迁移方法及系统，具有以下有益效果：The implementation of the database migration method and system based on data verification of the present invention has the following beneficial effects:

通过对比第一数据迁移前后的哈希值以及对比第二数据迁移前后的摘要值，构建多重完整性校验机制。确保无论是在迁移数据库架构还是迁移数据内容阶段，都能准确检测并纠正潜在的数据不一致问题。在迁移过程中，迁移服务器不直接参与数据读写，而是通过接收和验证哈希值、数据摘要等方式监控数据迁移状态，提升了数据安全性，增加了数据在读写和迁移过程中的抗篡改能力。进一步的，针对访问热度过高的数据可能被多次修改，导致数据反复迁移，本发明采用具有优先级的迁移队列，反向调整热度过高的数据的优先级，避免反复迁移导致数据前后不一致。By comparing the hash values before and after the first data migration and comparing the summary values before and after the second data migration, a multiple integrity verification mechanism is constructed. Ensure that potential data inconsistencies can be accurately detected and corrected regardless of whether it is in the stage of migrating the database architecture or the data content. During the migration process, the migration server does not directly participate in data reading and writing, but monitors the data migration status by receiving and verifying hash values, data summaries, etc., which improves data security and increases the data's resistance to tampering during reading, writing, and migration. Furthermore, in view of the fact that data with excessive access popularity may be modified multiple times, resulting in repeated data migration, the present invention adopts a migration queue with priority to reversely adjust the priority of data with excessive popularity to avoid data inconsistencies caused by repeated migration.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明的基于数据校验的数据库迁移方法的流程图；FIG1 is a flow chart of a database migration method based on data verification of the present invention;

图2为本发明的基于数据校验的数据库迁移方法的时序图；FIG2 is a timing diagram of a database migration method based on data verification according to the present invention;

图3为本发明的源数据库根据迁移队列迁移数据片段的示意图；FIG3 is a schematic diagram of a source database migrating data segments according to a migration queue according to the present invention;

图4为本发明生成第一哈希值的示意图；FIG4 is a schematic diagram of generating a first hash value according to the present invention;

图5为本发明的数据库迁移系统的原理框图。FIG5 is a functional block diagram of the database migration system of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述。The technical solutions in the embodiments of the present invention will be described clearly and completely below in conjunction with the accompanying drawings in the embodiments of the present invention.

源数据库需要通过扩容迁移，提升例如窗口函数、JSON支持、全文搜索、GIS地理空间数据处理等方面的能力。为业务需要，迁移过程中允许用户访问甚至修改部分数据。数据的频繁增删可能导致数据没有被完整地转移到目标数据库。本发明首先将源数据库架构复制到目标数据库，在目标数据库创建运行环境，确认数据库架构无误后，再开展内容数据的迁移。在内容数据迁移过程中保持访问接口处于工作状态，根据数据的热度调整迁移的优先级，避免热度较高的数据被反复迁移。迁移服务器提供数据迁移校验服务，不参与数据的读取和复制，在实现完整性校验的同时保证数据安全。The source database needs to be expanded and migrated to improve capabilities such as window functions, JSON support, full-text search, and GIS geospatial data processing. For business needs, users are allowed to access or even modify some data during the migration process. Frequent additions and deletions of data may result in the data not being completely transferred to the target database. The present invention first copies the source database architecture to the target database, creates an operating environment in the target database, and confirms that the database architecture is correct before migrating the content data. During the content data migration process, the access interface is kept in a working state, and the migration priority is adjusted according to the popularity of the data to avoid repeated migration of popular data. The migration server provides data migration verification services, does not participate in the reading and copying of data, and ensures data security while implementing integrity verification.

实施例一Embodiment 1

本实施例公开了一种基于数据校验的数据库迁移方法，用于在源数据库与目标数据库之间迁移第一数据和第二数据。本实施例的源数据库与目标数据库例如是现有的MySQL数据库系统。在执行迁移操作之前，需要结合数据库特性检查迁移环境。例如在源数据库的控制台执行显示全局状态（SHOW GLOBAL STATUS）命令，查看服务器的状态和性能指标，执行检查表（CHECK TABLE）命令，检查表的完整性，确保没有长时间运行的事务和死锁问题。又如例如检查目标数据库是否有足够的磁盘空间、内存和CPU资源承载迁移后的数据。提升网络质量，减轻迁移中的网络拥塞或目标数据库压力。如图1至图3，本发明的基于数据校验的数据库迁移方法包括以下步骤。The present embodiment discloses a database migration method based on data verification, which is used to migrate first data and second data between a source database and a target database. The source database and the target database of the present embodiment are, for example, an existing MySQL database system. Before performing the migration operation, it is necessary to check the migration environment in combination with the database characteristics. For example, the SHOW GLOBAL STATUS command is executed in the console of the source database to view the status and performance indicators of the server, and the CHECK TABLE command is executed to check the integrity of the table to ensure that there are no long-running transactions and deadlock problems. Another example is to check whether the target database has sufficient disk space, memory and CPU resources to carry the migrated data. Improve network quality and reduce network congestion or target database pressure during migration. As shown in Figures 1 to 3, the database migration method based on data verification of the present invention includes the following steps.

步骤1：源数据库关闭访问接口，源数据库扫描迁移对象，识别迁移对象中基于架构的第一数据以及基于内容的第二数据。迁移对象为源数据库内需要迁移的数据。迁移对象包括数据架构和数据内容。所述第一数据包括迁移对象的表、索引、存储规则、触发器中的一种或几种，所述第二数据包括文本、视频、图片中的一种或几种。本实施例中，可以直接使用SQL命令检查并复制数据架构。Step 1: The source database closes the access interface, and the source database scans the migration object to identify the first data based on the architecture and the second data based on the content in the migration object. The migration object is the data that needs to be migrated in the source database. The migration object includes the data architecture and data content. The first data includes one or more of the table, index, storage rule, and trigger of the migration object, and the second data includes one or more of text, video, and picture. In this embodiment, the SQL command can be used directly to check and copy the data architecture.

步骤2：迁移服务器创建源数据库与目标数据库的迁移映射，源数据库根据迁移映射将第一数据迁移至目标数据库。迁移映射用于定义源数据库和目标数据库之间数据结构和数据类型的对应关系。对源数据库进行全面审计，识别所有相关表、字段、主键、外键约束、索引、视图以及字段的数据类型、大小、是否允许空值和特殊格式要求，之后基于目标数据库的存储需求和技术规范设置目标数据库的相应配置信息并在建立数据迁移映射。迁移映射包括约束映射、表映射和对象映射等等。为保证数据库运行环境的准确性，迁移第一数据时关闭访问接口。Step 2: The migration server creates a migration mapping between the source database and the target database, and the source database migrates the first data to the target database according to the migration mapping. The migration mapping is used to define the correspondence between the data structure and data type between the source database and the target database. Perform a comprehensive audit of the source database to identify all relevant tables, fields, primary keys, foreign key constraints, indexes, views, and field data types, sizes, whether null values are allowed, and special format requirements. Then, based on the storage requirements and technical specifications of the target database, set the corresponding configuration information of the target database and establish a data migration mapping. Migration mapping includes constraint mapping, table mapping, object mapping, etc. To ensure the accuracy of the database operating environment, close the access interface when migrating the first data.

步骤3：源数据库生成第一数据的第一哈希值，目标数据库生成第一数据的第二哈希值，迁移服务器接收第一哈希值和第二哈希值，若第一哈希值与第二哈希值匹配，进入步骤4，否则返回至步骤2。虽然第一数据为架构数据，其数据类型各种各样，但是哈希算法本身不依赖于数据的具体种类。哈希算法根据第一数据的二进制字符串按照预设逻辑产生一个固定长度的第一哈希值。由于哈希算法采用二进制流，目标数据库的第一数据发生微小的改变，会导致最终的第二哈希值与第一哈希值有显著的不同，从而完成第一数据的校验。Step 3: The source database generates a first hash value for the first data, the target database generates a second hash value for the first data, and the migration server receives the first hash value and the second hash value. If the first hash value matches the second hash value, proceed to step 4, otherwise return to step 2. Although the first data is architectural data and its data type varies, the hash algorithm itself does not depend on the specific type of data. The hash algorithm generates a first hash value of a fixed length according to the preset logic based on the binary string of the first data. Since the hash algorithm uses a binary stream, a slight change in the first data of the target database will cause the final second hash value to be significantly different from the first hash value, thereby completing the verification of the first data.

步骤4：源数据库开启访问接口，源数据库将第二数据分为多个数据片段，源数据库基于优先级生成数据片段的迁移队列，并创建每一数据片段的第一存储日志，迁移服务器接收第一存储日志并在目标数据库创建第二存储日志。本实施例不限制数据分片的方法。可以预设分片数量M，第二数据的内容数据（在本实施例中，内容数据是指数据ASCII码的二进制形式）对M取模，根据取模后的余数确定该内容数据所属的数据片段，M≥I。在另一实施例中，还可以基于数据存储位置或数据类型将第二数据分为多个数据片段。进一步的，源数据库为每一数据片段分配标识码，标识码可以是数据片段的序号或者主键的哈希值。第一存储日志包含标识码与第一存储索引，第二存储日志包含标识码与第二存储索引。Step 4: The source database opens the access interface, the source database divides the second data into multiple data segments, the source database generates a migration queue of the data segments based on the priority, and creates a first storage log for each data segment, the migration server receives the first storage log and creates a second storage log in the target database. This embodiment does not limit the method of data segmentation. The number of segments M can be preset, the content data of the second data (in this embodiment, the content data refers to the binary form of the data ASCII code) is modulo M, and the data segment to which the content data belongs is determined according to the remainder after modulo, M≥I. In another embodiment, the second data can also be divided into multiple data segments based on the data storage location or data type. Furthermore, the source database assigns an identification code to each data segment, and the identification code can be the serial number of the data segment or the hash value of the primary key. The first storage log contains an identification code and a first storage index, and the second storage log contains an identification code and a second storage index.

步骤5：源数据库根据迁移队列将数据片段迁移至目标数据库，目标数据库根据第二存储日志存储数据片段。建立源数据库与目标数据库的数据复制通道，在迁移过程中，用户可以通过访问接口访问源数据库的数据片段。迁移队列存储数据片段的标识码。每一数据片段迁移完毕后，删除迁移队列中该数据片段的相应标识码。Step 5: The source database migrates the data segments to the target database according to the migration queue, and the target database stores the data segments according to the second storage log. A data replication channel is established between the source database and the target database. During the migration process, users can access the data segments of the source database through the access interface. The migration queue stores the identification codes of the data segments. After each data segment is migrated, the corresponding identification code of the data segment in the migration queue is deleted.

步骤6：访问接口接收对数据片段的修改请求，源数据库根据该修改请求修改至少一个数据片段，源数据库更新该数据片段的优先级并重新插入迁移队列，若迁移队列读取完毕，进入步骤7，否则返回至步骤5。被修改的数据片段需要重新迁移，以保证目标数据库获得最新的数据片段。迁移队列中同一数据片段包含多个优先级的，保留优先级最低值。Step 6: The access interface receives a request to modify the data segment. The source database modifies at least one data segment according to the modification request. The source database updates the priority of the data segment and reinserts it into the migration queue. If the migration queue is read, it proceeds to step 7, otherwise it returns to step 5. The modified data segment needs to be re-migrated to ensure that the target database obtains the latest data segment. If the same data segment in the migration queue contains multiple priorities, the lowest priority value is retained.

本实施例基于数据热度生成数据片段的优先级。根据优先级的大小排列数据片段后获得迁移队列。数据片段i的优先级=1/T_i。数据热度，J为数据片段i的访问次数，t_j为第j次访问的时刻，α_j为第j次访问的衰减因子，j≤J。访问接口的数据请求包括修改请求和访问请求。α_j∈[0,1] ，衰减因子用于控制数据冷却速度，本实施例不限制衰减因子的具体取值。若第j次访问修改了数据片段，α_j≤0.5。若第j次访问没有修改数据片段，α_j≥0.6。进一步的，还可以考察数据片段的复杂性、依赖关系以及在业务流程中的关键程度这四项属性值，采用加权评分法，为各项标准分配权重，确定衰减因子的取值。This embodiment generates the priority of data segments based on data heat. After arranging the data segments according to the priority, a migration queue is obtained. The priority of data segment i = 1/T_i . Data heat , J is the number of accesses to data segment i, t_j is the time of the jth access, α_j is the attenuation factor of the jth access, j ≤ J. The data request of the access interface includes a modification request and an access request. α_j ∈ [0,1] , the attenuation factor is used to control the data cooling speed, and this embodiment does not limit the specific value of the attenuation factor. If the data segment is modified in the jth access, α_j ≤ 0.5. If the data segment is not modified in the jth access, α_j ≥ 0.6. Furthermore, the complexity, dependency, and criticality of the data segment in the business process of the four attribute values can also be examined, and a weighted scoring method can be used to assign weights to each criterion to determine the value of the attenuation factor.

步骤7：迁移服务器确定需要验证的多组数据片段，源数据库生成每一数据片段的第一摘要，目标数据库生成每一数据片段的第二摘要。迁移服务器根据需要验证的I组数据片段，生成包括验证标识集｛n_i｝和随机参数λ的验证请求，i≤I，n_i为第i组数据片段的标识码。源数据库接收验证标识集，根据第一存储日志的第一存储索引提取第i组数据片段的内容数据F_i，目标数据库接收验证标识集，根据第二存储日志的第二存储索引提取第i组数据片段的内容数据F'_i。本实施例将数据片段的摘要值发送给迁移服务器，避免迁移服务器中不可信的端口窃取数据内容。Step 7: The migration server determines multiple groups of data fragments that need to be verified, the source database generates a first summary for each data fragment, and the target database generates a second summary for each data fragment. The migration server generates a verification request including a verification identification set {n_i } and a random parameter λ according to the I groups of data fragments that need to be verified, i≤I, n_i is the identification code of the i-th group of data fragments. The source database receives the verification identification set, and extracts the content data F_i of the i-th group of data fragments according to the first storage index of the first storage log. The target database receives the verification identification set, and extracts the content data F'_i of the i-th group of data fragments according to the second storage index of the second storage log. This embodiment sends the summary value of the data fragment to the migration server to prevent untrusted ports in the migration server from stealing data content.

源数据库根据该内容数据F_i确定第一参数p_i，第一参数p_i由数据片段确定。本实施例不限制第一参数的生成方法，例如可以根据内容数据的特定字符段的二进制码生成整数，将大于该整数的最小素数作为第一参数。同样的方法，目标数据库根据该内容数据F'_i确定第二参数q_i。源数据库生成第i组数据片段的第一摘要σ_i=sig(λ，F_i，p_i) ，sig()为摘要生成算法。目标数据库生成第i组数据片段的第二摘要δ_i=sig(λ，F'_i，q_i)。The source database determines the first parameter p_i according to the content data F_i , and the first parameter p_i is determined by the data segment. This embodiment does not limit the method for generating the first parameter. For example, an integer can be generated according to the binary code of a specific character segment of the content data, and the smallest prime number greater than the integer is used as the first parameter. In the same way, the target database determines the second parameter q_i according to the content data F'_i . The source database generates the first summary σ_i =sig(λ, F_i , p_i ) of the i-th group of data segments, where sig() is the summary generation algorithm. The target database generates the second summary δ_i =sig(λ, F'_i , q_i ) of the i-th group of data segments.

步骤8：迁移服务器接收并匹配对应的第一摘要和第二摘要，若匹配成功，结束任务，否则将该数据片段添加至迁移队列，返回至步骤5。因q_i=p_i，若数据片段完整，F'i=Fi，则σ_i=δ_i。若数据片段损坏，F'_i≠F_i，σ_i≠δ_i。本发明在更进一步实施例中提供更严格的匹配方法。计算第一摘要集和第二摘要集/>。若σ=δ，结束任务。若σ≠δ，将I组数据片段添加至迁移队列。由于n_i参与第一摘要集的生成，若任意数据片段被修改，第二摘要被放大n_i倍，第一摘要集与第二摘要集产生显著差异。Step 8: The migration server receives and matches the corresponding first digest and second digest. If the match is successful, the task is terminated. Otherwise, the data segment is added to the migration queue and the process returns to step 5. Since q_i = p_i , if the data segment is complete, F'i = Fi, then σ_i = δ_i . If the data segment is damaged,_F'i ≠_Fi , σ_i ≠δ_i . The present invention provides a more stringent matching method in a further embodiment. Calculate the first digest set and the second abstract collection/> If σ=δ, end the task. If σ≠δ, add I groups of data segments to the migration queue. Since n_i participates in the generation of the first summary set, if any data segment is modified, the second summary is magnified n_i times, and the first summary set and the second summary set have significant differences.

实施例二Embodiment 2

实施例一中步骤3的第一哈希值和第二哈希值用于校验迁移前后的第一数据的完整性，可以采用MD5、SHA-1、SHA-2等算法。参照图4，本实施例公开了一种优选的哈希算法。The first hash value and the second hash value in step 3 in the first embodiment are used to verify the integrity of the first data before and after the migration, and algorithms such as MD5, SHA-1, and SHA-2 may be used. Referring to Fig. 4, this embodiment discloses a preferred hash algorithm.

初始化。源数据库定义8组初始哈希值h₀、h₁、...、h₇，初始哈希值由迁移数据库源数据库和目标数据库协商确定。Initialization. The source database defines 8 groups of initial hash values h₀ , h₁ , ..., h₇ , which are determined by negotiation between the migration database source database and the target database.

数据填充。将第一数据转换为字节序列。第一数据的末尾填充一个1比特，随后填充足够的0比特直到达到下一个512位。Data padding. Convert the first data into a byte sequence. Pad the end of the first data with a 1 bit, and then pad with enough 0 bits until the next 512 bits are reached.

分块。填充后的字节序列再分割成多个512位的消息块。Blocking. The padded byte sequence is then divided into multiple 512-bit message blocks.

循环迭代。以消息块作为非线性函数的参数，将初始哈希值代入非线性函数，经过多次迭代后的最终状态即为最终哈希值h'₀、h'₁、...、h'₇。Iteration. Use the message block as the parameter of the nonlinear function, substitute the initial hash value into the nonlinear function, and the final state after multiple iterations is the final hash value h'₀ , h'₁ , ..., h'₇ .

输出。将该8组最终哈希值拼接在一起组合成第一哈希值。Output: The 8 groups of final hash values are concatenated together to form the first hash value.

实施例三Embodiment 3

本实施例进一步公开了计算第一摘要的方法。本实施例采用单向散列函数构造第一摘要。This embodiment further discloses a method for calculating the first digest. This embodiment uses a one-way hash function to construct the first digest.

首先填充随机参数λ与第一参数p_i，以满足哈希运算的需要。填充过程中可以采用异或操作，提升每一字节的安全属性。First, the random parameter λ and the first parameter_pi are filled to meet the needs of the hash operation. During the filling process, an XOR operation can be used to improve the security attribute of each byte.

再根据随机参数λ与数据片段i的内容数据F_i构造认证码mac，mac=H(λ‖F_i)，H()为哈希函数，该哈希函数可以采用实施例二或现有其他的哈希算法。‖为二进制串的连接操作。Then, the authentication code mac is constructed according to the random parameter λ and the content data F_i of the data segment i, mac=H(λ‖F_i ), where H() is a hash function, and the hash function can adopt the second embodiment or other existing hash algorithms. ‖ is a concatenation operation of binary strings.

最后根据第一参数p_i与认证码构造第一摘要H(p_i‖mac)= H(p_i‖H(λ‖F_i))。即第一摘要σ_i=sig(λ，F_i，p_i) =H(p_i‖H(λ‖F_i))。再更进一步实施例中，还可以引入参数β。第一摘要σ_i=sig(λ，F_i，p_i) = H(p_i‖H(λ‖F_i)^β)。Finally, the first summary H(_pi ‖mac)= H(pi_‖H (λ‖Fi₎ ) is constructed according to the first parameter p_i and the authentication code. That is, the first summary σ_i =sig(λ,_Fi ,_pi )=H(_pi ‖H(λ‖Fi₎ ). In a further embodiment, the parameter β can also be introduced. The first summary σ_i =sig(λ,_Fi ,_pi )=H(_pi ‖H(λ‖Fi₎^β ).

进一步的，由于第一摘要的生成过程包含随机参数λ以及第一参数，外部用户无法根据摘要解出内容数据。且迁移服务器没有第一参数p_i，也无法得到数据片段的内容数据，同样的方法可以用于计算第二摘要。Furthermore, since the generation process of the first digest includes the random parameter λ and the first parameter, external users cannot decrypt the content data according to the digest. And the migration server does not have the first parameter_pi and cannot obtain the content data of the data segment. The same method can be used to calculate the second digest.

实施例四Embodiment 4

如图5所示，本发明的实现所述基于数据校验的数据库迁移方法的数据库迁移系统，包括：源数据库、目标数据库、访问接口以及迁移服务器。As shown in FIG. 5 , the database migration system for implementing the database migration method based on data verification of the present invention includes: a source database, a target database, an access interface, and a migration server.

源数据库被配置为发送第一数据和第二数据。源数据库生成第一数据的第一哈希值以及第二数据的第一摘要。目标数据库被配置为接收第一数据和第二数据，目标数据库生成第一数据的第二哈希值以及第二数据的第二摘要。访问接口被配置为接收对第二数据的修改请求。迁移服务器被配置为匹配第一哈希值与第二哈希值以及匹配第一摘要与第二摘要。第二数据包括多组数据片段，源数据库基于数据片段的优先级生成迁移队列，源数据库根据修改请求更新数据片段的优先级并重新插入迁移队列。The source database is configured to send first data and second data. The source database generates a first hash value of the first data and a first digest of the second data. The target database is configured to receive the first data and the second data, and the target database generates a second hash value of the first data and a second digest of the second data. The access interface is configured to receive a modification request for the second data. The migration server is configured to match the first hash value with the second hash value and match the first digest with the second digest. The second data includes multiple groups of data fragments, the source database generates a migration queue based on the priority of the data fragments, and the source database updates the priority of the data fragments according to the modification request and reinserts them into the migration queue.

在本实施例中，迁移服务器生成包括验证标识集和随机参数的验证请求，源数据库确定第一参数并基于第一参数与随机参数生成第一摘要，目标数据库确定第二参数并基于第二参数与随机参数生成第二摘要。由于摘要的生成过程包含随机参数λ以及第一参数和第二参数，外部用户无法根据摘要解出内容数据。第一参数和第二参数均与数据片段有关，在数据片段完整的前提下源数据库与目标数据库生成的摘要相同，从而完成数据迁移后的校验。In this embodiment, the migration server generates a verification request including a verification identification set and a random parameter, the source database determines a first parameter and generates a first summary based on the first parameter and the random parameter, and the target database determines a second parameter and generates a second summary based on the second parameter and the random parameter. Since the summary generation process includes the random parameter λ as well as the first parameter and the second parameter, external users cannot decrypt the content data based on the summary. The first parameter and the second parameter are both related to the data fragment. Under the premise that the data fragment is complete, the summary generated by the source database and the target database is the same, thereby completing the verification after data migration.

以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内，所作的任何修改，等同替换和改进等，均应包含在本发明的保护范围之内。The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the protection scope of the present invention.

Claims

Translated fromChinese

1.一种基于数据校验的数据库迁移方法，其特征在于，包括以下步骤：1. A database migration method based on data verification, characterized in that it includes the following steps:

2.根据权利要求1所述的基于数据校验的数据库迁移方法，其特征在于，在步骤1中，所述第一数据包括迁移对象的表、索引、存储规则、触发器中的一种或几种，所述第二数据包括文本、视频、图片中的一种或几种。2. The database migration method based on data verification according to claim 1 is characterized in that, in step 1, the first data includes one or more of the table, index, storage rule, and trigger of the migration object, and the second data includes one or more of text, video, and picture.

3.根据权利要求1所述的基于数据校验的数据库迁移方法，其特征在于，在步骤3中，源数据库定义8组初始哈希值h₀、h₁、...、h₇，将第一数据转换为字节序列并填充生成消息块，根据该消息块迭代运算初始哈希值后生成最终哈希值h'₀、h'₁、...、h'₇，将该8组最终哈希值组合成第一哈希值。3. The database migration method based on data verification according to claim 1 is characterized in that in step 3, the source database defines 8 groups of initial hash values h₀ , h₁ , ..., h₇ , the first data is converted into a byte sequence and filled to generate a message block, the initial hash value is iteratively calculated according to the message block to generate final hash values h'₀ , h'₁ , ..., h'₇ , and the 8 groups of final hash values are combined into a first hash value.

4.根据权利要求1所述的基于数据校验的数据库迁移方法，其特征在于，在步骤4中，预设分片数量M，第二数据的内容数据对M取模，根据取模后的余数确定该内容数据所属的数据片段，M≥I。4. The database migration method based on data verification according to claim 1 is characterized in that in step 4, the number of shards M is preset, the content data of the second data is modulo M, and the data segment to which the content data belongs is determined according to the remainder after modulo, M≥I.

5.根据权利要求1所述的基于数据校验的数据库迁移方法，其特征在于，在步骤4中，源数据库为每一数据片段分配标识码，第一存储日志包含标识码与第一存储索引，第二存储日志包含标识码与第二存储索引。5. According to the data verification-based database migration method of claim 1, it is characterized in that in step 4, the source database assigns an identification code to each data fragment, the first storage log contains the identification code and the first storage index, and the second storage log contains the identification code and the second storage index.

6.根据权利要求1所述的基于数据校验的数据库迁移方法，其特征在于，在步骤6中，数据片段i的优先级=1/T_i，数据热度，J为数据片段i的访问次数，t_j为第j次访问的时刻，α_j为第j次访问的衰减因子，j≤J。6. The method for database migration based on data verification according to claim 1, characterized in that in step 6, the priority of data segment i = 1/T_i , the data heat , J is the number of accesses to data segment i,_tj is the time of the jth access,_αj is the attenuation factor of the jth access, j≤J.

7.根据权利要求5所述的基于数据校验的数据库迁移方法，其特征在于，在步骤7中，迁移服务器根据需要验证的I组数据片段，生成包括验证标识集｛n_i｝和随机参数λ的验证请求，i≤I，n_i为第i组数据片段的标识码。7. The database migration method based on data verification according to claim 5 is characterized in that, in step 7, the migration server generates a verification request including a verification identification set {n_i } and a random parameter λ according to I groups of data fragments that need to be verified, i≤I, and n_i is the identification code of the i-th group of data fragments.

8.根据权利要求7所述的基于数据校验的数据库迁移方法，其特征在于，在步骤7中，源数据库接收验证标识集，根据第一存储日志提取第i组数据片段的内容数据F_i，根据该内容数据F_i确定第一参数p_i，第一摘要σ_i=sig(λ，F_i，p_i)，sig()为摘要生成算法。8. The database migration method based on data verification according to claim 7 is characterized in that in step 7, the source database receives the verification identification set, extracts the content data F_i of the i-th group of data fragments according to the first storage log, determines the first parameter p_i according to the content data F_i , and the first summary σ_i =sig(λ, F_i , p_i ), where sig() is a summary generation algorithm.

9. 根据权利要求8所述的基于数据校验的数据库迁移方法，其特征在于，在步骤7中，目标数据库接收验证标识集，根据第二存储日志提取第i组数据片段的内容数据F'_i，根据该内容数据F'_i确定第二参数q_i，第二摘要δ_i=sig(λ，F'_i，q_i) ，sig()为摘要生成算法。9. The database migration method based on data verification according to claim 8 is characterized in that in step 7, the target database receives the verification identification set, extracts the content data_F'i of the i-th group of data fragments according to the second storage log, determines the second parameter q_i according to the content data_F'i , and the second summary δ_i =sig(λ,_F'i , q_i ), where sig() is a summary generation algorithm.

10.一种实现权利要求1所述基于数据校验的数据库迁移方法的数据库迁移系统，其特征在于，包括：10. A database migration system for implementing the database migration method based on data verification according to claim 1, characterized in that it comprises:

第二数据包括多组数据片段，源数据库基于数据片段的优先级生成迁移队列，并根据修改请求更新数据片段的优先级并重新插入迁移队列。The second data includes multiple groups of data segments. The source database generates a migration queue based on the priorities of the data segments, and updates the priorities of the data segments according to the modification request and reinserts them into the migration queue.

11.根据权利要求10所述的数据库迁移系统，其特征在于，迁移服务器生成包括验证标识集和随机参数的验证请求，源数据库确定第一参数并基于第一参数与随机参数生成第一摘要，目标数据库确定第二参数并基于第二参数与随机参数生成第二摘要。11. The database migration system according to claim 10 is characterized in that the migration server generates a verification request including a verification identification set and a random parameter, the source database determines a first parameter and generates a first digest based on the first parameter and the random parameter, and the target database determines a second parameter and generates a second digest based on the second parameter and the random parameter.