WO2025190209A1

Movatterモバイル変換

Info

Publication number: WO2025190209A1
Application number: PCT/CN2025/081570
Authority: WO
Inventors: 黄华; 覃博
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2024-03-14
Filing date: 2025-03-10
Publication date: 2025-09-18
Anticipated expiration: 2026-09-14
Also published as: CN117891414A; CN117891414B

Abstract

Provided in the present specification are a data storage method based on perfect hashing, and a related device. The method comprises: dividing a target data set to be stored into a plurality of data segments that match the data volume of key-value data included in said target data set, wherein each data segment includes a portion of the key-value data in said target data set; calculating perfect hash slots respectively corresponding to a plurality of pieces of key-value data comprised in each data segment, and dividing each data segment into a plurality of data blocks that match the data volume of the key-value data included in the data segment, wherein each data block includes several perfect hash slots arranged in sequence, and each perfect hashing slot includes at least one piece of key-value data corresponding to the perfect hash slot; and sequentially storing the plurality of data segments in a disk, and when each data segment is stored, sequentially storing the plurality of data blocks in the data segment according to the arrangement order of the perfect hash slots.

Description

Translated fromChinese

基于完美哈希的数据存储Perfect hashing-based data storage

技术领域Technical Field

本说明书一个或多个实施例涉及数据库技术领域，尤其涉及基于完美哈希的数据存储的方法及相关设备。One or more embodiments of the present specification relate to the field of database technology, and in particular to a method and related device for storing data based on perfect hashing.

背景技术Background Art

键值对(key-value，KV)存储系统作为很多服务系统的后端存储，其每次更新数据时会指定相应的key以及对应的value数据。在后续的查找过程中，可以通过输入的key来获取此前写入的value数据。Key-value (KV) storage systems serve as the backend storage for many service systems. Each time data is updated, a corresponding key and value are specified. Subsequent searches can retrieve previously written value data by entering the key.

通常情况下，KV存储系统往往包含了海量的KV数据集合，如何高效的组织并存储这些KV数据，既关系到整体数据集的最终存储空间大小，也会影响到整个KV存储系统的数据查询效率。Typically, a KV storage system contains massive KV data sets. How to efficiently organize and store these KV data affects not only the final storage space size of the entire data set, but also the data query efficiency of the entire KV storage system.

发明内容Summary of the Invention

有鉴于此，本说明书一个或多个实施例提供一种基于完美哈希的数据存储方法及相关设备。In view of this, one or more embodiments of this specification provide a data storage method and related devices based on perfect hashing.

第一方面，本说明书提供了一种基于完美哈希的数据存储方法，所述方法包括：将待存储的目标数据集划分为与其包含的key-value数据的数据量相匹配的多个数据片段，每个数据片段中包含所述目标数据集中的部分key-value数据；计算与每个数据片段中包含的多个key-value数据分别对应的完美哈希槽位，并将每个数据片段划分为与其包含的key-value数据的数据量相匹配的多个数据块；其中，每个数据块中包含按序排列的若干完美哈希槽位，每个完美哈希槽位中包含与其对应的至少一个key-value数据；将所述多个数据片段依次存储至磁盘中，并且，在存储每个数据片段时，按照完美哈希槽位的排列顺序依次存储其中的多个数据块。In a first aspect, the present specification provides a data storage method based on perfect hashing, the method comprising: dividing a target data set to be stored into a plurality of data fragments that match the amount of key-value data contained therein, each data fragment containing part of the key-value data in the target data set; calculating perfect hash slots corresponding to the plurality of key-value data contained in each data fragment, and dividing each data fragment into a plurality of data blocks that match the amount of key-value data contained therein; wherein each data block contains a plurality of perfect hash slots arranged in sequence, and each perfect hash slot contains at least one key-value data corresponding thereto; storing the plurality of data fragments in sequence on a disk, and, when storing each data fragment, storing the plurality of data blocks therein in sequence according to the arrangement order of the perfect hash slots.

第二方面，本说明书提供了一种基于完美哈希的数据存储装置，所述装置包括：数据片段划分单元，用于将待存储的目标数据集划分为与其包含的key-value数据的数据量相匹配的多个数据片段，每个数据片段中包含所述目标数据集中的部分key-value数据；完美哈希计算单元，用于计算与每个数据片段中包含的多个key-value数据分别对应的完美哈希槽位，并将每个数据片段划分为与其包含的key-value数据的数据量相匹配的多个数据块；其中，每个数据块中包含按序排列的若干完美哈希槽位，每个完美哈希槽位中包含与其对应的至少一个key-value数据；存储单元，用于将所述多个数据片段依次存储至磁盘中，并且，在存储每个数据片段时，按照完美哈希槽位的排列顺序依次存储其中的多个数据块。In a second aspect, the present specification provides a data storage device based on perfect hashing, the device comprising: a data segmentation unit, for dividing a target data set to be stored into multiple data segments that match the amount of key-value data contained therein, each data segment containing part of the key-value data in the target data set; a perfect hash calculation unit, for calculating perfect hash slots corresponding to the multiple key-value data contained in each data segment, and dividing each data segment into multiple data blocks that match the amount of key-value data contained therein; wherein each data block contains a number of perfect hash slots arranged in sequence, and each perfect hash slot contains at least one key-value data corresponding thereto; a storage unit, for storing the multiple data segments in sequence on a disk, and, when storing each data segment, storing the multiple data blocks therein in sequence according to the arrangement order of the perfect hash slots.

第三方面，本说明书提供了一种基于完美哈希的数据查询方法，所述方法包括：解析接收到的查询语句，得到所述查询语句中包含的与待查询的key-value数据对应的目标key；计算所述目标key的目标哈希值，并在磁盘中存储的多个数据片段中确定出与所述目标哈希值对应的目标数据片段；获取所述目标数据片段中包含的完美哈希参数，并基于所述完美哈希参数，计算与所述目标key对应的目标完美哈希槽位；以及，从所述目标数据片段包含的多个数据块中读取出所述目标完美哈希槽位所在的目标数据块，并从所述目标数据块的所述目标完美哈希槽位中获取所述待查询的key-value数据。In a third aspect, the present specification provides a data query method based on perfect hashing, the method comprising: parsing a received query statement to obtain a target key contained in the query statement and corresponding to the key-value data to be queried; calculating a target hash value of the target key, and determining a target data fragment corresponding to the target hash value from multiple data fragments stored on the disk; obtaining a perfect hash parameter contained in the target data fragment, and calculating a target perfect hash slot corresponding to the target key based on the perfect hash parameter; and reading a target data block where the target perfect hash slot is located from multiple data blocks contained in the target data fragment, and obtaining the key-value data to be queried from the target perfect hash slot of the target data block.

第四方面，本说明书提供了一种基于完美哈希的数据查询装置，所述装置包括：语句解析单元，用于解析接收到的查询语句，得到所述查询语句中包含的与待查询的key-value数据对应的目标key；确定单元，用于计算所述目标key的目标哈希值，并在磁盘中存储的多个数据片段中确定出与所述目标哈希值对应的目标数据片段；完美哈希计算单元，用于获取所述目标数据片段中包含的完美哈希参数，并基于所述完美哈希参数，计算与所述目标key对应的目标完美哈希槽位；以及，获取单元，用于从所述目标数据片段包含的多个数据块中读取出所述目标完美哈希槽位所在的目标数据块，并从所述目标数据块的所述目标完美哈希槽位中获取所述待查询的key-value数据。In a fourth aspect, the present specification provides a data query device based on perfect hashing, the device comprising: a statement parsing unit, used to parse a received query statement to obtain a target key contained in the query statement and corresponding to the key-value data to be queried; a determination unit, used to calculate a target hash value of the target key, and determine a target data segment corresponding to the target hash value from multiple data fragments stored on the disk; a perfect hash calculation unit, used to obtain a perfect hash parameter contained in the target data segment, and calculate a target perfect hash slot corresponding to the target key based on the perfect hash parameter; and an acquisition unit, used to read the target data block where the target perfect hash slot is located from multiple data blocks contained in the target data segment, and obtain the key-value data to be queried from the target perfect hash slot of the target data block.

相应地，本说明书还提供了一种计算机设备，包括：存储器和处理器；所述存储器上存储有可由所述处理器运行的计算机程序；所述处理器运行所述计算机程序时，执行上述第一方面所述的基于完美哈希的数据存储方法，或者第三方面所述的基于完美哈希的数据查询方法。Accordingly, this specification also provides a computer device, comprising: a memory and a processor; the memory stores a computer program that can be run by the processor; when the processor runs the computer program, it executes the data storage method based on perfect hashing described in the first aspect above, or the data query method based on perfect hashing described in the third aspect.

相应地，本说明书还提供了一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器运行时，执行如上述第一方面所述的基于完美哈希的数据存储方法，或者第三方面所述的基于完美哈希的数据查询方法。Accordingly, this specification also provides a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, it executes the data storage method based on perfect hashing as described in the first aspect above, or the data query method based on perfect hashing as described in the third aspect.

相应地，本说明书还提供了一种计算机程序产品，所述计算机程序产品包括计算机程序/指令，所述计算机程序/指令被处理器执行时，执行如上述第一方面所述的基于完美哈希的数据存储方法，或者第三方面所述的基于完美哈希的数据查询方法。Accordingly, this specification also provides a computer program product, which includes a computer program/instructions. When the computer program/instructions are executed by a processor, it executes the data storage method based on perfect hashing as described in the first aspect above, or the data query method based on perfect hashing as described in the third aspect.

综上所述，本申请基于待存储的KV数据集中的数据量大小，将其切分为多个数据片段。进一步地，针对每个数据片段内的多个key-value数据分别计算与其对应的完美哈希槽位。并且，根据每个数据片段中包含的key-value数据的数据量大小，按照完美哈希槽位的顺序，将每个数据片段中相邻的若干哈希槽位中的key-value数据打包成一个数据块。如此，本申请在存储上述KV数据集时，可以按照完美哈希槽位的顺序，依次将每个数据块存入磁盘，从而实现整体数据集以顺序追加写入的方式存储至磁盘中。后续进行数据查询时，可以根据输入的key计算与其对应的完美哈希槽位，并根据该完美哈希槽位从磁盘中读取出相应的数据块，从而实现了索引空间大小和查询效率之间的合理优化。In summary, the present application divides the KV data set to be stored into multiple data fragments based on the amount of data in the KV data set. Furthermore, the perfect hash slots corresponding to the multiple key-value data in each data fragment are calculated respectively. Moreover, according to the amount of key-value data contained in each data fragment, the key-value data in several adjacent hash slots in each data fragment are packaged into a data block in the order of the perfect hash slots. In this way, when storing the above-mentioned KV data set, the present application can store each data block in the disk in sequence according to the order of the perfect hash slots, thereby realizing the storage of the entire data set in the disk in a sequential append-write manner. When performing subsequent data queries, the perfect hash slot corresponding to the input key can be calculated, and the corresponding data block can be read from the disk according to the perfect hash slot, thereby achieving a reasonable optimization between the index space size and the query efficiency.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是一示例性实施例提供的一种基于完美哈希的数据存储方法的流程示意图。FIG1 is a flow chart of a data storage method based on perfect hashing provided by an exemplary embodiment.

图2是一示例性实施例提供的一种数据集的分段示意图。FIG2 is a segmented schematic diagram of a data set provided by an exemplary embodiment.

图3是一示例性实施例提供的一种数据片段的内部结构示意图。FIG3 is a schematic diagram of the internal structure of a data segment provided by an exemplary embodiment.

图4是一示例性实施例提供的一种数据块的内部结构示意图。FIG4 is a schematic diagram of the internal structure of a data block provided by an exemplary embodiment.

图5是一示例性实施例提供的一种完美哈希槽位的内部数据组织示意图。FIG5 is a schematic diagram of internal data organization of a perfect hash slot provided by an exemplary embodiment.

图6是一示例性实施例提供的另一种完美哈希槽位的内部数据组织示意图。FIG6 is a schematic diagram of internal data organization of another perfect hash slot provided by an exemplary embodiment.

图7是一示例性实施例提供的一种基于完美哈希的数据存储装置的结构示意图。FIG7 is a schematic structural diagram of a data storage device based on perfect hashing provided by an exemplary embodiment.

图8是一示例性实施例提供的一种计算机设备的结构示意图。FIG8 is a schematic structural diagram of a computer device provided by an exemplary embodiment.

具体实施方式DETAILED DESCRIPTION

这里将详细地对示例性实施例进行说明，其示例表示在附图中。下面的描述涉及附图时，除非另有表示，不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本说明书一个或多个实施例相一致的所有实施方式。相反，它们仅是与如所附权利要求书中所详述的、本说明书一个或多个实施例的一些方面相一致的装置和方法的例子。Exemplary embodiments will be described in detail herein, with examples illustrated in the accompanying drawings. In the following description, when referring to the drawings, identical numerals in different figures represent identical or similar elements, unless otherwise indicated. The implementations described in the following exemplary embodiments are not intended to represent all implementations consistent with one or more embodiments of this specification. Rather, they are merely examples of apparatuses and methods consistent with certain aspects of one or more embodiments of this specification, as detailed in the appended claims.

需要说明的是，在其他实施例中并不一定按照本说明书示出和描述的顺序来执行相应方法的步骤。在一些其他实施例中，其方法所包括的步骤可以比本说明书所描述的更多或更少。此外，本说明书中所描述的单个步骤，在其他实施例中可能被分解为多个步骤进行描述；而本说明书中所描述的多个步骤，在其他实施例中也可能被合并为单个步骤进行描述。It should be noted that, in other embodiments, the steps of the corresponding method are not necessarily performed in the order shown and described in this specification. In some other embodiments, the method may include more or fewer steps than those described in this specification. In addition, a single step described in this specification may be broken down into multiple steps for description in other embodiments; and multiple steps described in this specification may be combined into a single step for description in other embodiments.

需要说明的是，本申请中所述的“多个”是指两个或者两个以上。It should be noted that the “plurality” mentioned in this application refers to two or more.

在一示出的实施方式中，常规的基于日志结构合并树(The Log-Structured Merge Tree，LSM-Tree)来组织数据集的存储引擎，通常采用排序队列表(Sorted String Table，SST)格式，将KV数据集中的所有key-value数据按照key的前缀进行排序，并顺序存储所有的key-value数据，以便后续采用二分查找的方式进行数据查询。然而，二分查找的复杂度较高，需要消耗较多的CPU资源，并且查找过程中可能存在索引和实际key-value数据的多次IO操作，效率低下，无法满足用户的实际需求。In one illustrated embodiment, a conventional storage engine that organizes data sets based on a Log-Structured Merge Tree (LSM-Tree) typically uses a Sorted String Table (SST) format to sort all key-value data in a KV data set by the prefix of the key and store all key-value data sequentially so that data can be subsequently queried using a binary search. However, binary search is complex and consumes a lot of CPU resources. In addition, the search process may involve multiple IO operations between the index and the actual key-value data, resulting in low efficiency and failure to meet the actual needs of users.

基于此，本说明书提供了一种技术方案，将全量的KV数据集划分为多个数据片段，并针对每个数据片段中的多个key-value数据计算完美哈希索引，并且将每个数据片段细分为合适的多个数据块依次保存，从而实现索引空间大小和查询效率之间的合理优化。Based on this, this specification provides a technical solution that divides the full KV data set into multiple data fragments, calculates a perfect hash index for the multiple key-value data in each data fragment, and subdivides each data fragment into multiple appropriate data blocks and saves them in sequence, thereby achieving a reasonable optimization between the index space size and query efficiency.

在实现时，本申请先获取待存储的目标数据集，该目标数据集中可以包含多个key-value数据。然后将待存储的目标数据集划分为与其包含的key-value数据的数据量相匹配的多个数据片段，每个数据片段可以包括目标数据集中的部分key-value数据。进一步地，本申请可以计算与每个数据片段中包括的多个key-value数据分别对应的完美哈希槽位，并将每个数据片段划分为与其包含的key-value数据的数据量相匹配的多个数据块；其中，每个数据块中包含按序排列的若干完美哈希槽位，每个完美哈希槽位中包含与其对应的至少一个key-value数据。最后，本申请可以将上述多个数据片段依次存储至磁盘中，并且，在存储每个数据片段时，按照完美哈希槽位的顺序依次存储其中的多个数据块。During implementation, the present application first obtains the target data set to be stored, which may contain multiple key-value data. The target data set to be stored is then divided into multiple data fragments that match the amount of key-value data contained therein, and each data fragment may include part of the key-value data in the target data set. Furthermore, the present application may calculate the perfect hash slots corresponding to the multiple key-value data included in each data fragment, and divide each data fragment into multiple data blocks that match the amount of key-value data contained therein; wherein each data block contains several perfect hash slots arranged in sequence, and each perfect hash slot contains at least one key-value data corresponding thereto. Finally, the present application may store the above-mentioned multiple data fragments in sequence to the disk, and when storing each data fragment, store the multiple data blocks therein in sequence according to the order of the perfect hash slots.

在以上技术方案中，本申请基于待存储的KV数据集中的数据量大小，将其切分为多个数据片段。进一步地，针对每个数据片段内的多个key-value数据分别计算与其对应的完美哈希槽位。并且，根据每个数据片段中包含的key-value数据的数据量大小，按照完美哈希槽位的顺序，将每个数据片段中相邻的若干哈希槽位中的key-value数据打包成一个数据块。如此，本申请在存储上述KV数据集时，可以按照完美哈希槽位的顺序，依次将每个数据块存入磁盘，从而实现整体数据集以顺序追加写入的方式存储至磁盘中。后续进行数据查询时，可以根据输入的key计算与其对应的完美哈希槽位，并根据该完美哈希槽位从磁盘中读取出相应的数据块，从而实现了索引空间大小和查询效率之间的合理优化。In the above technical solution, the present application divides the KV data set to be stored into multiple data fragments based on the amount of data in the KV data set. Furthermore, the perfect hash slots corresponding to the multiple key-value data in each data fragment are calculated respectively. Moreover, according to the amount of key-value data contained in each data fragment, the key-value data in several adjacent hash slots in each data fragment are packaged into a data block in the order of the perfect hash slots. In this way, when storing the above-mentioned KV data set, the present application can store each data block in the disk in sequence according to the order of the perfect hash slots, so that the entire data set is stored in the disk in a sequential append-write manner. When performing data query subsequently, the perfect hash slot corresponding to the input key can be calculated, and the corresponding data block can be read from the disk according to the perfect hash slot, thereby achieving a reasonable optimization between the index space size and the query efficiency.

请参阅图1，图1是一示例性实施例提供的一种基于完美哈希的数据存储方法的流程示意图。该方法可以应用于基于完美哈希索引的存储引擎中。如图1所示，该方法具体可以包括如下步骤S101-步骤S103。Please refer to Figure 1, which is a flow chart of a data storage method based on perfect hashing, provided by an exemplary embodiment. This method can be applied to a storage engine based on perfect hashing indexes. As shown in Figure 1, the method may specifically include the following steps S101-S103.

步骤S101，将待存储的目标数据集划分为与其包含的key-value数据的数据量相匹配的多个数据片段，每个数据片段中包含所述目标数据集中的部分key-value数据。Step S101 : dividing a target data set to be stored into a plurality of data segments that match the amount of key-value data contained therein, wherein each data segment contains a portion of the key-value data in the target data set.

首先，数据库中的存储引擎可以先获取当前待存储的目标数据集，该目标数据集中包含的多个key-value数据可以为当前待存储的全量key-value数据。First, the storage engine in the database may first obtain the target data set to be stored. The multiple key-value data contained in the target data set may be the full amount of key-value data to be stored.

需要说明的是，本说明书对上述数据库的具体类型不作特别限定，在一示出的实施方式中，该数据库可以是以key-value结构存储数据的任何类型的数据库，比如图数据库，等等，本说明书对此不做具体限定。It should be noted that this specification does not specifically limit the specific type of the above-mentioned database. In one embodiment shown, the database can be any type of database that stores data in a key-value structure, such as a graph database, etc. This specification does not specifically limit this.

在一示出的实施方式中，该待存储的目标数据集可以位于存储引擎管理的内存中，以待写入磁盘中实现数据的持久化。In an illustrated embodiment, the target data set to be stored may be located in a memory managed by a storage engine, to be written to a disk to achieve data persistence.

需要说明的是，上述磁盘可以是存储引擎管理的本地磁盘，也可以是基于分布式文件系统的云磁盘，等等，本说明书对此不做具体限定。It should be noted that the above disk can be a local disk managed by a storage engine, or a cloud disk based on a distributed file system, etc. This specification does not make specific limitations on this.

进一步地，存储引擎可以将待存储的目标数据集划分为与其包含的key-value数据的数据量相匹配的多个数据片段(Segment)，或者说多个分区。其中，每个数据片段中可以包含目标数据集中的部分key-value数据。Furthermore, the storage engine can divide the target dataset to be stored into multiple data segments (segments), or partitions, that match the amount of key-value data it contains. Each data segment can contain part of the key-value data in the target dataset.

在一示出的实施方式中，存储引擎具体可以将目标数据集划分为2^N个数据片段，其中，N可以为大于或者等于0的整数，也就是说，存储引擎可以将数据集划分为2个、4个、8个或者16个数据片段，等等，本说明书对此不做具体限定。In one illustrated embodiment, the storage engine may specifically divide the target data set into^2N data segments, where N may be an integer greater than or equal to 0. That is, the storage engine may divide the data set into 2, 4, 8 or 16 data segments, and so on. This specification does not impose any specific limitations on this.

应理解，一般情况下，目标数据集中包含的数据量越大，划分的数据片段可以越多，以避免单个数据片段数据过多的问题；相应的，目标数据集中包含的数据量越小，划分的数据片段可以越少，甚至在目标数据集极小的情况下，可以不对其进行划分，或者说仅将其划分为单个数据片段(即N等于0)。It should be understood that, generally speaking, the larger the amount of data contained in the target data set, the more data segments that can be divided, so as to avoid the problem of too much data in a single data segment; correspondingly, the smaller the amount of data contained in the target data set, the fewer data segments that can be divided, and even when the target data set is extremely small, it may not be divided, or it may be divided into only a single data segment (that is, N is equal to 0).

在一示出的实施方式中，存储引擎可以计算与目标数据集中的每个key-value数据的key对应的哈希值。需要说明的是，本申请对计算哈希值的具体算法不作特别限定。In one embodiment, the storage engine can calculate a hash value corresponding to the key of each key-value data in the target data set. It should be noted that the present application does not specifically limit the specific algorithm for calculating the hash value.

进一步地，存储引擎可以根据计算得到的哈希值，确定与每个key-value数据对应的数据片段，并将每个key-value数据划分到对应的数据片段中。Furthermore, the storage engine can determine the data segment corresponding to each key-value data according to the calculated hash value, and divide each key-value data into the corresponding data segment.

在一示出的实施方式中，本申请可以选取与每个key-value数据的key对应的哈希值中的若干个比特(bit)为作为索引，将每个key-value数据划分到对应的数据片段中，等等，本说明书对此不做具体限定。In one illustrated embodiment, the present application may select several bits in the hash value corresponding to the key of each key-value data as an index, divide each key-value data into corresponding data segments, and so on. This specification does not make specific limitations on this.

请参阅图2，图2是一示例性实施例提供的一种数据集的分段示意图。如图2所示，存储引擎可以将目标数据集划分为4个数据片段，分别为Segment0、Segment1、Segment2、Segment3。Please refer to Figure 2, which is a schematic diagram of segmentation of a data set provided by an exemplary embodiment. As shown in Figure 2, the storage engine can divide the target data set into four data segments, namely Segment0, Segment1, Segment2, and Segment3.

如图2所示，该多个Segment可以存储于磁盘的一个文件中，并且，在该文件的尾部，即最后一个数据片段后面，可以存储与该多个数据片段对应的脚注信息(Footer)。As shown in FIG2 , the multiple segments may be stored in a file on a disk, and footer information corresponding to the multiple data segments may be stored at the end of the file, that is, after the last data segment.

在一示出的实施方式中，该脚注信息可以包括该多个数据片段的数量，每个数据片段在文件中的偏移地址(offset)或者说起始地址，每个数据片段的长度，等等，本说明书对此不做具体限定。In one illustrated embodiment, the footer information may include the number of the multiple data fragments, the offset address (offset) or starting address of each data fragment in the file, the length of each data fragment, etc., which is not specifically limited in this specification.

示例性的，图2所示的脚注信息具体可以包含如下信息：
Footer{
uint32_t SegmentCount＝4；
uint32_t Segment0Offset＝0；
uint32_t Segment0Length＝10240；
uint32_t Segment1Offset＝10240；
uint32_t Segment1Length＝4096；
uint32_t Segment2Offset＝14336；
uint32_t Segment2Length＝8192；
uint32_t Segment3Offset＝22528；
uint32_t Segment3Length＝16384；
}；For example, the footnote information shown in FIG2 may specifically include the following information:
Footer
uint32_t SegmentCount = 4;
uint32_t Segment0Offset＝0;
uint32_t Segment0Length=10240;
uint32_t Segment1Offset=10240;
uint32_t Segment1Length=4096;
uint32_t Segment2Offset=14336;
uint32_t Segment2Length=8192;
uint32_t Segment3Offset=22528;
uint32_t Segment3Length=16384;
};

如上所述，脚注信息描述了数据片段的总数量为4，第一个数据片段的偏移地址为0，长度为10240字节；第二个数据片段的偏移地址为10240，长度为4096字节；第三个数据片段的偏移地址为14336，长度为8192字节；第四个数据片段的偏移地址为22528，长度为16384字节。As mentioned above, the footer information describes that the total number of data fragments is 4, the offset address of the first data fragment is 0, and the length is 10240 bytes; the offset address of the second data fragment is 10240, and the length is 4096 bytes; the offset address of the third data fragment is 14336, and the length is 8192 bytes; the offset address of the fourth data fragment is 22528, and the length is 16384 bytes.

基于上述对目标数据集的划分，相应的，在存储该目标数据集时，可以依次将每个数据片段存储至磁盘中，并在已存储的最后一个数据片段后面进一步存储相应的脚注信息。Based on the above division of the target data set, when storing the target data set, each data segment may be stored in the disk in sequence, and corresponding footer information may be further stored after the last stored data segment.

进一步地，在完成上述多个数据片段的存储之后，存储引擎可以根据查询时输入的key计算其哈希值，然后根据该哈希值在多个数据片段中确定出对应的数据片段，并在该数据片段内部查找与输入的key对应的数据，具体可参考后续实施例的描述，此处不再展开详述。Furthermore, after completing the storage of the above-mentioned multiple data fragments, the storage engine can calculate the hash value based on the key entered during the query, and then determine the corresponding data fragment among the multiple data fragments based on the hash value, and search for the data corresponding to the input key within the data fragment. For details, please refer to the description of the subsequent embodiments, which will not be elaborated here.

步骤S102，计算与每个数据片段中包括的多个key-value数据分别对应的完美哈希槽位，并将每个数据片段划分为与其包含的key-value数据的数据量相匹配的多个数据块；其中，每个数据块中包含按序排列的若干完美哈希槽位，每个完美哈希槽位中包含与其对应的至少一个key-value数据。Step S102, calculate the perfect hash slots corresponding to the multiple key-value data included in each data fragment, and divide each data fragment into multiple data blocks that match the amount of key-value data it contains; wherein each data block contains a number of perfect hash slots arranged in sequence, and each perfect hash slot contains at least one key-value data corresponding thereto.

进一步地，在每个数据片内部，存储引擎可以基于该数据片段内的多个key-value数据的key进行完美哈希计算，具体可以计算与该多个key-value数据分别对应的完美哈希槽位(slot)。示例性的，若当前数据片段中包含M个key-value数据，则完美哈希映射表的完美哈希槽位的数量应该大于或者等于M，可以将完美哈希槽位的总数标记为SlotCount。Furthermore, within each data slice, the storage engine can perform a perfect hash calculation based on the keys of the multiple key-value data in the data slice, specifically calculating the perfect hash slots corresponding to the multiple key-value data. For example, if the current data slice contains M key-value data, the number of perfect hash slots in the perfect hash map should be greater than or equal to M, and the total number of perfect hash slots can be marked as SlotCount.

需要说明的是，本申请对完美哈希映射的具体实现方法不作特别限定，本申请主要阐述基于完美哈希映射后的数据组织和存储方式。It should be noted that this application does not specifically limit the specific implementation method of perfect hash mapping. This application mainly describes the data organization and storage method based on perfect hash mapping.

在一示出的实施方式中，本申请可以先计算与每个key-value数据的key对应的普通哈希值，该普通哈希值例如可以为上述步骤S101中计算得到的哈希值。进一步地，本申请可以在该普通哈希值的基础上进行进一步计算，从而得到与每个key-value数据对应的完美哈希槽位，也就是说，本申请可以将计算得到的普通哈希值作为新的key来计算完美哈希。In one illustrated embodiment, the present application may first calculate a common hash value corresponding to the key of each key-value data. The common hash value may be, for example, the hash value calculated in step S101 above. Furthermore, the present application may perform further calculations based on the common hash value to obtain a perfect hash slot corresponding to each key-value data. In other words, the present application may use the calculated common hash value as a new key to calculate a perfect hash.

通常情况下，完美哈希映射可以使得每个key-value数据对应的完美哈希槽位互不冲突，但是，在一些极端情况下，若多个key的普通哈希值相同，则与这多个key对应的完美哈希槽位也会相同，即一个完美哈希槽位会对应多个key-value数据。Normally, a perfect hash mapping ensures that the perfect hash slots corresponding to each key-value data do not conflict with each other. However, in some extreme cases, if the ordinary hash values of multiple keys are the same, the perfect hash slots corresponding to these multiple keys will also be the same, that is, one perfect hash slot will correspond to multiple key-value data.

综上，每个数据片段中可以包含按序排列的多个哈希槽位，例如slot0、slot1、slot2、slot3，等等，每个哈希槽位中可以包含至少一个key-value数据，具体地，可以包含对至少一个key-value数据进行编码后得到的编码数据(EncodedKVM)。其中，M代表与KV数据对应的元数据(meta)。In summary, each data fragment can contain multiple hash slots arranged in sequence, such as slot0, slot1, slot2, slot3, and so on. Each hash slot can contain at least one key-value data, specifically, encoded data (EncodedKVM) obtained by encoding at least one key-value data. M represents the metadata corresponding to the KV data.

相应的，在一示出的实施方式中，本申请在存储每个数据片段时，可以顺序保存每个完美哈希槽位中的key-value数据。进一步地，在具体的保存过程中，存储引擎可以将多个哈希槽位中的key-value数据统一打包成一个数据块(DataBlock)，示例性的，可以将连续的2^k个哈希槽位中的所有key-value数据打包为一个数据块。其中，k可以为大于或者等于0的整数，例如，k为2，即表示将每四个连续的完美哈希槽位打包成一个数据块。Accordingly, in one embodiment shown, the present application can sequentially save the key-value data in each perfect hash slot when storing each data fragment. Furthermore, in the specific saving process, the storage engine can uniformly package the key-value data in multiple hash slots into a data block (DataBlock). For example, all key-value data in^2k consecutive hash slots can be packaged into a data block. Wherein, k can be an integer greater than or equal to 0. For example, k is 2, which means that every four consecutive perfect hash slots are packaged into a data block.

在一示出的实施方式中，存储引擎可以根据每个数据片段中包含的key-value数据的数据量，例如每个数据片段中包含的所有key-value数据的平均长度，确定应该把数据片段划分为多少个数据块，相当于确定数据片段中的每个数据块所包含的完美哈希槽位的数量。示例性的，若key-value数据较小，则可以尽可能地将更多的哈希槽位打包到单个DataBlock内，即划分更少的数据块，从而实现更高的压缩效率，降低整体的内存索引空间的大小。示例性的，若key-value数据较大，则可以尽可能地将更少的哈希槽位打包到单个DataBlock内，即划分更多的数据块，以避免查询时单次IO过大，从而导致查询效率降低的问题。In one illustrated embodiment, the storage engine can determine how many data blocks a data segment should be divided into based on the amount of key-value data contained in each data segment, such as the average length of all key-value data contained in each data segment, which is equivalent to determining the number of perfect hash slots contained in each data block in the data segment. For example, if the key-value data is small, more hash slots can be packed into a single DataBlock as much as possible, that is, fewer data blocks can be divided, thereby achieving higher compression efficiency and reducing the size of the overall memory index space. For example, if the key-value data is large, fewer hash slots can be packed into a single DataBlock as much as possible, that is, more data blocks can be divided, to avoid the problem of a single IO being too large during querying, thereby reducing query efficiency.

如上所述，本申请在将目标数据集划分为多个数据片段的基础上，还可以将每个数据片段划分为多个数据块，在数据片段内部，所有的key-value数据将以数据块为单位顺序存放，每个数据块中可以包括按序排列的多个完美哈希槽位，每个完美哈希槽位中可以包含至少一个key-value数据。相应的，在存储每个数据片段时，可以依次保存其中的DataBlock0、DataBlock1、DataBlock2……，等等。As described above, in addition to dividing the target dataset into multiple data segments, this application also divides each data segment into multiple data blocks. Within each data segment, all key-value data is stored sequentially in data blocks. Each data block can include multiple perfectly hashed slots arranged in sequence, and each perfect hash slot can contain at least one key-value data. Accordingly, when storing each data segment, DataBlock0, DataBlock1, DataBlock2, and so on can be stored sequentially.

请参阅图3，图3是一示例性实施例提供的一种数据片段的内部结构示意图。如图3所示，每个数据片段内部可以包括按序排列的多个数据区域(或者说字段)，具体可以包括：完美哈希索引区、多个数据块、数据分片区、数据块偏移地址(DataBlockOffsets)区、哈希值区和数据片段脚注信息(Segmentfooter)区。Please refer to Figure 3, which is a schematic diagram of the internal structure of a data segment provided by an exemplary embodiment. As shown in Figure 3, each data segment may include multiple data areas (or fields) arranged in sequence, specifically including: a perfect hash index area, multiple data blocks, a data shard area, a data block offset address (DataBlockOffsets) area, a hash value area, and a data segment footer information (Segmentfooter) area.

如图3所示，位于头部的完美哈希索引区中保存了与该数据片段中的多个key-value数据对应的完美哈希参数(phcontext)，该完美哈希参数用于计算与该多个key-value数据分别对应的完美哈希槽位。As shown in FIG3 , the perfect hash index area located in the header stores perfect hash parameters (phcontext) corresponding to multiple key-value data in the data segment. The perfect hash parameters are used to calculate the perfect hash slots corresponding to the multiple key-value data.

如图3所示，完美哈希索引区之后包括多个按序排列的数据块，例如DataBlock1、DataBlock2等。在一示出的实施方式中，存储引擎保存每个数据块时还可以对数据块进行整体压缩或者对数据块中的部分key-value数据单独进行压缩，等等，以减少占用的存储空间，本说明书对此不做具体限定。As shown in FIG3 , the perfect hash index area includes multiple sequentially arranged data blocks, such as DataBlock1, DataBlock2, etc. In one illustrated embodiment, the storage engine may further compress each data block as a whole or separately compress some key-value data in the data block, etc., to reduce the occupied storage space, although this specification does not impose specific limitations on this.

如图3所示，在最后一个数据块之后紧跟着一个附加数据(ExtraData)区域，该附加数据区域可以作为数据分片区，用于存储较大的key-value数据。在一示出的实施方式中，响应于上述目标数据集中的目标key-value数据的长度大于预设阈值(例如256字节或者1024字节等)，存储引擎可以对该目标key-value数据进行切分，得到多个数据分片。其中，该多个数据分片中的目标数据分片(例如目标key-value数据中长度未超出预设阈值的部分)存储于与该目标key-value数据对应的完美哈希槽位中，而该多个数据分片中的其他数据分片可以存储于该ExtraData区域中。进一步地，与该目标key-value数据对应的完美哈希槽位中还可以保存其他数据分片在该ExtraData区域中的偏移地址。As shown in Figure 3, an additional data (ExtraData) area follows the last data block, and the additional data area can be used as a data sharding area for storing larger key-value data. In one embodiment shown, in response to the length of the target key-value data in the above-mentioned target data set being greater than a preset threshold (for example, 256 bytes or 1024 bytes, etc.), the storage engine can split the target key-value data to obtain multiple data shards. Among them, the target data shards in the multiple data shards (for example, the portion of the target key-value data whose length does not exceed the preset threshold) are stored in the perfect hash slot corresponding to the target key-value data, and the other data shards in the multiple data shards can be stored in the ExtraData area. Furthermore, the perfect hash slot corresponding to the target key-value data can also save the offset addresses of other data shards in the ExtraData area.

示例性的，如果某条key-value数据小于所有key-value数据的平均长度的两倍，则可以将该key-value数据直接顺序保存在数据块的完美哈希槽位中。若某条key-value数据超过所有key-value数据的平均长度的两倍，则可以将其进行截断，仅在数据块的完美哈希槽位中保留部分数据，并将额外的被截断的多余数据分片保存到ExtraData区域，以及在其完美哈希槽位中记录该被截断的多余数据分片在ExtraData区域的偏移地址。For example, if a key-value data is less than twice the average length of all key-value data, the key-value data can be directly stored in the perfect hash slot of the data block. If a key-value data exceeds twice the average length of all key-value data, it can be truncated, and only part of the data can be retained in the perfect hash slot of the data block. The additional truncated data fragments are saved in the ExtraData area, and the offset address of the truncated data fragments in the ExtraData area is recorded in its perfect hash slot.

如图3所示，哈希值区中保存前面每个完美哈希槽位中的key-value数据中的key的哈希值(即普通哈希值)。在一示出的实施方式中，每个哈希值的长度具体可以是32bit或者64bit，等等，本说明书对此不做具体限定。As shown in Figure 3, the hash value area stores the hash value (i.e., the normal hash value) of the key in the key-value data in each perfect hash slot. In one embodiment shown, the length of each hash value can be 32 bits or 64 bits, etc., which is not specifically limited in this specification.

如图3所示，在数据片段的尾部，存储了与该数据片段对应的脚注信息。在一示出的实施方式中，该脚注信息中可以包含以下示出的一个或多个信息的组合：前面各个数据区域(包括完美哈希索引区、各个DataBlock、ExtraData区域和哈希值区)在该数据片段中的偏移地址以及长度；每个数据块是否被压缩的信息；每个哈希槽位中的key-value数据是否被独立压缩的信息，等等，本说明书对此不做具体限定。通过该脚注信息中汇总的各项信息，可以获取数据片段中各个数据区域的偏移地址，从而可以在该数据片段中快速读取出各个数据区域中具体的数据内容。As shown in Figure 3, at the end of the data segment, footer information corresponding to the data segment is stored. In one embodiment shown, the footer information may include a combination of one or more of the following information: the offset address and length of each of the previous data areas (including the perfect hash index area, each DataBlock, ExtraData area and hash value area) in the data segment; information on whether each data block is compressed; information on whether the key-value data in each hash slot is independently compressed, etc. This specification does not make specific restrictions on this. Through the various information summarized in the footer information, the offset address of each data area in the data segment can be obtained, so that the specific data content in each data area in the data segment can be quickly read.

进一步地，请参阅图4，图4是一示例性实施例提供的一种数据块的内部结构示意图。如图4所示，每个数据块内部可以包括按序排列的多个数据区域(或者说字段)，具体可以包括：按序排列的多个完美哈希槽位、哈希槽位偏移地址区、哈希槽位压缩标识(slotCompressFlags)区和数据块控制信息区。Further, please refer to Figure 4, which is a schematic diagram of the internal structure of a data block provided by an exemplary embodiment. As shown in Figure 4, each data block may include multiple data areas (or fields) arranged in sequence, specifically including: multiple perfect hash slots arranged in sequence, a hash slot offset address area, a hash slot compression flag (slotCompressFlags) area, and a data block control information area.

如图4所示，在每个数据块内部，首先顺序存入了与每个完美哈希槽位对应的key-value数据。在一示出的实施方式中，每个完美哈希槽位中存储的key-value数据具体可以是基于预设的编码规则将key-value数据进行编码后得到的编码数据，即EncodedKVM。As shown in Figure 4, the key-value data corresponding to each perfect hash slot is first stored sequentially within each data block. In one embodiment, the key-value data stored in each perfect hash slot can be encoded data obtained by encoding the key-value data based on a preset encoding rule, namely, EncodedKVM.

如图4所示，在最后一个哈希槽位之后可以紧跟着哈希槽位偏移地址区，该哈希槽位偏移地址区中保存了前面所有完美哈希槽位在该数据块中的偏移地址。在一示出的实施方式中，完美哈希槽位偏移地址的大小可以是1字节，2字节，或者4字节，等等，本说明书对此不做具体限定。若所有EncodedKVM的总长度小于256字节，则每个offset可以仅占1个字节，足以表示所有完美哈希槽位的偏移地址；若所有EncodedKVM的总长度大于或等于256字节，且小于或等于65536字节，则每个offset可以占2个字节；否则，若所有EncodedKVM的总长度大于65536字节，则每个offset占4个字节，如此可以根据实际需求减少偏移地址对内存的占用。As shown in Figure 4, the last hash slot may be followed by a hash slot offset address area, in which the offset addresses of all previous perfect hash slots in the data block are stored. In one embodiment shown, the size of the perfect hash slot offset address may be 1 byte, 2 bytes, or 4 bytes, etc., and this specification does not specifically limit this. If the total length of all EncodedKVMs is less than 256 bytes, each offset may only occupy 1 byte, which is sufficient to represent the offset addresses of all perfect hash slots; if the total length of all EncodedKVMs is greater than or equal to 256 bytes and less than or equal to 65536 bytes, each offset may occupy 2 bytes; otherwise, if the total length of all EncodedKVMs is greater than 65536 bytes, each offset may occupy 4 bytes, so that the memory occupied by the offset address can be reduced according to actual needs.

如图4所示，在哈希槽位偏移地址区之后可以紧跟着哈希槽位压缩标识区，该哈希槽位压缩标识区中包括的多个哈希槽位压缩标识可以用于指示每个完美哈希槽位中的key-value数据是否被压缩。其中，每个完美哈希槽位的压缩标识可以对应一个bit，标识为0可以指示该完美哈希槽位的数据未被压缩，标识为1可以指示该完美哈希槽位的数据已被压缩，本说明书对此不做具体限定。As shown in FIG4 , the hash slot offset address area may be followed by a hash slot compression identification area. The multiple hash slot compression identification areas included in the hash slot compression identification area may be used to indicate whether the key-value data in each perfect hash slot is compressed. The compression identification of each perfect hash slot may correspond to one bit, where a bit of 0 indicates that the data in the perfect hash slot is not compressed, and a bit of 1 indicates that the data in the perfect hash slot is compressed. This specification does not impose any specific restrictions on this.

如图4所示，位于数据块尾部的数据块控制信息区(CtrlByte)中可以存储整个数据块的控制信息。在一示出的实施方式中，该CtrlByte可以占据1个字节。As shown in Figure 4, the data block control information area (CtrlByte) located at the end of the data block can store control information of the entire data block. In an embodiment shown, the CtrlByte can occupy 1 byte.

如图4所示，CtrlByte中的最低两比特(bit1和bit0)可以表示前述哈希槽位压缩标记区的总长度，若bit1和bit0为00(即0x0)，则表示不存在哈希槽位压缩标记，相当于表示该数据块内的所有key-value数据均未被压缩；若bit1和bit0为01(即0x1)，则表示哈希槽位压缩标记区总共占据1个字节，相当于表示每个数据块中包含了8个完美哈希槽位；若bit1和bit0为10(即0x2)，则表示哈希槽位压缩标记区总共占据2个字节，相当于表示每个数据块中包含了16个完美哈希槽位；若bit1和bit0为11(即0x3)，则表示哈希槽位压缩标记区总共占据4个字节，相当于表示每个数据块中包含了32个完美哈希槽位。As shown in Figure 4, the lowest two bits (bit1 and bit0) in CtrlByte can represent the total length of the aforementioned hash slot compression mark area. If bit1 and bit0 are 00 (i.e., 0x0), it means that there is no hash slot compression mark, which is equivalent to indicating that all key-value data in the data block are not compressed; if bit1 and bit0 are 01 (i.e., 0x1), it means that the hash slot compression mark area occupies a total of 1 byte, which is equivalent to indicating that each data block contains 8 perfect hash slots; if bit1 and bit0 are 10 (i.e., 0x2), it means that the hash slot compression mark area occupies a total of 2 bytes, which is equivalent to indicating that each data block contains 16 perfect hash slots; if bit1 and bit0 are 11 (i.e., 0x3), it means that the hash slot compression mark area occupies a total of 4 bytes, which is equivalent to indicating that each data block contains 32 perfect hash slots.

如图4所示，CtrlByte中的第4和第3个比特位(bit3和bit2)可以用于记录每个哈希槽位偏移地址所占的空间大小。如图4所示，bit3和bit2为00(即0x0)可以表示每个哈希槽位偏移地址占1个字节；bit3和bit2为01(即0x1)可以表示每个哈希槽位偏移地址占2个字节；bit3和bit2为10(即0x2)可以表示每个哈希槽位偏移地址占4个字节。As shown in Figure 4, the fourth and third bits (bit 3 and bit 2) in the CtrlByte can be used to record the size of the space occupied by each hash slot offset address. As shown in Figure 4, if bits 3 and 2 are 00 (i.e., 0x0), each hash slot offset address occupies 1 byte; if bits 3 and 2 are 01 (i.e., 0x1), each hash slot offset address occupies 2 bytes; and if bits 3 and 2 are 10 (i.e., 0x2), each hash slot offset address occupies 4 bytes.

如图4所示，CtrlByte中剩余的第8至第5个比特位(bit7至bit4)为预留(reserved)区，可以用于描述其他任何可能的信息，本说明书对此不做具体限定。As shown in FIG4 , the remaining bits 8 to 5 (bit 7 to bit 4) in CtrlByte are reserved areas and can be used to describe any other possible information, which is not specifically limited in this specification.

进一步地，请参阅图5，图5是一示例性实施例提供的一种完美哈希槽位的内部数据组织示意图。如图5所示，若单个完美哈希槽位仅对应一条key-value数据，则该完美哈希槽位存储的编码数据中具体可以包括以下多个内容：控制信息区、键数据区、分片信息区、值数据区、元数据区。Further, please refer to Figure 5, which is a schematic diagram of the internal data organization of a perfect hash slot provided by an exemplary embodiment. As shown in Figure 5, if a single perfect hash slot corresponds to only one key-value data item, the encoded data stored in the perfect hash slot may specifically include the following multiple contents: a control information area, a key data area, a shard information area, a value data area, and a metadata area.

如图5所示，控制信息区中包含与该完美哈希槽位中的编码数据所包含的key-value数据对应的控制信息。在一示出的实施方式中，该控制信息可以占据1个字节。As shown in Figure 5, the control information area includes control information corresponding to the key-value data included in the encoded data in the perfect hash slot. In one embodiment shown, the control information may occupy 1 byte.

示例性的，控制信息中的最高比特位(即bit7)可以用于指示该完美哈希槽位中是否仅包含一个key-value数据。示例性的，若该bit7为0，则表示该完美哈希槽位中仅包含一个key-value数据，若该bit7为1，则表示该完美哈希槽位中包含多个key-value数据。应理解，图5所示为单个完美哈希槽位仅对应一条key-value数据时的数据组织示意图，因此图5中的bit7应为0。Exemplarily, the highest bit (i.e., bit 7) in the control information can be used to indicate whether the perfect hash slot contains only one key-value data. Exemplarily, if bit 7 is 0, it means that the perfect hash slot contains only one key-value data. If bit 7 is 1, it means that the perfect hash slot contains multiple key-value data. It should be understood that Figure 5 shows a data organization diagram when a single perfect hash slot corresponds to only one key-value data, so bit 7 in Figure 5 should be 0.

示例性的，控制信息中的bit6可以用于指示该完美哈希槽位中包含的key-value数据是否具有对应的元数据，该元数据例如为key-value数据的写入时间，等等，本说明书对此不做具体限定。示例性的，若该bit6为0，则表示该完美哈希槽位中的key-value数据不包含对应的元数据，若该bit6为1，则表示该完美哈希槽位中的key-value数据包含对应的元数据。Exemplarily, bit 6 in the control information can be used to indicate whether the key-value data contained in the perfect hash slot has corresponding metadata, such as the write time of the key-value data, etc. This specification does not specifically limit this. Exemplarily, if bit 6 is 0, it indicates that the key-value data in the perfect hash slot does not contain corresponding metadata. If bit 6 is 1, it indicates that the key-value data in the perfect hash slot does contain corresponding metadata.

示例性的，控制信息中的bit5可以用于指示该完美哈希槽位中包含的key-value数据是否为切分过的数据分片。示例性的，若该bit5为0，则表示该完美哈希槽位中的key-value数据不是切分过的数据分片，相应的，数据片段中可能不包含ExtraData区域；若该bit5为1，则表示该完美哈希槽位中的key-value数据是切分过的数据分片，相应的，数据片段中包含ExtraData区域。Exemplarily, bit 5 in the control information can be used to indicate whether the key-value data contained in the perfect hash slot is a data fragment that has been segmented. Exemplarily, if bit 5 is 0, it indicates that the key-value data in the perfect hash slot is not a data fragment that has been segmented, and accordingly, the data segment may not contain an ExtraData area; if bit 5 is 1, it indicates that the key-value data in the perfect hash slot is a data fragment that has been segmented, and accordingly, the data segment contains an ExtraData area.

示例性的，控制信息中的bit4可以用于指示键数据区中的键数据长度(keysize)字段所占用的长度，0表示keysize字段占1个字节，1表示keysize字段占2个字节。Exemplarily, bit 4 in the control information can be used to indicate the length occupied by the key data length (keysize) field in the key data area, 0 indicates that the keysize field occupies 1 byte, and 1 indicates that the keysize field occupies 2 bytes.

示例性的，控制信息中的bit2和bit3可以用于表示其他任何可能的信息，本说明书对此不做具体限定。Illustratively, bit 2 and bit 3 in the control information may be used to represent any other possible information, and this specification does not impose any specific limitation on this.

示例性的，控制信息中的bit0和bit1可以用于表示与当前value对应的类型(ValueType)，如图4所示，bit0和bit1为00可以表示该value是更新(update)的数据；bit0和bit1为01可以表示该value是改写(overwrite)的数据；bit0和bit1为10可以表示该value是删除(delete)的数据，本说明书对此不做具体限定。Exemplarily, bit0 and bit1 in the control information can be used to indicate the type (ValueType) corresponding to the current value. As shown in FIG4 , bit0 and bit1 being 00 may indicate that the value is updated data; bit0 and bit1 being 01 may indicate that the value is overwritten data; bit0 and bit1 being 10 may indicate that the value is deleted data. This specification does not impose any specific restrictions on this.

如图5所示，在控制信息区之后紧跟着键数据区，该键数据区中依次包括用于描述key的长度的keysize字段，以及key数据本身。As shown in FIG5 , the control information area is followed by the key data area, which includes a keysize field for describing the length of the key and the key data itself.

如图5所示，在键数据区之后紧跟着分片信息区，该分片信息区中可以存储前述对key-value数据进行切分后得到的数据分片数量(ExtraCnt)，以及每个数据分片在ExtraData区域中的偏移地址(ExtraDataOffsets)。相应的，在进行数据查询时，可以通过这些分片信息(即数据分片的元数据信息)获取完整的key-value数据。应理解，若该完美哈希槽位中的key-value数据不是特别大，则不会进行数据切分，相应的，编码数据中也不会存在该分片信息区。As shown in Figure 5, the key data area is followed by the shard information area, which can store the number of data shards (ExtraCnt) obtained after the aforementioned sharding of the key-value data, as well as the offset address (ExtraDataOffsets) of each data shard in the ExtraData area. Accordingly, when performing data queries, the complete key-value data can be obtained through these shard information (i.e., metadata information of the data shards). It should be understood that if the key-value data in the perfect hash slot is not particularly large, data sharding will not be performed, and accordingly, the shard information area will not exist in the encoded data.

如图5所示，在分片信息区之后紧跟着值数据区，该值数据区中依次包括用于描述经变长编码后的value的长度的VarValueSize字段，以及value数据本身。As shown in FIG5 , the value data area follows the fragment information area, and the value data area includes the VarValueSize field for describing the length of the value after variable-length encoding, and the value data itself.

如图5所示，在值数据区之后紧跟着元数据区，该元数据区中可以存储与该完美哈希槽位中的key-value数据对应的元数据，例如key-value数据的写入时间，等等，本说明书对此不做具体限定。应理解，若该完美哈希槽位中的key-value数据不包含对应的元数据(即上述控制信息中的bit6为0)，则可以不存在该元数据区。As shown in FIG5 , the metadata area follows the value data area. The metadata area may store metadata corresponding to the key-value data in the perfect hash slot, such as the write time of the key-value data, etc. This specification does not specifically limit this. It should be understood that if the key-value data in the perfect hash slot does not contain corresponding metadata (i.e., bit 6 in the above control information is 0), the metadata area may not exist.

进一步地，请参阅图6，图6是一示例性实施例提供的另一种完美哈希槽位的内部数据组织示意图。如图6所示，若单个完美哈希槽位对应多个key-value数据，则该完美哈希槽位存储的数据中具体可以包括以下多个内容：控制信息区，以及多个key-value数据各自的编码数据(可参考上述图5对应实施例的描述)，并且，在每个key-value数据的编码数据之前还包含用于描述该编码数据的长度的KVsize字段。示例性的，如图6所示，每个KVsize字段可以占据4个字节，本说明书对此不做具体限定。Further, please refer to Figure 6, which is a schematic diagram of the internal data organization of another perfect hash slot provided by an exemplary embodiment. As shown in Figure 6, if a single perfect hash slot corresponds to multiple key-value data, the data stored in the perfect hash slot may specifically include the following multiple contents: a control information area, and the encoding data of each of the multiple key-value data (refer to the description of the embodiment corresponding to Figure 5 above), and a KVsize field for describing the length of the encoded data is also included before the encoded data of each key-value data. For example, as shown in Figure 6, each KVsize field can occupy 4 bytes, and this specification does not make specific restrictions on this.

如图6所示，该控制信息区中包含与该完美哈希槽位中的编码数据所包含的key-value数据对应的控制信息。在一示出的实施方式中，该控制信息可以占据一个字节。As shown in Figure 6, the control information area includes control information corresponding to the key-value data included in the encoded data in the perfect hash slot. In one embodiment shown, the control information may occupy one byte.

如图6所示，控制信息中的最高比特位(即bit7)可以用于指示该完美哈希槽位中是否仅包含一个key-value数据。示例性的，若该bit7为0，则表示该完美哈希槽位中仅包含一个key-value数据，若该bit7为1，则表示该完美哈希槽位中包含多个key-value数据。应理解，图6所示为单个完美哈希槽位对应多条key-value数据时的数据组织示意图，因此图6中的bit7应为1。As shown in Figure 6, the highest bit (i.e., bit 7) in the control information can be used to indicate whether the perfect hash slot contains only one key-value data. For example, if bit 7 is 0, it means that the perfect hash slot contains only one key-value data. If bit 7 is 1, it means that the perfect hash slot contains multiple key-value data. It should be understood that Figure 6 shows a data organization diagram when a single perfect hash slot corresponds to multiple key-value data, so bit 7 in Figure 6 should be 1.

如图6所示，控制信息中的bit0至bit3可以表示该完美哈希槽位包含的key-value数据的数量。此外，控制信息中的bit4至bit6可以表示任何可能的信息，本说明书对此不做具体限定。As shown in Figure 6, bits 0 to 3 in the control information can represent the number of key-value data contained in the perfect hash slot. In addition, bits 4 to 6 in the control information can represent any possible information, and this specification does not specifically limit this.

步骤S103，将所述多个数据片段依次存储至磁盘中，并且，在存储每个数据片段时，按照完美哈希槽位的排列顺序依次存储其中的多个数据块。Step S103: storing the multiple data fragments in the disk in sequence, and when storing each data fragment, storing multiple data blocks therein in sequence according to the arrangement order of the perfect hash slots.

按照上述数据组织方式，本申请可以将目标数据集的多个数据片段依次存储至磁盘中，并且，在存储每个数据片段时，可以按照完美哈希槽位的顺序依次将其中的多个数据块存储至磁盘中，从而实现整体数据集以顺序追加写入的方式存储至磁盘中。According to the above data organization method, the present application can store multiple data fragments of the target data set on the disk in sequence, and when storing each data fragment, multiple data blocks therein can be stored on the disk in sequence according to the order of perfect hash slots, so that the entire data set can be stored on the disk in a sequential append-write manner.

下面，将基于上述数据组织和存储方式，对本申请提供的数据查询方法进行阐述。Below, the data query method provided by this application will be explained based on the above data organization and storage method.

首先，本申请可以接收用户发起的查询语句，并对该查询语句进行解析，得到该查询语句中包含的与待查询的key-value数据对应的目标key。First, the application can receive a query statement initiated by a user, and parse the query statement to obtain a target key contained in the query statement that corresponds to the key-value data to be queried.

然后，存储引擎可以采用上述划分数据片段时相同的哈希计算方法，计算该目标key的目标哈希值，并在磁盘中已存储的多个数据片段中确定出与该目标哈希值对应的目标数据片段。示例性的，可以选取与该目标key的哈希值中的若干个比特(bit)作为索引，确定与该目标哈希值对应的目标数据片段，等等，本说明书对此不做具体限定。The storage engine can then use the same hash calculation method used when dividing the data segments to calculate the target hash value of the target key and determine the target data segment corresponding to the target hash value from the multiple data segments stored on the disk. For example, several bits in the hash value of the target key can be selected as indexes to determine the target data segment corresponding to the target hash value, and so on. This specification does not specifically limit this.

进一步地，存储引擎可以先读取该目标数据片段的完美哈希索引区中存储的完美哈希参数，并基于该完美哈希参数，计算与该目标key对应的目标完美哈希槽位，例如SlotIndex。Furthermore, the storage engine may first read the perfect hash parameters stored in the perfect hash index area of the target data segment, and calculate the target perfect hash slot corresponding to the target key, such as SlotIndex, based on the perfect hash parameters.

进一步地，存储引擎可以读取该目标数据片段的哈希值区中存储的与该目标完美哈希槽位对应的哈希值，即Hashes[SlotIndex]。Furthermore, the storage engine may read the hash value corresponding to the target perfect hash slot stored in the hash value area of the target data segment, ie, Hashes[SlotIndex].

在一示出的实施方式中，存储引擎可以先读取该目标数据片段尾部的脚注信息(Segmentfooter)，并根据该脚注信息中记录的哈希值区在该数据片段中的偏移地址以及长度，在该目标数据片段中定位哈希值区，并进一步从该哈希值区中获取与该目标完美哈希槽位对应的哈希值Hashes[SlotIndex]，等等，本说明书对此不做具体限定。In one illustrated embodiment, the storage engine may first read the footer information (Segmentfooter) at the end of the target data segment, and locate the hash value area in the target data segment based on the offset address and length of the hash value area in the data segment recorded in the footer information, and further obtain the hash value Hashes[SlotIndex] corresponding to the target perfect hash slot from the hash value area, etc. This specification does not make any specific limitations on this.

进一步地，存储引擎可以确定当前读取出的哈希值Hashes[SlotIndex]与上述计算得到的目标哈希值是否一致。Furthermore, the storage engine can determine whether the currently read hash value Hashes[SlotIndex] is consistent with the target hash value calculated above.

在一示出的实施方式中，若读取出的哈希值Hashes[SlotIndex]与计算得到的目标哈希值不一致，则说明该数据集中并不存在待查询的目标key，查询失败。In one illustrated embodiment, if the read hash value Hashes[SlotIndex] is inconsistent with the calculated target hash value, it means that the target key to be queried does not exist in the data set and the query fails.

在一示出的实施方式中，若读取出的哈希值Hashes[SlotIndex]与计算得到的目标哈希值一致，则进一步地，存储引擎可以从该目标数据片段包含的多个数据块中确定出该目标完美哈希槽位所在的目标数据块。In one illustrated embodiment, if the read hash value Hashes[SlotIndex] is consistent with the calculated target hash value, the storage engine can further determine the target data block where the target perfect hash slot is located from the multiple data blocks contained in the target data segment.

进一步地，存储引擎可以从该数据片段的数据块偏移地址区中获取该目标数据块在该数据片段中的偏移地址，以根据该偏移地址读取出该目标数据块。Furthermore, the storage engine may obtain the offset address of the target data block in the data segment from the data block offset address area of the data segment, so as to read out the target data block according to the offset address.

例如，若将每4个哈希槽位打包成一个数据块，可以通过DataBlockOffsets[SlotIndex/4-1]获取数据块的起始位置，DataBlockOffsets[SlotIndex/4]-DataBlockOffsets[SlotIndex/4-1]则表示该数据块的长度。For example, if every 4 hash slots are packaged into a data block, the starting position of the data block can be obtained through DataBlockOffsets[SlotIndex/4-1], and DataBlockOffsets[SlotIndex/4]-DataBlockOffsets[SlotIndex/4-1] represents the length of the data block.

进一步地，存储引擎在读取出该目标数据块后，便可以从该目标数据块的目标完美哈希槽位中获取待查询的key-value数据。具体地，存储引擎可以先获取目标数据块中存储的与该目标完美哈希槽位对应的偏移地址，并根据该偏移地址从目标数据块中读取出该目标完美哈希槽位中的编码数据EncodedKVM。Furthermore, after reading the target data block, the storage engine can obtain the key-value data to be queried from the target perfect hash slot of the target data block. Specifically, the storage engine can first obtain the offset address corresponding to the target perfect hash slot stored in the target data block, and then read the encoded data EncodedKVM in the target perfect hash slot from the target data block based on the offset address.

进一步地，存储引擎可以基于与上述编码规则相对应的解码规则，对该编码数据EncodedKVM进行解码，获得该目标完美哈希槽位中包含的至少一个key-value数据。Furthermore, the storage engine may decode the encoded data EncodedKVM based on a decoding rule corresponding to the above encoding rule to obtain at least one key-value data contained in the target perfect hash slot.

进一步地，存储引擎可以确定该至少一个key-value数据中的key是否与查询输入的目标key相同。Furthermore, the storage engine may determine whether a key in the at least one key-value data is the same as a target key input by the query.

在一示出的实施方式中，若该目标完美哈希槽位中仅包含一个key-value数据，则可以直接比较该key-value数据中保存的key是否与输入的目标key相匹配，即比较二者是否相同。示例性的，若该key-value数据中的key与输入的目标key相匹配，则可以确定该key-value数据即为本次需要查询的key-value数据，直接返回其中的Value和Meta数据等。否则，若该key-value数据中的key与输入的目标key不匹配，则查询失败，返回查询失败的指示信息。In one illustrated embodiment, if the target perfect hash slot contains only one key-value data, the key stored in the key-value data can be directly compared to see if it matches the input target key, i.e., to see if they are the same. For example, if the key in the key-value data matches the input target key, then it can be determined that the key-value data is the key-value data to be queried this time, and the Value and Meta data therein can be directly returned. Otherwise, if the key in the key-value data does not match the input target key, the query fails, and an indication of the query failure is returned.

在一示出的实施方式中，若该目标完美哈希槽位中包含多个key-value数据，则依次比较每个key-value数据中的key是否与输入的目标Key相匹配。若任一key-value数据中的key与输入的目标Key相匹配，则可以确定该key-value数据即为本次需要查询的key-value数据，直接返回其中的Value和Meta数据等。若多个key-value数据中的key均与目标Key不匹配，则查询失败，返回查询失败的指示信息。In one illustrated embodiment, if the target perfect hash slot contains multiple key-value data, the key in each key-value data is compared in turn to see if it matches the input target key. If the key in any key-value data matches the input target key, it can be determined that the key-value data is the key-value data to be queried this time, and the Value and Meta data therein are directly returned. If the keys in multiple key-value data do not match the target key, the query fails and an indication of the query failure is returned.

与上述方法流程实现对应，本说明书的实施例还提供了一种基于完美哈希的数据存储装置。请参阅图7，图7是一示例性实施例提供的一种基于完美哈希的数据存储装置的结构示意图，该装置70可以应用于数据库中的存储引擎。如图7所示，该装置70包括：数据片段划分单元701，用于将待存储的目标数据集划分为与其包含的key-value数据的数据量相匹配的多个数据片段，每个数据片段中包含所述目标数据集中的部分key-value数据；完美哈希计算单元702，用于计算与每个数据片段中包含的多个key-value数据分别对应的完美哈希槽位，并将每个数据片段划分为与其包含的key-value数据的数据量相匹配的多个数据块；其中，每个数据块中包含按序排列的若干完美哈希槽位，每个完美哈希槽位中包含与其对应的至少一个key-value数据；存储单元703，用于将所述多个数据片段依次存储至磁盘中，并且，在存储每个数据片段时，按照完美哈希槽位的排列顺序依次存储其中的多个数据块。Corresponding to the implementation of the above-mentioned method flow, an embodiment of the present specification also provides a data storage device based on perfect hashing. Please refer to Figure 7, which is a structural schematic diagram of a data storage device based on perfect hashing provided by an exemplary embodiment. The device 70 can be applied to a storage engine in a database. As shown in Figure 7, the device 70 includes: a data segmentation unit 701, which is used to divide the target data set to be stored into multiple data segments that match the amount of key-value data contained therein, each data segment contains part of the key-value data in the target data set; a perfect hash calculation unit 702, which is used to calculate the perfect hash slots corresponding to the multiple key-value data contained in each data segment, and divide each data segment into multiple data blocks that match the amount of key-value data contained therein; wherein each data block contains a number of perfect hash slots arranged in sequence, and each perfect hash slot contains at least one key-value data corresponding thereto; a storage unit 703, which is used to store the multiple data segments in sequence on a disk, and when storing each data segment, the multiple data blocks therein are stored in sequence according to the arrangement order of the perfect hash slots.

在一示出的实施方式中，所述数据片段划分单元701，具体用于：根据待存储的目标数据集包含的key-value数据的数据量，确定与其相匹配的数据片段的数量；计算与所述目标数据集中包含的每个key-value数据的key对应的哈希值；根据计算得到的哈希值，将每个key-value数据划分到对应的数据片段中。In one illustrated embodiment, the data segmentation unit 701 is specifically used to: determine the number of data segments that match the target data set to be stored based on the amount of key-value data contained in the target data set; calculate the hash value corresponding to the key of each key-value data contained in the target data set; and divide each key-value data into corresponding data segments based on the calculated hash value.

在一示出的实施方式中，所述装置70还包括脚注信息存储单元704，用于：在磁盘中已存储的最后一个数据片段之后，存储与所述多个数据片段对应的脚注信息；其中，所述脚注信息包括以下示出的一个或多个信息的组合：所述多个数据片段的数量、每个数据片段的偏移地址、每个数据片段的长度。In an illustrated embodiment, the device 70 also includes a footer information storage unit 704, which is used to: store footer information corresponding to the multiple data fragments after the last data fragment stored in the disk; wherein the footer information includes a combination of one or more information shown below: the number of the multiple data fragments, the offset address of each data fragment, and the length of each data fragment.

在一示出的实施方式中，每个数据片段中还包括位于第一个数据块之前的完美哈希索引区；其中，所述完美哈希索引区中包含用于计算与该数据片段中的多个key-value数据分别对应的完美哈希槽位的完美哈希参数。In one illustrated embodiment, each data fragment also includes a perfect hash index area located before the first data block; wherein the perfect hash index area contains perfect hash parameters for calculating perfect hash slots corresponding to multiple key-value data in the data fragment.

在一示出的实施方式中，每个数据片段中还包括：位于最后一个数据块之后的数据块偏移地址区和哈希值区；其中，所述数据块偏移地址区中包含每个数据块在该数据片段中的偏移地址；哈希值区中包含与每个数据片段中的多个key-value数据的key分别对应的哈希值。In one illustrated embodiment, each data fragment also includes: a data block offset address area and a hash value area located after the last data block; wherein the data block offset address area contains the offset address of each data block in the data fragment; and the hash value area contains hash values corresponding to the keys of multiple key-value data in each data fragment.

在一示出的实施方式中，所述数据片段中还包括数据分片区；所述装置70还包括数据分片单元705，用于：响应于所述目标数据集中的目标key-value数据的长度大于预设阈值，对所述目标key-value数据进行切分，得到多个数据分片；其中，所述多个数据分片中的目标数据分片存储于与所述目标key-value数据对应的完美哈希槽位中，其他分片存储于所述数据分片区中。In an illustrated embodiment, the data fragment also includes a data sharding area; the device 70 also includes a data sharding unit 705, which is used to: in response to the length of the target key-value data in the target data set being greater than a preset threshold, split the target key-value data to obtain multiple data shards; wherein the target data shard among the multiple data shards is stored in the perfect hash slot corresponding to the target key-value data, and the other shards are stored in the data sharding area.

在一示出的实施方式中，每个完美哈希槽位中包含的至少一个key-value数据包括：基于预设的编码规则对该至少一个key-value数据进行编码后得到的编码数据。In an illustrated embodiment, the at least one key-value data included in each perfect hash slot includes: encoded data obtained by encoding the at least one key-value data based on a preset encoding rule.

在一示出的实施方式中，每个完美哈希槽位中的编码数据中包含与key-value数据对应的控制信息，所述控制信息包含以下示出的一个或多个信息的组合：该完美哈希槽位中是否仅包含一个key-value数据；该完美哈希槽位中包含的key-value数据是否具有对应的元数据；该完美哈希槽位中包含的key-value数据是否为切分过的数据分片。In one illustrated embodiment, the encoded data in each perfect hash slot includes control information corresponding to the key-value data, and the control information includes a combination of one or more of the following information: whether the perfect hash slot contains only one key-value data; whether the key-value data contained in the perfect hash slot has corresponding metadata; and whether the key-value data contained in the perfect hash slot is a segmented data fragment.

在一示出的实施方式中，每个数据块中还包括：位于最后一个完美哈希槽位之后的哈希槽位偏移地址区；其中，所述哈希槽位偏移地址区包含每个完美哈希槽位在该数据块中的偏移地址。In one illustrated embodiment, each data block further includes: a hash slot offset address area located after the last perfect hash slot; wherein the hash slot offset address area includes the offset address of each perfect hash slot in the data block.

在一示出的实施方式中，所述装置70还包括数据查询单元706，用于：解析接收到的查询语句，得到所述查询语句中包含的与待查询的key-value数据对应的目标key；计算所述目标key的目标哈希值，并在所述多个数据片段中确定出与所述目标哈希值对应的目标数据片段；获取所述目标数据片段中的完美哈希索引区中存储的完美哈希参数，并基于所述完美哈希参数，计算与所述目标key对应的目标完美哈希槽位；以及，从所述目标数据片段包含的多个数据块中读取出所述目标完美哈希槽位所在的目标数据块，并从所述目标数据块的所述目标完美哈希槽位中获取所述待查询的key-value数据。In an illustrated embodiment, the device 70 also includes a data query unit 706, which is used to: parse the received query statement to obtain the target key corresponding to the key-value data to be queried contained in the query statement; calculate the target hash value of the target key, and determine the target data segment corresponding to the target hash value in the multiple data fragments; obtain the perfect hash parameters stored in the perfect hash index area in the target data segment, and calculate the target perfect hash slot corresponding to the target key based on the perfect hash parameters; and read the target data block where the target perfect hash slot is located from the multiple data blocks contained in the target data segment, and obtain the key-value data to be queried from the target perfect hash slot of the target data block.

在一示出的实施方式中，所述数据查询单元706，具体用于：获取所述目标数据块中存储的与所述目标完美哈希槽位对应的偏移地址，并基于所述偏移地址从所述目标数据块中读取出所述目标完美哈希槽位中的编码数据；基于与所述编码规则相对应的解码规则，对所述编码数据进行解码，获得至少一个key-value数据；确定所述至少一个key-value数据中的key是否与所述目标key相同；若任一key-value数据中的key与所述目标key相同，则确定该key-value数据为待查询的key-value数据。In an illustrated embodiment, the data query unit 706 is specifically used to: obtain the offset address corresponding to the target perfect hash slot stored in the target data block, and read the encoded data in the target perfect hash slot from the target data block based on the offset address; decode the encoded data based on a decoding rule corresponding to the encoding rule to obtain at least one key-value data; determine whether the key in the at least one key-value data is the same as the target key; if the key in any key-value data is the same as the target key, determine that the key-value data is the key-value data to be queried.

上述装置70中各个单元的功能和作用的实现过程具体详见上述图1-图6对应实施例的描述，在此不再进行赘述。应理解，上述装置70可以通过软件实现，也可以通过硬件或者软硬件结合的方式实现。以软件实现为例，作为逻辑意义上的装置，是通过所在设备的处理器(CPU)将对应的计算机程序指令读取到内存中运行形成的。从硬件层面而言，除了CPU以及存储器之外，上述装置所在的设备通常还包括用于进行无线信号收发的芯片等其他硬件，和/或用于实现网络通信功能的板卡等其他硬件。The implementation process of the functions and effects of each unit in the above-mentioned device 70 is specifically described in the description of the corresponding embodiments of Figures 1 to 6 above, and will not be repeated here. It should be understood that the above-mentioned device 70 can be implemented by software, or by hardware or a combination of software and hardware. Taking software implementation as an example, as a device in the logical sense, it is formed by the processor (CPU) of the device where it is located reading the corresponding computer program instructions into the memory and running them. From the hardware level, in addition to the CPU and memory, the device where the above-mentioned device is located usually also includes other hardware such as chips for wireless signal transmission and reception, and/or other hardware such as boards and cards for implementing network communication functions.

以上所描述的装置实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理模块，即可以位于一个地方，或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部单元或模块来实现本说明书方案的目的。本领域普通技术人员在不付出创造性劳动的情况下，即可以理解并实施。The device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separate, and the components shown as units may or may not be physical modules, that is, they may be located in one place or distributed across multiple network modules. Some or all of the units or modules may be selected according to actual needs to achieve the purpose of the scheme of this specification. Those of ordinary skill in the art can understand and implement the present invention without inventive effort.

上述实施例阐明的装置、单元、模块，具体可以由计算机芯片或实体实现，或者由具有某种功能的产品来实现。一种典型的实现设备为计算机，计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备、车载计算机或者这些设备中的任意几种设备的组合。The devices, units, and modules described in the above embodiments may be implemented by computer chips or entities, or by products having certain functions. A typical implementation device is a computer, which may be in the form of a personal computer, laptop computer, cellular phone, camera phone, smartphone, personal digital assistant, media player, navigation device, email transceiver, game console, tablet computer, wearable device, vehicle-mounted computer, or any combination of these devices.

与上述方法实施例相对应，本说明书的实施例还提供了一种计算机设备。请参阅图8，图8是一示例性实施例提供的一种计算机设备的结构示意图。如图8所示，该计算机设备包括处理器1001和存储器1002，进一步还可以包括输入设备1004(例如键盘等)和输出设备1005(例如显示器等)。处理器1001、存储器1002、输入设备1004和输出设备1005之间可以通过总线或其他方式连接。如图8所示，存储器1002包括计算机可读存储介质1003，该计算机可读存储介质1003存储有能够由处理器1001运行的计算机程序。处理器1001可以是CPU，微处理器，或用于控制以上方法实施例执行的集成电路。处理器1001在运行存储的计算机程序时，可以执行本说明书实施例中基于完美哈希的数据存储方法的各个步骤，包括：将待存储的目标数据集划分为与其包含的key-value数据的数据量相匹配的多个数据片段，每个数据片段中包含所述目标数据集中的部分key-value数据；计算与每个数据片段中包含的多个key-value数据分别对应的完美哈希槽位，并将每个数据片段划分为与其包含的key-value数据的数据量相匹配的多个数据块；其中，每个数据块中包含按序排列的若干完美哈希槽位，每个完美哈希槽位中包含与其对应的至少一个key-value数据；将所述多个数据片段依次存储至磁盘中，并且，在存储每个数据片段时，按照完美哈希槽位的排列顺序依次存储其中的多个数据块，等等。Corresponding to the above-mentioned method embodiment, the embodiment of this specification also provides a computer device. Please refer to Figure 8, which is a structural diagram of a computer device provided by an exemplary embodiment. As shown in Figure 8, the computer device includes a processor 1001 and a memory 1002, and may further include an input device 1004 (such as a keyboard, etc.) and an output device 1005 (such as a display, etc.). The processor 1001, the memory 1002, the input device 1004 and the output device 1005 can be connected via a bus or other means. As shown in Figure 8, the memory 1002 includes a computer-readable storage medium 1003, which stores a computer program that can be run by the processor 1001. The processor 1001 can be a CPU, a microprocessor, or an integrated circuit for controlling the execution of the above method embodiment. When the processor 1001 runs the stored computer program, it can execute the various steps of the data storage method based on perfect hashing in the embodiments of this specification, including: dividing the target data set to be stored into multiple data fragments that match the amount of key-value data contained therein, each data fragment contains part of the key-value data in the target data set; calculating the perfect hash slots corresponding to the multiple key-value data contained in each data fragment, and dividing each data fragment into multiple data blocks that match the amount of key-value data contained therein; wherein each data block contains several perfect hash slots arranged in sequence, and each perfect hash slot contains at least one key-value data corresponding thereto; storing the multiple data fragments in sequence on the disk, and, when storing each data fragment, storing the multiple data blocks therein in sequence according to the arrangement order of the perfect hash slots, and so on.

对上述基于完美哈希的数据存储方法的各个步骤的详细描述请参见之前的内容，此处不再进行赘述。For a detailed description of each step of the above-mentioned perfect hash-based data storage method, please refer to the previous content and will not be repeated here.

与上述方法实施例相对应，本说明书的实施例还提供了一种计算机可读存储介质，该存储介质上存储有计算机程序，这些计算机程序在被处理器运行时，执行本说明书实施例中基于完美哈希的数据存储方法的各个步骤。具体请参见上述图1-图6对应实施例的描述，此处不再进行赘述。Corresponding to the above method embodiments, embodiments of this specification also provide a computer-readable storage medium storing a computer program that, when executed by a processor, performs the steps of the perfect hash-based data storage method of the embodiments of this specification. For details, please refer to the description of the embodiments corresponding to Figures 1-6 above, and will not be repeated here.

以上所述仅为本说明书的较佳实施例而已，并不用以限制本说明书，凡在本说明书的精神和原则之内，所做的任何修改、等同替换、改进等，均应包含在本说明书保护的范围之内。The above description is only a preferred embodiment of this specification and is not intended to limit this specification. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of this specification should be included in the scope of protection of this specification.

在一个典型的配置中，终端设备包括一个或多个CPU、输入/输出接口、网络接口和内存。In a typical configuration, a terminal device includes one or more CPUs, input/output interfaces, network interfaces, and memory.

内存可能包括计算机可读介质中的非永久性存储器，随机存取存储器(RAM)和/或非易失性内存等形式，如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。Memory may include non-permanent storage in a computer-readable medium, in the form of random access memory (RAM) and/or non-volatile memory, such as read-only memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.

计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。Computer-readable media include permanent and non-permanent, removable and non-removable media that can store information using any method or technology. The information can be computer-readable instructions, data structures, program modules or other data.

计算机的存储介质的例子包括，但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带，磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质，可用于存储可以被计算设备访问的信息。按照本文中的界定，计算机可读介质不包括暂存电脑可读媒体(transitory media)，如调制的数据信号和载波。Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission media that can be used to store information that can be accessed by a computing device. As defined herein, computer-readable media does not include transitory media such as modulated data signals and carrier waves.

还需要说明的是，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "comprises," "includes," or any other variations thereof are intended to encompass non-exclusive inclusion, such that a process, method, commodity, or apparatus that includes a series of elements includes not only those elements but also other elements not explicitly listed, or includes elements inherent to such process, method, commodity, or apparatus. In the absence of further limitations, an element defined by the phrase "comprises a ..." does not exclude the presence of other identical elements in the process, method, commodity, or apparatus that includes the element.

本领域技术人员应明白，本说明书的实施例可提供为方法、系统或计算机程序产品。因此，本说明书的实施例可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且，本说明书的实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that the embodiments of this specification may be provided as methods, systems, or computer program products. Thus, the embodiments of this specification may take the form of entirely hardware embodiments, entirely software embodiments, or embodiments combining software and hardware. Furthermore, the embodiments of this specification may take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to magnetic disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.