Movatterモバイル変換


[0]ホーム

URL:


CN118312515A - Collaborative invalid key value pair confirmation method and garbage collection method applied to WiscKey - Google Patents

Collaborative invalid key value pair confirmation method and garbage collection method applied to WiscKey
Download PDF

Info

Publication number
CN118312515A
CN118312515ACN202410726237.1ACN202410726237ACN118312515ACN 118312515 ACN118312515 ACN 118312515ACN 202410726237 ACN202410726237 ACN 202410726237ACN 118312515 ACN118312515 ACN 118312515A
Authority
CN
China
Prior art keywords
key
value
value pair
invalid
hash table
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410726237.1A
Other languages
Chinese (zh)
Other versions
CN118312515B (en
Inventor
潘玉彪
许杰
张惠臻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaqiao University
Original Assignee
Huaqiao University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaqiao UniversityfiledCriticalHuaqiao University
Priority to CN202410726237.1ApriorityCriticalpatent/CN118312515B/en
Publication of CN118312515ApublicationCriticalpatent/CN118312515A/en
Application grantedgrantedCritical
Publication of CN118312515BpublicationCriticalpatent/CN118312515B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

The invention discloses a collaborative invalid key value pair confirmation method and a garbage collection method applied to WiscKey, which relate to the field of computer storage, wherein the collaborative invalid key value pair confirmation method comprises the following steps: if the value offset address is found in the hash table and is equal to the address where the key value pair is located, the key value pair is a new version, namely valid data; if the value offset address is found in the hash table, but is not equal to the address where the key value pair is located, the key value pair is old version, namely invalid data; if the value offset address is not found in the hash table and the corresponding bit in the bit array is 1, the key value pair is valid data; if the value offset address is not found in the hash table and the corresponding bit in the bit array is 0, then the key value pair is an invalid number. The method does not need to access the external memory, can effectively control the memory occupation, and realizes the garbage collection operation with low time cost and low space cost.

Description

Translated fromChinese
应用于WiscKey的协同无效键值对确认方法及垃圾回收方法Collaborative invalid key-value pair confirmation method and garbage collection method applied to WiscKey

技术领域Technical Field

本发明涉及计算机存储领域,具体涉及一种应用于WiscKey的协同无效键值对确认方法及垃圾回收方法。The present invention relates to the field of computer storage, and in particular to a collaborative invalid key-value pair confirmation method and a garbage collection method applied to WiscKey.

背景技术Background technique

近年来随着互联网的兴起,云计算、大数据等应用越来越广泛,NoSQL数据库日益繁荣。而键值存储作为NoSQL中的一种,出现了许多成功的商业化产品,如FaceBook的RocksDB,以及Google的LevelDB和BigTable。In recent years, with the rise of the Internet, cloud computing, big data and other applications have become more and more widespread, and NoSQL databases have become increasingly prosperous. As a type of NoSQL, key-value storage has many successful commercial products, such as FaceBook's RocksDB, Google's LevelDB and BigTable.

目前,键值存储系统的实现多种多样。其中一种是基于LSM树的键值存储系统。该系统是当下比较流行的键值存储实现。与其他实现结构相比,LSM树的主要优点是它使用了顺序的写入模式,从根本上提升了写吞吐量。由于要支持多键的范围查询和单键的点查询,LSM树会维护多个有序键范围。LSM树的第0层有多个有序键范围,而第1至第n层每层为一个有序键范围。为此,LSM树采用了一种名为层间合并的机制,将多个键值对排序,并回收其中无效的键值对。需要指出的是,LSM树的层间合并机制会占用较多的I/O带宽,与用户的读写操作产生带宽竞争,并且数据排序后的反复写入会加剧固态硬盘的寿命损耗。为了解决LSM树存在的问题,键值分离应运而生。作为键值分离的经典设计,WiscKey的主要方法就是将值使用其它的方式存储于一个值日志中,避免值被牵涉入层间合并的执行过程。键值分离策略从而显著地提升了系统的吞吐量。At present, there are many different implementations of key-value storage systems. One of them is the key-value storage system based on the LSM tree. This system is a popular key-value storage implementation. Compared with other implementation structures, the main advantage of the LSM tree is that it uses a sequential write mode, which fundamentally improves the write throughput. In order to support multi-key range queries and single-key point queries, the LSM tree maintains multiple ordered key ranges. The 0th layer of the LSM tree has multiple ordered key ranges, and each layer from the 1st to the nth layer has an ordered key range. To this end, the LSM tree uses a mechanism called inter-layer merging to sort multiple key-value pairs and recycle invalid key-value pairs. It should be pointed out that the inter-layer merging mechanism of the LSM tree will occupy more I/O bandwidth, causing bandwidth competition with the user's read and write operations, and the repeated writing of data after sorting will increase the life loss of the solid-state drive. In order to solve the problems of the LSM tree, key-value separation came into being. As a classic design of key-value separation, the main method of WiscKey is to store values in a value log in other ways to avoid the values being involved in the execution process of inter-layer merging. The key-value separation strategy significantly improves the system throughput.

然而,在不同的流行工作负载中,键值分离存储系统仍具有访存优化的空间。具体而言,键值分离存储系统WiscKey 仍然不能在更新密集型工作负载下完全实现高性能。根本原因是 WiscKey 的值日志需要频繁的垃圾回收操作,为了从被新更新而无效的键值对中回收空间。然而,值日志的垃圾回收具有较高的开销,并且如果预留空间有限,垃圾回收开销也会变得更加严重。垃圾回收操作需要查询LSM树,以检查每个键值对对的无效性。即使键值分离在一定程度上缩小了LSM树,使其层数减少,但查询仍然可能需要经过LSM树的多层,具有较高的访存开销。所以,这些查询具有很高的延迟,特别是当LSM树在较大的数据量下变得相当大时。However, in different popular workloads, key-value separation storage systems still have room for memory access optimization. Specifically, the key-value separation storage system WiscKey still cannot fully achieve high performance under update-intensive workloads. The root cause is that WiscKey's value log requires frequent garbage collection operations to reclaim space from key-value pairs that are invalidated due to new updates. However, garbage collection of value logs has high overhead, and if the reserved space is limited, the garbage collection overhead will become more serious. The garbage collection operation requires querying the LSM tree to check the invalidity of each key-value pair. Even though key-value separation shrinks the LSM tree to a certain extent and reduces its number of layers, queries may still need to go through multiple layers of the LSM tree, which has high memory access overhead. Therefore, these queries have high latency, especially when the LSM tree becomes quite large under large data volumes.

为了减少WiscKey值日志中的垃圾回收访存开销,HashKV采用了一种新的基于哈希的数据分组,替代了值日志结构。它的想法是将值存储结构划分为固定大小的分区,并通过哈希来将每个写入的键值对映射到一个分区。因为同一个键的新旧键值对都存在于同一分区中,所以每个垃圾回收操作可以选择一个分区,随后通过扫描整个分区的方式识别出无效键值对,进而回收了空间。这种无效键值对确认方法无需对LSM树执行查询,减少了原先WiscKey固有的垃圾回收访存开销。In order to reduce the garbage collection memory access overhead in the WiscKey value log, HashKV adopts a new hash-based data grouping to replace the value log structure. The idea is to divide the value storage structure into fixed-size partitions and map each written key-value pair to a partition through hashing. Because the old and new key-value pairs of the same key exist in the same partition, each garbage collection operation can select a partition, and then identify invalid key-value pairs by scanning the entire partition, thereby reclaiming space. This method of confirming invalid key-value pairs does not require queries on the LSM tree, reducing the garbage collection memory access overhead inherent in the original WiscKey.

需要指出的是,HashKV虽然通过新的值存储结构,省去了LSM树查询的过程,但其系统中其他操作的时间开销都变得较大。其原因在于,相比于WiscKey的值日志,HashKV值存储结构设计较为复杂,导致了额外开销。因此,可以得出的结论是,若能将WiscKey的键值对无效性检查的时间降至一定水平,同时继承该系统值日志结构所具有的优势,那么在更新密集型负载下,键值分离存储系统的更新性能能够获得进一步提升。It should be pointed out that although HashKV saves the LSM tree query process through the new value storage structure, the time overhead of other operations in the system has become larger. The reason is that compared with the value log of WiscKey, the design of HashKV value storage structure is more complex, resulting in additional overhead. Therefore, it can be concluded that if the time of WiscKey's key-value pair invalidity check can be reduced to a certain level, while inheriting the advantages of the system value log structure, then under update-intensive loads, the update performance of the key-value separation storage system can be further improved.

在当前,随着闪存技术的发展,被广泛使用的固态硬盘的访问速度要大大高于磁盘。但内存的访问速度仍是固态硬盘的三个数量级。因此,如果将内存操作替代WiscKey中访问LSM树的外存操作,减少访存时间开销,就能明显提升键值对无效性检查的速度。At present, with the development of flash memory technology, the access speed of widely used solid-state drives is much higher than that of disks. However, the access speed of memory is still three orders of magnitude slower than that of solid-state drives. Therefore, if the memory operation is used to replace the external memory operation of accessing the LSM tree in WiscKey, the memory access time overhead can be reduced, and the speed of key-value pair invalidity check can be significantly improved.

发明内容Summary of the invention

针对上述提到的技术问题,本申请的实施例的目的在于提出了一种应用于WiscKey的协同无效键值对确认方法及垃圾回收方法,能够在更新密集型工作负载下实现键值分离存储系统WiscKey的高效运行。本发明结合两种基础的无效键值对确认策略(一种是将LSM树层间合并期间识别出的无效数据记录在内存中,另一种是直接查询LSM树以获得数据的有效性)的优势,使两者能够在内存中协同运行,无须访问外存,并有效控制了内存占用,实现了低时间开销和低空间开销的垃圾回收操作。In response to the technical problems mentioned above, the purpose of the embodiments of the present application is to propose a collaborative invalid key-value pair confirmation method and garbage collection method applied to WiscKey, which can realize the efficient operation of the key-value separation storage system WiscKey under update-intensive workloads. The present invention combines the advantages of two basic invalid key-value pair confirmation strategies (one is to record the invalid data identified during the LSM tree inter-layer merge in the memory, and the other is to directly query the LSM tree to obtain the validity of the data), so that the two can run collaboratively in the memory without accessing the external memory, and effectively control the memory occupancy, and realize garbage collection operations with low time overhead and low space overhead.

本发明的技术方案如下。The technical solution of the present invention is as follows.

第一方面,一种应用于WiscKey的协同无效键值对确认方法,包括:In a first aspect, a collaborative invalid key-value pair confirmation method applied to WiscKey includes:

S101,对于一个要确认有效性的键值对,向哈希表发出其中键的查询操作,到S102;S101, for a key-value pair whose validity is to be confirmed, a query operation of the key in the hash table is issued, and then to S102;

S102,若查找成功,到S103,否则,到S105;S102, if the search is successful, go to S103, otherwise, go to S105;

S103,判断查询哈希表获得的值偏移地址和所述键值对在值日志的偏移地址是否相等,若相等,到S104,若不相等,到S105;S103, determining whether the value offset address obtained by querying the hash table is equal to the offset address of the key-value pair in the value log, if they are equal, proceed to S104, if not, proceed to S105;

S104,确认所述键值对为有效数据,结束确认过程;S104, confirming that the key-value pair is valid data, and ending the confirmation process;

S105,查询比特数组的对应位,到S106;S105, query the corresponding bit of the bit array, and go to S106;

S106,若对应位为1,到S104,若对应位不为1,到S107;S106, if the corresponding bit is 1, go to S104, if the corresponding bit is not 1, go to S107;

S107,确认所述键值对为无效数据。S107, confirming that the key-value pair is invalid data.

优选的,所述哈希表结构的每一个元素存储了键以及拥有键所对应的键值对在值日志中的偏移地址;Preferably, each element of the hash table structure stores a key and an offset address of a key-value pair corresponding to the key in the value log;

键值对写入值日志后,将键和元数据插入LSM树,并将键和值日志的偏移地址插入哈希表之中;具体的,哈希表中存储了LSM树较高层数据的副本;随着层间合并的进行,LSM树较高层的数据会陆续迁移至最底层;此时,后台线程会删除哈希表中不属于LSM树较高层的数据副本,使哈希表始终留存LSM较高层的数据副本;其中,较高层表示高于最底层的所有层。After the key-value pair is written to the value log, the key and metadata are inserted into the LSM tree, and the offset address of the key and value log is inserted into the hash table; specifically, the hash table stores a copy of the higher-level data of the LSM tree; as the inter-layer merge proceeds, the data in the higher layers of the LSM tree will be gradually migrated to the bottom layer; at this time, the background thread will delete the data copies in the hash table that do not belong to the higher layers of the LSM tree, so that the hash table always retains the data copies of the higher layers of the LSM tree; among them, the higher layers refer to all layers higher than the bottom layer.

优选的,所述比特数组是由多个比特位排列而成的循环数组,并且每个比特位与值日志中的每个键值对呈按序对应的关系;比特位为0时代表对应的键值对是无效的;Preferably, the bit array is a circular array composed of a plurality of bits, and each bit corresponds to each key-value pair in the value log in sequence; when the bit is 0, it means that the corresponding key-value pair is invalid;

比特数组的每一个比特位初始值为1,当LSM树层间合并操作识别出一个被更新的旧版本键和元数据时,所述旧版本键和元数据指向的键值对也是旧的,即无效的;通过旧版本键和元数据记录的键值对次序位置找到对应的比特位,并置0。The initial value of each bit of the bit array is 1. When the LSM tree inter-layer merge operation identifies an updated old version key and metadata, the key-value pair pointed to by the old version key and metadata is also old, that is, invalid. The corresponding bit is found through the order position of the key-value pair recorded in the old version key and metadata, and set to 0.

优选的,通过旧版本键和元数据记录的键值对次序位置找到对应的比特位的方法如下:Preferably, the method for finding the corresponding bit position through the key-value pair sequence position of the old version key and metadata record is as follows:

其中,index为比特数组下标,为对应键值对在键和元数据中所记录的次序位置,positionmin为全局最小次序值。Among them, index is the bit array subscript, It is the order position of the corresponding key-value pair recorded in the key and metadata, and positionmin is the global minimum order value.

另一方面,一种应用于WiscKey的垃圾回收方法,其特征在于,基于所述的协同无效键值对确认方法,具体包括:On the other hand, a garbage collection method applied to WiscKey is characterized by being based on the collaborative invalid key-value pair confirmation method, specifically comprising:

S201,值日志的空间不足时,触发垃圾回收操作;S201, when the space of the value log is insufficient, a garbage collection operation is triggered;

S202,从值日志的尾部读取一批键值对;S202, read a batch of key-value pairs from the tail of the value log;

S203,通过所述协同无效键值对确认方法,找出无效键值对并将其删除;S203, finding invalid key-value pairs and deleting them through the collaborative invalid key-value pair confirmation method;

S204,将剩余的有效键值对追加写至值日志的头部;S204, append the remaining valid key-value pairs to the header of the value log;

S205,更新比特数组的状态,以维护比特数组和值日志键值对一一对应的关系;S205, updating the state of the bit array to maintain a one-to-one correspondence between the bit array and the value log key-value pair;

S206,向LSM树写入带有新偏移地址和新次序的键和元数据,再向哈希表写入新偏移地址和新次序;S206, write the key and metadata with the new offset address and the new order to the LSM tree, and then write the new offset address and the new order to the hash table;

S207,释放值日志尾部的相应空间。S207, releasing the corresponding space at the end of the value log.

相比于现有技术,本发明具有以下有益效果:Compared with the prior art, the present invention has the following beneficial effects:

(1)本发明结合两种基础的无效键值对确认策略(一种是将LSM树层间合并期间识别出的无效数据记录在内存中,另一种是直接查询LSM树以获得数据的有效性)的优势,使两者能够在内存中协同运行,无须访问外存,拥有较高的无效键值对确认速度,进而拥有较高的垃圾回收执行速度和更新操作吞吐量;(1) The present invention combines the advantages of two basic invalid key-value pair confirmation strategies (one is to record the invalid data identified during the LSM tree inter-layer merge in the memory, and the other is to directly query the LSM tree to obtain the validity of the data), so that the two can run together in the memory without accessing the external memory, and have a higher invalid key-value pair confirmation speed, thereby having a higher garbage collection execution speed and update operation throughput;

(2)本发明仅需要将LSM树较高层的数据放至内存中,避免了过高的内存空间占用。(2) The present invention only needs to put the data of the higher layers of the LSM tree into the memory, thus avoiding excessive memory space occupation.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简要介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings required for use in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative work.

图1为本发明实施例的应用于键值分离存储系统WiscKey的协同无效键值对确认方法的框架示意图;FIG1 is a schematic diagram of a framework of a collaborative invalid key-value pair confirmation method applied to a key-value separation storage system WiscKey according to an embodiment of the present invention;

图2为本发明实施例的应用于键值分离存储系统WiscKey的协同无效键值对确认方法的流程图;2 is a flow chart of a collaborative invalid key-value pair confirmation method applied to a key-value separation storage system WiscKey according to an embodiment of the present invention;

图3为本发明实施例的应用于键值分离存储系统WiscKey的垃圾回收方法的流程图。FIG3 is a flow chart of a garbage collection method applied to a key-value separation storage system WiscKey according to an embodiment of the present invention.

具体实施方式Detailed ways

下面结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚,完整的描述。显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部实施例。The following is a clear and complete description of the technical solutions in the embodiments of the present invention in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of the embodiments.

本发明提出的应用于键值分离存储系统WiscKey的协同无效键值对确认方法包括:如果在哈希表中查找到值偏移地址,并且它与键值对所在的地址相等,则这个键值对为新版本,即有效数据;如果在哈希表中查找到值偏移地址,但是它与键值对所在的地址不相等,则这个键值对为旧版本,即无效数据;如果在哈希表中查找不到值偏移地址,且比特数组中对应位为1,则这个键值对为有效数据;如果在哈希表中查找不到值偏移地址,且比特数组中对应位为0,则这个键值对为无效数。The collaborative invalid key-value pair confirmation method proposed by the present invention and applied to the key-value separation storage system WiscKey includes: if the value offset address is found in the hash table and it is equal to the address where the key-value pair is located, then this key-value pair is a new version, that is, valid data; if the value offset address is found in the hash table, but it is not equal to the address where the key-value pair is located, then this key-value pair is an old version, that is, invalid data; if the value offset address cannot be found in the hash table, and the corresponding bit in the bit array is 1, then this key-value pair is valid data; if the value offset address cannot be found in the hash table, and the corresponding bit in the bit array is 0, then this key-value pair is an invalid number.

参见图1所示,本发明应用于键值分离存储系统WiscKey的协同无效键值对确认方法的系统结构包括一系列内存结构,每个内存结构中包括哈希表和比特数组。As shown in FIG. 1 , the system structure of the collaborative invalid key-value pair confirmation method applied to the key-value separation storage system WiscKey of the present invention includes a series of memory structures, each of which includes a hash table and a bit array.

参见图2所示,为应用于键值分离存储系统WiscKey的协同无效键值对确认方法的执行流程图,包括如下步骤。Referring to FIG. 2 , there is shown a flowchart of an execution of a collaborative invalid key-value pair confirmation method applied to a key-value separation storage system WiscKey, which includes the following steps.

101,对于一个要确认有效性的键值对,向哈希表发出其中键的查询操作,到102;101, for a key-value pair whose validity is to be confirmed, a query operation of the key in the hash table is issued to 102;

102,若查找成功,到103,否则,到105;102, if the search is successful, go to 103, otherwise, go to 105;

103,判断查询哈希表获得的值偏移地址和所述键值对在值日志的偏移地址是否相等,若相等,到104,若不相等,到105;103, determine whether the value offset address obtained by querying the hash table is equal to the offset address of the key-value pair in the value log, if they are equal, go to 104, if not equal, go to 105;

104,确认所述键值对为有效数据,结束确认过程;104, confirming that the key-value pair is valid data, and ending the confirmation process;

105,查询比特数组的对应位,到106;105, query the corresponding bit of the bit array, to 106;

106,若对应位为1,到104,若对应位不为1,到107;106, if the corresponding bit is 1, go to 104, if the corresponding bit is not 1, go to 107;

107,确认所述键值对为无效数据。107, confirming that the key-value pair is invalid data.

本发明中,在垃圾回收过程中,哈希表替代了LSM树,用于完成内存中键值对的无效性检查。在检查每一个键值对是否无效时,只需查询该结构即可,无需查询LSM树。该结构的每一个元素不仅存储了键,还拥有它所对应的键值对在值日志中的偏移地址。这些数据来源于每次的写入操作。当键值对写入值日志后,会将键和元数据插入LSM树,随后将键和值日志的偏移地址插入哈希表之中。实际上,哈希表中存储的是LSM树的数据副本。但需要指出的是,哈希表虽然替代了LSM树在垃圾回收中的功能,但是并不是存储了LSM树的所有数据副本,而仅仅存储了LSM树较高层(高于最底层的所有层)数据的副本。随着层间合并的进行,LSM树较高层的数据会陆续迁移至最底层。此时,一个后台线程会删除哈希表中不属于LSM树较高层的数据副本,使哈希表始终留存LSM较高层的数据副本。注意,哈希表存储了LSM树的部分数据的副本,并不意味着LSM树的基础功能被剥夺。相反,范围查询和点查询仍然需要用到LSM树。In the present invention, during the garbage collection process, the hash table replaces the LSM tree to complete the invalidity check of the key-value pair in the memory. When checking whether each key-value pair is invalid, it is only necessary to query the structure without querying the LSM tree. Each element of the structure not only stores the key, but also has the offset address of the key-value pair corresponding to it in the value log. These data come from each write operation. When the key-value pair is written to the value log, the key and metadata are inserted into the LSM tree, and then the offset address of the key and value log is inserted into the hash table. In fact, what is stored in the hash table is a copy of the data of the LSM tree. However, it should be pointed out that although the hash table replaces the function of the LSM tree in garbage collection, it does not store all the data copies of the LSM tree, but only stores the copies of the data of the higher layers of the LSM tree (all layers higher than the bottom layer). As the inter-layer merge proceeds, the data of the higher layers of the LSM tree will be migrated to the bottom layer one after another. At this time, a background thread will delete the data copies that do not belong to the higher layers of the LSM tree in the hash table, so that the hash table always retains the data copies of the higher layers of the LSM. Note that the fact that the hash table stores a copy of some of the data in the LSM tree does not mean that the basic functions of the LSM tree are deprived. On the contrary, range queries and point queries still require the use of the LSM tree.

比特数组是由多个比特位排列而成的循环数组,并且每个比特位与值日志中的每个键值对呈按序对应的关系。比特位为0时代表对应的键值对是无效的。比特数组的每一个比特位初始值为1,而当LSM树层间合并操作识别出一个被更新的旧版本键和元数据(元数据包括值大小、值偏移地址和次序等字段)时,意味着所述键和元数据指向的键值对也是旧的(即无效的),此时可通过所述键和元数据记录的键值对次序位置找到对应的比特位,并置0,代表着所述键和元数据对应的键值对是旧的(即无效的)。由键和元数据记录的键值对次序位置找到对应比特位的方法如下:The bit array is a circular array composed of multiple bits, and each bit corresponds to each key-value pair in the value log in order. When the bit is 0, it means that the corresponding key-value pair is invalid. The initial value of each bit of the bit array is 1. When the LSM tree inter-layer merge operation identifies an updated old version of the key and metadata (the metadata includes fields such as value size, value offset address and order), it means that the key-value pair pointed to by the key and metadata is also old (that is, invalid). At this time, the corresponding bit can be found through the key-value pair sequence position recorded in the key and metadata, and set to 0, indicating that the key-value pair corresponding to the key and metadata is old (that is, invalid). The method to find the corresponding bit by the key-value pair sequence position recorded in the key and metadata is as follows:

其中,index为比特数组下标,为对应键值对在键和元数据中所记录的次序位置,positionmin为全局最小次序值。Among them, index is the bit array subscript, It is the order position of the corresponding key-value pair recorded in the key and metadata, and positionmin is the global minimum order value.

参见图3所示,为应用于键值分离存储系统WiscKey的垃圾回收方法的执行流程图,包括如下步骤。Referring to FIG. 3 , there is shown a flowchart of an execution of a garbage collection method applied to the key-value separation storage system WiscKey, which includes the following steps.

S201,值日志的空间不足时,触发垃圾回收操作;S201, when the space of the value log is insufficient, a garbage collection operation is triggered;

S202,从值日志的尾部读取一批键值对;S202, read a batch of key-value pairs from the tail of the value log;

S203,通过协同无效键值对确认方法,找出无效键值对并将其删除;S203, finding invalid key-value pairs and deleting them by coordinating the invalid key-value pair confirmation method;

S204,将剩余的有效键值对追加写至值日志的头部;S204, append the remaining valid key-value pairs to the header of the value log;

S205,更新比特数组的状态,以维护比特数组和值日志键值对一一对应的关系;S205, updating the state of the bit array to maintain a one-to-one correspondence between the bit array and the value log key-value pair;

S206,向LSM树写入带有新偏移地址和新次序的键和元数据,再向哈希表写入新偏移地址和新次序;S206, write the key and metadata with the new offset address and the new order to the LSM tree, and then write the new offset address and the new order to the hash table;

S207,释放值日志尾部的相应空间。S207, releasing the corresponding space at the end of the value log.

以上所述,仅为本发明的具体实施方式,但本发明的保护范围不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到的变化或替换,都应该涵盖在本发明的保护范围之内。因此本发明的保护范围应以权利要求的保护范围为准。The above is only a specific embodiment of the present invention, but the protection scope of the present invention is not limited thereto. Any changes or substitutions that can be easily thought of by a person skilled in the art within the technical scope disclosed by the present invention should be included in the protection scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (5)

Translated fromChinese
1.一种应用于WiscKey的协同无效键值对确认方法,其特征在于,包括:1. A collaborative invalid key-value pair confirmation method applied to WiscKey, characterized by comprising:S101,对于一个要确认有效性的键值对,向哈希表发出其中键的查询操作,到S102;S101, for a key-value pair whose validity is to be confirmed, a query operation of the key in the hash table is issued, and then to S102;S102,若查找成功,到S103,否则,到S105;S102, if the search is successful, go to S103, otherwise, go to S105;S103,判断查询哈希表获得的值偏移地址和所述键值对在值日志的偏移地址是否相等,若相等,到S104,若不相等,到S105;S103, determining whether the value offset address obtained by querying the hash table is equal to the offset address of the key-value pair in the value log, if they are equal, proceed to S104, if not, proceed to S105;S104,确认所述键值对为有效数据,结束确认;S104, confirming that the key-value pair is valid data, and ending the confirmation;S105,查询比特数组的对应位,到S106;S105, query the corresponding bit of the bit array, and go to S106;S106,若对应位为1,到S104,若对应位不为1,到S107;S106, if the corresponding bit is 1, go to S104, if the corresponding bit is not 1, go to S107;S107,确认所述键值对为无效数据。S107, confirming that the key-value pair is invalid data.2.根据权利要求1所述的应用于WiscKey的协同无效键值对确认方法,其特征在于,2. The collaborative invalid key-value pair confirmation method applied to WiscKey according to claim 1, characterized in that:所述哈希表结构的每一个元素存储了键以及拥有键所对应的键值对在值日志中的偏移地址;Each element of the hash table structure stores a key and an offset address of a key-value pair corresponding to the key in the value log;键值对写入值日志后,将键和元数据插入LSM树,并将键和值日志的偏移地址插入哈希表之中;具体的,哈希表中存储了LSM树较高层数据的副本;随着层间合并的进行,LSM树较高层的数据会陆续迁移至最底层;此时,后台线程会删除哈希表中不属于LSM树较高层的数据副本,使哈希表始终留存LSM较高层的数据副本;其中,较高层表示高于最底层的所有层。After the key-value pair is written to the value log, the key and metadata are inserted into the LSM tree, and the offset address of the key and value log is inserted into the hash table; specifically, the hash table stores a copy of the higher-level data of the LSM tree; as the inter-layer merge proceeds, the data in the higher layers of the LSM tree will be gradually migrated to the bottom layer; at this time, the background thread will delete the data copies in the hash table that do not belong to the higher layers of the LSM tree, so that the hash table always retains the data copies of the higher layers of the LSM tree; among them, the higher layers refer to all layers higher than the bottom layer.3.根据权利要求1所述的应用于WiscKey的协同无效键值对确认方法,其特征在于,3. The collaborative invalid key-value pair confirmation method applied to WiscKey according to claim 1, characterized in that:所述比特数组是由多个比特位排列而成的循环数组,并且每个比特位与值日志中的每个键值对呈按序对应的关系;比特位为0时代表对应的键值对是无效的;The bit array is a circular array composed of a plurality of bits, and each bit corresponds to each key-value pair in the value log in sequence; when a bit is 0, it means that the corresponding key-value pair is invalid;比特数组的每一个比特位初始值为1,当LSM树层间合并操作识别出一个被更新的旧版本键和元数据时,所述旧版本键和元数据指向的键值对也是旧的,即无效的;通过旧版本键和元数据记录的键值对次序位置找到对应的比特位,并置0。The initial value of each bit of the bit array is 1. When the LSM tree inter-layer merge operation identifies an updated old version key and metadata, the key-value pair pointed to by the old version key and metadata is also old, that is, invalid. The corresponding bit is found by the order position of the key-value pair recorded in the old version key and metadata, and set to 0.4.根据权利要求3所述的应用于WiscKey的协同无效键值对确认方法,其特征在于,4. The collaborative invalid key-value pair confirmation method applied to WiscKey according to claim 3 is characterized in that:通过旧版本键和元数据记录的键值对次序位置找到对应的比特位的方法如下:The method to find the corresponding bit position through the key-value pair sequence of the old version key and metadata record is as follows: ;其中,index为比特数组下标,为对应键值对在键和元数据中所记录的次序位置,positionmin为全局最小次序值。Among them, index is the bit array subscript, It is the order position of the corresponding key-value pair recorded in the key and metadata, and positionmin is the global minimum order value.5.一种应用于WiscKey的垃圾回收方法,其特征在于,基于如权利要求1~4中任一所述的协同无效键值对确认方法,具体包括:5. A garbage collection method applied to WiscKey, characterized in that, based on the collaborative invalid key-value pair confirmation method as described in any one of claims 1 to 4, specifically comprising:S201,值日志的空间不足时,触发垃圾回收操作;S201, when the space of the value log is insufficient, a garbage collection operation is triggered;S202,从值日志的尾部读取一批键值对;S202, read a batch of key-value pairs from the tail of the value log;S203,通过所述协同无效键值对确认方法,找出无效键值对并将其删除;S203, finding invalid key-value pairs and deleting them through the collaborative invalid key-value pair confirmation method;S204,将剩余的有效键值对追加写至值日志的头部;S204, append the remaining valid key-value pairs to the header of the value log;S205,更新比特数组的状态,以维护比特数组和值日志键值对一一对应的关系;S205, updating the state of the bit array to maintain a one-to-one correspondence between the bit array and the value log key-value pair;S206,向LSM树写入带有新偏移地址和新次序的键和元数据,再向哈希表写入新偏移地址和新次序;S206, write the key and metadata with the new offset address and the new order to the LSM tree, and then write the new offset address and the new order to the hash table;S207,释放值日志尾部的相应空间。S207, releasing the corresponding space at the end of the value log.
CN202410726237.1A2024-06-062024-06-06 Collaborative invalid key-value pair confirmation method and garbage collection method applied to WiscKeyActiveCN118312515B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202410726237.1ACN118312515B (en)2024-06-062024-06-06 Collaborative invalid key-value pair confirmation method and garbage collection method applied to WiscKey

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202410726237.1ACN118312515B (en)2024-06-062024-06-06 Collaborative invalid key-value pair confirmation method and garbage collection method applied to WiscKey

Publications (2)

Publication NumberPublication Date
CN118312515Atrue CN118312515A (en)2024-07-09
CN118312515B CN118312515B (en)2024-09-24

Family

ID=91726080

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202410726237.1AActiveCN118312515B (en)2024-06-062024-06-06 Collaborative invalid key-value pair confirmation method and garbage collection method applied to WiscKey

Country Status (1)

CountryLink
CN (1)CN118312515B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109558084A (en)*2018-11-292019-04-02文华学院A kind of data processing method and relevant device
WO2019156309A1 (en)*2018-02-092019-08-15연세대학교 산학협력단Key-value-based data access device and method using internal parallelism of flash storage device
CN110389942A (en)*2019-06-212019-10-29华中科技大学 A key-value separation storage method and system without garbage collection
CN112131140A (en)*2020-09-242020-12-25北京计算机技术及应用研究所 A key-value separation storage method supporting efficient storage space management based on SSD
CN116909939A (en)*2023-07-042023-10-20浙江大学LSM tree-based key value separation storage engine garbage recycling method, system and equipment
WO2024022330A1 (en)*2022-07-252024-02-01华为云计算技术有限公司Metadata management method based on file system, and related device thereof

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2019156309A1 (en)*2018-02-092019-08-15연세대학교 산학협력단Key-value-based data access device and method using internal parallelism of flash storage device
CN109558084A (en)*2018-11-292019-04-02文华学院A kind of data processing method and relevant device
CN110389942A (en)*2019-06-212019-10-29华中科技大学 A key-value separation storage method and system without garbage collection
CN112131140A (en)*2020-09-242020-12-25北京计算机技术及应用研究所 A key-value separation storage method supporting efficient storage space management based on SSD
WO2024022330A1 (en)*2022-07-252024-02-01华为云计算技术有限公司Metadata management method based on file system, and related device thereof
CN116909939A (en)*2023-07-042023-10-20浙江大学LSM tree-based key value separation storage engine garbage recycling method, system and equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
英昌甜;王维庆;于炯;卞琛;国冰磊;祁雷;: "内存计算环境下基于索引结构的内存优化策略", 新疆大学学报(自然科学版), no. 01, 29 January 2018 (2018-01-29)*

Also Published As

Publication numberPublication date
CN118312515B (en)2024-09-24

Similar Documents

PublicationPublication DateTitle
CN113821171B (en)Key value storage method based on hash table and LSM tree
CN110347852B (en) File system and file management method embedded in horizontally scalable key-value storage system
CN110825748B (en)High-performance and easily-expandable key value storage method by utilizing differentiated indexing mechanism
TWI683217B (en)Deduplication memory module using dedupe dram system algorithm architecture and method thereof
US7882304B2 (en)System and method for efficient updates of sequential block storage
US8219749B2 (en)System and method for efficient updates of sequential block storage
US11947839B2 (en)Storage device, system, and method for customizable metadata
CN113535670B (en)Virtual resource mirror image storage system and implementation method thereof
CN112394874B (en)Key value KV storage method and device and storage equipment
WO2018205151A1 (en)Data updating method and storage device
CN113553476A (en)Key value storage method for reducing write pause by utilizing Hash
CN111737261A (en) Compressed log caching method and device based on LSM-Tree
CN114741028B (en) A persistent key-value storage method, device and system based on OCSSD
CN115203079A (en)Method for writing data into solid state disk
CN109407985B (en)Data management method and related device
CN110780806B (en)Method and system for facilitating atomicity guarantee for metadata and data bundled storage
CN116009776A (en)Data storage method, controller and system for NVM and SSD
CN115576956A (en)Data processing method, system, equipment and storage medium
CN113407111B (en) Flash memory controller, flash memory controller method and memory device
TWI715408B (en)Flash memory controller, memory device and method for accessing flash memory module
CN118312515B (en) Collaborative invalid key-value pair confirmation method and garbage collection method applied to WiscKey
US12164480B2 (en)Optimizing file system defrag for deduplicated block storage
MishraA survey of LSM-Tree based Indexes, Data Systems and KV-stores
US20240020014A1 (en)Method for Writing Data to Solid-State Drive
CN117873916A (en) A retrieval method based on disk-level cache data redirection

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp