技术领域Technical Field
本发明涉及计算机存储领域,特别是指一种基于多树转换机制的键值存储方法。The invention relates to the field of computer storage, and in particular to a key value storage method based on a multi-tree conversion mechanism.
背景技术Background Art
随着大数据时代的到来,数据储量不断增加,数据特点发生了明显变化,非结构化数据占比已达到了数据总量的80%。在这样的数据变化趋势下,键值存储被广泛应用,它对数据结构没有明确限制,具有很高的可扩展性。目前已经有一些优秀的键值存储产品出现且被广泛应用,如Chrome浏览器使用的Leveldb,facebook基于Leveldb进一步提出改进的Rocksdb,Twitter使用的Redis和Memcached等。目前的键值存储所使用的数据结构主要有哈希结构、B树以及日志结构合并树(LSM树),这三种结构分别具有不同的优缺点。With the advent of the big data era, data storage continues to increase, data characteristics have changed significantly, and unstructured data accounts for 80% of the total data. Under such data change trends, key-value storage is widely used. It has no clear restrictions on data structure and has high scalability. At present, some excellent key-value storage products have appeared and are widely used, such as Leveldb used by Chrome browser, Rocksdb further improved by Facebook based on Leveldb, Redis and Memcached used by Twitter, etc. The data structures used by current key-value storage mainly include hash structure, B-tree and log structure merge tree (LSM tree), and these three structures have different advantages and disadvantages.
由于基于LSM树键值存储系统具有良好的写性能,并且支持范围扫描,因此出现了许多相关产品。但由于LSM树结构固有的结构特点(外存中的层级结构),该结构的维护和使用会对键值存储系性能造成消极影响。以Leveldb为例,首先,Leveldb的写放大的来源主要是层间合并操作中的数据重写。例如,Leveldb中每一层的大小时上一层的10倍,将i层中的一个文件合并到i+1层时,在最坏的情况下,需要读取i+1层的10个文件,合并排序后再将这些数据写回i+1层,写放大倍数高达10,在这种情况下,将一个SST文件从L0层移动到L6层,写放大最高会达到50。其次,Leveldb的读放大的来源主要有两个方面。一方面,在查找一个键值对时,Leveldb可能需要在多个层级中进行查找,在最坏情况下,需要在L0中查找8个文件,在L1-L6层中每层需要查找一个文件,一共需要查找14个文件;另一方面,在一个SST文件中查找一个键值对时,Leveldb需要读取该文件的多个元数据块,实际读取的数据包括索引块、bloom过滤器块和数据块。例如,当查找1KB的键值对时,Leveldb需要读取16KB的索引块,4KB的过滤器块和4KB的数据块,共24KB的数据。在最差的情况下,需要读取14个SST文件,所以读放大会高达336。Since the LSM tree-based key-value storage system has good write performance and supports range scanning, many related products have emerged. However, due to the inherent structural characteristics of the LSM tree structure (hierarchical structure in external memory), the maintenance and use of this structure will have a negative impact on the performance of the key-value storage system. Taking Leveldb as an example, first of all, the source of Leveldb's write amplification is mainly the data rewriting in the inter-layer merge operation. For example, the size of each layer in Leveldb is 10 times that of the previous layer. When merging a file in layer i to layer i+1, in the worst case, it is necessary to read 10 files in layer i+1, merge and sort, and then write these data back to layer i+1. The write amplification factor is as high as 10. In this case, moving an SST file from layer L0 to layer L6 will result in a write amplification of up to 50. Secondly, the sources of Leveldb's read amplification are mainly in two aspects. On the one hand, when searching for a key-value pair, Leveldb may need to search in multiple levels. In the worst case, it needs to search 8 files in L0 and one file in each layer of L1-L6, for a total of 14 files. On the other hand, when searching for a key-value pair in an SST file, Leveldb needs to read multiple metadata blocks of the file. The data actually read includes index blocks, bloom filter blocks, and data blocks. For example, when searching for a 1KB key-value pair, Leveldb needs to read 16KB index blocks, 4KB filter blocks, and 4KB data blocks, for a total of 24KB of data. In the worst case, 14 SST files need to be read, so the read amplification will be as high as 336.
发明内容Summary of the invention
本发明的主要目的在于克服现有技术中的上述缺陷,提出一种基于多树转换机制的键值存储方法,使用热度策略减少读放大的同时,保证写入性能,实现键值存储系统性能的整体提升。The main purpose of the present invention is to overcome the above-mentioned defects in the prior art and propose a key-value storage method based on a multi-tree conversion mechanism, which uses a heat strategy to reduce read amplification while ensuring write performance, thereby achieving an overall improvement in the performance of the key-value storage system.
本发明采用如下技术方案:The present invention adopts the following technical solution:
一种基于多树转换机制的键值存储方法,包括:A key-value storage method based on a multi-tree conversion mechanism, comprising:
对于写入的键值数据,首先保存至写入跳表中,当大小达到限制后,转换为只读跳表插入至内存设备中的B+树;当B+树的大小达到一定限制时,遍历键值数据;For the key-value data written, it is first saved in the write skip list. When the size reaches the limit, it is converted into a read-only skip list and inserted into the B+ tree in the memory device. When the size of the B+ tree reaches a certain limit, the key-value data is traversed.
根据热度策略进行判断,将热度低的键值数据持久化至外存设备中的冷树的0层,热度高的键值数据则继续留存在B+树中;According to the heat strategy, the key-value data with low heat is persisted to the 0th layer of the cold tree in the external storage device, and the key-value data with high heat continues to remain in the B+ tree;
若冷树的0层中的键值数据文件数量达到大小限制,则触发0层分区操作,0层分区操作依据层中现有的键值数据范围进行均分,若0层中某一分区的键值数据文件数量达到大小限制,则再次触发分区操作;If the number of key-value data files in layer 0 of the cold tree reaches the size limit, the layer 0 partition operation is triggered. The layer 0 partition operation is evenly divided according to the existing key-value data range in the layer. If the number of key-value data files in a partition in layer 0 reaches the size limit, the partition operation is triggered again.
当B+树中热度较低的键值数据执行持久化操作时,0层分区只接收符合设定范围的键值数据;若冷树中范围内的键值数据经热度策略判断后达到一定热度时,则将这一范围内的键值数据转移至外存设备的热树中,并使用内存设备中的B+树索引热树中的键值数据;When the key value data with lower heat in the B+ tree performs persistence operation, the 0-level partition only receives the key value data that meets the set range; if the key value data in the range in the cold tree reaches a certain heat after being judged by the heat strategy, the key value data in this range will be transferred to the hot tree of the external storage device, and the key value data in the hot tree will be indexed by the B+ tree in the memory device;
当热树的大小达到一定限制时,对键值数据根据热度策略进行判断,将一个或多个低热度范围内的键值数据转移至冷树分区中,并从B+树中删除这些数据的索引。When the size of the hot tree reaches a certain limit, the key value data is judged according to the heat strategy, one or more key value data in the low heat range are transferred to the cold tree partition, and the index of these data is deleted from the B+ tree.
具体地,所述热度策略具体为:Specifically, the heat strategy is:
将只读跳表中的键值数据插入至B+树结构中时,保存插入时间字段,用于判断键值数据的热度情况;When inserting the key value data in the read-only skip list into the B+ tree structure, the insertion time field is saved to determine the popularity of the key value data;
根据节点中记录的时间字段判断该键值数据是否需要保留在B+树中,若不需保留,则持久化至外存设备的冷树分区内;Determine whether the key value data needs to be retained in the B+ tree based on the time field recorded in the node. If not, persist it to the cold tree partition of the external storage device.
对于外存设备中的键值数据,使用全局热度表来记录键值数据的操作数,当执行冷树中的合并操作时,若一个位于冷树内的范围单位的操作数达到阈值,则将此范围单位内的所有键值数据转移至热树中;For the key-value data in the external memory device, a global heat table is used to record the number of operations of the key-value data. When performing a merge operation in the cold tree, if the number of operations of a range unit in the cold tree reaches the threshold, all the key-value data in this range unit are transferred to the hot tree;
当热树达到一定大小限制时,使用全局热度表对热树内的键值数据执行热度判断,根据键值数据的范围热度情况,判断该范围内的键值数据是否需要保留热树中,若不需保留,则将热树中这些范围内的键值数据转移至冷树中。When the hot tree reaches a certain size limit, the global heat table is used to perform heat judgment on the key-value data in the hot tree. According to the range heat of the key-value data, it is judged whether the key-value data in the range needs to be retained in the hot tree. If not, the key-value data in these ranges in the hot tree are transferred to the cold tree.
具体地,还包括数据读取步骤,具体为:Specifically, it also includes a data reading step, which is specifically:
S11:首先在缓存区中查找目标数据,若找到,到步骤S12;若未找到,到步骤S13;S11: First, search for the target data in the cache area. If found, go to step S12; if not found, go to step S13;
S12:系统完成用户的读取请求,向用户发送读取到的目标数据;S12: The system completes the user's read request and sends the read target data to the user;
S13:在B+树中查找目标键及值类型,若找到且值类型为实际值,到步骤S12,若找到且值类型为地址信息,到步骤S14;若未找到,到步骤S15;S13: Search the target key and value type in the B+ tree. If found and the value type is an actual value, go to step S12. If found and the value type is address information, go to step S14. If not found, go to step S15.
S14:在外存设备的热树中根据上步得到的地址读取目标数据,到步骤S12;S14: Read the target data in the heat tree of the external storage device according to the address obtained in the previous step, and go to step S12;
S15:在外存设备的冷树中自顶向下按层查找目标数据,若找到,到步骤S12;若未找到,到步骤S16;S15: Search the target data from top to bottom in the cold tree of the external storage device. If found, go to step S12; if not found, go to step S16;
S16:系统完成用户的读取请求,向用户发送结果,未找到目标数据。S16: The system completes the user's read request and sends the result to the user. The target data is not found.
具体地,还包括数据写入步骤,具体为:Specifically, it also includes a data writing step, which is specifically:
S21:首先写入跳表结构中,如果达到大小限制,到步骤S22;S21: first write into the skip list structure, if the size limit is reached, go to step S22;
S22:将此跳表的转换为只读跳表,并创建新的写入跳表结构用于数据写入;S22: converting the jump table into a read-only jump table and creating a new write jump table structure for data writing;
S23:若B+树大小未达到限制,到步骤S25;若达到限制,到步骤S24;S23: If the B+ tree size does not reach the limit, go to step S25; if it reaches the limit, go to step S24;
S24:遍历B+树中的键值数据,将热度较低的键值数据持久化至外存设备的冷树中,到步骤S25;S24: traverse the key-value data in the B+ tree, and persist the key-value data with lower heat to the cold tree of the external storage device, and go to step S25;
S25:将只读跳表结构内的键值数据与当前时间保存至B+树结构中;S25: Save the key value data and the current time in the read-only skip list structure into the B+ tree structure;
S26:系统完成用户的写入请求,向用户发送写入成功的信息。S26: The system completes the user's write request and sends a write success message to the user.
具体地,应用的存储结构具体为:Specifically, the storage structure of the application is as follows:
写入跳表、只读跳表、B+树以及磁盘中的冷树结构与热树结构;其中所述冷树结构为3层日志结构合并树,包括0层日志结构、1层日志结构、2层日志结构,且0层日志结构存在分区结构;热树结构为单层日志结构合并树。Write skip list, read-only skip list, B+ tree and cold tree structure and hot tree structure in disk; wherein the cold tree structure is a 3-layer log structure merge tree, including 0-layer log structure, 1-layer log structure, 2-layer log structure, and the 0-layer log structure has a partition structure; the hot tree structure is a single-layer log structure merge tree.
由上述对本发明的描述可知,与现有技术相比,本发明具有如下有益效果:It can be seen from the above description of the present invention that, compared with the prior art, the present invention has the following beneficial effects:
(1)本发明提出一种基于多树转换机制的键值存储方法,具体包括:对于写入的键值数据,首先保存至写入跳表中,当大小达到限制后,转换为只读跳表插入至内存设备中的B+树;当B+树的大小达到一定限制时,遍历键值数据;根据热度策略进行判断,将热度低的键值数据持久化至外存设备中的冷树的0层,热度高的键值数据则继续留存在B+树中;若冷树的0层中的键值数据文件数量达到大小限制,则触发0层分区操作,0层分区操作依据层中现有的键值数据范围进行均分,若0层中某一分区的键值数据文件数量达到大小限制,则再次触发分区操作;当B+树中热度较低的键值数据执行持久化操作时,0层分区只接收符合设定范围的键值数据;若冷树中范围内的键值数据经热度策略判断后达到一定热度时,则将这一范围内的键值数据转移至外存设备的热树中,并使用内存设备中的B+树索引热树中的键值数据;当热树的大小达到一定限制时,对键值数据根据热度策略进行判断,将一个或多个低热度范围内的键值数据转移至冷树分区中,并从B+树中删除这些数据的索引。本发明提供的方法使用热度策略减少读放大的同时,保证写入性能,实现键值存储系统性能的整体提升。(1) The present invention proposes a key-value storage method based on a multi-tree conversion mechanism, which specifically includes: for the written key-value data, first save it to the write jump list, and when the size reaches the limit, convert it into a read-only jump list and insert it into the B+ tree in the memory device; when the size of the B+ tree reaches a certain limit, traverse the key-value data; judge according to the heat strategy, persist the low-heat key-value data to the 0th layer of the cold tree in the external storage device, and continue to retain the high-heat key-value data in the B+ tree; if the number of key-value data files in the 0th layer of the cold tree reaches the size limit, trigger the 0th layer partition operation, and the 0th layer partition operation is divided equally according to the existing key-value data range in the layer, if the 0th layer When the number of key-value data files of a partition reaches the size limit, the partition operation is triggered again; when the key-value data with lower heat in the B+ tree performs a persistence operation, the 0-layer partition only receives the key-value data that meets the set range; if the key-value data within the range in the cold tree reaches a certain heat after being judged by the heat strategy, the key-value data within this range is transferred to the hot tree of the external storage device, and the B+ tree in the memory device is used to index the key-value data in the hot tree; when the size of the hot tree reaches a certain limit, the key-value data is judged according to the heat strategy, and one or more key-value data within the low heat range are transferred to the cold tree partition, and the index of these data is deleted from the B+ tree. The method provided by the present invention uses the heat strategy to reduce read amplification while ensuring write performance, thereby achieving an overall improvement in the performance of the key-value storage system.
(2)本发明一方面,通过热度策略决定持久化操作,将热数据暂留在内存中,避免从外存设备中读取造成额外IO开销;另一方面,使用全局热度判断策略,索引外存设备中频繁访问的键值数据,加速键值数据的读取过程,提高读取性能。(2) On the one hand, the present invention determines the persistence operation through the heat strategy, temporarily retains the hot data in the memory, and avoids the additional IO overhead caused by reading from the external storage device; on the other hand, a global heat judgment strategy is used to index the frequently accessed key-value data in the external storage device, accelerate the reading process of the key-value data, and improve the reading performance.
(3)本发明使用内存中的B+树保存热树中的地址信息,能够索引到热树中的所有有效数据,从而减少热数据的读取次数,并且B+树能够保证一定的范围查找性能。通过减少冷树的层级与0层分区的设计,减少冷数据的读取次数,减小了读放大。(3) The present invention uses the B+ tree in the memory to store the address information in the hot tree, which can index all valid data in the hot tree, thereby reducing the number of times the hot data is read, and the B+ tree can ensure a certain range search performance. By reducing the level of the cold tree and the design of the 0-level partition, the number of times the cold data is read is reduced, and the read amplification is reduced.
(4)本发明使用B+树保存热数据,能够避免键值数据执行更新操作时导致的多次持久化操作。传统LSM树的层间合并操作能够实现空间回收,但使得相同数据重复写入。使用冷热树之间的转换替代传统的层间合并操作,既能提高读性能又能减少数据写入量。(4) The present invention uses B+ trees to store hot data, which can avoid multiple persistence operations caused by updating key-value data. The inter-layer merge operation of the traditional LSM tree can achieve space recovery, but it causes the same data to be written repeatedly. Using the conversion between hot and cold trees instead of the traditional inter-layer merge operation can both improve the read performance and reduce the amount of data written.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1为基于多树转换机制的键值存储方法的系统结构图;FIG1 is a system structure diagram of a key-value storage method based on a multi-tree conversion mechanism;
图2为基于多树转换机制的键值存储方法的数据读取流程图;FIG2 is a data reading flow chart of a key-value storage method based on a multi-tree conversion mechanism;
图3为基于多树转换机制的键值存储方法的数据写入流程图。FIG3 is a data writing flow chart of the key-value storage method based on the multi-tree conversion mechanism.
具体实施方式DETAILED DESCRIPTION
以下通过具体实施方式对本发明作进一步的描述。The present invention is further described below through specific implementation modes.
如图1,为本发明实施例提供的基于多树转换机制的键值存储方法的系统结构图;应用的存储结构具体为:As shown in FIG1 , it is a system structure diagram of a key-value storage method based on a multi-tree conversion mechanism provided by an embodiment of the present invention; the storage structure of the application is specifically as follows:
写入跳表、只读跳表、B+树以及磁盘中的冷树结构与热树结构;其中所述冷树结构为3层日志结构合并树,包括0层日志结构、1层日志结构、2层日志结构,且0层日志结构存在分区结构;热树结构为单层日志结构合并树;Write skip list, read-only skip list, B+ tree and cold tree structure and hot tree structure in disk; wherein the cold tree structure is a 3-layer log structure merge tree, including 0-layer log structure, 1-layer log structure, 2-layer log structure, and the 0-layer log structure has a partition structure; the hot tree structure is a single-layer log structure merge tree;
存储方法具体包括:首先保存至写入跳表中,当大小达到限制后,转换为只读跳表插入至内存设备中的B+树;当B+树的大小达到一定限制时,根据热度策略遍历其中的键值数据进行判断;The storage method specifically includes: firstly saving to a write skip list, and when the size reaches a limit, converting to a read-only skip list and inserting it into a B+ tree in a memory device; when the size of the B+ tree reaches a certain limit, traversing the key value data therein for judgment according to the heat strategy;
将热度较低的键值数据持久化至外存设备中的冷树的0层,热度较高的键值数据则继续留存在B+树中;The key-value data with lower popularity is persisted to the 0th layer of the cold tree in the external storage device, and the key-value data with higher popularity continues to remain in the B+ tree;
若冷树的0层中的键值数据文件数量达到大小限制,则触发数据合并操作与0层分区操作;0层分区操作依据层中现有的键值数据范围进行均分;若0层中某一分区的键值数据文件数量达到大小限制,则再次触发数据合并操作与分区操作;If the number of key-value data files in layer 0 of the cold tree reaches the size limit, the data merge operation and layer 0 partition operation are triggered; the layer 0 partition operation is evenly divided according to the existing key-value data range in the layer; if the number of key-value data files in a partition in layer 0 reaches the size limit, the data merge operation and partition operation are triggered again;
当B+树中热度较低的键值数据执行持久化操作时,0层分区只接收符合自身范围的键值数据。如果冷树中某一范围内的键值数据经热度策略判断达到一定热度时,那么将这一范围内的所有键值数据转移至外存设备的热树中,并在内存设备中的B+树插入键与地址信息用于索引热树中的键值数据。When the key value data with lower heat in the B+ tree performs persistence operation, the 0-level partition only receives the key value data that matches its own range. If the key value data in a certain range in the cold tree reaches a certain heat according to the heat strategy, all the key value data in this range will be transferred to the hot tree of the external storage device, and the key and address information will be inserted into the B+ tree in the memory device to index the key value data in the hot tree.
当热树的大小达到一定限制时,对其中的键值数据根据热度策略进行遍历判断,将一个或多个低热度范围内的键值数据转移至冷树分区中,并从B+树中删除这些数据的索引;When the size of the hot tree reaches a certain limit, the key value data in it is traversed and judged according to the heat strategy, one or more key value data in the low heat range is transferred to the cold tree partition, and the index of these data is deleted from the B+ tree;
具体地,所述热度策略具体为:Specifically, the heat strategy is:
将只读跳表中的键值数据插入至B+树结构中时,保存插入时间字段,用于判断键值数据的热度情况;When inserting the key value data in the read-only skip list into the B+ tree structure, the insertion time field is saved to determine the popularity of the key value data;
根据节点中记录的时间字段判断该键值数据是否需要保留在B+树中,若不需保留,则持久化至外存设备的冷树分区内;Determine whether the key value data needs to be retained in the B+ tree based on the time field recorded in the node. If not, persist it to the cold tree partition of the external storage device.
对于外存设备中的键值数据,使用全局热度表来记录键值数据的操作数,当执行冷树中的合并操作时,若一个位于冷树内的范围单位的操作数达到阈值,则将此范围单位内的所有键值数据转移至热树中;For the key-value data in the external memory device, a global heat table is used to record the number of operations of the key-value data. When performing a merge operation in the cold tree, if the number of operations of a range unit in the cold tree reaches the threshold, all the key-value data in this range unit are transferred to the hot tree;
当热树达到一定大小限制时,使用全局热度表对热树内的键值数据执行热度判断,根据键值数据的范围热度情况,判断该范围内的键值数据是否需要保留热树中,若不需保留,则将热树中这些范围内的键值数据转移至冷树中。When the hot tree reaches a certain size limit, the global heat table is used to perform heat judgment on the key-value data in the hot tree. According to the range heat of the key-value data, it is judged whether the key-value data in the range needs to be retained in the hot tree. If not, the key-value data in these ranges in the hot tree are transferred to the cold tree.
如图2为基于多树转换机制的键值存储方法的数据读取流程图。FIG2 is a data reading flow chart of the key-value storage method based on the multi-tree conversion mechanism.
101、用户向系统发送的读取请求,到102。101. The user sends a read request to the system, to 102.
102、在缓存区中查找目标数据,若找到,获取目标数据,到103,否则到104。102. Search for target data in the cache area. If found, obtain the target data and go to 103. Otherwise, go to 104.
103、向用户返回查询结果。103. Return the query result to the user.
104、在B+树中查找目标数据,若未找到,到105,否则到106。104. Search for the target data in the B+ tree. If not found, go to 105; otherwise, go to 106.
105、在冷树中自顶向下逐层查找数据,若找到,获取目标数据,到103,否则,返回结果不存在,到103。105. Search for data layer by layer from top to bottom in the cold tree. If found, obtain the target data and go to 103. Otherwise, return the result that it does not exist and go to 103.
106、判断值类型是否为地址信息,若不是,获取目标数据,到103,否则到107。106. Determine whether the value type is address information. If not, obtain the target data and go to 103; otherwise, go to 107.
107、根据上步找到的地址信息,获取目标数据,到103。107. According to the address information found in the previous step, obtain the target data and go to 103.
如图3为基于多树转换机制的键值存储方法的数据写入流程图。FIG3 is a data writing flow chart of the key-value storage method based on the multi-tree conversion mechanism.
201、用户向系统发送的写入请求,到202。201. The user sends a write request to the system, to 202.
202、判断写入跳表是否已满,若未满,到211,否则到203。202. Determine whether the write jump table is full. If not, go to 211; otherwise, go to 203.
203、写入跳表转换为只读跳表,到204。203 , the write jump table is converted into a read-only jump table, go to 204 .
204、判断缓存区的大小是否已满,若未满,到205,否则到206。204. Determine whether the size of the cache area is full. If not, go to 205; otherwise, go to 206.
205、创建新的写入跳表,到202。205. Create a new write jump table and go to 202.
206、判断B+树的大小是否达到限制,若是,到207,否则到208。206. Determine whether the size of the B+ tree reaches the limit. If so, go to 207, otherwise go to 208.
207、判断冷树0层分区内文件数是否达到限制,若是,到209,否则到210。207. Determine whether the number of files in the cold tree 0-level partition has reached the limit. If so, go to 209; otherwise, go to 210.
208、将只读跳表插入至B+树中,到204。208. Insert the read-only skip list into the B+ tree, go to 204.
209、对冷树的0层执行合并操作与分区操作,到210。209. Perform a merge operation and a partition operation on layer 0 of the cold tree, and go to 210.
210、依据热度策略持久化B+树中低热度数据至冷树中,到206。210. According to the heat strategy, persist the low-heat data in the B+ tree to the cold tree, to 206.
211、写入数据到写入跳表中,到212。211. Write data into the write jump table, to 212.
212、向用户返回写入结果。212. Return the writing result to the user.
(1)本发明提出一种基于多树转换机制的键值存储方法,具体包括:对于写入的键值数据,首先保存至写入跳表中,当大小达到限制后,转换为只读跳表插入至内存设备中的B+树;当B+树的大小达到一定限制时,遍历键值数据;根据热度策略进行判断,将热度低的键值数据持久化至外存设备中的冷树的0层,热度高的键值数据则继续留存在B+树中;若冷树的0层中的键值数据文件数量达到大小限制,则触发0层分区操作,0层分区操作依据层中现有的键值数据范围进行均分,若0层中某一分区的键值数据文件数量达到大小限制,则再次触发分区操作;当B+树中热度较低的键值数据执行持久化操作时,0层分区只接收符合设定范围的键值数据;若冷树中范围内的键值数据经热度策略判断后达到一定热度时,则将这一范围内的键值数据转移至外存设备的热树中,并使用内存设备中的B+树索引热树中的键值数据;当热树的大小达到一定限制时,对键值数据根据热度策略进行判断,将一个或多个低热度范围内的键值数据转移至冷树分区中,并从B+树中删除这些数据的索引。本发明提供的方法使用热度策略减少读放大的同时,保证写入性能,实现键值存储系统性能的整体提升。(1) The present invention proposes a key-value storage method based on a multi-tree conversion mechanism, which specifically includes: for the written key-value data, first save it to the write jump list, and when the size reaches the limit, convert it into a read-only jump list and insert it into the B+ tree in the memory device; when the size of the B+ tree reaches a certain limit, traverse the key-value data; judge according to the heat strategy, persist the low-heat key-value data to the 0th layer of the cold tree in the external storage device, and continue to retain the high-heat key-value data in the B+ tree; if the number of key-value data files in the 0th layer of the cold tree reaches the size limit, trigger the 0th layer partition operation, and the 0th layer partition operation is divided equally according to the existing key-value data range in the layer, if the 0th layer When the number of key-value data files of a partition reaches the size limit, the partition operation is triggered again; when the key-value data with lower heat in the B+ tree performs a persistence operation, the 0-layer partition only receives the key-value data that meets the set range; if the key-value data within the range in the cold tree reaches a certain heat after being judged by the heat strategy, the key-value data within this range is transferred to the hot tree of the external storage device, and the B+ tree in the memory device is used to index the key-value data in the hot tree; when the size of the hot tree reaches a certain limit, the key-value data is judged according to the heat strategy, and one or more key-value data within the low heat range are transferred to the cold tree partition, and the index of these data is deleted from the B+ tree. The method provided by the present invention uses the heat strategy to reduce read amplification while ensuring write performance, thereby achieving an overall improvement in the performance of the key-value storage system.
(2)本发明一方面,通过热度策略决定持久化操作,将热数据暂留在内存中,避免从外存设备中读取造成额外IO开销;另一方面,使用全局热度判断策略,索引外存设备中频繁访问的键值数据,加速键值数据的读取过程,提高读取性能。(2) On the one hand, the present invention determines the persistence operation through the heat strategy, temporarily retains the hot data in the memory, and avoids the additional IO overhead caused by reading from the external storage device; on the other hand, a global heat judgment strategy is used to index the frequently accessed key-value data in the external storage device, accelerate the reading process of the key-value data, and improve the reading performance.
(3)本发明使用内存中的B+树保存热树中的地址信息,能够索引到热树中的所有有效数据,从而减少热数据的读取次数,并且B+树能够保证一定的范围查找性能。通过减少冷树的层级与0层分区的设计,减少冷数据的读取次数,减小了读放大。(3) The present invention uses the B+ tree in the memory to store the address information in the hot tree, which can index all valid data in the hot tree, thereby reducing the number of times the hot data is read, and the B+ tree can ensure a certain range search performance. By reducing the level of the cold tree and the design of the 0-level partition, the number of times the cold data is read is reduced, and the read amplification is reduced.
(4)本发明使用B+树保存热数据,能够避免键值数据执行更新操作时导致的多次持久化操作。传统LSM树的层间合并操作能够实现空间回收,但使得相同数据重复写入。使用冷热树之间的转换替代传统的层间合并操作,既能提高读性能又能减少数据写入量。(4) The present invention uses B+ trees to store hot data, which can avoid multiple persistence operations caused by updating key-value data. The inter-layer merge operation of the traditional LSM tree can achieve space recovery, but it causes the same data to be written repeatedly. Using the conversion between hot and cold trees instead of the traditional inter-layer merge operation can both improve the read performance and reduce the amount of data written.
上述仅为本发明的具体实施方式,但本发明的设计构思并不局限于此,凡利用此构思对本发明进行非实质性的改动,均应属于侵犯本发明保护范围的行为。The above is only a specific implementation of the present invention, but the design concept of the present invention is not limited to this. Any non-substantial changes to the present invention using this concept shall be deemed as an infringement of the protection scope of the present invention.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210711424.3ACN114996275B (en) | 2022-06-22 | 2022-06-22 | A key-value storage method based on multi-tree conversion mechanism |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210711424.3ACN114996275B (en) | 2022-06-22 | 2022-06-22 | A key-value storage method based on multi-tree conversion mechanism |
| Publication Number | Publication Date |
|---|---|
| CN114996275A CN114996275A (en) | 2022-09-02 |
| CN114996275Btrue CN114996275B (en) | 2024-08-16 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202210711424.3AActiveCN114996275B (en) | 2022-06-22 | 2022-06-22 | A key-value storage method based on multi-tree conversion mechanism |
| Country | Link |
|---|---|
| CN (1) | CN114996275B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116149550B (en)* | 2022-12-29 | 2024-08-02 | 北京趋动智能科技有限公司 | Data management method, device, storage medium and equipment |
| CN118092812B (en)* | 2024-04-18 | 2024-06-25 | 华侨大学 | Key-value storage and reading and writing method based on memory table index and iterator reduction mechanism |
| CN118626685B (en)* | 2024-08-09 | 2024-11-05 | 杭州新视窗信息技术有限公司 | Multi-layer data node storage indexing method and system |
| CN119961267B (en)* | 2025-04-08 | 2025-07-08 | 华侨大学 | Data processing method and device for Cassandra key-value storage system based on interval tree layering |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111026329A (en)* | 2019-11-18 | 2020-04-17 | 华中科技大学 | Key value storage system based on host management tile record disk and data processing method |
| CN112100293A (en)* | 2020-09-23 | 2020-12-18 | 腾讯科技(深圳)有限公司 | Data processing method, data access method, data processing device, data access device and computer equipment |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10795871B2 (en)* | 2016-09-26 | 2020-10-06 | Vmware, Inc. | Key-value stores implemented using fragmented log-structured merge trees |
| CN113760171A (en)* | 2020-06-01 | 2021-12-07 | 华为技术有限公司 | Metadata storage method and device |
| CN112486994B (en)* | 2020-11-30 | 2024-04-19 | 武汉大学 | Data quick reading method based on key value storage of log structure merging tree |
| CN113626431A (en)* | 2021-07-28 | 2021-11-09 | 浪潮云信息技术股份公司 | LSM tree-based key value separation storage method and system for delaying garbage recovery |
| CN113821171B (en)* | 2021-09-01 | 2024-06-11 | 上海沄熹科技有限公司 | Key value storage method based on hash table and LSM tree |
| CN114416646B (en)* | 2022-01-20 | 2025-09-23 | 上海妃鱼数字科技有限公司 | A data processing method and device for a hierarchical storage system |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111026329A (en)* | 2019-11-18 | 2020-04-17 | 华中科技大学 | Key value storage system based on host management tile record disk and data processing method |
| CN112100293A (en)* | 2020-09-23 | 2020-12-18 | 腾讯科技(深圳)有限公司 | Data processing method, data access method, data processing device, data access device and computer equipment |
| Publication number | Publication date |
|---|---|
| CN114996275A (en) | 2022-09-02 |
| Publication | Publication Date | Title |
|---|---|---|
| CN114996275B (en) | A key-value storage method based on multi-tree conversion mechanism | |
| CN113821171B (en) | Key value storage method based on hash table and LSM tree | |
| CN102646069B (en) | Method for prolonging service life of solid-state disk | |
| CN102981963B (en) | A kind of implementation method of flash translation layer (FTL) of solid-state disk | |
| CN107368436B (en) | Flash memory cold and hot data separated storage method combined with address mapping table | |
| CN105117415B (en) | A kind of SSD data-updating methods of optimization | |
| CN114969069B (en) | A heat-aware local update method for key-value storage systems | |
| CN110825748A (en) | High-performance and easily-expandable key value storage method utilizing differential index mechanism | |
| US10740251B2 (en) | Hybrid drive translation layer | |
| CN104166634A (en) | Management method of mapping table caches in solid-state disk system | |
| CN102364474A (en) | Metadata storage system and management method for cluster file system | |
| CN110347852A (en) | It is embedded in the file system and file management method of key assignments storage system extending transversely | |
| CN103902669B (en) | A kind of separate type file system based on different storage mediums | |
| CN114356877A (en) | A log structure merge tree hierarchical storage method and system based on persistent memory | |
| CN113553476A (en) | Key value storage method for reducing write pause by utilizing Hash | |
| CN113377690B (en) | Solid state disk processing method suitable for user requests of different sizes | |
| CN103677670A (en) | Method and device for reading data | |
| CN114741028B (en) | A persistent key-value storage method, device and system based on OCSSD | |
| CN118092812B (en) | Key-value storage and reading and writing method based on memory table index and iterator reduction mechanism | |
| CN104657461A (en) | File system metadata search caching method based on internal memory and SSD (Solid State Disk) collaboration | |
| CN114398007B (en) | LSM-tree-based caching optimization method for KV storage system read performance | |
| CN115203079A (en) | Method for writing data into solid state disk | |
| CN112732725A (en) | NVM (non volatile memory) hybrid memory-based adaptive prefix tree construction method, system and medium | |
| CN114691041B (en) | Key-value storage system, garbage collection method | |
| CN114415966A (en) | Method for constructing KV SSD storage engine |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |