Movatterモバイル変換


[0]ホーム

URL:


CN106294032A - The caching method of a kind of disk array single-deck recovery and system - Google Patents

The caching method of a kind of disk array single-deck recovery and system
Download PDF

Info

Publication number
CN106294032A
CN106294032ACN201610637551.8ACN201610637551ACN106294032ACN 106294032 ACN106294032 ACN 106294032ACN 201610637551 ACN201610637551 ACN 201610637551ACN 106294032 ACN106294032 ACN 106294032A
Authority
CN
China
Prior art keywords
data block
priority
caching
recovery
disk
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610637551.8A
Other languages
Chinese (zh)
Other versions
CN106294032B (en
Inventor
吴晨涛
过敏意
李璐雨
谭超
贾帅杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiao Tong University
Original Assignee
Shanghai Jiao Tong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiao Tong UniversityfiledCriticalShanghai Jiao Tong University
Priority to CN201610637551.8ApriorityCriticalpatent/CN106294032B/en
Publication of CN106294032ApublicationCriticalpatent/CN106294032A/en
Application grantedgrantedCritical
Publication of CN106294032BpublicationCriticalpatent/CN106294032B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种磁盘阵列单盘恢复的缓存方法及系统,该方法包括:根据数据块是否已经被应用读取到缓存中,为数据块定义不同的优先级;为每一个出错的数据块枚举出所有可行的恢复方法,针对所有可行的恢复方法,结合恢复方法之间的关系,通过迪杰斯特拉遍历算法进行遍历,选取出I/O最小的、符合要求的恢复方法;针对所选择的恢复方法,选取其中负载相对均衡的方法作为最终采用的磁盘阵列单盘恢复的纠删码恢复方法;根据纠删码恢复方法的读取次数为数据块定义优先级,采用在缓存中优先替换优先级低的数据块的缓存替换方法,通过本发明,解决了现有技术中I/O较大,速度较慢,缓存命中率低,影响使用效率的问题。

The invention discloses a caching method and system for single-disk recovery of a disk array. The method includes: defining different priorities for data blocks according to whether the data blocks have been read into the cache by an application; Enumerate all feasible recovery methods, for all feasible recovery methods, combined with the relationship between recovery methods, traverse through the Dijkstra traversal algorithm, and select the recovery method with the smallest I/O that meets the requirements; for For the selected recovery method, select the method with a relatively balanced load as the erasure code recovery method for the final recovery of a single disk in the disk array; define the priority for the data block according to the number of times the erasure code recovery method is read, and use it in the cache The invention solves the problem of large I/O, slow speed, low cache hit rate and affecting use efficiency in the prior art through the cache replacement method for preferentially replacing data blocks with low priority.

Description

Translated fromChinese
一种磁盘阵列单盘恢复的缓存方法及系统A caching method and system for disk array single disk recovery

技术领域technical field

本发明涉及计算机存储领域,特别是涉及一种磁盘阵列单盘恢复的缓存方法。The invention relates to the field of computer storage, in particular to a caching method for recovery of a disk array single disk.

背景技术Background technique

分布式存储系统,就是把数据分散的存储在多地区的多个设备上。传统的存储系统采用集中式的存储方案,将所有的数据存储在同一个存储服务器上,从而成为整个系统功能提升的瓶颈,也容易产生可靠性和安全性的问题。伴随着存储需求的提升,这种问题愈发突显。分布式存储系统利用可扩展的存储结构,将数据存储在不同的存储服务器上,提高了数据的可靠性、安全性,并且容易扩展。A distributed storage system is to store data dispersedly on multiple devices in multiple regions. The traditional storage system adopts a centralized storage solution and stores all data on the same storage server, which becomes the bottleneck of the improvement of the entire system function, and is also prone to reliability and security problems. With the increase in storage requirements, this problem has become more prominent. The distributed storage system uses a scalable storage structure to store data on different storage servers, which improves data reliability, security, and is easy to expand.

为实现分布式存储系统中数据的可靠性,系统需要从存储阵列中恢复出出错的数据。一般通过在编码时加入冗余校验码的方法来达到容错和纠错的目标,这种方法称为纠删码(Erasure Codes)。常用的纠删码有两种,一种是Reed-Solomon纠删码,一种是XOR编码。纠删码需要从k个块中生成包含n个块的纠删码,从而允许从n个块的子集中恢复k个块。磁盘阵列单盘恢复的情形,就是针对一个磁盘出错的情形进行快速的恢复。伴随着存储需求的增加,如何快速恢复出错的数据块、提高恢复过程中缓存的命中率已经成为人们关注的焦点。In order to realize the reliability of data in the distributed storage system, the system needs to recover the erroneous data from the storage array. Generally, the goal of error tolerance and error correction is achieved by adding redundant check codes during encoding. This method is called erasure codes (Erasure Codes). There are two commonly used erasure codes, one is Reed-Solomon erasure code, and the other is XOR code. Erasure codes need to generate an erasure code containing n blocks from k blocks, allowing recovery of k blocks from a subset of n blocks. The recovery of a single disk in a disk array is to quickly recover from a disk failure. With the increase of storage requirements, how to quickly recover the erroneous data blocks and improve the cache hit rate during the recovery process has become the focus of attention.

发明内容Contents of the invention

为克服上述现有技术存在的不足,本发明之目的在于提供一种磁盘阵列单盘恢复的缓存方法,其采用纠删码的磁盘阵列快速进行单盘恢复,以解决现有的恢复方法中I/O较大,速度较慢,缓存命中率低,影响使用效率的问题。In order to overcome the deficiencies in the above-mentioned prior art, the purpose of the present invention is to provide a cache method for disk array single-disk recovery, which uses erasure-coded disk arrays to quickly perform single-disk recovery to solve the problem of I/O in existing recovery methods. The /O is large, the speed is slow, and the cache hit rate is low, which affects the use efficiency.

为达上述及其它目的,本发明提出一种磁盘阵列单盘恢复的缓存方法,包括如下步骤:In order to achieve the above and other purposes, the present invention proposes a caching method for recovery of a single disk in a disk array, which includes the following steps:

步骤一,根据数据块是否已经被应用读取到缓存中,为数据块定义不同的优先级;Step 1, according to whether the data block has been read into the cache by the application, define different priorities for the data block;

步骤二,为每一个出错的数据块枚举出所有可行的恢复方法,针对所有可行的恢复方法,结合恢复方法之间的关系,通过迪杰斯特拉遍历算法进行遍历,选取出I/O最小的、符合要求的恢复方法;Step 2. Enumerate all feasible recovery methods for each erroneous data block. For all feasible recovery methods, combined with the relationship between recovery methods, traverse through the Dijkstra traversal algorithm and select the I/O Minimal, compliant recovery method;

步骤三,针对所选择的恢复方法,选取其中负载相对均衡的方法作为最终采用的磁盘阵列单盘恢复的纠删码恢复方法。Step 3, for the selected recovery methods, select the method with a relatively balanced load as the final erasure code recovery method for single-disk recovery of the disk array.

步骤四,根据纠删码恢复方法的读取次数为数据块定义优先级,采用在缓存中优先替换优先级低的数据块的缓存替换方法。Step 4: Define priorities for the data blocks according to the number of reading times of the erasure code recovery method, and adopt a cache replacement method that replaces data blocks with lower priority in the cache first.

进一步地,于步骤一中,为在缓存中的数据块赋予较高的优先级,为不在缓存中的数据块赋予较低的优先级。Further, in step 1, a higher priority is given to data blocks in the cache, and a lower priority is given to data blocks not in the cache.

进一步地,步骤二包括:Further, step two includes:

通过迪杰斯特拉方法对所有可行的恢复方法进行遍历,建立一个初节点表示遍历的最初状态;Traverse all feasible recovery methods through the Dijkstra method, and establish an initial node to represent the initial state of the traversal;

循环查询恢复一个出错的数据块可能需要的读取的数据块个数作为节点之间连线的权重;The number of read data blocks that may be required to recover an erroneous data block by loop query is used as the weight of the connection between nodes;

如果需要读取的数据块在缓存中,此数据块的读取将不参与权重的计算,如果需要读取的数据块不在缓存中,此数据块的读取将参与权重的计算;If the data block to be read is in the cache, the reading of this data block will not participate in the calculation of the weight; if the data block to be read is not in the cache, the reading of this data block will participate in the calculation of the weight;

选取出能够有效恢复所有出错的数据块又能将遍历的权重降到最低的方法。Select a method that can effectively recover all erroneous data blocks and minimize the weight of traversal.

进一步地,于步骤二中,在迪杰斯特拉算法遍历过程中,结合已经选取的有效方法特征,及时删除无效方法。Further, in step 2, during the traversal process of the Dijkstra algorithm, combined with the characteristics of the selected effective methods, the invalid methods are deleted in time.

进一步地,步骤三包括:Further, step three includes:

分别计算每一块磁盘上有效I/O的数量;Calculate the number of effective I/Os on each disk separately;

对阵列中每一块磁盘上的I/O进行对比,选取其中I/O最大的磁盘;Compare the I/O on each disk in the array, and select the disk with the largest I/O;

所有方法中I/O最大的磁盘上的I/O数目最小的方法被选作是恢复的最优方法。The method with the least number of I/Os on the disk with the most I/O among all the methods is selected as the optimal method for recovery.

进一步地,于步骤三中,针对选取的负载相对均衡的磁盘阵列单盘恢复情形下的最优方法,根据数据块被访问的次数设置在缓存中的优先级。Further, in step 3, for the selected optimal method in the case of recovering a single disk of a disk array with a relatively balanced load, the priority in the cache is set according to the number of times the data block is accessed.

进一步地,步骤四进一步包括:Further, step four further includes:

根据优先级将数据块放入缓存中不同的队列中去,同一个队列中优先级相同的数据块根据LRU算法进行排列,最近最久未被访问的数据块将优先被替换;Put the data blocks into different queues in the cache according to the priority, and the data blocks with the same priority in the same queue are arranged according to the LRU algorithm, and the data blocks that have not been accessed recently will be replaced first;

优先级高的队列中的数据块如被访问,则添加到优先级低一级的数据块当中,以此类推;If the data block in the queue with high priority is accessed, it will be added to the data block with a lower priority, and so on;

当缓存中空间不够时,最低优先级队列中的最近最久未被访问的数据块将被替换,新加入的数据块根据其优先级添加到相应的队列当中。When there is not enough space in the cache, the data block in the lowest priority queue that has not been accessed for the longest time will be replaced, and the newly added data block will be added to the corresponding queue according to its priority.

为达到上述目的,本发明还提供一种磁盘阵列单盘恢复的缓存系统,包括:In order to achieve the above object, the present invention also provides a caching system for disk array single disk recovery, including:

优先级定义模块,根据数据块是否已经被应用读取到缓存中,为数据块定义不同的优先级;The priority definition module defines different priorities for the data blocks according to whether the data blocks have been read into the cache by the application;

遍历模块,为每个出错的数据块枚举出所有可行的恢复方法,针对所有可行的恢复方法,结合恢复方法之间的关系,通过迪杰斯特拉遍历算法进行遍历,选取出I/O最小的、符合要求的恢复方法;The traversal module enumerates all feasible recovery methods for each erroneous data block. For all feasible recovery methods, combined with the relationship between recovery methods, it traverses through the Dijkstra traversal algorithm and selects the I/O Minimal, compliant recovery method;

恢复方法选取模块,针对所选择的恢复方法,选取其中负载相对均衡的方法作为最终采用的磁盘阵列单盘恢复的纠删码恢复方法;The recovery method selection module, for the selected recovery method, selects the method with relatively balanced load as the erasure code recovery method for the final disk array single disk recovery;

缓存替换模块,根据纠删码恢复方法的读取次数为数据块定义优先级,采用在缓存中优先替换优先级低的数据块的缓存替换方法。The cache replacement module defines priorities for the data blocks according to the number of read times of the erasure code recovery method, and adopts a cache replacement method that replaces data blocks with lower priority in the cache first.

进一步地,该缓存替换模块针对选取的负载相对均衡的磁盘阵列单盘恢复情形下的最优方法,根据数据块被访问的次数设置在缓存中的优先级。Further, the cache replacement module sets the priority in the cache according to the number of times the data blocks are accessed for the selected optimal method in the recovery situation of a disk array with a relatively balanced load.

进一步地,该缓存替换模块根据优先级将数据块放入缓存中不同的队列中去,同一个队列中优先级相同的数据块根据LRU算法进行排列,最近最久未被访问的数据块将优先被替换,优先级高的队列中的数据块如被访问,则添加到优先级低一级的数据块当中,以此类推;当缓存中空间不够时,最低优先级队列中的最近最久未被访问的数据块将被替换,新加入的数据块根据其优先级添加到相应的队列当中。Further, the cache replacement module puts the data blocks into different queues in the cache according to the priority, and the data blocks with the same priority in the same queue are arranged according to the LRU algorithm, and the data blocks that have not been accessed recently will be prioritized. Replacement, if the data block in the queue with high priority is accessed, it will be added to the data block with a lower priority, and so on; when there is not enough space in the cache, the data block in the queue with the lowest priority has not been accessed recently The data block will be replaced, and the newly added data block will be added to the corresponding queue according to its priority.

与现有技术相比,本发明一种磁盘阵列单盘恢复的缓存方法及系统提出了一种新的纠错码单盘出错恢复的情形,能够有效结合应用读取到缓存中的数据块制定相应的恢复策略,由于读取缓存中的数据块的速度比读取磁盘中的数据块在速度上有明显的改进,本发明能够有效减少纠错码单盘恢复情形下的恢复速度,同时保证应用访问的正常进行。在访问请求多、任务密集的存储服务器环境下能够有效提升服务速度和效率,提升分布式存储系统的可靠性和安全性。Compared with the prior art, the present invention provides a cache method and system for single disk recovery of a disk array, and proposes a new error correction code for single disk error recovery, which can effectively formulate data blocks read into the cache by the application. Corresponding recovery strategy, since the speed of reading the data blocks in the cache is significantly improved compared with the speed of reading the data blocks in the disk, the present invention can effectively reduce the recovery speed in the case of error correction code single disk recovery, while ensuring The normal operation of application access. In the storage server environment with many access requests and intensive tasks, it can effectively improve the service speed and efficiency, and improve the reliability and security of the distributed storage system.

附图说明Description of drawings

图1为本发明一种磁盘阵列单盘恢复的缓存方法的步骤流程图;Fig. 1 is a flow chart of the steps of a caching method for disk array single disk recovery of the present invention;

图2说明了本发明具体实施例中根据生成矩阵生成所有恢复指定数据块的方法;Fig. 2 has illustrated in the specific embodiment of the present invention and generates all the methods that restore specified data blocks according to generating matrix;

图3显示了本发明具体实施例中迪杰斯特拉遍历所有可能的恢复方法的过程;Fig. 3 shows Dijkstra's process of traversing all possible recovery methods in a specific embodiment of the present invention;

图4显示了本发明具体实施例中迪杰斯特拉遍历过程中,删除无效方法,提高计算速度的过程;Fig. 4 has shown the Dijkstra traversal process in the specific embodiment of the present invention, delete invalid method, the process that improves calculation speed;

图5说明了本发明具体实施例中根据遍历算法得到的恢复方法选择最优方法的过程;Fig. 5 illustrates the process of selecting the optimal method according to the recovery method obtained by the traversal algorithm in a specific embodiment of the present invention;

图6说明了本发明具体实施例中缓存中添加新数据块的方法;Fig. 6 illustrates the method for adding a new data block in the cache in a specific embodiment of the present invention;

图7说明了本发明具体实施例中数据块被访问后进行降级的过程;Fig. 7 illustrates the process of degrading data blocks after they are accessed in a specific embodiment of the present invention;

图8说明了本发明具体实施例中缓存中数据块替换选择的方法;Fig. 8 illustrates the method for data block replacement selection in the cache in the specific embodiment of the present invention;

图9为本发明一种磁盘阵列单盘恢复的缓存系统的系统架构图。FIG. 9 is a system architecture diagram of a caching system for disk array single disk recovery according to the present invention.

具体实施方式detailed description

以下通过特定的具体实例并结合附图说明本发明的实施方式,本领域技术人员可由本说明书所揭示的内容轻易地了解本发明的其它优点与功效。本发明亦可通过其它不同的具体实例加以施行或应用,本说明书中的各项细节亦可基于不同观点与应用,在不背离本发明的精神下进行各种修饰与变更。The implementation of the present invention is described below through specific examples and in conjunction with the accompanying drawings, and those skilled in the art can easily understand other advantages and effects of the present invention from the content disclosed in this specification. The present invention can also be implemented or applied through other different specific examples, and various modifications and changes can be made to the details in this specification based on different viewpoints and applications without departing from the spirit of the present invention.

图1为本发明一种磁盘阵列单盘恢复的缓存方法的步骤流程图。如图1所示,本发明一种磁盘阵列单盘恢复的缓存方法,包括如下步骤:FIG. 1 is a flow chart of the steps of a caching method for recovery of a single disk in a disk array according to the present invention. As shown in Figure 1, a kind of caching method of disk array single disk recovery of the present invention comprises the following steps:

步骤101,根据数据块是否已经被应用读取到缓存中,为数据块定义不同的优先级。具体地,为在缓存中的数据块赋予较高的优先级,为不在缓存中的数据块赋予较低的优先级。Step 101, according to whether the data block has been read into the cache by the application, define different priorities for the data block. Specifically, a higher priority is given to data blocks in the cache, and a lower priority is given to data blocks not in the cache.

也就是说,当系统检测到存储服务器出现错误时,首先读取在缓存中的数据块编号,在遍历时针对数据块是否在缓存中,优先地选择已经读取到缓存中的数据,次要地选择尚未被读取到缓存中的数据,进行恢复方法的综合考量That is to say, when the system detects an error in the storage server, it first reads the number of the data block in the cache, and when traversing whether the data block is in the cache, the data that has been read in the cache is preferentially selected, and the second Select the data that has not been read into the cache, and comprehensively consider the recovery method

步骤102,针对所有可能的恢复方法,结合恢复方法之间的关系,通过迪杰斯特拉遍历算法进行遍历,选取出I/O最小的、符合要求的恢复方法。较佳地,在迪杰斯特拉算法遍历过程中,结合已经选取的有效方法特征,及时删除无效方法,提高计算速度,避免无效遍历。Step 102, for all possible recovery methods, combining the relationship between the recovery methods, traversing through the Dijkstra traversal algorithm, and selecting the recovery method with the minimum I/O and meeting the requirements. Preferably, during the traversal process of the Dijkstra algorithm, combined with the characteristics of the selected effective methods, the invalid methods are deleted in time to improve the calculation speed and avoid invalid traversal.

具体地,步骤102进一步包括:Specifically, step 102 further includes:

通过迪杰斯特拉方法对所有可行的恢复方法进行遍历,建立一个初节点表示遍历的最初状态,即所有的数据块尚未被选择。循环查询恢复一个出错的数据块可能需要的读取的数据块个数作为节点之间连线的权重;Through the Dijkstra method to traverse all feasible recovery methods, an initial node is established to represent the initial state of the traversal, that is, all data blocks have not been selected yet. The number of read data blocks that may be required to recover an erroneous data block by loop query is used as the weight of the connection between nodes;

如果需要读取的数据块在缓存中,此数据块的读取将不参与权重的计算;If the data block that needs to be read is in the cache, the reading of this data block will not participate in the calculation of the weight;

相反地,如果需要读取的数据块不在缓存中,此数据块的读取将参与权重的计算。Conversely, if the data block that needs to be read is not in the cache, the read of this data block will participate in the calculation of the weight.

迪杰斯特拉方法遍历的目标是,选取出一种能够有效恢复所有出错的数据块又能将遍历的权重降到最低的方法。这里的权重与实现中的I/O成正比。The goal of the Dijkstra method traversal is to select a method that can effectively recover all erroneous data blocks and minimize the weight of traversal. The weight here is proportional to the I/O in the implementation.

进一步的,本发明对迪杰斯特拉方法遍历的方法进行了改进。针对已经遍历得到的数据,比如最小权重,剩余没有进行遍历的部分需要根据权重进行遍历的考量。对于没有可能得到更小权重的的遍历部分进行及时的舍弃,包括:Further, the present invention improves the traversal method of the Dijkstra method. For the data that has been traversed, such as the minimum weight, the remaining parts that have not been traversed need to be considered according to the weight. Timely discard the traversal part that is not likely to get a smaller weight, including:

能够恢复所有出错数据块的方法,如果超出的最小权重,这个方法被视作无效的方法;A method capable of recovering all erroneous data blocks, if the minimum weight is exceeded, this method is considered invalid;

尚未能恢复所有出错数据块的方法,如果其权重已经超出了已知的最小权重,那么这个方法以及根据这个方法衍生的能够恢复所有出错数据块的方法,都被视作无效的方法A method that has not been able to recover all erroneous data blocks, if its weight has exceeded the known minimum weight, then this method and the method derived from this method that can recover all erroneous data blocks are considered invalid methods

步骤103,针对所选择的恢复方法,选取其中负载相对均衡的方法作为最终采用的磁盘阵列单盘恢复的纠删码恢复方法。Step 103 , for the selected recovery method, select the method with a relatively balanced load as the finally adopted erasure code recovery method for single-disk recovery of the disk array.

具体地,步骤103进一步包括:Specifically, step 103 further includes:

分别计算每一块磁盘上有效I/O的数量,即需要读取的不在缓存中的数据块的数量;Calculate the number of effective I/Os on each disk separately, that is, the number of data blocks that need to be read that are not in the cache;

对阵列中每一块磁盘上的I/O进行对比,选取其中I/O最大的磁盘;Compare the I/O on each disk in the array, and select the disk with the largest I/O;

所有方法中I/O最大的磁盘上的I/O数目最小的方法被选作是恢复的最优方法。The method with the least number of I/Os on the disk with the most I/O among all the methods is selected as the optimal method for recovery.

步骤四,根据纠删码恢复方法的读取次数为数据块定义优先级,采用在缓存中优先替换优先级低的数据块的缓存替换方法。Step 4: Define priorities for the data blocks according to the number of reading times of the erasure code recovery method, and adopt a cache replacement method that replaces data blocks with lower priority in the cache first.

进一步的,针对选取的负载相对均衡的磁盘阵列单盘恢复情形下的最优方法,根据数据块被访问的次数设置在缓存中的优先级,以三盘容错的磁盘阵列为例,访问次数为三的数据块具有最高的优先级,其次是访问次数为二的数据块,再次是访问次数为一的数据块。具体的:Further, for the optimal method in the case of single-disk recovery of a disk array with a relatively balanced load, the priority in the cache is set according to the number of times data blocks are accessed. Taking a three-disk fault-tolerant disk array as an example, the number of accesses is A data block of three has the highest priority, followed by a data block with a visit count of two, and then a data block with a visit count of one. specific:

根据优先级将数据块放入缓存中不同的队列中去,同一个队列中优先级相同的数据块根据LRU算法进行排列,最近最久未被访问的数据块将优先被替换;Put the data blocks into different queues in the cache according to the priority, and the data blocks with the same priority in the same queue are arranged according to the LRU algorithm, and the data blocks that have not been accessed recently will be replaced first;

优先级高的队列中的数据块如被访问,则添加到优先级低一级的数据块当中,以此类推;If the data block in the queue with high priority is accessed, it will be added to the data block with a lower priority, and so on;

当缓存中空间不够时,最低优先级队列中的最近最久未被访问的数据块将被替换,新加入的数据块根据其优先级添加到相应的队列当中。When there is not enough space in the cache, the data block in the lowest priority queue that has not been accessed for the longest time will be replaced, and the newly added data block will be added to the corresponding queue according to its priority.

以下将配合图2-图8以一个纠错码的单盘出错情况下的恢复过程为例,详细描述本发明的过程:In the following, the process of the present invention will be described in detail by taking the recovery process in the case of a single disk error of an error correction code as an example in conjunction with Figures 2-8:

步骤S1:缓存数据块获取与优先级定义;为恢复前的数据块定义不同的优先级;根据数据块是否已经被应用读取到缓存中,为数据块定义不同的优先级,已经在缓存中的数据块赋予较高的优先级,被优先选择,没有读取到缓存中的数据块赋予较低的优先级,不被优先选择。默认已经出错的数据块不会被预读进缓存中,如果应用对出错的数据块发出了请求,则需要对被请求的出错数据块进行恢复之后再进行请求处理。Step S1: Cache data block acquisition and priority definition; define different priorities for data blocks before recovery; define different priorities for data blocks according to whether the data blocks have been read into the cache by the application, and are already in the cache The data blocks that are given higher priority are given priority, and the data blocks that have not been read into the cache are given lower priority and are not preferred. By default, erroneous data blocks will not be pre-read into the cache. If an application sends a request for an erroneous data block, it needs to restore the requested erroneous data block before processing the request.

步骤S2:数据块恢复方法枚举。通过生成矩阵为每一个出错的数据块枚举出所有的恢复等式列表;图2作为数据块恢复方法枚举的一个实例,用以说明利用生成矩阵进行枚举的过程。假设磁盘阵列中的D0和D1出现了错误,为恢复D0,纵向搜索D0所对应的第一列,可以发现R4与R6参与了D0的计算,因此可用于恢复数据块D0。同理R5、R7可以用于恢复数据块D1。Step S2: Enumerate the data block recovery methods. Enumerate all recovery equation lists for each erroneous data block through the generating matrix; Figure 2 is an example of enumerating the data block recovery method to illustrate the enumeration process using the generating matrix. Assuming that D0 and D1 in the disk array have errors, in order to restore D0, search the first column corresponding to D0 vertically, and find that R4 and R6 participate in the calculation of D0, so they can be used to restore data block D0. Similarly, R5 and R7 can be used to restore the data block D1.

步骤S3:恢复方法遍历。将磁盘阵列中的每一个数据块用布尔值0,1表示。0表示恢复过程中不需要用到该数据块,1表示恢复过程中需要用到该数据块。设由数据块数量个的布尔值组成遍历图中的一个节点,起始节点为全零,表示此时没有数据块被用到,亦没有数据块能够被恢复。每增加一层节点即增加一块能够被恢复的数据块。图3中起始节点为全0,为恢复数据块D0枚举4种可行的恢复方法。从中选择路径最短的一个节点,进一步恢复D1。当所有的数据块都能够被有效的恢复时,选取其中距离最短的路径标识为最短路径。在接下来的遍历中,凡超出此路径长度的节点都被视为无效节点,不参与后续的遍历查找。图3中恢复D0而产生的路径长度为4的节点,因为其总长度必然超出了已知的最短路径,故被视为无效节点,其后的所有节点不参与遍历运算。确认路径长度的步骤包括:Step S3: resume method traversal. Each data block in the disk array is represented by a Boolean value 0, 1. 0 means that the data block does not need to be used in the recovery process, and 1 means that the data block needs to be used in the recovery process. Assume that a node in the traversal graph is composed of Boolean values of the number of data blocks, and the starting node is all zeros, which means that no data block is used at this time, and no data block can be recovered. Each additional layer of nodes adds a data block that can be recovered. In Figure 3, the starting node is all 0s, and four feasible recovery methods are enumerated for recovering data block D0. Select a node with the shortest path to further restore D1. When all data blocks can be recovered effectively, the path with the shortest distance is selected as the shortest path. In the next traversal, any node exceeding this path length will be regarded as an invalid node and will not participate in the subsequent traversal search. In Figure 3, the node with a path length of 4 generated by restoring D0 is considered an invalid node because its total length must exceed the known shortest path, and all subsequent nodes do not participate in the traversal operation. The steps to confirm path length include:

设置初始路径长度为零;Set the initial path length to zero;

标记两个节点之间增加的数据块;如果增加的数据块在缓存中,则路径长度保持不变;如果增加的数据块不在缓存中,则路径长度加一;Mark the increased data block between two nodes; if the increased data block is in the cache, the path length remains unchanged; if the increased data block is not in the cache, the path length is increased by one;

默认增加的数据块为可读的未出现错误的数据块;如果增加的数据块出现了错误,新增的节点视为无效节点。By default, the added data block is a readable data block without errors; if an error occurs in the added data block, the newly added node is regarded as an invalid node.

步骤S4:恢复方法筛选。从经过迪杰斯特拉算法遍历得到的具有相同最短路径的恢复方法中选取负载相对均衡的方法。筛选标准为使负载最大磁盘上的负载最小。图5中第一组和第三组除出错磁盘外,每个盘上只读取了一个数据块,负载相对均衡。图5中第二组和第四组除出错磁盘外,两个盘读取了两块数据块,一个盘没有读取数据块,相对于第一、三组而言,负载不均衡。因此最终的方法应当在第一组和第三组之间任选其一。第一组和第三组恢复方法具有对称性,没有优劣之分。Step S4: resume method screening. Select the method with relatively balanced load from the recovery methods with the same shortest path obtained through Dijkstra algorithm traversal. The filter criterion is to minimize the load on the most loaded disk. In the first group and the third group in Figure 5, except for the faulty disk, only one data block is read on each disk, and the load is relatively balanced. In the second group and the fourth group in Figure 5, except for the faulty disk, two disks read two data blocks, and one disk did not read a data block. Compared with the first and third groups, the load is unbalanced. Therefore the final method should choose one between the first group and the third group. The recovery methods of the first group and the third group are symmetrical, and there is no distinction between good and bad.

步骤S5:缓存替换策略。根据迪杰斯特拉算法遍历得到的最优方法,根据恢复过程中数据块被访问的次数设立不同的优先级;以采用了三盘容错的纠删码的磁盘阵列为例,访问次数为三的数据块具有最高的优先级,在缓存中置于优先级最高的队列P1;访问次数为二的数据块优先级次之,在缓存中置于优先级仅次于P1的队列P2;访问次数为一的数据块优先级再次之,在缓存中置于优先级最低的队列P3;队列内部的数据块根据LRU进行排列,通过指针相互连接,便于数据块关系的排列更新。Step S5: cache replacement policy. According to the optimal method obtained by Dijkstra's algorithm traversal, different priorities are set according to the number of times data blocks are accessed during the recovery process; taking a disk array with a three-disk fault-tolerant erasure code as an example, the number of accesses is three The data block has the highest priority and is placed in the queue P1 with the highest priority in the cache; the data block with the number of accesses is second in priority, and is placed in the queue P2 with a priority second only to P1 in the cache; the number of accesses The priority of the data block is 1, and it is placed in the queue P3 with the lowest priority in the cache; the data blocks inside the queue are arranged according to the LRU, and are connected to each other through pointers, which is convenient for the arrangement and update of the data block relationship.

缓存的替换策略分为两种情况,具体为:当需要读取的盘在缓存内部时,如图7所示找到需要读取的盘,读取后将其降级到下一优先级队列的首位,如读取的盘已经在最低优先级的队列当中,则读取后根据LRU规则将其置于队列首位,并不做降级处理;当需要读取的盘不在缓存内部时,需要将磁盘中的数据块读取到缓存中,并插入相应的队列当中,具体为:The replacement strategy of the cache is divided into two situations, specifically: when the disk to be read is in the cache, find the disk to be read as shown in Figure 7, and demote it to the first place in the next priority queue after reading , if the disk to be read is already in the queue with the lowest priority, it will be placed at the top of the queue according to the LRU rule after reading, and will not be downgraded; when the disk to be read is not in the cache, it needs to be read The data block is read into the cache and inserted into the corresponding queue, specifically:

如果缓存中有空余的数据块空间,则读取磁盘数据块到相应的缓存数据块,根据数据块的优先级,插入相应的队列中;If there is free data block space in the cache, read the disk data block to the corresponding cache data block, and insert it into the corresponding queue according to the priority of the data block;

如果缓存中没有空余的数据块空间,则将最低优先级队列的末端数据块替换为需要读取的数据块,并将其插入相应的队列中。If there is no free data block space in the cache, replace the end data block of the lowest priority queue with the data block that needs to be read, and insert it into the corresponding queue.

图9为本发明一种磁盘阵列单盘恢复的缓存系统的系统架构图。如图9所示,本发明一种磁盘阵列单盘恢复的缓存系统,包括:优先级定义模块901、遍历模块902、恢复方法选取模块903、缓存替换模块904。FIG. 9 is a system architecture diagram of a caching system for disk array single disk recovery according to the present invention. As shown in FIG. 9 , a caching system for disk array single disk recovery in the present invention includes: a priority definition module 901 , a traversal module 902 , a recovery method selection module 903 , and a cache replacement module 904 .

优先级定义模块901,根据数据块是否已经被应用读取到缓存中,为数据块定义不同的优先级。具体地,为在缓存中的数据块赋予较高的优先级,为不在缓存中的数据块赋予较低的优先级。也就是说,当系统检测到存储服务器出现错误时,首先读取在缓存中的数据块编号,在遍历时针对数据块是否在缓存中,优先地选择已经读取到缓存中的数据,次要地选择尚未被读取到缓存中的数据,进行恢复方法的综合考量The priority definition module 901 defines different priorities for the data blocks according to whether the data blocks have been read into the cache by the application. Specifically, a higher priority is given to data blocks in the cache, and a lower priority is given to data blocks not in the cache. That is to say, when the system detects an error in the storage server, it first reads the number of the data block in the cache, and when traversing whether the data block is in the cache, the data that has been read in the cache is preferentially selected, and the second Select the data that has not been read into the cache, and comprehensively consider the recovery method

遍历模块902,为每个出错的数据块枚举出所有可能的恢复方法,针对所有可能的恢复方法,结合恢复方法之间的关系,通过迪杰斯特拉遍历算法进行遍历,选取出I/O最小的、符合要求的恢复方法。较佳地,在迪杰斯特拉算法遍历过程中,结合已经选取的有效方法特征,及时删除无效方法,提高计算速度,避免无效遍历。具体地说,遍历模块902通过迪杰斯特拉方法对所有可行的恢复方法进行遍历,建立一个初节点表示遍历的最初状态,即所有的数据块尚未被选择。循环查询恢复一个出错的数据块可能需要的读取的数据块个数作为节点之间连线的权重,如果需要读取的数据块在缓存中,此数据块的读取将不参与权重的计算;相反地,如果需要读取的数据块不在缓存中,此数据块的读取将参与权重的计算。迪杰斯特拉方法遍历的目标是,选取出一种能够有效恢复所有出错的数据块又能将遍历的权重降到最低的方法。这里的权重与实现中的I/O成正比。较佳地,本发明中的遍历模块对迪杰斯特拉方法遍历的方法进行了改进。针对已经遍历得到的数据,比如最小权重,剩余没有进行遍历的部分需要根据权重进行遍历的考量,对于没有可能得到更小权重的的遍历部分进行及时的舍弃,包括:The traversal module 902 enumerates all possible recovery methods for each erroneous data block, and for all possible recovery methods, in combination with the relationship between the recovery methods, traverses through the Dijkstra traversal algorithm, and selects the I/ O Minimal, compliant recovery method. Preferably, during the traversal process of the Dijkstra algorithm, combined with the characteristics of the selected effective methods, the invalid methods are deleted in time to improve the calculation speed and avoid invalid traversal. Specifically, the traversal module 902 traverses all feasible recovery methods through the Dijkstra method, and establishes an initial node to represent the initial state of the traversal, that is, all data blocks have not been selected yet. The number of read data blocks that may be required to recover an erroneous data block in a loop query is used as the weight of the connection between nodes. If the data block to be read is in the cache, the read of this data block will not participate in the calculation of the weight ;Conversely, if the data block to be read is not in the cache, the read of this data block will participate in the calculation of the weight. The goal of the Dijkstra method traversal is to select a method that can effectively recover all erroneous data blocks and minimize the weight of traversal. The weight here is proportional to the I/O in the implementation. Preferably, the traversal module in the present invention improves the Dijkstra method traversal method. For the data that has been traversed, such as the minimum weight, the remaining parts that have not been traversed need to be traversed according to the weight, and the traversed part that is not likely to obtain a smaller weight is discarded in time, including:

能够恢复所有出错数据块的方法,如果超出的最小权重,这个方法被视作无效的方法;A method capable of recovering all erroneous data blocks, if the minimum weight is exceeded, this method is considered invalid;

尚未能恢复所有出错数据块的方法,如果其权重已经超出了已知的最小权重,那么这个方法以及根据这个方法衍生的能够恢复所有出错数据块的方法,都被视作无效的方法A method that has not been able to recover all erroneous data blocks, if its weight has exceeded the known minimum weight, then this method and the method derived from this method that can recover all erroneous data blocks are considered invalid methods

恢复方法选取模块903,针对所选择的恢复方法,选取其中负载相对均衡的方法作为最终采用的磁盘阵列单盘恢复的纠删码恢复方法。The recovery method selection module 903, for the selected recovery method, selects the method with a relatively balanced load as the erasure code recovery method for the final recovery of a single disk of the disk array.

具体地,恢复方法选取模块903进一步包括:Specifically, the recovery method selection module 903 further includes:

分别计算每一块磁盘上有效I/O的数量,即需要读取的不在缓存中的数据块的数量;Calculate the number of effective I/Os on each disk separately, that is, the number of data blocks that need to be read that are not in the cache;

对阵列中每一块磁盘上的I/O进行对比,选取其中I/O最大的磁盘;Compare the I/O on each disk in the array, and select the disk with the largest I/O;

所有方法中I/O最大的磁盘上的I/O数目最小的方法被选作是恢复的最优方法。The method with the least number of I/Os on the disk with the most I/O among all the methods is selected as the optimal method for recovery.

缓存替换模块904,根据纠删码恢复方法的读取次数为数据块定义优先级,采用在缓存中优先替换优先级低的数据块的缓存替换方法。The cache replacement module 904 defines priorities for the data blocks according to the reading times of the erasure code recovery method, and adopts a cache replacement method that preferentially replaces data blocks with lower priority in the cache.

缓存替换模块904针对选取的负载相对均衡的磁盘阵列单盘恢复情形下的最优方法,根据数据块被访问的次数设置在缓存中的优先级。以三盘容错的磁盘阵列为例,访问次数为三的数据块具有最高的优先级,其次是访问次数为二的数据块,再次是访问次数为一的数据块。缓存替换模块的替换策略具体如下:The cache replacement module 904 sets the priority in the cache according to the number of times the data blocks are accessed for the selected optimal method in the case of disk array single-disk recovery with relatively balanced load. Taking a three-disk fault-tolerant disk array as an example, the data block with the access count of three has the highest priority, followed by the data block with the access count of two, and the data block with the access count of one. The replacement strategy of the cache replacement module is as follows:

根据优先级将数据块放入缓存中不同的队列中去,同一个队列中优先级相同的数据块根据LRU算法进行排列,最近最久未被访问的数据块将优先被替换;Put the data blocks into different queues in the cache according to the priority, and the data blocks with the same priority in the same queue are arranged according to the LRU algorithm, and the data blocks that have not been accessed recently will be replaced first;

优先级高的队列中的数据块如被访问,则添加到优先级低一级的数据块当中,以此类推;If the data block in the queue with high priority is accessed, it will be added to the data block with a lower priority, and so on;

当缓存中空间不够时,最低优先级队列中的最近最久未被访问的数据块将被替换,新加入的数据块根据其优先级添加到相应的队列当中。When there is not enough space in the cache, the data block in the lowest priority queue that has not been accessed for the longest time will be replaced, and the newly added data block will be added to the corresponding queue according to its priority.

综上所述,本发明一种磁盘阵列单盘恢复的缓存方法及系统提出了一种新的纠错码单盘出错恢复的情形,其能够有效结合应用读取到缓存中的数据块制定相应的恢复策略,由于读取缓存中的数据块的速度比读取磁盘中的数据块在速度上有明显的改进,本发明能够有效减少纠错码单盘恢复情形下的恢复速度,同时保证应用访问的正常进行。在访问请求多、任务密集的存储服务器环境下能够有效提升服务速度和效率,提升分布式存储系统的可靠性和安全性。To sum up, the present invention provides a cache method and system for disk array single-disk recovery, which proposes a new error correction code for single-disk error recovery, which can effectively formulate corresponding data blocks read into the cache by the application. recovery strategy, since the speed of reading the data blocks in the cache is significantly improved compared with the speed of reading the data blocks in the disk, the present invention can effectively reduce the recovery speed in the case of error correction code single disk recovery, and at the same time ensure that the application The visit proceeds normally. In the storage server environment with many access requests and intensive tasks, it can effectively improve the service speed and efficiency, and improve the reliability and security of the distributed storage system.

上述实施例仅例示性说明本发明的原理及其功效,而非用于限制本发明。任何本领域技术人员均可在不违背本发明的精神及范畴下,对上述实施例进行修饰与改变。因此,本发明的权利保护范围,应如权利要求书所列。The above-mentioned embodiments only illustrate the principles and effects of the present invention, but are not intended to limit the present invention. Any person skilled in the art can modify and change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Therefore, the protection scope of the present invention should be listed in the claims.

Claims (10)

The caching system that a kind of disk array single-deck the most as claimed in claim 9 recovers, it is characterised in that: this caching is replacedData block is put in queues different in caching according to priority by module, the data block that same queue medium priority is identicalArranging according to lru algorithm, not being accessed for data block the most at most will preferentially be replaced, the number in the queue that priority is highAccording to block as accessed, then add in the middle of the data block of the low one-level of priority, by that analogy;When in caching, space is inadequate,Not being accessed for data block the most at most and will be replaced in Low Priority Queuing, the data block being newly added adds according to its priorityIt is added in the middle of corresponding queue.
CN201610637551.8A2016-08-052016-08-05A kind of caching method and system of the recovery of disk array single-deckActiveCN106294032B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201610637551.8ACN106294032B (en)2016-08-052016-08-05A kind of caching method and system of the recovery of disk array single-deck

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201610637551.8ACN106294032B (en)2016-08-052016-08-05A kind of caching method and system of the recovery of disk array single-deck

Publications (2)

Publication NumberPublication Date
CN106294032Atrue CN106294032A (en)2017-01-04
CN106294032B CN106294032B (en)2019-06-28

Family

ID=57665495

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201610637551.8AActiveCN106294032B (en)2016-08-052016-08-05A kind of caching method and system of the recovery of disk array single-deck

Country Status (1)

CountryLink
CN (1)CN106294032B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114968663A (en)*2022-05-202022-08-30成都易我科技开发有限责任公司 Database content recovery method and system
CN118312095A (en)*2024-03-272024-07-09北京邮电大学Distributed cluster disk reconstruction performance optimization method based on bandwidth perception
CN118535375A (en)*2024-03-042024-08-23北京邮电大学 A disk reconstruction performance optimization method based on erasure coding

Citations (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN101135955A (en)*2007-10-192008-03-05杭州华三通信技术有限公司Method for realizing magnetic disc redundant array rebuilding and magnetic disc redundant array
US20120079317A1 (en)*2010-09-272012-03-29Chandrashekar NelogalSystem and method for information handling system redundant storage rebuild
CN103744750A (en)*2014-01-132014-04-23杭州华为数字技术有限公司Method and device for recovering data
CN104035830A (en)*2014-06-242014-09-10浙江宇视科技有限公司Method and device for recovering data
CN104536698A (en)*2014-12-102015-04-22华为技术有限公司Disk reconfiguration method based on RAID and related apparatus
CN104866433A (en)*2015-05-312015-08-26上海交通大学Multi-level caching method based on historical information
CN105045894A (en)*2015-07-312015-11-11中国科学院计算技术研究所Cache method and system oriented to distributed sequence list
WO2016048314A1 (en)*2014-09-242016-03-31Hewlett Packard Enterprise Development LpBlock priority information

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN101135955A (en)*2007-10-192008-03-05杭州华三通信技术有限公司Method for realizing magnetic disc redundant array rebuilding and magnetic disc redundant array
US20120079317A1 (en)*2010-09-272012-03-29Chandrashekar NelogalSystem and method for information handling system redundant storage rebuild
CN103744750A (en)*2014-01-132014-04-23杭州华为数字技术有限公司Method and device for recovering data
CN104035830A (en)*2014-06-242014-09-10浙江宇视科技有限公司Method and device for recovering data
WO2016048314A1 (en)*2014-09-242016-03-31Hewlett Packard Enterprise Development LpBlock priority information
CN104536698A (en)*2014-12-102015-04-22华为技术有限公司Disk reconfiguration method based on RAID and related apparatus
CN104866433A (en)*2015-05-312015-08-26上海交通大学Multi-level caching method based on historical information
CN105045894A (en)*2015-07-312015-11-11中国科学院计算技术研究所Cache method and system oriented to distributed sequence list

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ROBERT Y.HOU: "Balancing I/O Response Time and Disk Rebuild Time in a RAID5 Disk Array", 《PROCEEDINGS OF THE TWENTY-SIXTH HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES》*
毛波 等: "一种提高磁盘阵列重建效率的缓存替换方法", 《华中科技大学学报(自然科学版)》*

Cited By (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114968663A (en)*2022-05-202022-08-30成都易我科技开发有限责任公司 Database content recovery method and system
CN114968663B (en)*2022-05-202025-05-30成都易我科技开发有限责任公司 A method and system for restoring database content
CN118535375A (en)*2024-03-042024-08-23北京邮电大学 A disk reconstruction performance optimization method based on erasure coding
CN118312095A (en)*2024-03-272024-07-09北京邮电大学Distributed cluster disk reconstruction performance optimization method based on bandwidth perception

Also Published As

Publication numberPublication date
CN106294032B (en)2019-06-28

Similar Documents

PublicationPublication DateTitle
US10169383B2 (en)Method and system for scrubbing data within a data storage subsystem
US10884940B2 (en)Method and apparatus for using compression to improve performance of low voltage caches
US10599519B2 (en)Coordinating node modification of an index structure via delegates
CN110262922B (en)Erasure code updating method and system based on duplicate data log
US9851911B1 (en)Dynamic distribution of replicated data
US8627004B2 (en)Extent migration for tiered storage architecture
Yue et al.Building an efficient put-intensive key-value store with skip-tree
WO2017050014A1 (en)Data storage processing method and device
US10754735B2 (en)Distributed storage reservation for recovering distributed data
CN110427156B (en)Partition-based MBR (Membrane biological reactor) parallel reading method
US10922012B1 (en)Fair data scrubbing in a data storage system
CN106294032B (en)A kind of caching method and system of the recovery of disk array single-deck
US20190129647A1 (en)Method, device and computer program product for managing disk array
US10402423B2 (en)Sliding windows for batching index updates
US11381400B2 (en)Using double hashing schema to reduce short hash handle collisions and improve memory allocation in content-addressable storage systems
US9891992B2 (en)Information processing apparatus, information processing method, storage system and non-transitory computer readable storage media
CN106547484A (en)It is a kind of that internal storage data reliability method and system realized based on RAID5
US11797379B2 (en)Error detection and data recovery for distributed cache
CN109960588A (en) A read request scheduling method and system for heterogeneous memory clusters
JP2019128959A (en)Key-value data reliability system, data storage method therefor, and non-transitory computer-readable medium including computer code implementing that method
US9396077B2 (en)Redundant location address mapper
Ahn et al.Dynamic erasure coding decision for modern block-oriented distributed storage systems
Zhang et al.On the implementation of BRS codes in Ceph
JP2015194942A (en)Cache device, storage apparatus, cache control method and storage control program
CN107508850A (en) A lock-step distribution method based on tree network and data partitioning in big data environment

Legal Events

DateCodeTitleDescription
C06Publication
PB01Publication
C10Entry into substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp