CN104484252B

Movatterモバイル変換

Info

Publication number: CN104484252B
Application number: CN201410829093.9A
Authority: CN
Inventors: 徐明军; 霍杰
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2014-12-26
Filing date: 2014-12-26
Publication date: 2017-11-24
Anticipated expiration: 2034-12-26
Also published as: CN104484252A

Abstract

The embodiment of the present invention provides a kind of standby electro-detection method of solid state hard disc,Device and system,First by obtaining the configuration information of N number of RAID groups in storage system and the state of N number of RAID groups,The configuration information of each RAID groups in N number of RAID groups includes the SSD quantity and redundancy of the RAID groups,The state of each RAID groups includes normal condition,One kind in degrading state and failure state,Secondly the M RAID group that state is normal condition is determined in N number of RAID groups,Then the configuration information further according to M RAID group controls M RAID group to carry out discharge test simultaneously,When each RAID groups in wherein M RAID group carry out discharge test,The quantity for carrying out the SSD of discharge test in the RAID groups simultaneously is less than or equal to the redundancy of the RAID groups.In the case where ensureing performance and Information Security SSD can be realized for electro-detection.

Description

Translated fromChinese

固态硬盘的备电检测方法、装置和系统Backup power detection method, device and system for solid-state hard drives

技术领域technical field

本发明实施例涉及存储技术领域，尤其涉及一种固态硬盘的备电检测方法、装置和系统。The embodiments of the present invention relate to the field of storage technologies, and in particular, to a method, device and system for detecting backup power of a solid-state hard disk.

背景技术Background technique

CPU、内存、硬盘作为计算机系统中的三大件，CPU和内存处理性能随着技术发展迅速提高，而传统的机械硬盘由于其机械部分的限制，性能提高缓慢，成为了系统性能的瓶颈。外置存储系统也面临同样的问题，使用机械硬盘的存储系统虽然可以提供较高的性能，但是对于性能要求较高的应用，仍然无法满足要求。而固态硬盘(英文：Solid StateDrives，简称：SSD)作为新兴的存储介质，由于其性能高的特点逐步取代了传统机械硬盘成为了高性能应用的首选。CPU, memory, and hard disk are the three major components of a computer system. The processing performance of CPU and memory has been rapidly improved with the development of technology. However, due to the limitations of its mechanical parts, the traditional mechanical hard disk has slow performance improvement and has become a bottleneck in system performance. External storage systems also face the same problem. Although storage systems using mechanical hard disks can provide high performance, they still cannot meet the requirements for applications with high performance requirements. As an emerging storage medium, solid state drives (English: Solid State Drives, referred to as: SSD) have gradually replaced traditional mechanical hard drives and become the first choice for high-performance applications due to their high performance.

但是，SSD有其自身的缺点，SSD为了提高性能，内部设计了大容量的缓存，而掉电后缓存内的数据会丢失，为了解决这个问题，SSD内部设计了备电。现在主流的备电采用超级电容备电方式，但是超级电容故障率高。However, SSD has its own disadvantages. In order to improve performance, SSD has internally designed a large-capacity cache, and the data in the cache will be lost after power failure. To solve this problem, SSD has internally designed backup power. At present, the mainstream backup power adopts super capacitor backup power mode, but the failure rate of super capacitor is high.

为了检测超级电容故障，SSD自身会定期通过放电测试进行电量检测，尤其是做精确电量检测时，由于放电过多，剩余电量无法保证数据可靠性，必须将SSD设置为透写，即将SSD的写缓存(cache)禁用(disable)，这样会导致SSD性能严重降低。对于独立冗余磁盘阵列(英文：Redundant Arrays of Independent Disks，简称：RAID)组来说，这个盘就变成了“慢盘”，RAID组一般由多个SSD组成，若一个或者多个SSD变成“慢盘”，就会导致整个RAID组性能下降，严重的会导致业务中断。In order to detect supercapacitor faults, the SSD itself will conduct power detection through discharge tests on a regular basis, especially when doing accurate power detection, due to excessive discharge, the remaining power cannot guarantee data reliability, and the SSD must be set to write through, that is, write to the SSD. Cache (cache) disabled (disable), which will lead to serious degradation of SSD performance. For a redundant array of independent disks (English: Redundant Arrays of Independent Disks, referred to as: RAID) group, this disk becomes a "slow disk", and a RAID group is generally composed of multiple SSDs. If one or more SSDs become If it becomes a "slow disk", the performance of the entire RAID group will be degraded, and in severe cases, services will be interrupted.

发明内容Contents of the invention

本发明实施例提供一种固态硬盘的备电检测方法、装置和系统，能够在保证性能和数据安全性的情况下实现备电检测。Embodiments of the present invention provide a backup power detection method, device, and system for a solid-state hard disk, which can realize backup power detection while ensuring performance and data security.

第一方面，提供一种固态硬盘的备电检测方法，应用于存储系统，所述存储系统包括N个RAID组，N为正整数，所述方法包括：In the first aspect, a backup power detection method of a solid-state hard disk is provided, which is applied to a storage system, the storage system includes N RAID groups, and N is a positive integer, and the method includes:

获取所述N个RAID组的配置信息和所述N个RAID组的状态，所述N个RAID组中的每个RAID组的配置信息包括该RAID组的SSD数量和冗余度，每个RAID组的状态包括正常状态、降级状态和失效状态中的一种；Obtain the configuration information of the N RAID groups and the status of the N RAID groups, the configuration information of each RAID group in the N RAID groups includes the number of SSDs and the redundancy of the RAID group, and each RAID The state of the group includes one of normal state, degraded state and failure state;

在所述N个RAID组中确定状态为正常状态的M个RAID组；Determining M RAID groups whose status is normal in the N RAID groups;

根据所述M个RAID组的配置信息控制所述M个RAID组同时进行放电测试，其中所述M个RAID组中的每个RAID组进行放电测试时，该RAID组中同时进行放电测试的SSD的数量小于或等于该RAID组的冗余度。According to the configuration information of the M RAID groups, the M RAID groups are controlled to perform the discharge test at the same time, wherein when each RAID group in the M RAID groups performs the discharge test, the SSDs in the RAID group that perform the discharge test at the same time The number is less than or equal to the redundancy of the RAID group.

结合第一方面，在第一种可能的实现方式中，所述根据所述M个RAID组的配置信息控制所述M个RAID组同时进行放电测试，其中所述M个RAID组中的每个RAID组进行放电测试时，该RAID组中同时进行放电测试的SSD的数量小于或等于该RAID组的冗余度包括：With reference to the first aspect, in a first possible implementation manner, the controlling the M RAID groups to perform discharge tests simultaneously according to the configuration information of the M RAID groups, wherein each of the M RAID groups When a RAID group performs a discharge test, the number of SSDs in the RAID group that is simultaneously performing a discharge test is less than or equal to the redundancy of the RAID group, including:

在所述M个RAID组中，根据每个RAID组中的每个SSD的放电测试周期获取该RAID组的SSD放电测试顺序，所述SSD放电测试顺序用于指示每个RAID组中的每个SSD在该RAID组中的放电测试顺序；In the M RAID groups, the SSD discharge test sequence of the RAID group is obtained according to the discharge test period of each SSD in each RAID group, and the SSD discharge test sequence is used to indicate that each SSD in each RAID group The discharge test sequence of SSDs in the RAID group;

按照每个RAID组的SSD放电测试顺序指示每个RAID组中的每个SSD进行放电测试，其中每个RAID组进行放电测试时该RAID组中同时进行放电测试的SSD的数量小于或等于该RAID组的冗余度。According to the SSD discharge test sequence of each RAID group, instruct each SSD in each RAID group to perform a discharge test, and when each RAID group performs a discharge test, the number of SSDs in the RAID group that conducts a discharge test at the same time is less than or equal to the RAID Group redundancy.

结合第一方面或第一方面的第一种可能的实现方式，在第二种可能的实现方式中，所述M个RAID组中的每个RAID组进行放电测试时，该RAID组中进行放电测试的SSD保持回写状态。With reference to the first aspect or the first possible implementation of the first aspect, in the second possible implementation, when each of the M RAID groups performs a discharge test, the RAID group performs a discharge The tested SSD remains in write-back state.

结合第一方面的第二种可能的实现方式，在第三种可能的实现方式中，所述方法还包括：With reference to the second possible implementation of the first aspect, in a third possible implementation, the method further includes:

在第一RAID组中的第一SSD进行放电测试时，若检测到所述第一SSD失效，则通过所述第一RAID组中其他的正常SSD对所述第一SSD的所述数据进行重构；所述第一RAID组为所述M个RAID组中的任一个RAID组，所述第一SSD为所述第一RAID组中的任一个SSD。When the first SSD in the first RAID group is performing a discharge test, if it is detected that the first SSD fails, then the data of the first SSD is replayed by other normal SSDs in the first RAID group. structure; the first RAID group is any RAID group in the M RAID groups, and the first SSD is any SSD in the first RAID group.

结合第一方面的第三种可能的实现方式，在第四种可能的实现方式中，所述第一SSD失效包括所述第一SSD掉电，或者所述第一SSD被拔出；所述第一RAID组中其他的正常SSD包括：所述第一RAID组中除所述第一SSD外当前未进行放电测试的SSD。With reference to the third possible implementation manner of the first aspect, in a fourth possible implementation manner, the failure of the first SSD includes that the first SSD is powered off, or the first SSD is pulled out; Other normal SSDs in the first RAID group include: SSDs in the first RAID group that are not currently undergoing a discharge test except the first SSD.

结合第一方面的第三或第四种可能的实现方式，在第五种可能的实现方式中，通过所述第一RAID组中其他的正常SSD对所述第一SSD的所有数据进行重构包括：With reference to the third or fourth possible implementation of the first aspect, in a fifth possible implementation, all data on the first SSD is reconstructed by using other normal SSDs in the first RAID group include:

通过所述第一RAID组中其他的正常SSD对所述第一SSD的内存和存储介质中的总数据进行重构。The total data in the internal memory and the storage medium of the first SSD is reconstructed by using other normal SSDs in the first RAID group.

结合第一方面，在第六种可能的实现方式中，Combining with the first aspect, in the sixth possible implementation,

若任一个RAID组中不存在损坏的SSD，则该RAID组的状态为所述正常状态；If there is no damaged SSD in any RAID group, the state of the RAID group is the normal state;

若任一个RAID组中存在损坏的SSD，所述损坏的SSD数量小于或者等于该RAID组的冗余度，则该RAID组的状态为所述降级状态；If there are damaged SSDs in any RAID group, and the number of damaged SSDs is less than or equal to the redundancy of the RAID group, then the state of the RAID group is the degraded state;

若任一个RAID组中存在损坏的SSD，且所述损坏的SSD数量大于该RAID组的冗余度，则该RAID组的状态为所述失效状态。If there are damaged SSDs in any RAID group, and the number of damaged SSDs is greater than the redundancy of the RAID group, the state of the RAID group is the failure state.

第二方面，提供一种检测装置，应用于存储系统，所述存储系统包括N个RAID组，N为正整数，所述检测装置还包括：In a second aspect, a detection device is provided, which is applied to a storage system, and the storage system includes N RAID groups, where N is a positive integer, and the detection device further includes:

获取模块，获取所述N个RAID组的配置信息和所述N个RAID组的状态，所述N个RAID组中的每个RAID组的配置信息包括该RAID组的SSD数量和冗余度，每个RAID组的状态包括正常状态、降级状态和失效状态中的一种；An acquisition module that acquires the configuration information of the N RAID groups and the states of the N RAID groups, the configuration information of each RAID group in the N RAID groups includes the number of SSDs and the redundancy of the RAID group, The state of each RAID group includes one of normal state, degraded state and failure state;

识别模块，在所述N个RAID组中确定状态为正常状态的M个RAID组；An identification module, determining M RAID groups in the normal state among the N RAID groups;

处理模块，根据所述M个RAID组的配置信息控制所述M个RAID组同时进行放电测试，其中所述M个RAID组中的每个RAID组进行放电测试时，该RAID组中同时进行放电测试的SSD的数量小于或等于该RAID组的冗余度。A processing module, controlling the M RAID groups to perform a discharge test simultaneously according to the configuration information of the M RAID groups, wherein when each RAID group in the M RAID groups performs a discharge test, the RAID group simultaneously performs a discharge test The number of SSDs tested is less than or equal to the redundancy of the RAID group.

结合第二方面，在第一种可能的实现方式中，所述处理模块具体用于：With reference to the second aspect, in a first possible implementation manner, the processing module is specifically configured to:

结合第二方面或第二方面的第一种可能的实现方式，在第二种可能的实现方式中，所述M个RAID组中的每个RAID组进行放电测试时，该RAID组中进行放电测试的SSD保持回写状态。With reference to the second aspect or the first possible implementation of the second aspect, in the second possible implementation, when each of the M RAID groups performs a discharge test, the RAID group performs a discharge The tested SSD remains in write-back state.

结合第二方面的第二种可能的实现方式，在第三种可能的实现方式中，所述检测装置还包括：With reference to the second possible implementation manner of the second aspect, in a third possible implementation manner, the detection device further includes:

重构模块，用于在第一RAID组中的第一SSD进行放电测试时，若检测到所述第一SSD失效，则通过所述第一RAID组中其他的正常SSD对所述第一SSD的所有数据进行重构；所述第一RAID组为所述M个RAID组中的任一个RAID组，所述第一SSD为所述第一RAID组中的任一个SSD。The reconfiguration module is configured to perform a discharge test on the first SSD in the first RAID group, and if it is detected that the first SSD fails, the first SSD is reconfigured by other normal SSDs in the first RAID group. All the data of the data are reconstructed; the first RAID group is any one of the M RAID groups, and the first SSD is any one of the SSDs in the first RAID group.

结合第二方面的第三种可能的实现方式，在第四种可能的实现方式中，所述第一SSD失效包括所述第一SSD掉电，或者所述第一SSD被拔出；所述第一RAID组中其他的正常SSD包括：所述第一RAID组中除所述第一SSD外当前未进行放电测试的SSD。With reference to the third possible implementation manner of the second aspect, in a fourth possible implementation manner, the failure of the first SSD includes that the first SSD is powered off, or the first SSD is pulled out; Other normal SSDs in the first RAID group include: SSDs in the first RAID group that are not currently undergoing a discharge test except the first SSD.

结合第二方面的第三或第四种可能的实现方式，在第五种可能的实现方式中，所述重构模块具体用于：With reference to the third or fourth possible implementation of the second aspect, in a fifth possible implementation, the reconstruction module is specifically used to:

结合第二方面，在第六种可能的实现方式中，In combination with the second aspect, in a sixth possible implementation,

本发明实施例提供一种固态硬盘的备电检测方法和装置，能够在保证性能和数据安全性的情况下实现备电检测。Embodiments of the present invention provide a backup power detection method and device for a solid-state hard disk, which can realize backup power detection while ensuring performance and data security.

第三方面，提供一种存储系统，所述存储系统包括N个RAID组，N为正整数，所述存储系统还包括：所述第二方面或其上述任一种实现方式所述的检测装置。In a third aspect, a storage system is provided, the storage system includes N RAID groups, and N is a positive integer, and the storage system further includes: the detection device described in the second aspect or any one of the above-mentioned implementation manners .

本发明实施例提供一种固态硬盘的备电检测方法、装置和系统，首先通过获取存储系统中的N个RAID组的配置信息和所述N个RAID组的状态，所述N个RAID组中的每个RAID组的配置信息包括该RAID组的SSD数量和冗余度，每个RAID组的状态包括正常状态、降级状态和失效状态中的一种，其次在所述N个RAID组中确定状态为正常状态的M个RAID组，而后再根据所述M个RAID组的配置信息控制所述M个RAID组同时进行放电测试，其中所述M个RAID组中的每个RAID组进行放电测试时，该RAID组中同时进行放电测试的SSD的数量小于或等于该RAID组的冗余度。相比现有技术中，RAID组内的SSD的放电测试是由SSD自身进行控制的，每个SSD无法感知其它的SSD什么时候进行放电测试，因此，SSD做放电测试时，只能转为透写。SSD透写时，会降低该RAID组的性能，本发明实施例提供的技术方案中通过检测装置统一指示各个RAID组进行放电测试，控制每个RAID组中进行放电测试的SSD数量不超过其冗余度，SSD放电测试时保持回写，因此不会影响到该RAID组的性能，即使在测试过程中发生故障、掉电或者被拔出，依然不会影响到该RAID组的性能，由于采用了检测装置对各个RAID组进行统一指示，因此能够得知各个RAID组中的每个SSD的状态，并且由于SSD均发生掉电或者被拔出的SSD数量不会超过该RAID的冗余度，所以当SSD故障后进行更换，或者掉电/被拔出的SSD重新启用后，能够对该SSD进行数据重构，从而能够在保证性能和数据安全性的情况下实现备电检测。Embodiments of the present invention provide a backup power detection method, device, and system for a solid-state hard disk. First, by acquiring the configuration information of the N RAID groups in the storage system and the states of the N RAID groups, the N RAID groups The configuration information of each RAID group includes the number of SSDs and redundancy of the RAID group, and the state of each RAID group includes one of normal state, degraded state and failure state, and then determined in the N RAID groups M RAID groups whose status is normal, and then control the M RAID groups to perform a discharge test at the same time according to the configuration information of the M RAID groups, wherein each RAID group in the M RAID groups performs a discharge test , the number of SSDs in the RAID group that are performing discharge tests at the same time is less than or equal to the redundancy of the RAID group. Compared with the prior art, the discharge test of the SSDs in the RAID group is controlled by the SSD itself, and each SSD cannot perceive when other SSDs perform the discharge test. Therefore, when the SSD performs the discharge test, it can only switch to transparent Write. When the SSD writes through, it will reduce the performance of the RAID group. In the technical solution provided by the embodiment of the present invention, the detection device uniformly instructs each RAID group to perform the discharge test, and controls the number of SSDs performing the discharge test in each RAID group to not exceed its redundant capacity. The redundancy, the write-back is maintained during the SSD discharge test, so the performance of the RAID group will not be affected. In order for the detection device to uniformly indicate each RAID group, it is possible to know the status of each SSD in each RAID group, and because all SSDs are powered off or the number of SSDs that are pulled out will not exceed the redundancy of the RAID, Therefore, when the SSD is replaced after failure, or the power-off/plugged-out SSD is restarted, the data of the SSD can be reconstructed, so that the backup power detection can be realized while ensuring the performance and data security.

附图说明Description of drawings

为了更清楚地说明本发明实施例中的技术方案，下面将对实施例描述中所需要使用的附图作一简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the following will briefly introduce the drawings that need to be used in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present invention. For those skilled in the art, other drawings can also be obtained based on these drawings without any creative effort.

图1为本发明实施例提供一种固态硬盘的备电检测方法的流程示意图；FIG. 1 is a schematic flow diagram of a method for detecting backup power of a solid-state hard disk according to an embodiment of the present invention;

图2为本发明实施例提供另一种固态硬盘的备电检测方法的流程示意图；FIG. 2 is a schematic flowchart of another solid-state hard disk backup power detection method provided by an embodiment of the present invention;

图3为本发明实施例提供的一种存储系统的结构示意图；FIG. 3 is a schematic structural diagram of a storage system provided by an embodiment of the present invention;

图4为本发明实施例提供的一种检测装置的结构示意图；Fig. 4 is a schematic structural diagram of a detection device provided by an embodiment of the present invention;

图5为本发明实施例提供的另一种检测装置的结构示意图；Fig. 5 is a schematic structural diagram of another detection device provided by an embodiment of the present invention;

图6为本发明实施例提供的又一种检测装置的结构示意图。FIG. 6 is a schematic structural diagram of another detection device provided by an embodiment of the present invention.

具体实施方式detailed description

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

应理解，本发明实施例可应用于包括一个或多个RAID的存储系统，其中，RAID是指独立磁盘冗余阵列，可以理解为由独立磁盘构成的具有冗余能力的存储阵列。RAID通过在多个磁盘上同时存储和读取数据来大幅提高存储系统的数据吞吐量(Throughput)。在RAID中，可以让很多磁盘驱动器同时传输数据，而这些磁盘驱动器在逻辑上又是一个磁盘驱动器，所以使用RAID可以达到单个磁盘驱动器几倍、几十倍甚至上百倍的速率。RAID还可以通过数据校验提供容错功能。在很多RAID模式中都有较为完备的相互校验及恢复的措施，甚至可以是直接相互的镜像备份，从而大大提高了RAID系统的冗余度，提高了系统的稳定性和数据的安全性。在以下本发明提供实施例中，RAID组可以是上述RAID，该RAID可以是由SSD组成的。It should be understood that the embodiments of the present invention may be applied to a storage system including one or more RAIDs, where RAID refers to a redundant array of independent disks, and may be understood as a storage array composed of independent disks with redundancy capabilities. RAID can greatly improve the data throughput (Throughput) of the storage system by simultaneously storing and reading data on multiple disks. In RAID, many disk drives can transmit data at the same time, and these disk drives are logically a disk drive, so using RAID can achieve several times, dozens of times or even hundreds of times the speed of a single disk drive. RAID can also provide fault tolerance through data parity. In many RAID modes, there are relatively complete mutual verification and recovery measures, and even direct mutual mirror backup, which greatly improves the redundancy of the RAID system, improves the stability of the system and the security of data. In the following embodiments of the present invention, the RAID group may be the aforementioned RAID, and the RAID may be composed of SSDs.

本发明实施例提供一种固态硬盘的备电检测方法，如图1所示，该方法包括：An embodiment of the present invention provides a backup power detection method for a solid-state hard disk, as shown in FIG. 1 , the method includes:

步骤101、获取所述N个RAID组的配置信息和所述N个RAID组的状态，所述N个RAID组中的每个RAID组的配置信息包括该RAID组的SSD数量和冗余度，每个RAID组的状态包括正常状态、降级状态和失效状态中的一种。Step 101, acquiring the configuration information of the N RAID groups and the states of the N RAID groups, the configuration information of each RAID group in the N RAID groups includes the number of SSDs and the redundancy of the RAID group, The state of each RAID group includes one of a normal state, a degraded state and a failure state.

步骤102、在所述N个RAID组中确定状态为正常状态的M个RAID组。Step 102. Determine M RAID groups whose status is normal among the N RAID groups.

步骤103、根据所述M个RAID组的配置信息控制所述M个RAID组同时进行放电测试，其中所述M个RAID组中的每个RAID组进行放电测试时，该RAID组中同时进行放电测试的SSD的数量小于或等于该RAID组的冗余度。Step 103, according to the configuration information of the M RAID groups, control the M RAID groups to perform a discharge test at the same time, wherein when each RAID group in the M RAID groups performs a discharge test, the RAID group performs a discharge at the same time The number of SSDs tested is less than or equal to the redundancy of the RAID group.

为了使本领域技术人员能够更清楚地理解本发明实施例提供的技术方案，下面通过具体的实施例，对本发明的实施例提供的固态硬盘的备电检测方法进行详细说明，如图2所示，该方法包括：In order to enable those skilled in the art to understand the technical solution provided by the embodiment of the present invention more clearly, the method for detecting the backup power of the solid-state hard disk provided by the embodiment of the present invention will be described in detail below through specific embodiments, as shown in FIG. 2 , the method includes:

步骤201、获取所述N个RAID组的配置信息和所述N个RAID组的状态。Step 201. Obtain configuration information of the N RAID groups and states of the N RAID groups.

其中，所述N个RAID组中的每个RAID组的配置信息包括该RAID组的SSD数量和冗余度，每个RAID组的状态包括正常状态、降级状态和失效状态中的一种。Wherein, the configuration information of each RAID group in the N RAID groups includes the number of SSDs and the redundancy of the RAID group, and the state of each RAID group includes one of a normal state, a degraded state and a failure state.

在对N个RAID组进行放电测试前，需要首先知道每个RAID组的RAID配置(或称为RAID类型)，通常，当前RAID技术主要包含RAID0～RAID7等数个规范，它们的侧重点各不相同，常见的规范有如下几种：Before performing a discharge test on N RAID groups, you need to know the RAID configuration (or RAID type) of each RAID group. Generally, the current RAID technology mainly includes several specifications such as RAID0~RAID7, and their emphases are different. Similarly, the common specifications are as follows:

RAID0：RAID0连续以位或字节为单位分割数据，并行读/写于多个磁盘上，因此具有很高的数据传输率，但它没有数据冗余，因此并不能算是真正的RAID结构。RAID0只是单纯地提高性能，并没有为数据的可靠性提供保证，而且其中的一个磁盘失效将影响到所有数据。因此，RAID0不能应用于数据安全性要求高的场合。RAID0: RAID0 continuously divides data in units of bits or bytes, and reads/writes on multiple disks in parallel, so it has a high data transfer rate, but it has no data redundancy, so it cannot be regarded as a real RAID structure. RAID0 simply improves performance and does not guarantee data reliability, and the failure of one of the disks will affect all data. Therefore, RAID0 cannot be applied to occasions requiring high data security.

RAID1：它是通过磁盘数据镜像实现数据冗余，在成对的独立磁盘上产生互为备份的数据。当原始数据繁忙时，可直接从镜像拷贝中读取数据，因此RAID 1可以提高读取性能。RAID 1是磁盘阵列中单位成本最高的，但提供了很高的数据安全性和可用性。当一个磁盘失效时，系统可以自动切换到镜像磁盘上读写，而不需要重组失效的数据。RAID1: It achieves data redundancy through disk data mirroring, and generates mutual backup data on paired independent disks. When the original data is busy, the data can be read directly from the mirror copy, so RAID 1 can improve the read performance. RAID 1 has the highest unit cost among disk arrays, but it provides high data security and availability. When a disk fails, the system can automatically switch to read and write on the mirror disk without reorganizing the failed data.

RAID0+1:也被称为RAID10标准，实际是将RAID0和RAID1标准结合的产物，在连续地以位或字节为单位分割数据并且并行读/写多个磁盘的同时，为每一块磁盘作磁盘镜像进行冗余。它的优点是同时拥有RAID0的超凡速度和RAID 1的数据高可靠性，但是CPU占用率同样也更高，而且磁盘的利用率比较低。RAID0+1: Also known as the RAID10 standard, it is actually the product of combining the RAID0 and RAID1 standards. It continuously divides data in units of bits or bytes and reads/writes multiple disks in parallel. Disk mirroring for redundancy. Its advantage is that it has both the extraordinary speed of RAID0 and the high data reliability of RAID1, but the CPU usage rate is also higher, and the disk utilization rate is relatively low.

RAID2：将数据条块化地分布于不同的SSD上，条块单位为位或字节，并使用称为“加重平均纠错码(海明码)”的编码技术来提供错误检查及恢复。这种编码技术需要多个磁盘存放检查及恢复信息，使得RAID2技术实施更复杂，因此在商业环境中很少使用。RAID2: The data is distributed on different SSDs in blocks, and the block unit is bit or byte, and a coding technique called "weighted average error correction code (Hamming code)" is used to provide error checking and recovery. This encoding technique requires multiple disks to store check and restore information, which makes the implementation of RAID2 more complicated, so it is rarely used in commercial environments.

RAID3：它同RAID2非常类似，都是将数据条块化分布于不同的SSD上，区别在于RAID3使用简单的奇偶校验，并用单块磁盘存放奇偶校验信息，该磁盘作为奇偶校验盘。如果一块磁盘失效，奇偶校验盘及其他数据盘可以重新产生数据；如果奇偶盘失效则不影响数据使用。RAID3对于大量的连续数据可提供很好的传输率，但对于随机数据来说，奇偶盘会成为写操作的瓶颈。RAID3: It is very similar to RAID2. It distributes data in different SSDs in strips. The difference is that RAID3 uses a simple parity check and uses a single disk to store parity information. This disk is used as a parity check disk. If a disk fails, the parity disk and other data disks can regenerate data; if the parity disk fails, data usage will not be affected. RAID3 can provide a good transfer rate for a large amount of continuous data, but for random data, the parity disk will become a bottleneck for write operations.

RAID4：RAID4同样也将数据条块化并分布于不同的磁盘上，但条块单位为块或记录。RAID4使用一块磁盘作为奇偶校验盘，每次写操作都需要访问奇偶校验盘，这时奇偶校验盘会成为写操作的瓶颈，因此RAID4在商业环境中也很少使用。RAID4: RAID4 also stripes data and distributes it on different disks, but the stripe unit is a block or record. RAID4 uses a disk as a parity disk, and each write operation needs to access the parity disk. At this time, the parity disk will become the bottleneck of the write operation, so RAID4 is rarely used in commercial environments.

RAID5：RAID5不单独指定的奇偶盘，而是在所有磁盘上交叉地存取数据及奇偶校验信息。在RAID5上，读/写指针可同时对阵列设备进行操作，提供了更高的数据流量。RAID5更适合于小数据块和随机读写的数据。RAID3与RAID5相比，最主要的区别在于RAID 3每进行一次数据传输就需涉及到所有的阵列盘；而对于RAID5来说，大部分数据传输只对一块磁盘操作，并可进行并行操作。在RAID5中有“写损失”，即每一次写操作将产生四个实际的读/写操作，其中两次为读旧的数据及奇偶信息，两次为写新的数据及奇偶信息。RAID5: RAID5 does not specify a single parity disk, but accesses data and parity information across all disks. On RAID5, read/write pointers can operate on array devices at the same time, providing higher data flow. RAID5 is more suitable for small data blocks and random read and write data. Compared with RAID5, the main difference between RAID3 and RAID5 is that every data transmission of RAID3 needs to involve all array disks; while for RAID5, most of the data transmission only operates on one disk and can be operated in parallel. There is "write loss" in RAID5, that is, each write operation will generate four actual read/write operations, two of which are to read old data and parity information, and two are to write new data and parity information.

RAID6：与RAID5相比，RAID6增加了第二个独立的奇偶校验信息块。两个独立的奇偶系统使用不同的算法，数据的可靠性非常高，即使两块磁盘同时失效也不会影响数据的使用。但RAID6需要分配给奇偶校验信息更大的磁盘空间，相对于RAID5有更大的“写损失”，因此“写性能”非常差。较差的性能和复杂的实施方式使得RAID6很少得到实际应用。RAID6: Compared with RAID5, RAID6 adds a second independent parity information block. Two independent parity systems use different algorithms, so data reliability is very high, even if two disks fail at the same time, it will not affect the use of data. However, RAID6 needs to allocate more disk space for parity information, and has a greater "write loss" compared to RAID5, so the "write performance" is very poor. Poor performance and complex implementations make RAID6 seldom used in practice.

RAID7：这是一种新的RAID标准，其自身带有智能化实时操作系统和用于存储管理的软件工具，可完全独立于主机运行，不占用主机CPU资源。RAID7可以看作是一种存储计算机(Storage Computer)，它与其他RAID标准有明显区别。除了以上的各种标准，我们还可以如RAID0+1那样结合多种RAID规范来构筑所需的RAID阵列，例如RAID5+0(RAID50)就是一种应用较为广泛的阵列形式。用户一般可以通过灵活配置磁盘阵列来获得更加符合其要求的磁盘存储系统。RAID7: This is a new RAID standard. It has an intelligent real-time operating system and software tools for storage management. It can run completely independently of the host and does not occupy the CPU resources of the host. RAID7 can be regarded as a storage computer (Storage Computer), which is obviously different from other RAID standards. In addition to the above standards, we can also combine multiple RAID specifications like RAID0+1 to construct the required RAID array. For example, RAID5+0 (RAID50) is a widely used array form. Users can generally obtain a disk storage system that better meets their requirements by configuring disk arrays flexibly.

RAID组的RAID类型不同，其所包括的SSD数量、该RAID组的冗余度也不同，其中，所谓冗余度，就是从安全角度考虑多余的一个量，这个量就是为了保障仪器、设备或某项工作在非正常情况下也能正常运转，在这里可以理解为在不影响该RAID组正常工作的情况下，所述允许出现的失效SSD的最大数量，示例性的，RAID5的冗余度为1，RAID6的冗余度为2，RAID10(n+n)的冗余度为n(n+n为RAID10的SSD数量)等等。因此，需要获取的每个RAID组的配置信息应当包括该RAID组的SSD数量，以及该RAID组的冗余度，除此之外还应当获取该RAID组中每个SSD的标识，比如SSDID。The RAID type of the RAID group is different, the number of SSDs included in it, and the redundancy of the RAID group are also different. Among them, the so-called redundancy is an excess amount considered from a security point of view. This amount is to protect the equipment, equipment or A certain job can also run normally under abnormal conditions, which can be understood here as the maximum number of allowed failed SSDs without affecting the normal operation of the RAID group, for example, the redundancy of RAID5 It is 1, the redundancy of RAID6 is 2, the redundancy of RAID10 (n+n) is n (n+n is the number of SSDs in RAID10), and so on. Therefore, the configuration information of each RAID group that needs to be obtained should include the number of SSDs in the RAID group and the redundancy of the RAID group. In addition, the identifier of each SSD in the RAID group, such as SSDID, should also be obtained.

另外，所述N个RAID组的状态包括：正常状态、降级状态和失效状态中的一种，所述N个RAID组的状态可以通过其中任一个RAID组中的SSD的状态来确定，例如：In addition, the states of the N RAID groups include: one of a normal state, a degraded state and an invalid state, and the states of the N RAID groups can be determined by the state of an SSD in any one of the RAID groups, for example:

若任一个RAID组中存在损坏的SSD，且所述损坏的SSD数量大于该RAID组的冗余度，则该RAID组的状态为所述失效状态。在获取了所述N个RAID组的配置信息和状态后，执行步骤202。If there are damaged SSDs in any RAID group, and the number of damaged SSDs is greater than the redundancy of the RAID group, the state of the RAID group is the failure state. After the configuration information and states of the N RAID groups are acquired, step 202 is executed.

步骤202、在所述N个RAID组中确定状态为正常状态的M个RAID组。Step 202: Determine M RAID groups whose status is normal among the N RAID groups.

具体的，根据步骤201中获取的所述N个RAID组的状态后，在其中确定状态为正常状态的M个RAID组，用于对这M个RAID组进行后续放电测试，对于状态为失效状态和降级状态的RAID组不进行放电测试。Specifically, according to the states of the N RAID groups obtained in step 201, determine the M RAID groups whose status is normal, and use it to carry out subsequent discharge tests on these M RAID groups. and degraded RAID groups do not perform discharge tests.

步骤203、在所述M个RAID组中，根据每个RAID组中的每个SSD的放电测试周期获取该RAID组的SSD放电测试顺序，所述SSD放电测试顺序用于指示每个RAID组中的每个SSD在该RAID组中的放电测试顺序。Step 203, in the M RAID groups, obtain the SSD discharge test sequence of the RAID group according to the discharge test cycle of each SSD in each RAID group, and the SSD discharge test sequence is used to indicate the SSD discharge test sequence in each RAID group The discharge test sequence of each SSD in the RAID group.

其中，每个RAID组中的每个SSD的放电测试周期可以包括每个RAID组中的每个SSD上次放电的时间和周期时长。根据每个SSD上次放电的时间和周期时长可以确定每个SSD在其所属的磁盘阵列中的放电测试顺序。当然，除此之外，也可能采用其他的策略来确定每个RAID组的中的各个SSD的放电测试顺序。Wherein, the discharge test cycle of each SSD in each RAID group may include the last discharge time and cycle duration of each SSD in each RAID group. The discharge test sequence of each SSD in the disk array to which it belongs can be determined according to the last discharge time and cycle duration of each SSD. Of course, in addition to this, other strategies may also be used to determine the discharge test sequence of each SSD in each RAID group.

步骤204、按照每个RAID组的SSD放电测试顺序指示每个RAID组中的每个SSD进行放电测试，其中每个RAID组进行放电测试时该RAID组中同时进行放电测试的SSD的数量小于或等于该RAID组的冗余度。Step 204, instruct each SSD in each RAID group to perform a discharge test according to the SSD discharge test sequence of each RAID group, wherein when each RAID group performs a discharge test, the number of SSDs in the RAID group that simultaneously performs a discharge test is less than or Equal to the redundancy of the RAID group.

其中，需要说明的是RAID组的RAID类型不同，其放电测试顺序也不同以及同时进行放电测试的SSD数量也不同，只要控制每个RAID组进行放电测试时该RAID组中同时进行放电测试的SSD的数量不超过该RAID组的冗余度即可，例如，图3所示，存储系统中包括3个RAID组，分别为RAID5、RAID6和RAID10(3+3)，每个RAID组均由SSD组成，由于其冗余度分别为1、2、3，因此RAID5、RAID6和RAID10这三个RAID组同时可以进行放电测试的最大SSD数量也分别为1、2、3，故这三个RAID组的放电测试顺序可以如图3所示的顺序(其中序号相同的为可以同时进行放电测试的SSD)，即RAID5的放电测试顺序由前到后分别为SSD0、SSD1、SSD2、SSD3、SSD4，RAID6的放电测试顺序为首先进行SSD5和SSD6的放电测试，再进行SSD7和SSD8的放电测试，最后进行SSD9和SSD10的放电测试，RAID10的放电测试顺序为首先进行SSD11、SSD12、SSD13的放电测试，再进行SSD14、SSD15和SSD16的放电测试。Among them, it should be noted that the RAID type of the RAID group is different, the sequence of the discharge test is also different, and the number of SSDs performing the discharge test at the same time is also different. As long as each RAID group is controlled for the discharge test, the SSDs in the RAID group that perform the discharge test at the same time The number of RAID groups should not exceed the redundancy of the RAID group. For example, as shown in Figure 3, the storage system includes 3 RAID groups, namely RAID5, RAID6 and RAID10 (3+3), and each RAID group consists of SSD Since their redundancy levels are 1, 2, and 3, the maximum number of SSDs that can be tested simultaneously in the three RAID groups of RAID5, RAID6, and RAID10 are also 1, 2, and 3 respectively. Therefore, the three RAID groups The discharge test sequence can be as shown in Figure 3 (where the same serial number is the SSD that can perform discharge test at the same time), that is, the discharge test sequence of RAID5 is SSD0, SSD1, SSD2, SSD3, SSD4, RAID6 from front to back. The discharge test sequence of RAID 10 is to conduct the discharge test of SSD5 and SSD6 first, then the discharge test of SSD7 and SSD8, and finally the discharge test of SSD9 and SSD10. The discharge test sequence of RAID10 is to first conduct the discharge test of SSD11, SSD12, Carry out the discharge test of SSD14, SSD15 and SSD16.

具体的放电测试过程中，可以根据每个SSD的标识(如前述SSD0～16)来对各个RAID组的放电测试进行控制。例如，当获得某一RAID组的放电测试顺序后，系统中检测装置获取该放电测试顺序当前要进行放电测试的SSD的标识，而后系统中检测装置向该标识对应的SSD发送进行放电测试的指令，并且记录当前正在进行放电测试的SSD的标识，当该SSD完成放电测试后，系统中检测装置根据该RAID组的放电测试顺序进行下一个或多个SSD的放电测试。During the specific discharge test process, the discharge test of each RAID group can be controlled according to the identifier of each SSD (such as the aforementioned SSD0-16). For example, after obtaining the discharge test sequence of a certain RAID group, the detection device in the system obtains the identification of the SSD that is currently undergoing a discharge test in the discharge test sequence, and then the detection device in the system sends an instruction to perform a discharge test to the SSD corresponding to the identification , and record the identification of the SSD currently undergoing the discharge test. When the SSD completes the discharge test, the detection device in the system performs the discharge test of the next one or more SSDs according to the discharge test order of the RAID group.

另外，需要说明的是，所述M个RAID组中的每个RAID组进行放电测试时，该RAID组中进行放电测试的SSD保持回写(Write back)状态。回写，是指更新数据时，只是先把数据写入内存(cache，也称缓存)，并不同步更新到存储介质(memory)，只是在内存要写入新的数据时，才把内存的数据更新到存储介质，并释放内存的空间将新的数据再写入内存。采用回写，可以在SSD放电测试时保持SSD的性能不被降低。但是由于，保持回写状态时，一旦发生掉电或者该SSD被拔出时，该SSD的内存中的数据会丢失，因此，本发明实施例提供的检测方法中还可以包括：In addition, it should be noted that when each RAID group in the M RAID groups performs the discharge test, the SSDs in the RAID group that perform the discharge test maintain a write back (Write back) state. Write-back means that when updating data, it only writes the data into the memory (cache, also called cache) first, and does not update it to the storage medium (memory) synchronously. The data is updated to the storage medium, and the memory space is released to write the new data into the memory. With write-back, the performance of the SSD can be kept from being degraded during the SSD discharge test. However, since the data in the internal memory of the SSD will be lost once a power failure occurs or the SSD is pulled out when the write-back state is maintained, the detection method provided in the embodiment of the present invention may also include:

步骤205、在第一RAID组中的第一SSD进行放电测试时，若检测到所述第一SSD失效，则通过所述第一RAID组中其他的正常SSD对所述第一SSD的所有数据进行重构。所述第一RAID组为所述M个RAID组中的任一个RAID组，所述第一SSD为所述第一RAID组中的任一个SSD。Step 205, when the first SSD in the first RAID group is performing a discharge test, if it is detected that the first SSD fails, then use other normal SSDs in the first RAID group to perform a discharge test on all data of the first SSD Do the refactoring. The first RAID group is any RAID group in the M RAID groups, and the first SSD is any SSD in the first RAID group.

其中，所述第一SSD失效包括所述第一SSD掉电，或者所述第一SSD被拔出；所述第一RAID组中其他的正常SSD包括：所述第一RAID组中除所述第一SSD外当前未进行放电测试的SSD。Wherein, the first SSD failure includes that the first SSD is powered off, or the first SSD is pulled out; other normal SSDs in the first RAID group include: the first RAID group except the SSDs other than the first SSD that are not currently undergoing a discharge test.

具体的，上述所有数据可以包括但不限于第一SSD的内存和存储介质中的总数据，此时对所述第一SSD的所有数据进行重构可以包括：通过所述第一RAID组中其他的正常SSD对所述第一SSD的内存和存储介质中的总数据进行重构。其原理是，由于一个RAID把数据和与数据相对应的奇偶校验信息存储到组成该RAID的各个SSD上。当该RAID中的SSD数据发生损坏后，只要出现数据损坏的SSD数量不超过其冗余度，可以利用剩下的其他SSD上存储的数据和数据相应的奇偶校验信息通过预设的算法去恢复被损坏的数据。由此可见，能够在保证性能和数据安全性的情况下实现备电检测。Specifically, all the above-mentioned data may include but not limited to the total data in the internal memory and storage medium of the first SSD, and at this time, reconstructing all the data of the first SSD may include: The normal SSD reconstructs the total data in the internal memory and storage medium of the first SSD. The principle is that a RAID stores data and parity information corresponding to the data on each SSD that makes up the RAID. When the SSD data in the RAID is damaged, as long as the number of SSDs with data damage does not exceed its redundancy, the data stored on the remaining SSDs and the parity information corresponding to the data can be used to remove the data through a preset algorithm. Recover corrupted data. It can be seen that the backup power detection can be realized under the condition of ensuring performance and data security.

综上所述，本发明实施例提供一种固态硬盘的备电检测方法，首先通过获取存储系统中的N个RAID组的配置信息和所述N个RAID组的状态，所述N个RAID组中的每个RAID组的配置信息包括该RAID组的SSD数量和冗余度，每个RAID组的状态包括正常状态、降级状态和失效状态中的一种，其次在所述N个RAID组中确定状态为正常状态的M个RAID组，而后再根据所述M个RAID组的配置信息控制所述M个RAID组同时进行放电测试，其中所述M个RAID组中的每个RAID组进行放电测试时，该RAID组中同时进行放电测试的SSD的数量小于或等于该RAID组的冗余度。相比现有技术中，RAID组内的SSD的放电测试是由SSD自身进行控制的，每个SSD无法感知其它的SSD什么时候进行放电测试，因此，SSD做放电测试时，只能转为透写。SSD透写时，会降低该RAID组的性能，本发明实施例提供的技术方案中通过检测装置统一指示各个RAID组进行放电测试，控制每个RAID组中进行放电测试的SSD数量不超过其冗余度，SSD放电测试时保持回写，因此不会影响到该RAID组的性能，即使在测试过程中发生故障、掉电或者被拔出，依然不会影响到该RAID组的性能，由于采用了检测装置对各个RAID组进行统一指示，因此能够得知各个RAID组中的每个SSD的状态，并且由于SSD均发生掉电或者被拔出的SSD数量不会超过该RAID的冗余度，所以当SSD故障后进行更换，或者掉电/被拔出的SSD重新启用后，能够对该SSD进行数据重构，从而能够在保证性能和数据安全性的情况下实现备电检测。To sum up, the embodiment of the present invention provides a backup power detection method for a solid-state hard disk. First, by acquiring the configuration information of the N RAID groups and the states of the N RAID groups in the storage system, the N RAID groups The configuration information of each RAID group in the RAID group includes the number of SSDs and redundancy of the RAID group, and the state of each RAID group includes one of normal state, degraded state and failure state, and secondly in the N RAID groups Determine the M RAID groups that are in the normal state, and then control the M RAID groups to perform a discharge test at the same time according to the configuration information of the M RAID groups, wherein each RAID group in the M RAID groups performs a discharge During the test, the number of SSDs in the RAID group that are performing the discharge test at the same time is less than or equal to the redundancy of the RAID group. Compared with the prior art, the discharge test of the SSDs in the RAID group is controlled by the SSD itself, and each SSD cannot perceive when other SSDs perform the discharge test. Therefore, when the SSD performs the discharge test, it can only switch to transparent Write. When the SSD writes through, it will reduce the performance of the RAID group. In the technical solution provided by the embodiment of the present invention, the detection device uniformly instructs each RAID group to perform the discharge test, and controls the number of SSDs performing the discharge test in each RAID group to not exceed its redundant capacity. The redundancy, the write-back is maintained during the SSD discharge test, so the performance of the RAID group will not be affected. In order for the detection device to uniformly indicate each RAID group, it is possible to know the status of each SSD in each RAID group, and because all SSDs are powered off or the number of SSDs that are pulled out will not exceed the redundancy of the RAID, Therefore, when the SSD is replaced after failure, or the power-off/plugged-out SSD is restarted, the data of the SSD can be reconstructed, so that the backup power detection can be realized while ensuring the performance and data security.

本发明实施例还提供一种检测装置01，应用于存储系统，所述存储系统包括N个RAID组，N为正整数，如图4所示，所述检测装置01包括：The embodiment of the present invention also provides a detection device 01, which is applied to a storage system. The storage system includes N RAID groups, and N is a positive integer. As shown in FIG. 4, the detection device 01 includes:

获取模块011，获取所述N个RAID组的配置信息和所述N个RAID组的状态，所述N个RAID组中的每个RAID组的配置信息包括该RAID组的SSD数量和冗余度，每个RAID组的状态包括正常状态、降级状态和失效状态中的一种；Obtaining module 011, acquiring the configuration information of the N RAID groups and the states of the N RAID groups, the configuration information of each RAID group in the N RAID groups includes the number of SSDs and the redundancy of the RAID group , the state of each RAID group includes one of normal state, degraded state and failure state;

识别模块012，在所述N个RAID组中确定状态为正常状态的M个RAID组；An identification module 012, determining M RAID groups whose status is normal among the N RAID groups;

处理模块013，根据所述M个RAID组的配置信息控制所述M个RAID组同时进行放电测试，其中所述M个RAID组中的每个RAID组进行放电测试时，该RAID组中同时进行放电测试的SSD的数量小于或等于该RAID组的冗余度。The processing module 013 controls the M RAID groups to perform discharge tests simultaneously according to the configuration information of the M RAID groups, wherein when each RAID group in the M RAID groups performs a discharge test, the RAID groups simultaneously perform discharge tests. The number of SSDs tested for discharge is less than or equal to the redundancy of the RAID group.

可选的，所述处理模块013可以具体用于：Optionally, the processing module 013 may be specifically used for:

可选的，所述M个RAID组中的每个RAID组进行放电测试时，该RAID组中进行放电测试的SSD保持回写状态。Optionally, when each of the M RAID groups performs the discharge test, the SSDs in the RAID group that perform the discharge test maintain a write-back state.

可选的，如图5所示，所述检测装置01还可以包括：Optionally, as shown in Figure 5, the detection device 01 may also include:

重构模块014，用于在第一RAID组中的第一SSD进行放电测试时，若检测到所述第一SSD失效，则通过所述第一RAID组中其他的正常SSD对所述第一SSD的所有数据进行重构；所述第一RAID组为所述M个RAID组中的任一个RAID组，所述第一SSD为所述第一RAID组中的任一个SSD。The reconfiguration module 014 is configured to perform a discharge test on the first SSD in the first RAID group, and if it is detected that the first SSD fails, the first SSD is detected by other normal SSDs in the first RAID group. All data of the SSD is reconstructed; the first RAID group is any RAID group in the M RAID groups, and the first SSD is any SSD in the first RAID group.

可选的，所述第一SSD失效包括所述第一SSD掉电，或者所述第一SSD被拔出；所述第一RAID组中其他的正常SSD包括：所述第一RAID组中除所述第一SSD外当前未进行放电测试的SSD。Optionally, the failure of the first SSD includes that the first SSD is powered off, or the first SSD is pulled out; other normal SSDs in the first RAID group include: SSDs other than the first SSD that are not currently undergoing a discharge test.

可选的，所述重构模块014可以具体用于：Optionally, the reconstruction module 014 can be specifically used for:

可选的，optional,

可选的，所述N个RAID组为独立磁盘冗余阵列RAID。Optionally, the N RAID groups are redundant array of independent disks RAID.

本实施例用于实现上述各方法实施例，本实施例中各个单元的工作流程和工作原理参见上述各方法实施例中的描述，在此不再赘述。This embodiment is used to implement the above-mentioned method embodiments. For the workflow and working principle of each unit in this embodiment, refer to the descriptions in the above-mentioned method embodiments, and details are not repeated here.

综上所述，本发明实施例提供一种检测装置，首先通过获取存储系统中的N个RAID组的配置信息和所述N个RAID组的状态，所述N个RAID组中的每个RAID组的配置信息包括该RAID组的SSD数量和冗余度，每个RAID组的状态包括正常状态、降级状态和失效状态中的一种，其次在所述N个RAID组中确定状态为正常状态的M个RAID组，而后再根据所述M个RAID组的配置信息控制所述M个RAID组同时进行放电测试，其中所述M个RAID组中的每个RAID组进行放电测试时，该RAID组中同时进行放电测试的SSD的数量小于或等于该RAID组的冗余度。相比现有技术中，RAID组内的SSD的放电测试是由SSD自身进行控制的，每个SSD无法感知其它的SSD什么时候进行放电测试，因此，SSD做放电测试时，只能转为透写。SSD透写时，会降低该RAID组的性能，本发明实施例提供的技术方案中通过检测装置统一指示各个RAID组进行放电测试，控制每个RAID组中进行放电测试的SSD数量不超过其冗余度，SSD放电测试时保持回写，因此不会影响到该RAID组的性能，即使在测试过程中发生故障、掉电或者被拔出，依然不会影响到该RAID组的性能，由于采用了检测装置对各个RAID组进行统一指示，因此能够得知各个RAID组中的每个SSD的状态，并且由于SSD均发生掉电或者被拔出的SSD数量不会超过该RAID的冗余度，所以当SSD故障后进行更换，或者掉电/被拔出的SSD重新启用后，能够对该SSD进行数据重构，从而能够在保证性能和数据安全性的情况下实现备电检测。To sum up, the embodiment of the present invention provides a detection device. Firstly, by acquiring the configuration information of the N RAID groups and the states of the N RAID groups in the storage system, each RAID in the N RAID groups The configuration information of the group includes the number of SSDs and the redundancy of the RAID group, and the state of each RAID group includes one of normal state, degraded state and failure state, and then determines that the state is a normal state in the N RAID groups M RAID groups, and then control the M RAID groups to perform discharge tests at the same time according to the configuration information of the M RAID groups, wherein when each RAID group in the M RAID groups performs a discharge test, the RAID The number of SSDs in the group that are performing discharge tests at the same time is less than or equal to the redundancy of the RAID group. Compared with the prior art, the discharge test of the SSDs in the RAID group is controlled by the SSD itself, and each SSD cannot perceive when other SSDs perform the discharge test. Therefore, when the SSD performs the discharge test, it can only switch to transparent Write. When the SSD writes through, it will reduce the performance of the RAID group. In the technical solution provided by the embodiment of the present invention, the detection device uniformly instructs each RAID group to perform the discharge test, and controls the number of SSDs performing the discharge test in each RAID group to not exceed its redundant capacity. The redundancy, the write-back is maintained during the SSD discharge test, so the performance of the RAID group will not be affected. In order for the detection device to uniformly indicate each RAID group, it is possible to know the status of each SSD in each RAID group, and because all SSDs are powered off or the number of SSDs that are pulled out will not exceed the redundancy of the RAID, Therefore, when the SSD is replaced after failure, or the power-off/plugged-out SSD is re-enabled, the data of the SSD can be reconstructed, so that the backup power detection can be realized while ensuring the performance and data security.

本发明实施例还提供一种检测装置02，应用于存储系统，所述存储系统包括N个RAID组，N为正整数，如图6所示，所述检测装置02包括：处理器021、存储器022和接口023，处理器021、存储器022和接口023通过总线024相连接，接口023用于与其他设备通信，存储器022用于存储计算机程序0221，处理器021用于执行该计算机程序0221，处理器021执行该计算机程序0221用于：The embodiment of the present invention also provides a detection device 02, which is applied to a storage system. The storage system includes N RAID groups, and N is a positive integer. As shown in FIG. 6, the detection device 02 includes: a processor 021, a memory 022 and interface 023, the processor 021, the memory 022 and the interface 023 are connected through the bus 024, the interface 023 is used to communicate with other devices, the memory 022 is used to store the computer program 0221, the processor 021 is used to execute the computer program 0221, and the processing The device 021 executes the computer program 0221 for:

可选的，处理器021执行该计算机程序0221可以具体用于：Optionally, the processor 021 executing the computer program 0221 may be specifically used for:

可选的，处理器021执行该计算机程序0221可以还用于：Optionally, the processor 021 executing the computer program 0221 may also be used for:

在第一RAID组中的第一SSD进行放电测试时，若检测到所述第一SSD失效，则通过所述第一RAID组中其他的正常SSD对所述第一SSD的所有数据进行重构；所述第一RAID组为所述M个RAID组中的任一个RAID组，所述第一SSD为所述第一RAID组中的任一个SSD。When the first SSD in the first RAID group is performing a discharge test, if it is detected that the first SSD fails, then use other normal SSDs in the first RAID group to reconstruct all data on the first SSD ; The first RAID group is any RAID group in the M RAID groups, and the first SSD is any SSD in the first RAID group.

可选的，若任一个RAID组中不存在损坏的SSD，则该RAID组的状态为所述正常状态；Optionally, if there is no damaged SSD in any RAID group, the state of the RAID group is the normal state;

综上所述，本发明实施例提供一种检测装置，首先通过获取存储系统中的N个RAID组的配置信息和所述N个RAID组的状态，所述N个RAID组中的每个RAID组的配置信息包括该RAID组的SSD数量和冗余度，每个RAID组的状态包括正常状态、降级状态和失效状态中的一种，其次在所述N个RAID组中确定状态为正常状态的M个RAID组，而后再根据所述M个RAID组的配置信息控制所述M个RAID组同时进行放电测试，其中所述M个RAID组中的每个RAID组进行放电测试时，该RAID组中同时进行放电测试的SSD的数量小于或等于该RAID组的冗余度。相比现有技术中，RAID组内的SSD的放电测试是由SSD自身进行控制的，每个SSD无法感知其它的SSD什么时候进行放电测试，因此，SSD做放电测试时，只能转为透写。SSD透写时，会降低该RAID组的性能，本发明实施例提供的技术方案中通过检测装置统一指示各个RAID组进行放电测试，控制每个RAID组中进行放电测试的SSD数量不超过其冗余度，SSD放电测试时保持回写，因此不会影响到该RAID组的性能，即使在测试过程中发生故障、掉电或者被拔出，依然不会影响到该RAID组的性能，由于采用了检测装置对各个RAID组进行统一指示，因此能够得知各个RAID组中的每个SSD的状态，并且由于SSD均发生掉电或者被拔出的SSD数量不会超过该RAID的冗余度，所以当SSD故障后进行更换，或者掉电/被拔出的SSD重新启用后，能够对该SSD进行数据重构，从而能够在保证性能和数据安全性的情况下实现备电检测。To sum up, the embodiment of the present invention provides a detection device. Firstly, by acquiring the configuration information of the N RAID groups and the states of the N RAID groups in the storage system, each RAID in the N RAID groups The configuration information of the group includes the number of SSDs and the redundancy of the RAID group, and the state of each RAID group includes one of normal state, degraded state and failure state, and then determines that the state is a normal state in the N RAID groups M RAID groups, and then control the M RAID groups to perform discharge tests at the same time according to the configuration information of the M RAID groups, wherein when each RAID group in the M RAID groups performs a discharge test, the RAID The number of SSDs in the group that are performing discharge tests at the same time is less than or equal to the redundancy of the RAID group. Compared with the prior art, the discharge test of the SSDs in the RAID group is controlled by the SSD itself, and each SSD cannot sense when other SSDs perform the discharge test. Therefore, when the SSD performs the discharge test, it can only switch to transparent Write. When the SSD writes through, the performance of the RAID group will be reduced. In the technical solution provided by the embodiment of the present invention, the detection device uniformly instructs each RAID group to perform a discharge test, and controls the number of SSDs performing a discharge test in each RAID group to not exceed its redundant capacity. redundancy, SSD discharge test keeps writing back, so it will not affect the performance of the RAID group, even if it fails, power down or is pulled out during the test, it will not affect the performance of the RAID group In order for the detection device to uniformly indicate each RAID group, it is possible to know the status of each SSD in each RAID group, and because the SSDs are all powered off or the number of SSDs that are pulled out will not exceed the redundancy of the RAID, Therefore, when the SSD is replaced after failure, or the power-off/plugged-out SSD is re-enabled, the data of the SSD can be reconstructed, so that the backup power detection can be realized while ensuring the performance and data security.

本发明实施例还提供一种存储系统，所述存储系统包括N个RAID组，N为正整数，所述存储系统还包括：前述实施例所述的检测装置01或02。An embodiment of the present invention further provides a storage system, the storage system includes N RAID groups, where N is a positive integer, and the storage system further includes: the detection device 01 or 02 described in the foregoing embodiments.

在本发明所提供的几个实施例中，应该理解到，所提供的装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the several embodiments provided in the present invention, it should be understood that the provided devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本发明各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用硬件加软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or in the form of hardware plus software functional units.

上述以软件功能单元的形式实现的集成的单元，可以存储在一个计算机可读取存储介质中。上述软件功能单元存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)或处理器(processor)执行本发明各个实施例所述方法的部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(Read-Only Memory，ROM)、随机存取存储器(Random Access Memory，RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The above-mentioned integrated units implemented in the form of software functional units may be stored in a computer-readable storage medium. The above-mentioned software functional units are stored in a storage medium, and include several instructions to make a computer device (which may be a personal computer, server, or network device, etc.) or a processor (processor) execute the methods described in various embodiments of the present invention. partial steps. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other various media that can store program codes. .

本领域技术人员可以清楚地了解到，为描述的方便和简洁，仅以上述各功能模块的划分进行举例说明，实际应用中，可以根据需要而将上述功能分配由不同的功能模块完成，即将装置的内部结构划分成不同的功能模块，以完成以上描述的全部或者部分功能。上述描述的装置的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that for the convenience and brevity of description, only the division of the above-mentioned functional modules is used as an example for illustration. The internal structure of the system is divided into different functional modules to complete all or part of the functions described above. For the specific working process of the device described above, reference may be made to the corresponding process in the foregoing method embodiments, and details are not repeated here.