US20060053338A1

Movatterモバイル変換

Info

Publication number: US20060053338A1
Application number: US11/043,449
Authority: US
Inventors: Jeffrey Cousins; Aloke Guha
Original assignee: Copan Systems Inc
Current assignee: RPX Corp; Copan Systems Inc
Priority date: 2004-09-08
Filing date: 2005-01-25
Publication date: 2006-03-09
Also published as: US7373559B2; WO2006029098A2; US20080244318A1; US7908526B2; US20050060618A1; WO2006029098A3

Abstract

Methods and apparatuses for maintaining a particular disk drive that is powered off in a storage system are disclosed. The method includes powering on the particular disk drive and executing a test on the particular disk drive. The method further includes powering off the particular disk drive on completing the test.

Description

CLAIM OF PRIORITY

This application is a continuation-in-part of the following application, which is hereby incorporated by reference, as if it is set forth in full in this specification:

- U.S. patent application Ser. No. 10/937,226, entitled ‘Method for Proactive Drive Replacement for High-Availability Storage Systems’, filed on Sep. 8, 2004.

This application is related to the following application, which is hereby incorporated by reference, as if set forth in full in this specification:

- Co-pending U.S. patent application Ser. No. 10/607,932, entitled ‘Method and Apparatus for Power-Efficient High-Capacity Scalable Storage System’, filed on Sep. 12, 2002.

BACKGROUND

The present invention relates generally to digital processing systems. More specifically, the present invention relates to a method of preventing failure of disk drives in high-availability storage systems.

Typically, data storage systems in computing applications include storage devices such as hard disk drives, floppy drives, tape drives, compact disks, and so forth. An increase in the amount and complexity of these applications has resulted in a proportional increase in the demand for larger storage capacities. Consequently, the production of high-capacity storage devices has increased in the past few years. However, large storage capacities demand reliable storage devices with reasonably high data-transfer rates. Moreover, the storage capacity of a single storage device cannot be increased beyond a certain limit. Therefore, various data-storage system configurations and topologies, using multiple storage devices, are commonly used to meet the growing demand for increased storage capacity.

A configuration of the data storage system, to meet the growing demand, involves the use of multiple disk drives. Such a configuration permits redundancy of stored data. Redundancy ensures data integrity in the case of device failures. In many such data-storage systems, recovery from common failures can be automated within the data storage system by using data redundancy such as, parity and its generation, with the help of a central controller. However, such data-redundancy schemes may be an overhead of the data storage system. These data-storage systems are typically referred to as Redundant Array of Inexpensive/Independent Disks (RAIDs). The 1988 publication by David A. Patterson et al., from the University of California at Berkeley, titled ‘A Case for Redundant Arrays of Inexpensive Disks (RAIDs)’, describes the fundamental concepts of the RAID technology.

RAID storage systems suffer from inherent drawbacks that reduce their availability. If a disk drive in the RAID storage system fails, data can be reconstructed with the help of redundant drives. The reconstructed data is then stored in a replacement disk drive. During reconstruction, the data on the failed drive is not available. Further, if more than one disk drive fails in a RAID system, data on both drives cannot be reconstructed if there is single drive redundancy, resulting in possible loss of data. The probability of disk drive failure increases as the number of disk drives in a RAID storage system increases. Therefore, RAID storage systems with a large number of disk drives are typically organized into several smaller RAID systems. This reduces the probability of data loss in large RAID systems. Further, the use of smaller RAID systems also reduces the time it takes to reconstruct data on a spare disk drive in the event of a disk drive failure. When a RAID system loses a critical number of disk drives, there is a period of vulnerability from the time the disk drives fail until the time-data reconstruction on the spare drives is completed. During this time, the RAID system is exposed to the possibility of additional disk drives failing, which would cause an unrecoverable data loss. If the failure of one or more disk drives can be predicted, with sufficient time to replace the drive or drives before a failure or failures, a drive or drives can be replaced without sacrificing fault tolerance, and data reliability and availability can be considerably enhanced.

Various methods and systems are known that predict the impending failure of disk drives in storage systems. However, these methods and systems predict the impending failure of disk drives that are used frequently to process requests from computers. The reliability of disk drives that are not used, or used infrequently, is not predicted by known methods and systems.

SUMMARY

In accordance with one embodiment of the present invention, a method for maintaining a particular disk drive that is powered off in a storage system is provided. The method includes powering on the particular disk drive and executing a test on it. The method further includes powering off the particular disk drive after executing the test.

In accordance with another embodiment of the present invention, an apparatus for maintaining a particular disk drive, which is powered off in a storage system, is provided. The apparatus includes a power controller for controlling power to the disk drives and the particular disk drive. The apparatus further includes a test moderator for executing a test on the particular disk drive, which is powered on by the power controller before the test is to be executed, and is powered off after the test is executed.

In one embodiment the invention provides a method for maintaining a particular disk drive in a storage system, wherein the storage system includes a plurality of disk drives and the particular disk drive that is powered-off, the method comprising: powering-on the particular disk drive; executing a test on the particular disk drive; and powering-off the particular disk drive.

In another embodiment the invention provides a method for maintaining data in a disk drive, the method comprising: performing a check on the disk drive; if a predetermined criterion is not met as a result of the test then performing a recovery action.

BRIEF DESCRIPTION OF THE DRAWINGS

Various embodiments of the present invention will hereinafter be described in conjunction with the appended drawings, provided to illustrate and not to limit the present invention, wherein like designations denote like elements, and in which:

FIG. 1 is a block diagram illustrating a storage system, in accordance with an embodiment of the present invention;

FIG. 2 is a block diagram illustrating the components of a memory and a Central Processing Unit (CPU) and their interaction in accordance with an embodiment of the present invention;

FIG. 3 is a flowchart of a method for preventing the failure of disk drives in a storage system, in accordance with one embodiment of the present invention;

FIG. 4 is a graph showing an exemplary variation of mean-time-to-failure of a disk drive with temperature;

FIG. 5 is a flowchart of a method for preventing the failure of disk drives in a storage system, in accordance with another embodiment of the present invention;

FIG. 6 is a flowchart of a method for preventing the failure of disk drives in a storage system, in accordance with another embodiment of the present invention;

FIG. 7 is a block diagram illustrating the components of a memory and a Central Processing Unit (CPU), and their interaction, in accordance with another embodiment of the present invention;

FIG. 8 is a flowchart of a method for maintaining a particular disk drive in a storage system, where the particular disk drive is powered off, in accordance with an embodiment of the present invention;

FIG. 9 is a flowchart of a method for maintaining a particular disk drive in a storage system, where the particular disk drive is powered off, in accordance with another embodiment of the present invention;

FIG. 10 is a flowchart of a method for executing a test on the particular disk drive, in accordance with an embodiment of the present invention; and

FIG. 11 is a flowchart of a method for executing a test on the particular disk drive, in accordance with another embodiment of the present invention.

DESCRIPTION OF VARIOUS EMBODIMENTS

Embodiments of the present invention provide a method, system and computer program product for preventing the failure of disk drives in high availability storage systems. Failure of disk drives is predicted and an indication for their replacement is given. Failure is predicted by the monitoring of factors, including those relating to the aging of disk drives, early onset of errors in disk drives and the acceleration of these factors.

FIG. 1 is a block diagram illustrating astorage system100 in accordance with an embodiment of the invention.Storage system100 includesdisk drives102, a Central Processing Unit (CPU)104, amemory106, acommand router108,environmental sensors110 and ahost adaptor112.Storage system100 stores data in disk drives102. Further,disk drives102 store parity information that is used to reconstruct data in case of disk drive failure.CPU104

controls storage system

100. Among other operations,CPU104 calculates parity for data stored in disk drives102. Further,CPU104 monitors factors of each disk drive indisk drives102 for predicting failure.

Exemplary factors for predicting disk drive failures include power-on hours, start stops, reallocated sector count, and the like. The method of predicting disk drive failure by monitoring the various factors is explained in detail in conjunction withFIG. 3,FIG. 5 andFIG. 6.Memory106 stores the monitored values of factors. Further,memory106 also stores values of thresholds to which the factors are compared. In an embodiment of the invention, Random Access Memory (RAM) is used to store the monitored values of factors and the threshold values.Command router108 is an interface betweenCPU104 and disk drives102. Data to be stored indisk drives102 is sent byCPU104 throughcommand router108. Further,CPU104 obtains values of factors for predicting disk drive failure throughcommand router108.Environmental sensors110 measure environmental factors relating to the failure of disk drives102. Examples of environmental factors that are measured byenvironmental sensors110 include temperature of disk drives, speed of cooling fans ofstorage system100, and vibrations instorage system100.Host adaptor112 is an interface betweenstorage system100 and all computers wanting to store data instorage system100.Host adaptor112 receives data from the computers.Host adaptor112 then sends the data toCPU104, which calculates parity for the data and decides where the data is stored in disk drives102.

FIG. 2 is a block diagram illustrating the components ofmemory106 andCPU104 and their interaction, in accordance with an embodiment of the invention.Memory106

stores sensor data

202 obtained fromenvironmental sensors110, drive attributes204 obtained from each ofdisk drives102, failure rate profiles206, andpreset attribute thresholds208. In order to predict failure of each disk drive indisk drives102,sensor data202 and drive attributes204 are compared with failure rate profiles206, andpreset attribute thresholds208. This prediction is described later in conjunction withFIG. 3,FIG. 5 andFIG. 6.CPU104 includesdrive replacement logic210 and drivecontrol212. The comparison insensor data202, drive attributes204, failure rate profiles206, andpreset attribute thresholds208 is performed bydrive replacement logic210. Once failure for a disk drive indisk drives102 is predicteddrive control212 indicates that the disk drive should be replaced. The indication can be external in the form of an LED or LCD that indicates which drive is failing. Further, the indication can be in the form of a message on a monitor that is connected toCPU104. The message can also include information regarding the location of the disk drive and the reason for the prediction of the failure. Various other ways of indicating disk drive failure are also possible. The manner in which this indication is provided does not restrict the scope of this invention.Drive control212 further ensures that data is reconstructed or copied into a replacement disk drive and further data is directed to the replacement disk drive.

FIG. 3 is a flowchart of a method for preventing the failure of disk drives instorage system100, in accordance with one embodiment of the present invention. Atstep302, factors relating to the aging of each ofdisk drives102 are monitored. Atstep304, it is determined if any of the factors exceed a first set of thresholds. If the thresholds are not exceeded, the method returns to step302 and this process is repeated. In case the thresholds are exceeded, an indication for the replacement of the disk drive, for which the factor has exceeded the threshold, is given atstep306. Factors that are related to aging include power-on hours (POH) and start stops (SS). POH is the sum total of the number of hours for which a particular disk drive has been powered on. To predict disk drive failure, POH is compared to a preset percentage of the mean-time-to-failure (MTTF) of disk drives102. This can be calculated bystorage system100 as disk drives fail. In another embodiment of the present invention, MTTF is calculated based on the temperature of disk drives102. MTTF depends on the temperature at which a disk drive operates. MTTF versus temperature graphs can be obtained from manufacturers of disk drives.

FIG. 4 is a graph showing an exemplary variation of MTTF with temperature. The graph shown is applicable for disk drives manufactured by one specific disk vendor. Similar graphs are provided by other disk drive manufacturers. These graphs can be piecewise graphs as shown inFIG. 4 or linear graphs. This depends on the experimentation conducted by the disk drive manufacturer. In accordance with another embodiment of the present invention, MTTF versus temperature graphs are stored as vector pairs of MTTF values and temperatures. These vector pairs are stored asfailure rate profiles206 inmemory106. For temperatures between the values stored in vector pairs, MTTF values are calculated by interpolation between consecutive vector pairs. The preset percentage for comparing the MTTF with the power-on hours of each ofdisk drives102 can be chosen between 0 and 0.75 (exclusive), for example. Other percentages can be used. For example, one basis for choosing a percentage can be based on studies that have shown that useful life is smaller than that indicated by manufacturers' MTTF.

Therefore, an indication for replacement is given when:
POH>p*MTTF(T)

- where, p=preset percentage for POH,0<p<0.75, and
- MTTF(T)=mean-time-to-failure calculated on the basis of temperature.

Start stops (SS) is the sum total of the number of times a disk drive completes a cycle of power on, disk drive usage and power off. To predict disk drive failure, SS is compared to a preset percentage of the maximum allowable value for the SS. This value is specified by drive manufacturers. Most drive manufacturers recommend the maximum allowable value for SS to be 50,000. The preset percentage for comparing the maximum allowable value of SS with the measured SS of each ofdisk drives102 can be chosen between 0 and 0.9 (exclusive). Therefore, an indication for replacement of a disk drive is given when:
SS>c*SS_max

- where, c=preset percentage for SS,0<c<0.9, and
- SS_max=maximum allowable value for SS, typically 50,000 as per current disk drive specifications.

FIG. 5 is a flowchart of a method for preventing the failure of disk drives instorage system100, in accordance with another embodiment of the present invention. Atstep502, factors relating to the early onset of errors in each ofdisk drives102 are monitored. Atstep504, it is determined if any of the factors exceed a first set of thresholds. If the thresholds are not exceeded, the method returns to step502 and this process is repeated. In case any of the set of thresholds is exceeded, an indication for the replacement of the disk drive is given atstep506; Factors that are related to the early onset of errors include reallocated sector count (RSC), read error rate (RSE), seek error rate (SKE), spin retry count (SRC). RSC is defined as the number of spare sectors that have been reallocated. Data is stored indisk drives102 in sectors. Disk drives102 also include spare sectors to which data is not written. When a sector goes bad, i.e., data cannot be read or written from the sector,disk drives102 reallocate spare sectors to store further data. In order to predict disk drive failure, RSC is compared to a preset percentage of the maximum allowable value for the RSC. This value is specified by the disk drive manufacturers. Most disk drive manufacturers recommend the maximum allowable value for RSC to be 1,500. The preset percentage for comparing the maximum allowable value of RSC with the measured RSC can be chosen between 0 and 0.7 (exclusive). Therefore, an indication for replacement is given when:
RSC>r*RSC_max

- where, r=preset percentage for RSC,0<r<0.7, and
- RSC_max=maximum allowable value for RSC≈1,500

Read error rate (RSE) is the rate at which errors in reading data from disk drives occur. Read errors occur when a disk drive is unable to read data from a sector in the disk drive. In order to predict disk drive failure, RSE is compared to a preset percentage of the maximum allowable value for the RSE. This value is specified by disk drive manufacturers. Most disk drive manufacturers recommend the maximum allowable value for RSE to be one error in every 1024 sector read attempts. The preset percentage for comparing the maximum allowable value of RSE with the measured RSE of each ofdisk drives102 can be chosen between 0 and 0.9 (exclusive). Therefore, an indication for replacement is given when:
RSE>m*RSE_max

- where, m=preset percentage for RSE,0<m<0.9, and
- RSE_max=maximum allowable value for RSE≈1 read error/1024 sector read attempts

Seek error rate (SKE) is the rate at which errors in seeking data fromdisk drives102 occur. Seek errors occur when a disk drive is not able to locate where particular data is stored on the disk drive. To predict disk drive failure, SKE is compared to a preset percentage of the maximum allowable value for the SKE. This value is specified by disk drive manufacturers. Most disk drive manufacturers recommend the maximum allowable value for SKE to be one seek error in every 256 sector seek attempts. The preset percentage for comparing the maximum allowable value of SKE with the measured SKE of each ofdisk drives102 can be chosen between 0 and 0.9 (exclusive). Therefore, an indication for replacement is given when:
SKE>s*SKE_max

- where, s=preset percentage for RSE, 0<s<0.9, and
- SKE_max=maximum allowable value for SKE≈1 seek error/256 sector seek attempts

Spin retry count (SRC) is defined as the number of attempts it takes to start the spinning of a disk drive. To predict disk drive failure, SRC is compared to a preset percentage of the maximum allowable value for the SRC. This value is specified by disk drive manufacturers. Most disk drive manufacturers recommend the maximum allowable value for SRC to be one spin failure in every 100 attempts. The preset percentage for comparing the maximum allowable value of SRC with the measured SRC of each ofdisk drives102 can be chosen between 0 and 0.3 (exclusive). Therefore, an indication for replacement is given when:
SRC>t*SRC_max

- where, t=preset percentage for SRC,0<t<0.3, and
- SRC_max=maximum allowable value for SRC≈1 spin failure/100 attempts.

FIG. 6 is a flowchart of a method for preventing the failure of disk drives instorage system100, in accordance with another embodiment of the present invention. Atstep602, a factor relating to the onset of errors in each ofdisk drives102 is measured. Atstep604, changes in the value of the factor are calculated. Atstep606, it is determined that the changes in the factor increase in consecutive calculations. If the thresholds are not exceeded, the method returns to step602 and the process is repeated. In case, the change increases, an indication is given that the disk drive should be replaced atstep608. An increase in change in two consecutive calculations of the change indicates that errors within the disk drive are increasing and could lead to failure of the disk drive. In one embodiment of the present invention, reallocated sector count (RSC) is considered as a factor relating to the onset of errors. Therefore, an indication for drive replacement is given when:
RSC(i+2)−RSC(i+1)>RSC(i+1)−RSC(i)AND RSC(i+3)−RSC(i+2)>RSC(i+2)−RSC(i+1) for any i

- where, i=a serial number representing measurements

Other factors can be used. For example, spin retry count (SRC), seek errors (SKE), read soft error (RSE), recalibrate retry (RRT), read channel errors such as a Viterbi detector mean-square error (MSE), etc., can be used. As future factors become known they can be similarly included.

Thresholds for comparing the factors are obtained from manufacturers of disk drives. In one embodiment of the present invention,memory106 stores thresholds specific to disk drive manufacturers. These thresholds and their corresponding threshold percentages are stored inmemory106 aspreset attribute thresholds208. This is useful in case plurality ofdisk drives102 comprises disk drives obtained from different disk drive manufacturers. In this embodiment, factors obtained from a particular disk drive are compared with thresholds recommended by the manufacturer of the particular disk drive as well as empirical evidence gathered during testing of the drives.

Combinations of the factors discussed above can also be used for predicting the failure of disk drives. When combinations of factors are monitored, they are compared with the corresponding thresholds that are stored inmemory106. Further, environmental data obtained fromenvironmental sensors110 can also be used, in combination with the described factors, to predict the failure of disk drives. For example, in case the temperature of a disk drive exceeds a threshold value, an indication for replacement of the disk drive can be given.

The invention, as described above can also be used to prevent the failure of disk drives in power-managed RAID systems where not all disk drives need to be powered on simultaneously. The power-managed scheme has been described in the co-pending US Patent Application ‘Method and Apparatus for Power Efficient High-Capacity Storage System’ referenced above. In this scheme, sequential writing onto disk drives is implemented, unlike simultaneous writing as performed in RAID 5 scheme. Sequential writing onto disk drives saves power because it requires powering up of one disk drive at a time.

Embodiments of the present invention also provide a method and apparatus for maintaining a particular disk drive in a storage system, where the particular disk drive is powered off. A power controller controls the power supplied to disk drives in the storage system. Further, a test-moderator executes a test on the particular disk drive. The power controller powers on the particular disk drive when the test is to be executed, and powers off the particular disk drive after the execution of the test.

Disk drives102 include at least one particular disk drive that is powered off during an operation ofstorage system100. In an embodiment of the present invention, the particular disk drive is powered off since it is not used to process requests from a computer. In another embodiment of the present invention, the particular disk drive is powered off since it is used as a replacement disk drive instorage system100. In yet another embodiment of the present invention, the particular disk drive is powered off since it is used infrequently for processing requests from a computer.

FIG. 7 is a block diagram illustrating the components ofCPU104 andmemory106 and their interaction, in accordance with another embodiment of the present invention. Disk drives102 include at least one particular disk drive, for example, adisk drive702 that is powered off.CPU104 also includes apower controller704 and a test-moderator706.Memory106 stores testresults708 obtained from test-moderator706.

Power controller

704 controls the power todisk drives102, based on the power budget ofstorage system100. The power budget determines the number of disk drives that can be powered on instorage system100. In an embodiment of the present invention,power controller704 powers on limited numbers of disk drive because of the constraint of the power budget during the operation ofstorage system100. Other disk drives instorage system100 are only powered on when required for operations such as reading or writing data in response to a request from a computer. This kind of storage system is referred to as a power-managed RAID system. Further information pertaining to the power-managed RAID system can be obtained from the co-pending US Patent Application, ‘Method and Apparatus for Power Efficient High-Capacity Storage System’, referenced above. However, the invention can also be practiced in conventional array storage systems. The reliability of any disk drive that is not powered on can be checked.

Test-moderator706 executes a test ondisk drive702, to maintain it.Power controller704 powers ondisk drive702 in response to an input from test-moderator706 when the test is to be executed.Power controller704 powers offdisk drive702 after the test is executed.

In an embodiment of the present invention, test-moderator706 executes a buffer test ondisk drive702. As a part of the test, random data is written to the buffer ofdisk drive702. This data is the read and is compared to the data that was written, which is referred to as a write/read/compare test ofdisk drive702. The buffer test fails when, on comparing, there is a mismatch in written and read data. This is to ensure that the disk drives are operating correctly and not introducing any errors. In an exemplary embodiment of the present invention, a hex ‘00’ and hex ‘FF’ pattern is written for each sector of the buffer indisk drive702. In another exemplary embodiment of the present invention, a write/read/compare hex ‘00’ and hex ‘FF’ pattern is written for sector bufferRAM disk drive702.

In another embodiment of the present invention, test-moderator706 executes a write test on a plurality of heads indisk drive702. Heads in disk drives refer to magnetic heads that read data from and write data to disk drives. The write test includes a write/read/compare operation on each head ofdisk drive702. The write test fails when, on comparing, there is a mismatch in written and read data. In an exemplary embodiment of the present invention, the write test is performed by accessing sectors ondisk drive702 that are non-user accessible. These sectors are provided for the purpose of self-testing and are not used for storing data. Data can also be written at any other sectors of the disk drives.

In yet another embodiment of the present invention, test-moderator706 executes a random read test ondisk drive702. The random read test includes a read operation on a plurality of randomly selected Logical Block Addresses (LBAs). LBA refers to a hard disk sector-addressing scheme used on Small Computer System Interface (SCSI) hard disks and Advanced Technology Attachment Interface with Extensions (ATA) conforming to Integrated Drive Electronic (IDE) hard disks. The random read test fails when the read operation on at least one selected LBA fails. In an exemplary embodiment of the present invention, the random read test is performed on 1000 randomly selected LBAs. In an embodiment of the present invention, the random read test ondisk drive702 is performed with auto defect reallocation. Auto defect reallocation refers to reallocation of spare sectors on the disk drives, to store data when a sector is corrupted, i.e., data cannot be read or written from the sector. The random read test, performed with auto defect reallocation, fails when the read operation on at least one selected LBA fails.

In another embodiment of the present invention, test-moderator706 executes a read scan test ondisk drive702. The read scan test includes a read operation on the entire surface of each sector ofdisk drive702 and fails when the read operation on at least one sector ofdisk drive702 fails. In an embodiment of the present invention, the read scan test ondisk drive702 is performed with auto defect reallocation. The read scan test performed with auto defect reallocation fails when the read operation on at least one sector ofdisk drive702 fails.

In yet another embodiment of the present invention, combinations of the above-mentioned tests can also be performed ondisk drive702. Further, in various embodiments of the invention, the test is performed serially on each particular disk drive if there is a plurality of particular disk drives instorage system100.

In various embodiments of the present invention, the results of the test performed ondisk drive702 are stored inmemory106 astest results708, which include a failure checkpoint byte. The value of the failure checkpoint byte is set according to the results of the test performed, for example, if the buffer test fails ondisk drive702, the value of the failure checkpoint byte is set to one. Further, if the write test fails ondisk drive702, the value of the failure checkpoint byte is set to two, and so on. However, if the test is in progress, has not started, or has been completed without error, the value of the failure checkpoint byte is set to zero.

In various embodiments of the present invention, drivereplacement logic210 also predicts the failure ofdisk drive702, based ontest results708. In an exemplary embodiment of the present invention, if the failure checkpoint byte is set to a non-zero value, i.e., the test executed ondisk drive702 by test-moderator706 has failed; drivereplacement logic210 predicts the failure ofdisk drive702. Once the failure ofdisk drive702 is predicted,drive control212 indicates thatdisk drive702 should be replaced. This indication can be external tostorage system100, in the form of an LED or LCD that indicates which drive is failing. Further, the indication can be in the form of a message on a monitor that is connected toCPU104; it can also include information pertaining to the location ofdisk drive702 and the reason for the prediction of the failure. Various other ways of indicating disk drive failure are also possible. The manner in which this indicated does not restrict the scope of this invention. In an embodiment of the present invention,drive control212 further ensures that data is reconstructed or copied into a replacement disk drive and further data is directed to the replacement disk drive.

FIG. 8 is a flowchart of a method for maintainingdisk drive702 instorage system100, in accordance with an embodiment of the present invention. Atstep802,disk drive702 is powered on. The step of powering on is performed bypower controller704. Atstep804, a test is executed ondisk drive702. The step of executing the test is performed by test-moderator706. The result of the test is then saved ontest results708 by test-moderator706. Thereafter,disk drive702 is powered off atstep806. The step of powering off is performed bypower controller704.

In an embodiment of the present invention,storage system100 may not be a power-managed storage system. In this embodiment, all the disk drives instorage system100 are powered on for the purpose of executing tests and are powered off after the execution of the tests.

FIG. 9 is a flowchart of a method for maintainingdisk drive702 instorage system100, in accordance with another embodiment of the present invention. A request for powering ondisk drive702 is received atstep902 bypower controller704. In an exemplary embodiment of the present invention, the request is sent by test-moderator706. Atstep904, it is then determined whether powering ondisk drive702 results in the power budget being exceeded. The step of determining whether the power budget is exceeded is performed bypower controller704. If the power budget has been exceeded, powering ondisk drive702 is postponed atstep906. In an embodiment of the present invention, a request for powering ondisk drive702 is then sent by test-moderator706 at predefined intervals topower controller704, until power is available i.e., the power budget has not been exceeded. In another embodiment of the present invention,power controller704 checks power availability at predefined intervals, if powering on is postponed. In an exemplary embodiment, the predefined interval is five minutes.

However, if the power budget has not been exceeded, i.e., power is available,disk drive702 is powered on atstep908. Thereafter, a test is executed ondisk drive702 atstep910. This is further explained in conjunction withFIG. 10 andFIG. 11. Examples of the test performed atstep910 can be, for example, a buffer test, a write test, a random read test, a read scan test, or their combinations thereof. After the test is executed,disk drive702 is powered off atstep912. Atstep914, it is then determined whether the test has failed. If the test has not failed, the method returns to step902 and the method is repeated. In an embodiment of the present invention, the method is repeated at predetermined intervals. In an exemplary embodiment of the present invention, the predetermined interval is 30 days. However, if it is determined atstep914 that the test has failed, an indication is given thatdisk drive702 should be replaced atstep916.

FIG. 10 is a flowchart of a method for executing a test ondisk drive702, in accordance with an embodiment of the present invention. After test-moderator706 has executed the test ondisk drive702, it is determined whether a request from a computer is received, to accessdisk drive702, atstep1002. This step is performed by test-moderator706. If a request to accessdisk drive702 is received from a computer, the test is suspended, to fulfill the request atstep1004. Once the request is fulfilled, the test is resumed at the point where it was suspended, atstep1006. This means that a request from a computer is given higher priority, as compared to executing a test ondisk drive702. However, if a request from a computer to accessdisk drive702 is not received, the test is executed till completion.

FIG. 11 is a flowchart of a method for executing a test ondisk drive702, in accordance with an embodiment of the present invention. After test-moderator706 has executed the test ondisk drive702, it is determined whether a request to power on an additional disk drive instorage system100 has been received atstep1102.Power controller704 performs this step.CPU104 sends a request to power on the additional disk drive, in response to a request from a computer to access the additional drive. If a request to power on an additional disk drive instorage system100 is received, it is then determined whether powering on the additional disk drive will result in the power budget being exceeded atstep1104. However, if a request to power on an additional disk drive instorage system100 is not received, the test is executed till completion.

If it is determined atstep1104 that the power budget has been exceeded, the test ondisk drive702 is suspended atstep1106.Disk drive702 is then powered off atstep1108. Thereafter, the additional disk drive is powered on. In an embodiment of the present invention, ifdisk drive702 is powered off, the request for powering ondisk drive702 is sent by test-moderator706 at preset intervals topower controller704, until power is available. In another embodiment of the present invention, if powering on is postponed,power controller704 checks power availability at preset intervals. In an exemplary embodiment of the present invention, the preset interval is five minutes. This means that a request for powering on an additional disk drive is given higher priority, as compared to executing the test ondisk drive702. However, if it is determined atstep1104 that the power budget has not been exceeded, the test is executed till completion and the additional disk drive is also powered on.

Embodiments of the present invention provide a method and apparatus for maintaining a particular disk drive in a storage system, where the particular disk drive is powered off. The method and apparatus predicts the impending failures of disk drives that are not used or used infrequently. This further improves the reliability of the storage system.

One embodiment of the present invention uses disk drive checking to proactively perform data restore operations. For example, error detection tests such as raw read error rate, seek error rate, RSC rate or changing rate, number and frequency of timeout errors, etc., can be performed at intervals as described herein, or at other times. In another example, error detection tests such as the buffer test, write test on a plurality of heads in the disk drive, random read test, random read test with auto defect reallocation, read scan test and read scan test with auto defect reallocation can be performed at intervals as described herein, or at other times. If a disk drive is checked and the results of a test or check indicate early onset failure then recovery action steps such as reconstructing or copying data into a replacement disk drive, can be taken. In an embodiment of the present invention,drive control212 further ensures that data is reconstructed or copied into a replacement disk drive and further data is directed to the replacement disk drive. In another embodiment of the present invention, if a disk drive is checked and the results of a test or check indicate early onset failure then recovery action steps, such as powering up additional drives, backing up data, performing more frequent monitoring, etc, can be taken.

Although the present invention has been described with respect to the specific embodiments thereof, these embodiments are descriptive, and not restrictive, of the present invention, for example, it is apparent that specific values and ranges of parameters can vary from those described herein. The values of the threshold parameters, p, c, r, m, s, t, etc., can change as new experimental data become known, as preferences or overall system characteristics change, or to achieve improved or desirable performance.

Although terms such as “storage device,” “disk drive,” etc., are used, any type of storage unit can be adaptable to work with the present invention. For example, disk drives, tape drives, random access memory (RAM), etc., can be used. Different present and future storage technologies can be used such as those created with magnetic, solid-state, optical, bioelectric, nano-engineered, or other techniques.

Storage units can be located either internally inside a computer or outside a computer in a separate housing that is connected to the computer. Storage units, controllers and other components of systems discussed herein can be included at a single location or separated at different locations. Such components can be interconnected by any suitable means such as with networks, communication links or other technology. Although specific functionality may be discussed as operating at, or residing in or with, specific places and times, in general the functionality can be provided at different locations and times. For example, functionality such as data protection steps can be provided at different tiers of a hierarchical controller. Any type of RAID or RAIV arrangement or configuration can be used.

In the description herein, numerous specific details are provided, such as examples of components and/or methods, to provide a thorough understanding of embodiments of the present invention. One skilled in the relevant art will recognize, however, that an embodiment of the present invention can be practiced without one or more of the specific details, or with other apparatus, systems, assemblies, methods, components, materials, parts, and/or the like. In other instances, well-known structures, materials, or operations are not specifically shown or described in detail to avoid obscuring aspects of embodiments of the present invention.

A “processor” or “process” includes any human, hardware and/or software system, mechanism, or component that processes data, signals, or other information. A processor can include a system with a general-purpose central processing unit, multiple processing units, dedicated circuitry for achieving functionality, or other systems. Processing need not be limited to a geographic location, or have temporal limitations. For example, a processor can perform its functions in “real time,” “offline,” in a “batch mode,” etc. Moreover, certain portions of processing can be performed at different times and at different locations, by different (or the same) processing systems.

Reference throughout this specification to “one embodiment”, “an embodiment”, or “a specific embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention and not necessarily in all embodiments. Thus, respective appearances of the phrases “in one embodiment”, “in an embodiment”, or “in a specific embodiment” in various places throughout this specification are not necessarily referring to the same embodiment. Furthermore, the particular features, structures, or characteristics of any specific embodiment of the present invention may be combined in any suitable manner with one or more other embodiments. It is to be understood that other variations and modifications of the embodiments of the present invention described and illustrated herein are possible in light of the teachings herein and are to be considered as part of the spirit and scope of the present invention.

It will also be appreciated that one or more of the elements depicted in the drawings/figures can also be implemented in a more separated or integrated manner, or even removed or rendered as inoperable in certain cases, as is useful in accordance with a particular application. It is also within the spirit and scope of the present invention to implement a program or code that can be stored in a machine-readable medium to permit a computer to perform any of the methods described above.

Additionally, any signal arrows in the drawings/figures should be considered only as exemplary, and not limiting, unless otherwise specifically noted. Furthermore, the term “or” as used herein is generally intended to mean “and/or” unless otherwise indicated. Combinations of components or steps will also be considered as being noted, where terminology is foreseen as rendering the ability to separate or combine is unclear.

As used in the description herein and throughout the claims that follow, “a”, “an”, and “the” includes plural references unless the context clearly dictates otherwise. In addition, as used in the description herein and throughout the claims that follow, the meaning of “in” includes “in” and “on” unless the context clearly dictates otherwise.

The foregoing description of illustrated embodiments of the present invention, including what is described in the Abstract, is not intended to be exhaustive or to limit the present invention to the precise forms disclosed herein. While specific embodiments of, and examples for, the present invention are described herein for illustrative purposes only, various equivalent modifications are possible within the spirit and scope of the present invention, as those skilled in the relevant art will recognize and appreciate. As indicated, these modifications may be made to the present invention in light of the foregoing description of illustrated embodiments of the present invention and are to be included within the spirit and scope of the present invention.

Thus, while the present invention has been described herein with reference to particular embodiments thereof, a latitude of modification, various changes, and substitutions are intended in the foregoing disclosures. It will be appreciated that in some instances some features of embodiments of the present invention will be employed without a corresponding use of other features without departing from the scope and spirit of the present invention as set forth. Therefore, many modifications may be made to adapt a particular situation or material to the essential scope and spirit of the present invention. It is intended that the present invention not be limited to the particular terms used in following claims and/or to the particular embodiment disclosed as the best mode contemplated for carrying out this invention, but that the present invention will include any and all embodiments and equivalents falling within the scope of the appended claims.