技术领域technical field
本发明涉及计算机技术领域,具体地说是一种快速有效进行系统硬盘失效分析及修复的方法。The present invention relates to the field of computer technology, specifically a method for fast and effective system hard disk failure analysis and repair.
背景技术Background technique
存储子系统失效一直以来是影响我们系统厂商的一个大问题,从部件厂商反馈数据来看,至少有60%以上的失效是可以避免的或者说是可以做改善后减少失效数据的,在这种背景下为提高硬盘利用率,减少数据丢失,急需一种快速有效的改善硬盘失效数据和提高硬盘利用率的方法。The failure of the storage subsystem has always been a major problem affecting our system manufacturers. According to the feedback data from the component manufacturers, at least 60% of the failures can be avoided or can be improved to reduce the failure data. In this In the background, in order to improve hard disk utilization and reduce data loss, a fast and effective method for improving hard disk failure data and improving hard disk utilization is urgently needed.
众所周知,硬盘作为一个部件,对用户留有SMART信息接口和硬盘协议都遵守SATASpec协议或者SASSpec协议。如何通过该协议读出硬盘出现了什么异常,或者该异常问题出现在什么地方、如何通过异常后检查或者SMARTcheck后检查硬盘健康状态以及如何根据SMART结果进行对应的操作后,判断失效是否为真正的失效,并对子系统进行修复是目前急需解决的技术问题。As we all know, as a component, the hard disk is left with SMART information interfaces and hard disk protocols for users to comply with the SATASpec protocol or the SASSSpec protocol. How to read out the abnormality of the hard disk through this protocol, or where the abnormal problem occurs, how to check the health status of the hard disk after the abnormality check or SMARTcheck, and how to perform corresponding operations according to the SMART result to determine whether the failure is real failure, and to repair the subsystem is an urgent technical problem to be solved.
发明内容Contents of the invention
本发明的技术任务是针对以上不足之处,提供一种结构简单、生产成本低、易于加工以及判断硬盘是否真正的失效,并且对硬盘或者存储子系统进行修复的快速有效进行系统硬盘失效分析及修复的方法。The technical task of the present invention is to aim at the above deficiencies, provide a simple structure, low production cost, easy to process and judge whether the hard disk is really invalid, and repair the hard disk or storage subsystem quickly and effectively for system hard disk failure analysis and method of repair.
本发明解决其技术问题所采用的技术方案是:一种快速有效进行系统硬盘失效分析及修复的方法,步骤如下:The technical solution adopted by the present invention to solve the technical problem is: a method for quickly and effectively performing system hard disk failure analysis and repair, the steps are as follows:
(1)Linux下查看硬盘是否支持SMART;(1) Check whether the hard disk supports SMART under Linux;
(2)根据步骤(1)的结果,如果支持,直接执行步骤(4);如果不支持SMART,通过指令:smartctl--smart=on--offlineauto=on--saveauto=on/dev/sdxx;(2) According to the result of step (1), if it is supported, directly execute step (4); if SMART is not supported, pass the command: smartctl --smart=on--offlineauto=on--saveauto=on/dev/sdxx;
(3)经过步骤(2)便可打开SMART支持;(3) SMART support can be turned on after step (2);
(4)检测硬盘是否失效执行指令,smartctl-A/dev/sdxx;(4) Detect whether the hard disk is invalid and execute the command, smartctl-A/dev/sdxx;
(5)经过步骤(4)便可查看该硬盘的所有SMART信息,可以读出SMART5和197项目分别是G——List和PendingList使用这两项判断硬盘状态,如果都PASS,表示硬盘健康,可以继续使用,如果有pendinglist需要对pendingList进行处理进行坏的Sector修复;(5) After step (4), you can view all the SMART information of the hard disk, and you can read that SMART5 and 197 items are G——List and PendingList. Use these two items to judge the hard disk status. If both PASS, it means the hard disk is healthy. Continue to use, if there is a pending list, the pending list needs to be processed to repair the bad Sector;
(6)如果步骤(5)出现197pendingList后,使用以下指令进行硬盘修复:执行sd_rawID指令进行修复,ID为SMART信息里出现Pending_Sector的地址;或者使用修复命令:hdparm--yes-i-know-what-i-am-doing--write-sector123456654/dev/sdx。(6) If 197pendingList appears in step (5), use the following command to repair the hard disk: execute the sd_rawID command to repair, and the ID is the address of Pending_Sector in the SMART information; or use the repair command: hdparm --yes-i-know-what -i-am-doing --write-sector 123456654 /dev/sdx.
本发明的一种快速有效进行系统硬盘失效分析及修复的方法和现有技术相比,具有以下有益效果:本发明通过SATASpec协议或者SASSpec协议读出硬盘出现了什么异常,或者该异常问题出现在什么地方。可以通过异常后检查或者SMARTcheck后检查硬盘健康状态。根据SMART结果进行对应的操作后,判断失效是否为真正的失效,并对子系统进行修复。因而,具有很好的推广使用价值。Compared with the prior art, a method for quickly and effectively analyzing and repairing system hard disk failures of the present invention has the following beneficial effects: the present invention reads out any abnormality in the hard disk through the SATASpec protocol or the SASSSpec protocol, or the abnormal problem occurs in Where. You can check the health status of the hard disk after abnormal check or SMARTcheck. After performing corresponding operations according to the SMART results, it is judged whether the failure is a real failure, and the subsystem is repaired. Therefore, it has good promotion value.
具体实施方式Detailed ways
下面结合具体实施例对本发明作进一步说明。The present invention will be further described below in conjunction with specific examples.
实施例1Example 1
一种快速有效进行系统硬盘失效分析及修复的方法,步骤如下:A method for quickly and effectively performing system hard disk failure analysis and repair, the steps are as follows:
(1)Linux下查看硬盘是否支持SMART;(1) Check whether the hard disk supports SMART under Linux;
smartctl–i/dev/sdxx为具体的盘符,返回值为enable证明支持SMART;smartctl–i/dev/sdxx is the specific drive letter, and the return value is enable to prove that SMART is supported;
(2)根据步骤(1)的结果,如果支持,直接执行步骤(4);如果不支持SMART,通过指令:smartctl--smart=on--offlineauto=on--saveauto=on/dev/sdxx;(2) According to the result of step (1), if it is supported, directly execute step (4); if SMART is not supported, pass the command: smartctl --smart=on--offlineauto=on--saveauto=on/dev/sdxx;
(3)经过步骤(2)便可打开SMART支持;(3) SMART support can be turned on after step (2);
(4)检测硬盘是否失效执行指令,smartctl-A/dev/sdxx;(4) Detect whether the hard disk is invalid and execute the command, smartctl-A/dev/sdxx;
(5)经过步骤(4)便可查看该硬盘的所有SMART信息,可以读出SMART5和197项目分别是G——List和PendingList使用这两项判断硬盘状态,如果都PASS,表示硬盘健康,可以继续使用,如果有pendinglist需要对pendingList进行处理进行坏的Sector修复;(5) After step (4), you can view all the SMART information of the hard disk, and you can read that SMART5 and 197 items are G——List and PendingList. Use these two items to judge the hard disk status. If both PASS, it means the hard disk is healthy. Continue to use, if there is a pending list, the pending list needs to be processed to repair the bad Sector;
(6)如果步骤(5)出现197pendingList后,使用以下指令进行硬盘修复:执行sd_rawID指令进行修复,ID为SMART信息里出现Pending_Sector的地址。(6) If 197pendingList appears in step (5), use the following command to repair the hard disk: Execute the sd_rawID command to repair, and the ID is the address of Pending_Sector in the SMART information.
说明:123456654是具体的pendingsector,如是4K扇区的,会打印出来8个setctor,需要修复8个sector;sdx中x替换为实际执行测试的盘符Note: 123456654 is the specific pending sector. If it is a 4K sector, 8 setctors will be printed out, and 8 sectors need to be repaired; replace x in sdx with the drive letter that actually performs the test
其中,步骤(5)查看SMART信息也可以使用如下的脚本实现:Among them, step (5) to view SMART information can also be implemented using the following script:
if[$type==SATA];then#判断SATA硬盘if[$type==SATA];then#judgment of SATA hard disk
smartctl--all--device=sat+megaraid,$id/dev/$sd>>$HDD_Dir/raid/raid_logsmartctl --all --device=sat+megaraid,$id/dev/$sd>>$HDD_Dir/raid/raid_log
elif[$type==SAS];then#判断SAS硬盘elif[$type==SAS];then#judgment of SAS hard disk
smartctl--all--device=megaraid,$id/dev/$sd>>$HDD_Dir/raid/raid_logsmartctl --all --device=megaraid,$id/dev/$sd>>$HDD_Dir/raid/raid_log
elseelse
echo-e"wrongPDType\n"#输出pendingsectorecho-e "wrongPDType\n"#output pendingsector
fithe fi
donedone
}}
functionhbainfo()function hbainfo()
{{
if[-d$HDD_Dir/hba];thenif[-d$HDD_Dir/hba];then
mv$HDD_Dir/hba$HDD_Dir/hba_beforemv$HDD_Dir/hba$HDD_Dir/hba_before
fithe fi
mkdir-p$HDD_Dir/hbamkdir -p $HDD_Dir/hba
date"+%m-%d%H:%M:%S">>$HDD_Dir/hba/hbaresultdate"+%m-%d%H:%M:%S">>$HDD_Dir/hba/hbaresult
date"+%m-%d%H:%M:%S">>$HDD_Dir/hba/hbalogdate"+%m-%d%H:%M:%S">>$HDD_Dir/hba/hbalog
foriin`ls/dev|grep"sd"|sed's/[0-9]//g'|uniq`foriin`ls /dev|grep "sd"|sed's/[0-9]//g'|uniq`
dodo
smartctl--all/dev/$i>>$HDD_Dir/hba/hbalogsmartctl --all /dev/$i>>$HDD_Dir/hba/hbalog
echo$i>>$HDD_Dir/hba/hbaresultecho $i>>$HDD_Dir/hba/hbaresult
smartctl--all/dev/$i|grep"SMARTErrorLogVersion"–A#输出log1>>$HDD_Dir/hba/hbaresultsmartctl --all/dev/$i|grep "SMARTErrorLogVersion" --A# output log1>>$HDD_Dir/hba/hbaresult
donedone
实施例2Example 2
一种快速有效进行系统硬盘失效分析及修复的方法,步骤如下:A method for quickly and effectively performing system hard disk failure analysis and repair, the steps are as follows:
(1)Linux下查看硬盘是否支持SMART;(1) Check whether the hard disk supports SMART under Linux;
smartctl–i/dev/sdxx为具体的盘符,返回值为enable证明支持SMART;smartctl–i/dev/sdxx is the specific drive letter, and the return value is enable to prove that SMART is supported;
(2)根据步骤(1)的结果,如果支持,直接执行步骤(4);如果不支持SMART,通过指令:smartctl--smart=on--offlineauto=on--saveauto=on/dev/sdxx;(2) According to the result of step (1), if it is supported, directly execute step (4); if SMART is not supported, pass the command: smartctl --smart=on--offlineauto=on--saveauto=on/dev/sdxx;
(3)经过步骤(2)便可打开SMART支持;(3) SMART support can be turned on after step (2);
(4)检测硬盘是否失效执行指令,smartctl-A/dev/sdxx;(4) Detect whether the hard disk is invalid and execute the command, smartctl-A/dev/sdxx;
(5)经过步骤(4)便可查看该硬盘的所有SMART信息,可以读出SMART5和197项目分别是G——List和PendingList使用这两项判断硬盘状态,如果都PASS,表示硬盘健康,可以继续使用,如果有pendinglist需要对pendingList进行处理进行坏的Sector修复;(5) After step (4), you can view all the SMART information of the hard disk, and you can read that SMART5 and 197 items are G——List and PendingList. Use these two items to judge the hard disk status. If both PASS, it means the hard disk is healthy. Continue to use, if there is a pending list, the pending list needs to be processed to repair the bad Sector;
(6)如果步骤(5)出现197pendingList后,使用修复命令:hdparm--yes-i-know-what-i-am-doing--write-sector123456654/dev/sdx。(6) If 197pendingList appears in step (5), use the repair command: hdparm --yes-i-know-what-i-am-doing --write-sector123456654/dev/sdx.
说明:123456654是具体的pendingsector,如是4K扇区的,会打印出来8个setctor,需要修复8个sector;sdx中x替换为实际执行测试的盘符Note: 123456654 is the specific pending sector. If it is a 4K sector, 8 setctors will be printed out, and 8 sectors need to be repaired; replace x in sdx with the drive letter that actually performs the test
其中,步骤(5)查看SMART信息也可以使用如下的脚本实现:Among them, step (5) to view SMART information can also be implemented using the following script:
if[$type==SATA];then#判断SATA硬盘if[$type==SATA];then#judgment of SATA hard disk
smartctl--all--device=sat+megaraid,$id/dev/$sd>>$HDD_Dir/raid/raid_logsmartctl --all --device=sat+megaraid,$id/dev/$sd>>$HDD_Dir/raid/raid_log
elif[$type==SAS];then#判断SAS硬盘elif[$type==SAS];then#judgment of SAS hard disk
smartctl--all--device=megaraid,$id/dev/$sd>>$HDD_Dir/raid/raid_logsmartctl --all --device=megaraid,$id/dev/$sd>>$HDD_Dir/raid/raid_log
elseelse
echo-e"wrongPDType\n"#输出pendingsectorecho-e "wrongPDType\n"#output pendingsector
fithe fi
donedone
}}
functionhbainfo()function hbainfo()
{{
if[-d$HDD_Dir/hba];thenif[-d$HDD_Dir/hba];then
mv$HDD_Dir/hba$HDD_Dir/hba_beforemv$HDD_Dir/hba$HDD_Dir/hba_before
fithe fi
mkdir-p$HDD_Dir/hbamkdir -p $HDD_Dir/hba
date"+%m-%d%H:%M:%S">>$HDD_Dir/hba/hbaresultdate"+%m-%d%H:%M:%S">>$HDD_Dir/hba/hbaresult
date"+%m-%d%H:%M:%S">>$HDD_Dir/hba/hbalogdate"+%m-%d%H:%M:%S">>$HDD_Dir/hba/hbalog
foriin`ls/dev|grep"sd"|sed's/[0-9]//g'|uniq`foriin`ls /dev|grep "sd"|sed's/[0-9]//g'|uniq`
dodo
smartctl--all/dev/$i>>$HDD_Dir/hba/hbalogsmartctl --all /dev/$i>>$HDD_Dir/hba/hbalog
echo$i>>$HDD_Dir/hba/hbaresultecho $i>>$HDD_Dir/hba/hbaresult
smartctl--all/dev/$i|grep"SMARTErrorLogVersion"–A#输出log1>>$HDD_Dir/hba/hbaresultsmartctl --all/dev/$i|grep "SMARTErrorLogVersion" --A# output log1>>$HDD_Dir/hba/hbaresult
donedone
上述具体实施方式仅是本发明的具体个案,本发明的专利保护范围包括但不限于上述具体实施方式,任何符合本发明的权利要求书的且任何所属技术领域的普通技术人员对其所做的适当变化或替换,皆应落入本发明的专利保护范围。The above-mentioned specific embodiments are only specific cases of the present invention, and the scope of patent protection of the present invention includes but is not limited to the above-mentioned specific embodiments, any claims that meet the claims of the present invention and any ordinary skilled person in the technical field. Appropriate changes or substitutions should fall within the scope of patent protection of the present invention.
除说明书所述的技术特征外,均为本专业技术人员的已知技术。Except for the technical features described in the instructions, all are known technologies by those skilled in the art.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201510535152.6ACN105183597A (en) | 2015-08-27 | 2015-08-27 | Method for rapidly and effectively analyzing and repairing system hard disk failure |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201510535152.6ACN105183597A (en) | 2015-08-27 | 2015-08-27 | Method for rapidly and effectively analyzing and repairing system hard disk failure |
| Publication Number | Publication Date |
|---|---|
| CN105183597Atrue CN105183597A (en) | 2015-12-23 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201510535152.6APendingCN105183597A (en) | 2015-08-27 | 2015-08-27 | Method for rapidly and effectively analyzing and repairing system hard disk failure |
| Country | Link |
|---|---|
| CN (1) | CN105183597A (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105740110A (en)* | 2016-01-29 | 2016-07-06 | 浪潮电子信息产业股份有限公司 | Detection method for smart information of hard disk in linux system |
| CN106095647A (en)* | 2016-06-29 | 2016-11-09 | 浪潮电子信息产业股份有限公司 | Method for monitoring voltage of Seagate hard disk in real time |
| CN106886471A (en)* | 2017-02-22 | 2017-06-23 | 郑州云海信息技术有限公司 | A kind of read-write fault detection method and system based on disk in linux |
| CN107832164A (en)* | 2017-11-20 | 2018-03-23 | 郑州云海信息技术有限公司 | A kind of method and device of the faulty hard disk processing based on Ceph |
| CN112380043A (en)* | 2020-11-27 | 2021-02-19 | 深圳忆联信息系统有限公司 | Method and device for analyzing SMART log of hard disk, computer equipment and storage medium |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102141949A (en)* | 2011-03-18 | 2011-08-03 | 浪潮电子信息产业股份有限公司 | Method for quickly detecting bad sector of SATA hard disk |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102141949A (en)* | 2011-03-18 | 2011-08-03 | 浪潮电子信息产业股份有限公司 | Method for quickly detecting bad sector of SATA hard disk |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105740110A (en)* | 2016-01-29 | 2016-07-06 | 浪潮电子信息产业股份有限公司 | Detection method for smart information of hard disk in linux system |
| CN106095647A (en)* | 2016-06-29 | 2016-11-09 | 浪潮电子信息产业股份有限公司 | Method for monitoring voltage of Seagate hard disk in real time |
| CN106886471A (en)* | 2017-02-22 | 2017-06-23 | 郑州云海信息技术有限公司 | A kind of read-write fault detection method and system based on disk in linux |
| CN107832164A (en)* | 2017-11-20 | 2018-03-23 | 郑州云海信息技术有限公司 | A kind of method and device of the faulty hard disk processing based on Ceph |
| CN112380043A (en)* | 2020-11-27 | 2021-02-19 | 深圳忆联信息系统有限公司 | Method and device for analyzing SMART log of hard disk, computer equipment and storage medium |
| Publication | Publication Date | Title |
|---|---|---|
| CN105183597A (en) | Method for rapidly and effectively analyzing and repairing system hard disk failure | |
| CN102279775B (en) | Method for processing failure of hard disk under Linux system | |
| CN102147708B (en) | Method and device for detecting discs | |
| CN113918375B (en) | Fault processing method and device, electronic equipment and storage medium | |
| JP4355012B2 (en) | Method, system, and program storage for protecting hard drive data through preliminary detection of adjacent track interference possibility (protection of hard drive data through preliminary detection of adjacent track interference possibility) | |
| CN102356384B (en) | Method and device for data reliability detection | |
| US7661044B2 (en) | Method, apparatus and program product to concurrently detect, repair, verify and isolate memory failures | |
| US20170147425A1 (en) | System and method for monitoring and detecting faulty storage devices | |
| CN106126368A (en) | Method for analyzing memory fault address under LINUX | |
| CN105740110A (en) | Detection method for smart information of hard disk in linux system | |
| CN112506744B (en) | Method, device and equipment for monitoring running state of NVMe hard disk | |
| CN103473158A (en) | Disk pressure testing method for Linux server | |
| CN103984627A (en) | Test method for memory pressure of Linux server | |
| US20150019808A1 (en) | Hybrid storage control system and method | |
| CN103645963B (en) | A kind of storage system and data consistency verification method thereof | |
| CN107729199A (en) | The hard disk detection method and system of a kind of storage device | |
| US11437071B2 (en) | Multi-session concurrent testing for multi-actuator drive | |
| CN105183583A (en) | Method for data reconstruction of disk array, and disk array system | |
| CN104657237A (en) | Method for detecting disk array | |
| CN110413463A (en) | A method for checking SMART information of hard disk | |
| CN105700982A (en) | Memory pressure and stability testing method based on high-performance linpack | |
| KR101505258B1 (en) | Replaying architectural execution with a probeless trace capture | |
| CN105302677A (en) | Information-processing device and method | |
| CN107203454A (en) | A kind of kernel internal memory monitoring method of power & environment supervision main frame | |
| TW201541242A (en) | Checking system and method for physical machine |
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication | Application publication date:20151223 | |
| RJ01 | Rejection of invention patent application after publication |