CN116302811A

Movatterモバイル変換

Info

Publication number: CN116302811A
Application number: CN202310206584.7A
Authority: CN
Inventors: 于超; 黄泉龙
Original assignee: Shenzhen Huawei Cloud Computing Technology Co ltd
Current assignee: Shenzhen Huawei Cloud Computing Technology Co ltd
Priority date: 2023-02-24
Filing date: 2023-02-24
Publication date: 2023-06-23

Abstract

The application discloses a method, a device and equipment for detecting a disk state and a computer readable storage medium, and belongs to the technical field of cloud computing. The method comprises the following steps: the method comprises the steps of acquiring states of a disk reading process for reading data of a disk for multiple times in a detection period, acquiring update information of the disk reading process for multiple times, wherein the update information comprises reading time for indicating time of the disk reading process for reading the data of the disk, two adjacent data read at the two reading times have interval spaces in the disk, and determining the disk state of the disk according to the fact that whether the acquired update information is identical and the states of the disk reading process, so that the detection of the disk state is completed. The method and the device realize detection by comparing whether the acquired multiple updated information of the disk reading process are the same or not and analyzing the multiple states of the disk reading process, are low in detection cost and simple in detection process, and the detection method is high in universality and accuracy.

Description

Translated fromChinese

磁盘状态的检测方法、装置、设备及计算机可读存储介质Disk state detection method, device, device and computer-readable storage medium

技术领域technical field

本申请涉及云计算技术领域，特别涉及磁盘状态的检测方法、装置、设备及计算机可读存储介质。The present application relates to the technical field of cloud computing, and in particular to a disk state detection method, device, equipment and computer-readable storage medium.

背景技术Background technique

随着云计算技术的发展，云计算技术处理数据的能力越来越强，能够并行处理的数据越来越多，而需要处理的数据可以存储在云数据库的磁盘中。在磁盘的运行过程中，会出现磁盘无响应等故障，导致磁盘无法正常运行，从而影响云计算业务的正常进行。因此，亟需一种磁盘状态的检测方法，及时且准确的检测出磁盘的状态，以保证云计算业务的正常进行。With the development of cloud computing technology, the ability of cloud computing technology to process data is getting stronger and stronger, and more and more data can be processed in parallel, and the data that needs to be processed can be stored in the disk of the cloud database. During the operation of the disk, failures such as disk non-response may occur, causing the disk to fail to operate normally, thus affecting the normal operation of cloud computing services. Therefore, there is an urgent need for a disk state detection method to detect the state of the disk in a timely and accurate manner, so as to ensure the normal operation of cloud computing services.

发明内容Contents of the invention

本申请提供了一种磁盘状态的检测方法、装置、设备及计算机可读存储介质，该技术方案如下：The application provides a detection method, device, equipment and computer-readable storage medium of a disk state, and the technical solution is as follows:

第一方面，提供了一种磁盘状态的检测方法，该方法用于提供云计算服务的设备，该方法包括：在检测周期内多次获取磁盘读取进程的状态，磁盘读取进程用于读取磁盘的数据；在检测周期内多次获取磁盘读取进程的更新信息，更新信息包括数据的读取时间，数据的读取时间用于指示磁盘读取进程读取磁盘的数据的时间，相邻的两个读取时间所读取的两个数据在磁盘中具有间隔空间；根据获取的多个更新信息是否相同以及磁盘读取进程的多个状态，确定磁盘的磁盘状态。In the first aspect, a method for detecting disk status is provided, the method is used for devices providing cloud computing services, and the method includes: obtaining the status of the disk reading process multiple times during the detection cycle, and the disk reading process is used for reading Get data from the disk; obtain update information of the disk reading process multiple times during the detection period, the update information includes the data reading time, and the data reading time is used to indicate the time when the disk reading process reads the data from the disk. The two data read at two adjacent reading times have an interval space in the disk; according to whether the obtained multiple update information is the same and multiple states of the disk read process, the disk status of the disk is determined.

本申请通过比较获取的磁盘读取进程的多个更新信息是否相同以及分析磁盘读取进程的多个状态实现检测，检测成本低且检测过程简单，检测方法具备通用性，所确定的磁盘状态的准确性较高。This application realizes the detection by comparing whether multiple updated information of the obtained disk reading process are the same and analyzing multiple states of the disk reading process, the detection cost is low and the detection process is simple, the detection method is universal, and the determined disk state is Higher accuracy.

在一种可能的实现方式中，更新信息还包括磁盘读取进程的进程标识符，进程标识符用于标识读取磁盘数据的磁盘读取进程；根据获取的多个更新信息是否相同以及磁盘读取进程的多个状态，确定磁盘的磁盘状态之前，还包括：基于获取的多个读取时间和多个进程标识符，确定多个更新信息是否相同。磁盘读取进程的更新信息除了包括读取时间之外，更新信息还可以包括磁盘读取进程的状态，根据磁盘读取进程的多个状态和多个读取时间共同确定出多个更新信息是否相同，能够进一步保证根据多个更新信息和磁盘读取进程的多个状态确定出的磁盘状态的准确性。In a possible implementation, the update information also includes the process identifier of the disk reading process, and the process identifier is used to identify the disk reading process that reads the disk data; Before obtaining the multiple states of the process and determining the disk state of the disk, it also includes: based on the obtained multiple read times and multiple process identifiers, determining whether the multiple update information is the same. In addition to the read time, the update information of the disk read process can also include the status of the disk read process. According to the multiple states and multiple read times of the disk read process, it is jointly determined whether the multiple update information Similarly, the accuracy of the disk state determined according to multiple update information and multiple states of the disk reading process can be further guaranteed.

在一种可能的实现方式中，基于获取的多个读取时间和多个进程标识符，确定多个更新信息是否相同，包括：基于多个读取时间相同以及多个进程标识符相同，确定多个更新信息相同；或者，基于多个读取时间不完全相同以及多个进程标识符不完全相同，确定多个更新信息不完全相同。多个读取时间相同且多个进程标识符相同，可以认为包括读取时间和进程标识符的多个更新信息未发生变化，进而可以确定多个更新信息相同，多个读取时间不完全相同且多个进程标识符不完全相同，可以认为包括读取时间和进程标识符的多个更新信息发生变化，进而可以确定多个更新信息不完全相同。根据确定出的多个更新信息是否相同，确定磁盘状态，能够保证检测结果的准确性。In a possible implementation manner, based on the obtained multiple read times and multiple process identifiers, determining whether the multiple update information is the same includes: determining whether the multiple read times are the same and the multiple process identifiers are the same, The multiple update information is the same; or, based on the fact that the multiple read times are not completely the same and the multiple process identifiers are not completely the same, it is determined that the multiple update information are not completely the same. If multiple read times are the same and multiple process identifiers are the same, it can be considered that the multiple update information including the read time and process identifier has not changed, and then it can be determined that the multiple update information is the same, and the multiple read times are not exactly the same And the plurality of process identifiers are not completely the same, it can be considered that the plurality of update information including the reading time and the process identifier has changed, and then it can be determined that the plurality of update information are not completely the same. The status of the disk is determined according to whether the determined plurality of update information are the same, which can ensure the accuracy of the detection result.

在一种可能的实现方式中，根据获取的多个更新信息是否相同以及磁盘读取进程的多个状态，确定磁盘的磁盘状态，包括：基于多个更新信息相同以及磁盘读取进程的多个状态均为无响应状态，将磁盘的磁盘状态确定为故障状态。通过结合多个更新信息和磁盘读取进程的多个状态，在多个更新信息相同并且磁盘读取进程的多个状态均为无响应状态的情况下，将磁盘状态确定为故障状态，能够避免根据单一信息确定出错误的磁盘状态，提高检测的准确性。In a possible implementation, the disk status of the disk is determined according to whether the obtained multiple update information is the same and multiple states of the disk reading process, including: The statuses are all non-responsive, determining the disk status of the disk as failed. By combining multiple update information and multiple states of the disk reading process, when multiple update information is the same and multiple states of the disk reading process are non-responsive, the disk state is determined as a fault state, which can avoid The faulty disk status is determined based on a single piece of information to improve detection accuracy.

在一种可能的实现方式中，根据获取的多个更新信息是否相同以及磁盘读取进程的多个状态，确定磁盘的磁盘状态，包括：基于多个更新信息不完全相同以及磁盘读取进程的多个状态中存在运行状态，将磁盘的磁盘状态确定为正常状态。通过结合多个更新信息和磁盘读取进程的多个状态，在多个更新信息不完全相同并且磁盘读取进程的多个状态中存在运行状态的情况下，认为磁盘能够正常运行，将磁盘状态确定为正常状态，能够提高对于磁盘读取进程的短暂无响应状态的容忍度，提高检测的准确性。In a possible implementation, the disk status of the disk is determined according to whether the obtained multiple update information is the same and multiple states of the disk reading process, including: A healthy state exists among the multiple states, and the disk state of the disk is determined to be normal. By combining multiple update information and multiple states of the disk reading process, when the multiple update information is not exactly the same and there is a running state in the multiple states of the disk reading process, it is considered that the disk can run normally, and the disk state Determining it as a normal state can increase the tolerance for the short-term unresponsive state of the disk reading process and improve the accuracy of detection.

在一种可能的实现方式中，相邻的两个读取时间之间的时间间隔为心跳周期，读取时间基于磁盘读取进程按照心跳周期读取数据的时间更新。磁盘读取进程按照心跳周期读取磁盘的数据，并根据读取数据的时间更新读取时间，能够保证读取时间的及时更新。In a possible implementation manner, the time interval between two adjacent reading times is a heartbeat period, and the reading time is updated based on the time when the disk reading process reads data according to the heartbeat period. The disk reading process reads the data of the disk according to the heartbeat cycle, and updates the reading time according to the time of reading the data, which can ensure the timely update of the reading time.

第二方面，提供了一种磁盘状态的检测装置，装置用于提供云计算服务的设备，装置包括：获取模块，用于在检测周期内多次获取磁盘读取进程的状态，磁盘读取进程用于读取磁盘的数据；获取模块，还用于在检测周期内多次获取磁盘读取进程的更新信息，更新信息包括数据的读取时间，数据的读取时间用于指示磁盘读取进程读取磁盘的数据的时间，相邻的两个读取时间所读取的两个数据在磁盘中具有间隔空间；确定模块，用于根据获取的多个更新信息是否相同以及磁盘读取进程的多个状态，确定磁盘的磁盘状态。In a second aspect, a device for detecting the state of a disk is provided, the device is used for providing cloud computing service equipment, and the device includes: an acquisition module, which is used to obtain the status of the disk reading process multiple times within the detection cycle, and the disk reading process Used to read the data of the disk; the acquisition module is also used to obtain the update information of the disk reading process multiple times during the detection cycle, the update information includes the data reading time, and the data reading time is used to indicate the disk reading process The time to read the data of the disk, the two data read at two adjacent reading times have an interval space in the disk; the determination module is used to determine whether the multiple update information obtained are the same and the disk reading process Multiple Status, which determines the disk status of the disk.

在一种可能的实现方式中，更新信息还包括磁盘读取进程的进程标识符，进程标识符用于标识读取磁盘数据的磁盘读取进程；确定模块，还用于基于获取的多个读取时间和多个进程标识符，确定多个更新信息是否相同。In a possible implementation manner, the update information also includes the process identifier of the disk reading process, and the process identifier is used to identify the disk reading process that reads the disk data; Take the time and multiple process identifiers to determine whether multiple update information is the same.

在一种可能的实现方式中，确定模块，用于基于多个读取时间相同以及多个进程标识符相同，确定多个更新信息相同；或者，基于多个读取时间不完全相同以及多个进程标识符不完全相同，确定多个更新信息不完全相同。In a possible implementation, the determining module is configured to determine that multiple update information is the same based on multiple read times being the same and multiple process identifiers being the same; or, based on multiple read times that are not exactly the same and multiple The process identifiers are not exactly the same, and it is determined that the multiple update information is not exactly the same.

在一种可能的实现方式中，确定模块，用于基于多个更新信息相同以及磁盘读取进程的多个状态均为无响应状态，将磁盘的磁盘状态确定为故障状态。In a possible implementation manner, the determining module is configured to determine the disk state of the disk as a failure state based on the fact that multiple update information are the same and multiple states of the disk reading process are all non-responsive.

在一种可能的实现方式中，确定模块，用于基于多个更新信息不完全相同以及磁盘读取进程的多个状态中存在运行状态，将磁盘的磁盘状态确定为正常状态。In a possible implementation manner, the determining module is configured to determine the disk state of the disk as a normal state based on the fact that the multiple update information is not completely the same and there is a running state in the multiple states of the disk reading process.

在一种可能的实现方式中，相邻的两个读取时间之间的时间间隔为心跳周期，读取时间基于磁盘读取进程按照心跳周期读取数据的时间更新。In a possible implementation manner, the time interval between two adjacent reading times is a heartbeat period, and the reading time is updated based on the time when the disk reading process reads data according to the heartbeat period.

第三方面，提供了一种计算设备集群，包括至少一个计算设备，每个计算设备包括处理器，处理器与存储器耦合；该至少一个计算设备的处理器用于执行该至少一个计算设备的存储器中存储的指令，以使得该计算设备集群执行如前述第一方面或第一方面的任意一种可能的实现方式所提供的磁盘状态的检测方法。In a third aspect, a computing device cluster is provided, including at least one computing device, each computing device includes a processor, and the processor is coupled to a memory; the processor of the at least one computing device is used to execute the process in the memory of the at least one computing device Stored instructions, so that the cluster of computing devices executes the disk state detection method provided in the aforementioned first aspect or any possible implementation manner of the first aspect.

第四方面，提供了一种包含指令的计算机程序产品，当该指令被计算设备集群运行时，使得该计算设备集群执行如前述第一方面或第一方面的任意一种可能的实现方式所提供的磁盘状态的检测方法。该计算机程序产品可以为一个软件安装包，在需要实现前述计算设备集群的功能的情况下，可以下载该计算机程序产品并在计算设备集群上执行该计算机程序产品。In a fourth aspect, there is provided a computer program product containing instructions. When the instruction is executed by a cluster of computing devices, the cluster of computing devices executes the above-mentioned first aspect or any one of the possible implementations of the first aspect. The detection method of the disk state. The computer program product may be a software installation package, and the computer program product may be downloaded and executed on the computing device cluster when it is necessary to realize the aforementioned functions of the computing device cluster.

第五方面，提供了一种计算机可读存储介质，包括计算机程序指令，当该计算机程序指令由计算设备集群执行时，该计算设备集群执行如前述第一方面或第一方面的任意一种可能的实现方式所提供的磁盘状态的检测方法。该存储介质包括但不限于易失性存储器，例如随机访问存储器，非易失性存储器，例如快闪存储器、硬盘(hard disk drive，HDD)、固态硬盘(solid state drive，SSD)。In a fifth aspect, there is provided a computer-readable storage medium, including computer program instructions. When the computer program instructions are executed by a cluster of computing devices, the cluster of computing devices executes any one of the aforementioned first aspect or the first aspect. The detection method of the disk status provided by the implementation. The storage medium includes but not limited to volatile memory, such as random access memory, and non-volatile memory, such as flash memory, hard disk drive (hard disk drive, HDD), and solid state drive (solid state drive, SSD).

应当理解的是，本申请的第二方面至第五方面的技术方案及对应的可能的实现方式所取得的有益效果，可以参见上述对第一方面及其对应的可能的实现方式的技术效果，此处不再赘述。It should be understood that, for the beneficial effects obtained by the technical solutions of the second aspect to the fifth aspect of the present application and the corresponding possible implementation manners, refer to the above-mentioned technical effects on the first aspect and the corresponding possible implementation manners, I won't repeat them here.

附图说明Description of drawings

图1为本申请实施例提供的一种实施场景的示意图；FIG. 1 is a schematic diagram of an implementation scenario provided by an embodiment of the present application;

图2是本申请实施例提供的一种磁盘状态的检测方法的流程示意图；FIG. 2 is a schematic flowchart of a method for detecting a disk state provided in an embodiment of the present application;

图3是本申请实施例提供的一种检测磁盘状态的检测流程的示意图；FIG. 3 is a schematic diagram of a detection process for detecting a disk state provided by an embodiment of the present application;

图4是本申请实施例提供的一种示例性的实施场景的示意图；Fig. 4 is a schematic diagram of an exemplary implementation scenario provided by an embodiment of the present application;

图5是本申请实施例提供的一种磁盘状态的检测装置的结构示意图；FIG. 5 is a schematic structural diagram of a disk state detection device provided by an embodiment of the present application;

图6是本申请实施例提供的一种计算设备的硬件结构示意图；FIG. 6 is a schematic diagram of a hardware structure of a computing device provided by an embodiment of the present application;

图7是本申请实施例提供的一种计算设备集群的结构示意图；FIG. 7 is a schematic structural diagram of a computing device cluster provided by an embodiment of the present application;

图8是本申请实施例提供的一种计算设备集群的连接方式示意图。FIG. 8 is a schematic diagram of a connection manner of a computing device cluster provided by an embodiment of the present application.

具体实施方式Detailed ways

本申请的实施方式部分使用的术语仅用于对本申请的具体实施例进行解释，而非旨在限定本申请。The terms used in the embodiments of the present application are only used to explain specific embodiments of the present application, and are not intended to limit the present application.

在云计算技术高速发展的云场景中，云计算技术处理数据的能力越来越强，能够并行处理的数据越来越多，需要处理的数据可以存储在云数据库中。云数据库例如可以是关系型数据库(relational database service，RDS)、支持结构化查询语言的关系型数据库(RDS for my structured query language，RDS for MySQL)或支持结构化查询语言的高性能关系型数据库(RDS for postgre structured query language，RDS forPostgreSQL)。由于云数据库的规模较大，在云场景的应用中通常将一个云数据库分为多个云数据库实例，将云数据库实例作为云数据库的小型管理单元。通过云数据库实例管理存储的数据，能够实现对云数据库中存储的数据更加精准的管理。一个云数据库实例中，可以包含多个磁盘，磁盘的存储空间用于存储数据。磁盘也可以称为数据盘或数据磁盘，在云场景下，磁盘也可以称为云磁盘或云盘。In the cloud scene where cloud computing technology is developing rapidly, the ability of cloud computing technology to process data is getting stronger and stronger, and more and more data can be processed in parallel, and the data that needs to be processed can be stored in the cloud database. The cloud database can be, for example, a relational database service (RDS), a relational database supporting a structured query language (RDS for my structured query language, RDS for MySQL), or a high-performance relational database supporting a structured query language ( RDS for postgre structured query language, RDS for PostgreSQL). Due to the large scale of the cloud database, a cloud database is usually divided into multiple cloud database instances in the application of the cloud scene, and the cloud database instance is used as a small management unit of the cloud database. Manage the stored data through the cloud database instance, which can realize more accurate management of the data stored in the cloud database. A cloud database instance can contain multiple disks, and the storage space of the disks is used to store data. Disks can also be called data disks or data disks, and in cloud scenarios, disks can also be called cloud disks or cloud disks.

在磁盘的运行过程中，会出现磁盘无响应(hang)等故障，导致磁盘无法正常运行，从而影响云计算业务的正常进行。例如，如果存储用户数据的一组磁盘所组成的存储池掉电，可能会导致存储池中的各个磁盘处于无响应的故障状态，导致用户所对应的云计算业务受损，无法正常运行。而随着云数据库实例的数量和规模的增长，云数据库实例中的磁盘出现故障的现象趋于常态化。因此，需要及时感知磁盘的磁盘状态，以及时确定并执行对磁盘或云数据库的决策和修复。During the operation of the disk, failures such as the disk not responding (hang) may occur, causing the disk to fail to operate normally, thus affecting the normal operation of cloud computing services. For example, if a storage pool composed of a group of disks storing user data loses power, each disk in the storage pool may be in an unresponsive failure state, resulting in damage to the cloud computing service corresponding to the user and unable to operate normally. With the increase in the number and scale of cloud database instances, disk failures in cloud database instances tend to become normal. Therefore, it is necessary to sense the disk state of the disk in time, so as to determine and execute the decision and repair of the disk or cloud database in time.

本申请实施例提供了一种磁盘状态的检测方法，能够及时且准确的检测出磁盘的磁盘状态，以保证基于磁盘中存储的数据进行的云计算业务的正常进行。参见图1，示出了一种实施场景的示意图。该实施场景中包括读取模块11、检测模块12和磁盘13，读取模块11能够分别与检测模块12和磁盘13连接，读取模块11与检测模块12能够集成于同一个计算设备中，在该计算设备中读取模块11与检测模块12可以是逻辑分离的两个模块，也可以是物理分离的两个模块，本申请实施例对此不做限定。The embodiment of the present application provides a disk state detection method, which can timely and accurately detect the disk state of the disk, so as to ensure the normal progress of cloud computing services based on the data stored in the disk. Referring to FIG. 1 , it shows a schematic diagram of an implementation scenario. This implementation scenario includes areading module 11, adetection module 12, and amagnetic disk 13. Thereading module 11 can be connected to thedetection module 12 and themagnetic disk 13 respectively, and thereading module 11 and thedetection module 12 can be integrated in the same computing device. Thereading module 11 and thedetection module 12 in the computing device may be two modules that are logically separated, or two modules that are physically separated, which is not limited in this embodiment of the present application.

读取模块11中可以通过运行不同的代码来启动不同的进程，进程可以是磁盘读取进程或其他的进程。检测模块12可以通过与读取模块11联动，根据读取模块11中不同的进程的运行结果，检测出磁盘13的磁盘状态。磁盘13可以是物理磁盘或是虚拟磁盘(virtualdisk，VD)，磁盘13的数量可以是一个或多个。如果磁盘13的数量为多个，则根据本申请实施例提供的磁盘状态的检测方法可以逐个对多个磁盘13中的各个磁盘13进行串行检测，也可以对多个磁盘13进行并行检测。Different processes can be started by running different codes in thereading module 11, and the processes can be disk reading processes or other processes. Thedetection module 12 can detect the disk state of thedisk 13 according to the running results of different processes in thereading module 11 through linkage with thereading module 11 . Thedisk 13 may be a physical disk or a virtual disk (virtualdisk, VD), and the number ofdisks 13 may be one or more. If the number ofdisks 13 is multiple, the method for detecting the state of the disks provided in the embodiment of the present application may perform serial detection on each of themultiple disks 13 one by one, or perform parallel detection onmultiple disks 13 .

参见图2，本申请实施例提供了一种磁盘状态的检测方法，该方法可以应用于图1所示的实施场景中，该方法可以包括如下的步骤S201至S203。Referring to FIG. 2 , an embodiment of the present application provides a disk state detection method, which can be applied to the implementation scenario shown in FIG. 1 , and the method can include the following steps S201 to S203.

S201，在检测周期内多次获取磁盘读取进程的状态，磁盘读取进程用于读取磁盘的数据。S201. Obtain the status of the disk reading process multiple times during the detection period, and the disk reading process is used to read data from the disk.

检测周期是指能够完成一次磁盘状态检测的时间间隔，该检测周期可以根据经验和应用场景设定，检测周期也可以是足够磁盘读取进程在磁盘正常的情况下完成多次数据的读取的时间，例如1分钟或85秒。如果检测周期设置过长，可能会导致无法及时发现并上报磁盘的故障状态，如果检测周期设置过短，可能会导致在检测周期内磁盘读取进程读取数据的次数较少，从而导致根据磁盘读取进程读取数据的过程确定出的磁盘状态有误。在对一个磁盘进行磁盘状态的检测的过程中，可以有一个或多个检测周期，磁盘读取进程读取磁盘的数据的次数也可以根据经验或应用场景设定。此外，在对不同的磁盘进行磁盘状态的检测时，检测周期的大小可以相同也可以不同。The detection cycle refers to the time interval for completing a disk status detection. The detection cycle can be set according to experience and application scenarios. The detection cycle can also be enough for the disk reading process to complete multiple data readings when the disk is normal. Time, such as 1 minute or 85 seconds. If the detection period is set too long, it may not be able to detect and report the fault status of the disk in time. If the detection period is set too short, the disk reading process may read data less frequently during the detection period, resulting in The process of reading data by the read process determined the disk state to be incorrect. In the process of detecting the disk status of a disk, there may be one or more detection cycles, and the number of times the disk reading process reads data from the disk can also be set according to experience or application scenarios. In addition, when detecting the disk state of different disks, the length of the detection period may be the same or different.

磁盘读取进程是用于读取磁盘的数据的进程，例如可以是基于设备驱动(devicedriver，DD)命令或DD工具的DD读取进程(dd_read process)。磁盘读取进程可以是其他进程的子进程，其他进程可以是能够控制磁盘读取进程完成读取动作的控制进程，例如可以是磁盘输入输出(input output，IO)数据读取进程(disk_io_read process)。磁盘读取进程与其他进程可以在同一个根(root)目录或同一个根用户下，磁盘读取进程与其他进程均可以包含多个不同的线程。The disk reading process is a process for reading data from a disk, for example, it may be a DD reading process (dd_read process) based on a device driver (DD) command or a DD tool. The disk reading process can be a child process of other processes, and the other process can be a control process that can control the disk reading process to complete the reading action, for example, it can be a disk input and output (input output, IO) data reading process (disk_io_read process) . The disk reading process and other processes can be in the same root (root) directory or under the same root user, and both the disk reading process and other processes can contain multiple different threads.

磁盘读取进程的状态包括但不限于运行(running，R)状态、睡眠(sleep，S)状态、无响应(disk sleep，D)状态等不同的状态。其中，运行状态也可以称为可执行状态，睡眠状态也可以称为可中断的睡眠状态，无响应状态也可以称为不可打断状态或不可打断的睡眠状态。磁盘读取进程的状态可以通过查询操作系统的进程信息文件获取。操作系统是指本申请实施例中的设备能够支持的操作系统，例如可以是Linux(一种操作系统)或视窗操作系统(Windows)等系统。操作系统的进程信息文件中，存储有操作系统中各个进程的运行状态和进程的其他信息。通过查询操作系统的进程信息文件中存储的磁盘读取进程的状态，能够实现检测模块对磁盘读取进程的状态的获取。例如，如果检测模块查询到操作系统的进程信息文件中磁盘读取进程的状态为运行状态，则可以确定磁盘读取进程正在正常运行。如果检测模块查询到操作系统的进程信息文件中磁盘读取进程的状态为无响应状态，则可以确定磁盘读取进程无响应。The state of the disk reading process includes, but is not limited to, different states such as a running (running, R) state, a sleeping (sleep, S) state, and a non-responsive (disk sleep, D) state. The running state may also be called an executable state, the sleep state may also be called an interruptible sleep state, and the unresponsive state may also be called an uninterruptible state or an uninterruptible sleep state. The status of the disk reading process can be obtained by querying the process information file of the operating system. The operating system refers to an operating system that can be supported by the device in the embodiment of the present application, for example, it may be a system such as Linux (an operating system) or a Windows operating system (Windows). The operating status of each process in the operating system and other information about the process are stored in the process information file of the operating system. By querying the state of the disk reading process stored in the process information file of the operating system, the detection module can acquire the state of the disk reading process. For example, if the detection module finds that the status of the disk reading process in the process information file of the operating system is running, it can be determined that the disk reading process is running normally. If the detection module finds that the state of the disk reading process in the process information file of the operating system is a non-responsive state, it can be determined that the disk reading process is not responding.

磁盘读取进程启动后，可以通过所要检测的磁盘的输入输出接口，读取磁盘的数据。读取的数据可以是磁盘的输入输出数据，也可以是磁盘内存储的其他数据。在磁盘读取进程完成一次数据的读取之后，磁盘读取进程会关闭。在本申请实施例中获取的磁盘读取进程的状态是磁盘读取进程启动之后到关闭之间的状态。由于磁盘读取进程在检测周期内会多次启动和关闭，或者说会有多次不同的读取过程，因此，在检测周期内多次获取的磁盘读取进程的状态，是磁盘读取进程在不同的读取过程中的状态。After the disk reading process is started, the data of the disk can be read through the input and output interfaces of the disk to be detected. The read data can be the input and output data of the disk, or other data stored in the disk. After the disk reading process finishes reading data once, the disk reading process will shut down. The state of the disk reading process acquired in the embodiment of the present application is a state between the disk reading process starting and closing. Since the disk reading process will be started and shut down multiple times during the detection cycle, or there will be multiple different reading processes, the status of the disk reading process obtained multiple times during the detection cycle is the disk reading process Status during the various reads.

S202，在检测周期内多次获取磁盘读取进程的更新信息。S202. Acquire update information of the disk reading process multiple times within the detection period.

更新信息是指磁盘读取进程完成一次数据的读取后更新的信息，也可以是磁盘读取进程一次关闭后更新的信息，更新信息也可以称为心跳信息。更新信息包括数据的读取时间，数据的读取时间用于指示磁盘读取进程读取磁盘的数据的时间，相邻的两个读取时间所读取的两个数据在磁盘中具有间隔空间。在一些实施例中，读取时间也可称为时间戳或心跳时间戳。The update information refers to the information updated after the disk reading process completes a data reading, and may also be the information updated after the disk reading process is closed once, and the update information may also be called heartbeat information. The update information includes the read time of the data. The read time of the data is used to indicate the time when the disk read process reads the data on the disk. The two data read by two adjacent read times have an interval space in the disk. . In some embodiments, the read time may also be referred to as a timestamp or a heartbeat timestamp.

一个数据的读取时间可以是磁盘读取进程开始一次读取磁盘的数据的时间点，也可以是磁盘读取进程结束一次数据的读取的时间点，也可以是包含磁盘读取进程完成一次数据的读取的过程中开始读取数据的时间点和磁盘读取进程结束读取的时间点的一段时间。The reading time of a data can be the time point when the disk reading process starts to read data from the disk, or the time point when the disk reading process finishes reading data once, or it can include the time when the disk reading process completes one time. A period of time between the point in time when the data is read and the point in time when the disk read process finishes reading during the data reading process.

相邻的两个读取时间均为检测周期内的时间，相邻的两个读取时间之间的时间间隔为心跳周期，读取时间基于磁盘读取进程按照心跳周期读取数据的时间更新。相邻的两个读取时间所读取的两个数据，可以位于磁盘的不同位置，并且两个数据之间具有间隔空间。由于读取的两个数据之间具有间隔空间，因此，磁盘读取进程多次读取数据这一过程可以称为跳读。示例性地，磁盘读取进程可以在检测周期内的第一读取时间开始读取第一数据，并在完成第一数据的读取之后，将第一读取时间确定为一个数据的读取时间。第一数据的大小可以根据经验或工作需求确定，可以是消耗较少的输入输出资源的大小，例如4千比特(kilo bit，Kbit)。磁盘读取进程可以在第二读取时间开始读取第二数据，并在完成第二数据的读取之后，将第二读取时间确定为另一个数据的读取时间。第二读取时间在磁盘读取进程完成第一数据的读取之后，第二读取时间为与第一读取时间相邻的读取时间，第二读取时间与第一读取时间之间的时间间隔为心跳周期。心跳周期可以根据经验或工作需求设定，例如可以是10毫秒(millisecond，ms)。在磁盘读取进程读取第二数据时，可以根据第一数据在磁盘中的位置，确定出与第一数据在磁盘中的位置具有间隔空间的位置，作为第二数据在磁盘中的位置，并根据第二数据在磁盘中的位置读取第二数据。The two adjacent read times are the time within the detection cycle, the time interval between two adjacent read times is the heartbeat cycle, and the read time is updated based on the time when the disk read process reads data according to the heartbeat cycle . The two pieces of data read in two adjacent reading times may be located in different positions on the disk, and there is an interval between the two pieces of data. Since there is an interval space between the two read data, the process of reading data multiple times by the disk reading process can be called skip reading. Exemplarily, the disk read process may start to read the first data at the first read time within the detection period, and after completing the reading of the first data, determine the first read time as a data read time. The size of the first data may be determined according to experience or work requirements, and may be a size that consumes less input and output resources, for example, 4 kilobits (Kbit). The disk reading process may start to read the second data at the second reading time, and after finishing reading the second data, determine the second reading time as the reading time of another data. The second read time is after the disk read process finishes reading the first data, the second read time is the read time adjacent to the first read time, and the distance between the second read time and the first read time The time interval between is the heartbeat cycle. The heartbeat period can be set according to experience or work requirements, for example, it can be 10 milliseconds (millisecond, ms). When the disk reading process reads the second data, a position having an interval with the position of the first data in the disk can be determined as the position of the second data in the disk according to the position of the first data in the disk, And read the second data according to the position of the second data in the disk.

本申请实施例对间隔空间的大小不做限定，可以是根据所检测的磁盘的存储空间确定，也可以是根据所检测的磁盘的存储空间和设定的读取次数确定。示例性地，根据所检测的磁盘的存储空间确定，可以是计算出所检测的磁盘的存储空间的比例值所对应的存储空间，将计算出的比例值所对应的存储空间确定为间隔空间。例如，所检测的磁盘的存储空间大小为10兆比特(million bit，Mbit)，比例值为10％，则可以计算出间隔空间的大小为1Mbit。The embodiment of the present application does not limit the size of the interval space, which may be determined according to the detected storage space of the disk, or may be determined according to the detected storage space of the disk and the set number of read times. Exemplarily, the determination according to the detected storage space of the disk may be to calculate the storage space corresponding to the ratio of the detected storage space of the disk, and determine the storage space corresponding to the calculated ratio as the interval space. For example, if the detected storage space of the disk is 10 megabits (million bits, Mbit), and the proportion value is 10%, then the size of the interval space can be calculated as 1 Mbit.

示例性地，间隔空间的大小根据所检测的磁盘的存储空间和设定的读取次数确定，可以是将存储空间与设定的读取次数的比值确定为间隔空间。例如，所检测的磁盘的存储空间为20Mbit，设定的读取次数为20次，则可以确定出间隔空间的大小为1Mbit。Exemplarily, the size of the interval space is determined according to the detected storage space of the magnetic disk and the set number of read times, and the ratio of the storage space to the set number of read times may be determined as the interval space. For example, if the detected storage space of the disk is 20Mbit, and the set number of reads is 20, then it can be determined that the size of the interval space is 1Mbit.

又或者，间隔空间的大小也可以是根据经验或检测粒度确定，间隔空间与检测粒度正相关，例如，所需的检测粒度越细，间隔空间的大小越小。在本申请实施例中，也可以先确定出间隔空间的大小，再确定出对所检测的磁盘的读取次数。示例性地，所检测的磁盘的存储空间为50Mbit，设定的间隔空间大小为800Kbit，每次读取的数据的大小为200Kbit，则可以确定出读取次数为50次。Alternatively, the size of the interval space may also be determined based on experience or detection granularity, and the interval space is positively correlated with the detection granularity, for example, the finer the required detection granularity, the smaller the size of the interval space. In the embodiment of the present application, the size of the interval space may also be determined first, and then the number of times to read the detected disk is determined. Exemplarily, if the detected storage space of the disk is 50Mbit, the set interval space is 800Kbit, and the size of the data read each time is 200Kbit, then it can be determined that the number of times of reading is 50 times.

在确定出间隔空间、读取次数和心跳周期后，磁盘读取进程按照间隔空间、读取次数和心跳周期对所检测的磁盘进行多次的数据读取，并在完成每次读取后，更新磁盘读取进程的更新信息。磁盘读取进程读取磁盘的数据这一过程是模拟其他设备通过输入输出接口读取磁盘内存储的数据的过程，但是由于在本申请实施例中磁盘读取进程可以采用跳读的方式进行数据的读取，因此，本申请实施例能够消耗较少的输入输出资源，并且提高读取数据的效率，减少每次读取数据所需的时间。After determining the interval space, number of reads, and heartbeat cycle, the disk reading process reads data from the detected disk multiple times according to the interval space, number of reads, and heartbeat cycle, and after completing each read, Update the update information of the disk reading process. The process of reading the data on the disk by the disk reading process is to simulate the process of other devices reading the data stored in the disk through the input and output interface. Therefore, the embodiment of the present application can consume less input and output resources, improve the efficiency of reading data, and reduce the time required for each data reading.

示例性地，本申请实施例中的更新信息可以存储在与所检测的磁盘不同的磁盘中，例如更新信息可以是存储在读取模块所在的磁盘中，能够避免更新信息与所检测的磁盘的依赖。例如，如果更新信息存储在所检测的磁盘中，而所检测的磁盘处于故障状态，检测模块则无法通过所检测的磁盘的输入输出接口获取更新信息，导致无法完成检测。而如果更新信息存储在与所检测的磁盘不同的磁盘中，则即使所检测的磁盘处于故障状态，检测模块依旧可以通过存储更新信息的磁盘的输入输出接口获取更新信息，实现对所检测的磁盘的状态的检测。Exemplarily, the update information in the embodiment of the present application may be stored in a disk different from the detected disk. For example, the update information may be stored in the disk where the reading module is located, which can avoid the difference between the update information and the detected disk. rely. For example, if the update information is stored in the detected disk, and the detected disk is in a fault state, the detection module cannot obtain the update information through the input and output interface of the detected disk, resulting in failure to complete the detection. And if the update information is stored in a disk different from the detected disk, even if the detected disk is in a fault state, the detection module can still obtain the update information through the input and output interface of the disk storing the update information, so as to realize the detection of the detected disk. state detection.

S203，根据获取的多个更新信息是否相同以及磁盘读取进程的多个状态，确定磁盘的磁盘状态。S203. Determine the disk state of the disk according to whether the multiple acquired update information are the same and the multiple states of the disk reading process.

在本申请的一种可能的实现方式中，更新信息除了可以包括数据的读取时间，还可以包括磁盘读取进程的进程标识符(process identifier，PID)，进程标识符用于标识读取磁盘数据的磁盘读取进程，进程标识符也可以称为进程控制符或进程号。如果磁盘读取进程为DD进程，则进程标识符可以是DD PID。In a possible implementation of the present application, the update information may not only include the reading time of the data, but also include a process identifier (process identifier, PID) of the disk reading process, and the process identifier is used to identify the disk reading process. The disk reading process of data, the process identifier can also be called the process control character or the process number. If the disk reading process is a DD process, the process identifier may be a DD PID.

在磁盘读取进程启动后，系统可以为磁盘读取进程分配一个进程标识符，并且在磁盘读取进程的一次运行过程中，磁盘读取进程的进程标识符是唯一且不可变的。在磁盘读取进程关闭之后，磁盘读取进程可以释放进程标识符，使得进程标识符被系统回收。例如，磁盘读取进程一次启动后，系统为磁盘读取进程分配的进程标识符为6，则在磁盘读取进程的运行过程中，磁盘读取进程的进程标识符一直为6。当此次磁盘读取进程关闭之后，磁盘读取进程可以释放进程标识符6，系统回收进程标识符6并可以将进程标识符6分配给系统中启动的其他进程。在磁盘读取进程又一次启动后，系统会再次为磁盘读取进程重新分配一个进程标识符，重新分配的进程标识符与上一次为磁盘读取进程分配的进程标识符可以相同，也可以不同。例如，系统重新为磁盘读取进程分配的标识符可以依旧是6，也可以是其他的进程标识符，比如15。After the disk reading process is started, the system may assign a process identifier to the disk reading process, and during a running process of the disk reading process, the process identifier of the disk reading process is unique and immutable. After the disk reading process is closed, the disk reading process may release the process identifier, so that the process identifier is reclaimed by the system. For example, after the disk reading process is started once, the system assigns a process identifier of 6 to the disk reading process, then during the operation of the disk reading process, the process identifier of the disk reading process is always 6. After the disk reading process is closed, the disk reading process can release the process ID 6, and the system reclaims the process ID 6 and can distribute the process ID 6 to other processes started in the system. After the disk reading process is started again, the system will reassign a process identifier for the disk reading process again. The reassigned process identifier can be the same as or different from the process identifier assigned to the disk reading process last time. . For example, the identifier reassigned by the system to the disk reading process may still be 6, or may be other process identifiers, such as 15.

通过多次获取磁盘读取进程的读取时间和进程标识符，能够获得检测周期内磁盘读取进程的多个读取时间和多个进程标识符。在根据获取的多个更新信息是否相同以及磁盘读取进程的多个状态，确定磁盘的磁盘状态之前，还包括：基于获取的多个读取时间和多个进程标识符，确定多个更新信息是否相同。By obtaining the reading times and process identifiers of the disk reading processes multiple times, multiple reading times and multiple process identifiers of the disk reading processes within the detection period can be obtained. Before determining the disk state of the disk according to whether the obtained multiple update information is the same and multiple states of the disk reading process, it also includes: determining multiple update information based on the multiple obtained read times and multiple process identifiers Is it the same.

在一种可能的实现方式中，基于获取的多个读取时间和多个进程标识符，确定多个更新信息是否相同，包括：基于多个读取时间相同以及多个进程标识符相同，确定多个更新信息相同；或者，基于多个读取时间不完全相同以及多个进程标识符不完全相同，确定多个更新信息不完全相同。根据多个读取时间和多个进程标识符确定多个更新信息是否相同，包括如下的情况A1。In a possible implementation manner, based on the obtained multiple read times and multiple process identifiers, determining whether the multiple update information is the same includes: determining whether the multiple read times are the same and the multiple process identifiers are the same, The multiple update information is the same; or, based on the fact that the multiple read times are not completely the same and the multiple process identifiers are not completely the same, it is determined that the multiple update information are not completely the same. Whether multiple update information are identical is determined according to multiple read times and multiple process identifiers, including the following case A1.

情况A1，根据多个读取时间是否相同以及多个进程标识符是否相同，确定多个更新信息是否相同。在此种情况下，如果多个读取时间相同以及多个进程标识符相同，可以确定多个更新信息相同。如果多个读取时间不完全相同，而多个进程标识符相同或不完全相同，可以认为磁盘读取进程完成了多次读取，但是因为系统多次为磁盘读取进程分配的多个进程标识符中存在相同的进程标识符，导致获取的多个进程标识符中存在相同的进程标识符，因此，可以确定多个更新信息不完全相同，或者认为更新信息有变化。如果多个读取时间不同，并且多个进程标识符不同，可以确定多个更新信息不同。如果多个读取时间相同，而多个进程标识符不同，可以认为获取的多个读取时间或多个进程标识符有误，无法根据获取的多个进程标识符和读取时间确定出多个更新信息是否相同。In case A1, it is determined whether the multiple update information is the same according to whether the multiple read times are the same and whether the multiple process identifiers are the same. In this case, if multiple read times are identical and multiple process identifiers are identical, it can be determined that multiple update information are identical. If multiple read times are not exactly the same, and multiple process identifiers are the same or not exactly the same, it can be considered that the disk read process has completed multiple reads, but because the system has allocated multiple processes for the disk read process multiple times The presence of the same process identifier in the identifiers results in the presence of the same process identifier in the multiple obtained process identifiers. Therefore, it can be determined that the multiple update information is not completely the same, or it is considered that the update information has changed. If the plurality of read times are different, and the plurality of process identifiers are different, it can be determined that the plurality of update information are different. If multiple read times are the same but multiple process identifiers are different, it can be considered that the obtained multiple read times or multiple process identifiers are wrong, and it is impossible to determine the multiple read times based on the obtained multiple process identifiers and read times. Whether the updated information is the same.

例如，检测模块在检测周期内获取到3个读取时间和3个进程标识符，如果3个读取时间相同且3个进程标识符相同，则可以认为3个更新信息相同，如果3个读取时间不完全相同，3个进程标识符不完全相同，则可以认为3个更新信息不完全相同，或者认为更新信息有变化。For example, the detection module obtains 3 read times and 3 process identifiers in the detection period, if the 3 read times are the same and the 3 process identifiers are the same, it can be considered that the 3 update information are the same, if the 3 read If the fetching time is not exactly the same, and the three process identifiers are not exactly the same, it can be considered that the three update information are not exactly the same, or that the update information has changed.

在一种可能的实现方式中，还可以根据如下的情况A2确定磁盘读取进程的多个更新信息是否相同。In a possible implementation manner, it may also be determined according to the following situation A2 whether multiple update information of the disk reading process are the same.

情况A2，根据多个读取时间是否相同确定多个更新信息是否相同。在此种情况下，如果多个读取时间相同，则可以认为多个更新信息相同，如果多个读取时间不完全相同，则可以认为多个更新信息不完全相同，或者认为更新信息有变化。In case A2, it is determined whether the plurality of update information are the same according to whether the plurality of read times are the same. In this case, if multiple reading times are the same, it can be considered that multiple update information is the same; if multiple reading times are not completely the same, it can be considered that multiple update information is not completely the same, or that the update information has changed .

例如，获取到3个读取时间，其中的2个读取时间相同，另一个读取时间与这2个读取时间不同，则可以认为3个更新信息不完全相同，或者认为更新信息有变化。For example, if 3 reading times are obtained, 2 of which are the same and the other reading time is different from the 2 reading times, it can be considered that the 3 update information is not exactly the same, or that the update information has changed .

在根据上述的情况A1和情况A2两种情况确定出多个更新信息是否相同之后，可以根据获取的多个更新信息是否相同以及磁盘读取进程的多个状态，确定磁盘的磁盘状态，包括但不限于如下的情况B1和情况B2两种情况。After determining whether the multiple update information is the same according to the above-mentioned two situations of A1 and A2, the disk status of the disk can be determined according to whether the obtained multiple update information is the same and the multiple states of the disk reading process, including but It is not limited to the following two cases of case B1 and case B2.

情况B1，基于多个更新信息相同以及磁盘读取进程的多个状态均为无响应状态，将磁盘的磁盘状态确定为故障状态。In case B1, the disk state of the disk is determined as a failure state based on the same update information and multiple states of the disk reading process being non-responsive.

其中，多个更新信息相同并且磁盘读取进程的多个状态均为无响应状态，可以认为在检测周期内磁盘读取进程没有多次启动和关闭，且磁盘读取进程在检测周期内一直处于无响应状态，磁盘读取进程没有通过磁盘的输入输出接口多次读取磁盘的数据，说明磁盘的输入输出接口故障，进而可以将磁盘的状态确定为故障状态。Among them, multiple update information is the same and multiple states of the disk reading process are non-responsive. It can be considered that the disk reading process has not been started and shut down multiple times during the detection period, and the disk reading process has been in the state of In the non-responsive state, the disk reading process has not read the data of the disk multiple times through the input and output interfaces of the disk, indicating that the input and output interfaces of the disk are faulty, and then the state of the disk can be determined as a faulty state.

如果磁盘读取进程能够成功运行，磁盘读取进程能够通过磁盘的输入输出接口读取数据，更新信息会随着磁盘读取进程多次读取数据的过程发生变化，因此，即使在检测周期内磁盘读取进程存在未被检测模块获取到的运行状态，但由于更新信息未发生变化，依旧可以认为磁盘读取进程未能成功完成一次数据的读取，从而可以确定磁盘在检测周期内处于故障状态。If the disk reading process can run successfully, the disk reading process can read data through the disk input and output interface, and the update information will change as the disk reading process reads data multiple times. Therefore, even within the detection cycle The disk reading process has a running state that has not been obtained by the detection module, but since the update information has not changed, it can still be considered that the disk reading process has not successfully completed a data reading, so it can be determined that the disk is in failure during the detection period state.

情况B2，基于多个更新信息不完全相同以及磁盘读取进程的多个状态中存在运行状态，将磁盘的磁盘状态确定为正常状态。In case B2, the disk status of the disk is determined to be normal based on the fact that multiple update information are not identical and there is a running state in multiple states of the disk reading process.

多个更新信息不完全相同并且获取的磁盘读取进程的多个状态中存在运行状态，可以认为在检测周期内磁盘读取进程成功完成了至少一次数据的读取，因而可以确定磁盘状态为正常状态。Multiple update information is not exactly the same and there is a running state in the multiple states of the acquired disk reading process. It can be considered that the disk reading process has successfully completed at least one data reading during the detection cycle, so it can be determined that the disk status is normal. state.

或者，如果在检测模块获取磁盘读取进程的状态时，磁盘的输入输出接口恰好处于短暂的睡眠状态或短暂的无响应状态，导致磁盘读取进程也短暂处于睡眠状态或无响应状态，而在检测模块获取磁盘读取进程的状态之后，磁盘的输入输出接口恢复为正常状态，磁盘读取进程也能够正常读取磁盘的数据。在这种情况下，检测模块获取的磁盘读取进程的多个状态均为睡眠状态或无响应状态，而多个更新信息不完全相同，也可以将磁盘状态确定为正常状态。Or, if when the detection module obtains the status of the disk reading process, the input and output interfaces of the disk happen to be in a short sleep state or a short no response state, causing the disk reading process to also be in a short sleep state or no response state. After the detection module obtains the state of the disk reading process, the input and output interfaces of the disk return to a normal state, and the disk reading process can also normally read data from the disk. In this case, the multiple states of the disk reading process obtained by the detection module are sleep state or non-response state, but the multiple update information is not completely the same, and the disk state can also be determined as a normal state.

参见图3，示出了一种检测磁盘状态的检测流程的示意图。读取模块中的控制进程启动后，系统为控制进程分配控制进程的进程标识符。在控制进程运行的过程中，磁盘读取进程作为控制进程的子进程会多次启动，在每一次磁盘读取进程启动之前，系统会通过控制进程为磁盘读取进程分配磁盘读取进程的进程标识符，控制进程可以记录磁盘读取进程当前的进程标识符。磁盘读取进程在运行过程中，可以读取待检测磁盘的数据，待检测磁盘可以是磁盘1、磁盘2和磁盘3，磁盘1、磁盘2和磁盘3可以是云数据库存储设备(device)中相邻的物理磁盘，也可以是云数据库存储设备中的一个物理磁盘按照存储空间中的存储区域划分的虚拟磁盘。例如，磁盘1、磁盘2和磁盘3可以分别是云数据库存储设备中一个物理磁盘按照存储空间中的存储区域划分的第三个虚拟磁盘(/dev/vdc)、第四个虚拟磁盘(/dev/vdd)和第五个虚拟磁盘(/dev/vde)。Referring to FIG. 3 , it shows a schematic diagram of a detection process for detecting a disk state. After the control process in the reading module is started, the system assigns the process identifier of the control process to the control process. During the running of the control process, the disk reading process will be started multiple times as a child process of the control process. Before each disk reading process is started, the system will assign the disk reading process to the disk reading process through the control process. Identifier, the control process can record the current process identifier of the disk reading process. During the operation of the disk reading process, the data of the disk to be detected can be read. The disk to be detected can be Disk 1, Disk 2, and Disk 3, and Disk 1, Disk 2, and Disk 3 can be stored in the cloud database storage device (device). The adjacent physical disk can also be a virtual disk divided by a physical disk in the cloud database storage device according to the storage area in the storage space. For example, Disk 1, Disk 2, and Disk 3 can be respectively the third virtual disk (/dev/vdc) and the fourth virtual disk (/dev /vdd) and a fifth virtual disk (/dev/vde).

磁盘读取进程在每次读取到待检测磁盘的数据之后，向控制进程反馈已经完成一次数据的读取，并将磁盘读取进程关闭，释放此次磁盘读取进程的进程标识符。After the disk reading process reads the data of the disk to be detected each time, it feeds back to the control process that a data reading has been completed, closes the disk reading process, and releases the process identifier of the disk reading process.

控制进程接收到磁盘读取进程的反馈之后，更新读取时间。检测模块多次获取检测周期内的控制进程的进程标识符、读取时间和磁盘读取进程的进程标识符以及磁盘读取进程的状态。如果检测模块能够获取到控制进程的进程标识符，代表控制进程正在运行中，而如果在控制进程处于运行状态的情况下，检测周期内的多个读取时间和磁盘读取进程的多个进程标识符未发生改变，并且磁盘读取进程一直处于睡眠状态或无响应状态，则可以确定待检测的磁盘已经无法正常运行，待检测的磁盘的磁盘状态为故障状态。After the control process receives the feedback from the disk read process, it updates the read time. The detection module obtains the process identifier of the control process, the reading time, the process identifier of the disk reading process, and the status of the disk reading process within the detection period multiple times. If the detection module can obtain the process identifier of the control process, it means that the control process is running, and if the control process is running, multiple read times and multiple processes of the disk read process in the detection cycle If the identifier has not changed, and the disk reading process has been in a sleeping state or in an unresponsive state, it can be determined that the disk to be detected has failed to operate normally, and the disk status of the disk to be detected is a failure state.

参见图4，示出了一种示例性的实施场景的示意图，该实施场景包括双机集群系统(highly available，HA)服务器41、主(master)实例节点42和备用(slave)实例节点43，HA服务器分别能够与主实例节点42和备用实例节点43连接。Referring to FIG. 4 , a schematic diagram of an exemplary implementation scenario is shown, the implementation scenario includes a two-machine cluster system (highly available, HA)server 41, a master (master)instance node 42 and a standby (slave)instance node 43, The HA server can be connected to themain instance node 42 and thestandby instance node 43 respectively.

主实例节点42和备用实例节点43分布在同一个云数据库实例中，主实例节点42和备用实例节点43中均包含至少一个磁盘，主实例节点42和备用实例节点43均部署有代理(agent)服务模块，主实例节点42也可以称为主节点实例或主节点，备用实例节点43也可以称为备用节点实例或备用节点，代理服务模块也可以称为代理模块。Theprimary instance node 42 and thestandby instance node 43 are distributed in the same cloud database instance, both theprimary instance node 42 and thestandby instance node 43 contain at least one disk, and both theprimary instance node 42 and thestandby instance node 43 are deployed with agents For the service module, themaster instance node 42 may also be called a master node instance or a master node, thebackup instance node 43 may also be called a backup node instance or a backup node, and the proxy service module may also be called a proxy module.

代理服务模块中可以集成检测模块和读取模块，检测模块和读取模块负责检测代理服务模块当前所在的实例节点的磁盘状态和操作系统(operating system，OS)的状态，代理服务模块负责将检测结果上报至HA服务器，检测结果用于HA服务器和技术人员对实例节点进行监控和修复。其中，由于各个实例节点均存在于云数据库实例中，因此，代理服务模块可以通过向HA服务器发送实例节点的数据库状态(date base，DB)消息，完成实例节点的磁盘状态的上报。The proxy service module can integrate a detection module and a reading module. The detection module and the reading module are responsible for detecting the disk status and operating system (operating system, OS) status of the instance node where the proxy service module is currently located. The proxy service module is responsible for detecting The results are reported to the HA server, and the detection results are used by the HA server and technicians to monitor and repair the instance nodes. Wherein, since each instance node exists in the cloud database instance, the agent service module can complete the report of the disk state of the instance node by sending the database state (date base, DB) message of the instance node to the HA server.

当检测模块检测到数据库中的磁盘出现异常或磁盘的状态为故障状态时，代理服务模块将磁盘出现异常或磁盘的状态为故障状态的信息上报至HA服务器。HA服务器根据上报的信息和决策条件，做出决策，决策例如可以是执行切换逻辑，执行决策能够确保云数据库实例不受磁盘故障的影响。其中，执行切换逻辑的决策条件包括但不限于以下两点：When the detection module detects that the disk in the database is abnormal or the state of the disk is faulty, the agent service module reports the information that the disk is abnormal or the state of the disk is faulty to the HA server. The HA server makes a decision based on the reported information and decision conditions. For example, the decision can be to execute switching logic, and the execution decision can ensure that the cloud database instance is not affected by the disk failure. Among them, the decision-making conditions for executing the switching logic include but are not limited to the following two points:

第一点：检测到主实例节点的磁盘故障；The first point: the disk failure of the primary instance node is detected;

第二点：主实例节点所在的云数据库实例中存在可用的备用实例节点，并且备用实例节点的磁盘状态为正常状态。The second point: there is an available standby instance node in the cloud database instance where the primary instance node is located, and the disk status of the standby instance node is normal.

代理服务模块向HA服务器上报的数据库状态消息中可以包含各个磁盘的状态消息，各个磁盘的状态消息包括但不限于各个磁盘的身份标识号(identity document，ID)、各个磁盘的类型(type)、各个磁盘的磁盘状态(state)。示例性地，可以通过获取磁盘状态的函数(diskStatus)获取各个磁盘的状态消息。例如，可以通过如下的获取磁盘状态的函数的代码获取主实例节点42中的一个磁盘的状态消息。The database status message reported by the proxy service module to the HA server may include the status message of each disk, the status message of each disk includes but not limited to the identity number (identity document, ID) of each disk, the type (type) of each disk, The disk status (state) of each disk. Exemplarily, the status message of each disk can be acquired through the function (diskStatus) of acquiring the status of the disk. For example, the status message of a disk in theprimary instance node 42 can be obtained through the following function code for obtaining disk status.

从上述代码中可以看出，主实例节点42的一个磁盘的ID为17645，磁盘类型为数据(data)磁盘，状态为故障(breakdown)状态。因此，可以确定出主实例节点42的一个磁盘故障，符合执行切换逻辑的决策条件中的第一点。又因为主实例节点42所在的云数据库实例中，存在备用实例节点43，如果备用实例节点43的磁盘状态为正常状态，则可以认为执行切换逻辑的决策条件的第二点也被满足，所以可以对主实例节点42与备用实例节点43执行切换逻辑的操作，对主实例节点42降备，将主实例节点42切换为备用实例节点，对备用实例节点43升主，将备用实例节点43切换为主实例节点，保证云数据库实例的运行不受磁盘故障的原主实例节点42影响，提高云数据库实例的高可用性。It can be seen from the above code that the ID of a disk of theprimary instance node 42 is 17645, the disk type is a data (data) disk, and the state is a failure (breakdown) state. Therefore, it can be determined that a disk failure of themaster instance node 42 meets the first point in the decision-making condition for executing the switching logic. And because there is astandby instance node 43 in the cloud database instance where theprimary instance node 42 is located, if the disk status of thestandby instance node 43 is in a normal state, it can be considered that the second point of the decision-making condition for executing the switching logic is also satisfied, so it can be Execute switching logic operations on theprimary instance node 42 and thestandby instance node 43, downgrade theprimary instance node 42, switch theprimary instance node 42 to the standby instance node, upgrade thestandby instance node 43, and switch thestandby instance node 43 to The primary instance node ensures that the operation of the cloud database instance will not be affected by the originalprimary instance node 42 of the disk failure, and improves the high availability of the cloud database instance.

相关技术中，通过多次检测云数据库中各个磁盘的所有进程或线程是否处于无响应状态，判断所检测的磁盘是否故障。但是由于正常的磁盘的进程或线程可能也会多次短暂处于不可打断状态后恢复，而通过相关技术对磁盘进行检测，可能会将进程或线程短暂处于不可打断状态的正常磁盘判定为处于故障状态的磁盘，造成误判，难以保证检测结果的准确性。In related technologies, whether all the processes or threads of each disk in the cloud database are in an unresponsive state is detected multiple times to determine whether the detected disk is faulty. However, because the process or thread of a normal disk may be in an uninterruptible state for a few times and then resume, and the detection of the disk through related technologies may determine a normal disk whose process or thread is in an uninterruptible state for a short time as being in an uninterruptible state. A disk in a faulty state will cause misjudgment, making it difficult to guarantee the accuracy of the detection results.

本申请实施例通过结合读取时间、磁盘读取进程的状态和磁盘读取进程的进程标识符共同判断出磁盘的状态。如果磁盘读取进程出现多次短暂处于无响应状态后又恢复，不存在误判，因为磁盘读取进程短暂处于无响应状态后恢复，读取时间和磁盘读取进程的进程标识符会正常更新，不满足本申请实施例中的判定磁盘为故障状态的条件，不会将正常状态的磁盘误判为故障状态的磁盘。因此，本申请实施例能够避免将进程短暂处于无响应状态的正常磁盘判定为处于故障状态的磁盘，减少误判的可能性，提高了检测的准确性。In the embodiment of the present application, the state of the disk is jointly determined by combining the reading time, the state of the disk reading process, and the process identifier of the disk reading process. If the disk reading process recovers after being temporarily unresponsive for many times, there is no misjudgment, because the disk reading process recovers after being temporarily unresponsive, the reading time and the process identifier of the disk reading process will be updated normally , the conditions for determining that the disk is in a fault state in the embodiment of the present application are not met, and a disk in a normal state will not be misjudged as a disk in a fault state. Therefore, the embodiment of the present application can avoid judging a normal disk whose process is temporarily in an unresponsive state as a disk in a faulty state, reduces the possibility of misjudgment, and improves detection accuracy.

在另一种相关技术中，通过向磁盘的进程所运行的代码中添加检测代码，对云数据库内核进行侵入式的改造，以获取磁盘的进程是否处于不可打断状态，从而确定出磁盘是否故障。然而，对云数据库进行侵入式改造，检测所需的成本高，并且检测过程繁杂，对于不同构造或不同规格的磁盘，检测方法不具备通用性。In another related technology, by adding detection code to the code running by the disk process, the cloud database kernel is invasively modified to obtain whether the disk process is in an uninterruptible state, so as to determine whether the disk is faulty . However, intrusive transformation of cloud database requires high cost and complicated detection process, and the detection method is not universal for disks with different structures or specifications.

本申请实施例通过磁盘读取进程周期性的读取待检测的磁盘，并通过跳读的方式减少读取输入输出数据的数量和时间，降低对磁盘的输入输出资源的消耗，实现高效检测。并且无需对云数据库或云计算业务进行侵入式改造，对于不同构造或不同规格的磁盘，检测方法具备通用性。In the embodiment of the present application, the disk to be detected is periodically read through the disk reading process, and the number and time of reading input and output data are reduced by means of skipping, reducing the consumption of input and output resources of the disk, and realizing efficient detection. And there is no need for intrusive modification of cloud database or cloud computing services, and the detection method is universal for disks of different structures or specifications.

综上所述，本申请通过比较获取的多个更新信息是否相同以及分析磁盘读取进程的多个状态实现检测，检测成本低且检测过程简单，检测方法具备通用性，根据磁盘读取进程的更新信息和状态所确定的磁盘状态具有准确性。In summary, this application realizes the detection by comparing whether multiple updated information obtained are the same and analyzing multiple states of the disk reading process. The detection cost is low and the detection process is simple. The detection method is universal. According to the disk reading process Update information and status to determine disk status with accuracy.

以上介绍了本申请实施例提供的磁盘状态的检测方法，与上述方法对应，本申请实施例还提供了一种磁盘状态的检测装置。其中，该装置应用于提供云计算服务的设备。该装置用于通过图5所示的各个模块执行上述图2所示的方法。如图5所示，本申请实施例提供的磁盘状态的检测装置包括如下几个模块。The method for detecting the state of the disk provided by the embodiment of the present application is described above. Corresponding to the above method, the embodiment of the present application also provides a detection device for the state of the disk. Wherein, the device is applied to equipment providing cloud computing services. The device is used to execute the above-mentioned method shown in FIG. 2 through each module shown in FIG. 5 . As shown in FIG. 5 , the device for detecting the disk state provided by the embodiment of the present application includes the following modules.

获取模块501，用于在检测周期内多次获取磁盘读取进程的状态，磁盘读取进程用于读取磁盘的数据；获取模块501，还用于在检测周期内多次获取磁盘读取进程的更新信息，更新信息包括数据的读取时间，数据的读取时间用于指示磁盘读取进程读取磁盘的数据的时间，相邻的两个读取时间所读取的两个数据在磁盘中具有间隔空间；确定模块502，用于根据获取的多个更新信息是否相同以及磁盘读取进程的多个状态，确定磁盘的磁盘状态。The obtainingmodule 501 is used to obtain the state of the disk reading process multiple times in the detection cycle, and the disk reading process is used to read the data of the disk; the obtainingmodule 501 is also used to obtain the disk reading process multiple times in the detection cycle The update information, the update information includes the read time of the data, the read time of the data is used to indicate the time when the disk read process reads the data on the disk, and the two data read by two adjacent read times are on the disk There is an interval space; a determiningmodule 502, configured to determine the disk state of the disk according to whether the obtained multiple update information is the same and multiple states of the disk reading process.

在一种可能的实现方式中，更新信息还包括磁盘读取进程的进程标识符，进程标识符用于标识读取磁盘数据的磁盘读取进程；确定模块502，还用于基于获取的多个读取时间和多个进程标识符，确定多个更新信息是否相同。In a possible implementation, the update information also includes the process identifier of the disk reading process, and the process identifier is used to identify the disk reading process that reads the disk data; the determiningmodule 502 is also used to Read time and multiple process identifiers to determine if multiple updates are identical.

在一种可能的实现方式中，确定模块502，用于基于多个读取时间相同以及多个进程标识符相同，确定多个更新信息相同；或者，基于多个读取时间不完全相同以及多个进程标识符不完全相同，确定多个更新信息不完全相同。In a possible implementation manner, thedetermination module 502 is configured to determine that multiple update information are the same based on multiple read times being the same and multiple process identifiers being the same; or, based on multiple read times that are not exactly the same and multiple The process identifiers are not exactly the same, and it is determined that the update information is not exactly the same.

在一种可能的实现方式中，确定模块502，用于基于多个更新信息相同以及磁盘读取进程的多个状态均为无响应状态，将磁盘的磁盘状态确定为故障状态。In a possible implementation manner, the determiningmodule 502 is configured to determine the disk state of the disk as a failure state based on the fact that multiple update information are the same and multiple states of the disk reading process are all non-responsive.

在一种可能的实现方式中，确定模块502，用于基于多个更新信息不完全相同以及磁盘读取进程的多个状态中存在运行状态，将磁盘的磁盘状态确定为正常状态。In a possible implementation manner, the determiningmodule 502 is configured to determine the disk state of the disk as a normal state based on the fact that the multiple update information is not completely the same and there is a running state in the multiple states of the disk reading process.

应理解的是，上述图5提供的装置在实现其功能时，所具备的有益效果与图2提供的磁盘状态的检测方法所具备的有益效果相同，此处不再赘述。另外，图5提供的装置在实现其功能时，仅以上述各功能模块的划分进行举例说明，实际应用中，可以根据需要而将上述功能分配由不同的功能模块完成，即将设备的内部结构划分成不同的功能模块，以完成以上描述的全部或者部分功能。另外，上述实施例提供的装置与方法实施例属于同一构思，其具体实现过程详见方法实施例，这里不再赘述。It should be understood that, when the device provided in FIG. 5 realizes its functions, the beneficial effect possessed is the same as the beneficial effect possessed by the method for detecting the state of the disk provided in FIG. 2 , which will not be repeated here. In addition, when the device provided in Figure 5 realizes its functions, it only uses the division of the above-mentioned functional modules as an example for illustration. In practical applications, the above-mentioned function allocation can be completed by different functional modules according to needs, that is, the internal structure of the device is divided into into different functional modules to complete all or part of the functions described above. In addition, the device and the method embodiment provided by the above embodiment belong to the same idea, and the specific implementation process thereof is detailed in the method embodiment, and will not be repeated here.

另外，在上述磁盘状态的检测装置中，获取模块501和确定模块502均可以通过软件实现，或者可以通过硬件实现。示例性的，接下来以获取模块501为例，介绍获取模块501的实现方式。类似的，确定模块502以及其他模块的实现方式可以参考获取模块501的实现方式。In addition, in the above-mentioned apparatus for detecting the state of the magnetic disk, both the obtainingmodule 501 and the determiningmodule 502 may be implemented by software, or may be implemented by hardware. Exemplarily, the following takes the obtainingmodule 501 as an example to introduce the implementation manner of the obtainingmodule 501 . Similarly, the implementation manner of thedetermination module 502 and other modules may refer to the implementation manner of theacquisition module 501 .

模块作为软件功能单元的一种举例，获取模块501可以包括运行在计算实例上的代码。其中，计算实例可以包括物理主机(计算设备)、虚拟机、容器中的至少一种。进一步地，上述计算实例可以是一台或者多台。例如，获取模块501可以包括运行在多个主机/虚拟机/容器上的代码。需要说明的是，用于运行该代码的多个主机/虚拟机/容器可以分布在相同的区域(region)中，也可以分布在不同的region中。进一步地，用于运行该代码的多个主机/虚拟机/容器可以分布在相同的可用区(availability zone，AZ)中，也可以分布在不同的AZ中，每个AZ包括一个数据中心或多个地理位置相近的数据中心。其中，通常一个region可以包括多个AZ。A module is an example of a software functional unit, and the obtainingmodule 501 may include codes running on computing instances. Wherein, the computing instance may include at least one of a physical host (computing device), a virtual machine, and a container. Further, the above computing instances may be one or more. For example, theacquisition module 501 may include code running on multiple hosts/virtual machines/containers. It should be noted that multiple hosts/virtual machines/containers used to run the code can be distributed in the same region (region), or in different regions. Furthermore, multiple hosts/virtual machines/containers used to run the code can be distributed in the same availability zone (availability zone, AZ), or in different AZs, and each AZ includes one data center or multiple geographically close data centers. Among them, usually a region can include multiple AZs.

同样，用于运行该代码的多个主机/虚拟机/容器可以分布在同一个虚拟私有云(virtual private cloud，VPC)中，也可以分布在多个VPC中。其中，通常一个VPC设置在一个region内，同一region内两个VPC之间，以及不同region的VPC之间跨区通信需在每个VPC内设置通信网关，经通信网关实现VPC之间的互连。Likewise, multiple hosts/virtual machines/containers for running the code can be distributed in the same virtual private cloud (virtual private cloud, VPC), or in multiple VPCs. Among them, usually a VPC is set in a region, and cross-region communication between two VPCs in the same region and between VPCs in different regions needs to set up a communication gateway in each VPC, and realize the interconnection between VPCs through the communication gateway. .

模块作为硬件功能单元的一种举例，获取模块501可以包括至少一个计算设备。或者，获取模块501也可以是利用专用集成电路(application-specific integratedcircuit，ASIC)实现、或可编程逻辑器件(programmable logic device，PLD)实现的设备等。其中，上述PLD可以是复杂程序逻辑器件(complex programmable logical device，CPLD)、现场可编程门阵列(field-programmable gate array，FPGA)、通用阵列逻辑(generic array logic，GAL)或其任意组合实现。A module is an example of a hardware functional unit, and the obtainingmodule 501 may include at least one computing device. Alternatively, the obtainingmodule 501 may also be a device implemented by an application-specific integrated circuit (application-specific integrated circuit, ASIC), or a programmable logic device (programmable logic device, PLD). Wherein, the above-mentioned PLD may be realized by complex programmable logic device (complex programmable logical device, CPLD), field-programmable gate array (field-programmable gate array, FPGA), general array logic (generic array logic, GAL) or any combination thereof.

获取模块501包括的多个计算设备可以分布在相同的region中，也可以分布在不同的region中。获取模块501包括的多个计算设备可以分布在相同的AZ中，也可以分布在不同的AZ中。同样，获取模块501包括的多个计算设备可以分布在同一个VPC中，也可以分布在多个VPC中。其中，该多个计算设备可以是服务器、ASIC、PLD、CPLD、FPGA和GAL等计算设备的任意组合。Multiple computing devices included in the acquiringmodule 501 may be distributed in the same region, or in different regions. Multiple computing devices included in the obtainingmodule 501 may be distributed in the same AZ, or may be distributed in different AZs. Similarly, multiple computing devices included in the obtainingmodule 501 may be distributed in the same VPC, or may be distributed in multiple VPCs. Wherein, the plurality of computing devices may be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.

需要说明的是，在其他实施例中，获取模块501可以用于执行磁盘状态的检测方法中的任意步骤，即，获取模块501和确定模块502负责实现的步骤可根据需要指定，通过获取模块501和确定模块502分别实现磁盘状态的检测方法中不同的步骤来实现磁盘状态的检测装置的全部功能。另外，上述实施例提供的磁盘状态的检测装置与磁盘状态的检测方法实施例属于同一构思，其具体实现过程详见方法实施例，这里不再赘述。It should be noted that, in other embodiments, the obtainingmodule 501 can be used to execute any step in the detection method of the disk state, that is, the steps that the obtainingmodule 501 and the determiningmodule 502 are responsible for implementing can be specified as required, and the obtainingmodule 501 Thedetermination module 502 respectively implements different steps in the disk state detection method to realize all functions of the disk state detection device. In addition, the device for detecting the state of the disk provided by the above embodiment and the embodiment of the method for detecting the state of the disk belong to the same idea, and its specific implementation process is detailed in the method embodiment, and will not be repeated here.

本申请还提供了一种计算设备，能够配置为上述实施场景中的设备。参考图6，图6是本申请实施例提供的一种计算设备的硬件结构示意图。如图6所示，计算设备600包括：总线602、处理器604、存储器606和通信接口608。处理器604、存储器606和通信接口608之间通过总线602通信。应理解，本申请不限定计算设备600中的处理器、存储器的个数。The present application also provides a computing device, which can be configured as the device in the above implementation scenario. Referring to FIG. 6 , FIG. 6 is a schematic diagram of a hardware structure of a computing device provided by an embodiment of the present application. As shown in FIG. 6 , computing device 600 includes: bus 602 , processor 604 , memory 606 and communication interface 608 . The processor 604 , the memory 606 and the communication interface 608 communicate through the bus 602 . It should be understood that the present application does not limit the number of processors and memories in the computing device 600 .

总线602可以是外设部件互连标准(peripheral component interconnect，PCI)总线或扩展工业标准结构(extended industry standard architecture，EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示，图6中仅用一条线表示，但并不表示仅有一根总线或一种类型的总线。总线602可包括在计算设备600各个部件(例如，存储器606、处理器604、通信接口608)之间传送信息的通路。The bus 602 may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus, an extended industry standard architecture (extended industry standard architecture, EISA) bus, or the like. The bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one line is used in FIG. 6 , but it does not mean that there is only one bus or one type of bus. Bus 602 may include pathways for communicating information between various components of computing device 600 (eg, memory 606 , processor 604 , communication interface 608 ).

处理器604可以包括中央处理器(central processing unit，CPU)、图形处理器(graphics processing unit，GPU)、微处理器(micro processor，MP)或者数字信号处理器(digital signal processor，DSP)等处理器中的任意一种或多种。The processor 604 may include processing such as a central processing unit (central processing unit, CPU), a graphics processing unit (graphics processing unit, GPU), a microprocessor (micro processor, MP) or a digital signal processor (digital signal processor, DSP). Any one or more of them.

存储器606可以包括易失性存储器(volatile memory)，例如随机存取存储器(random access memory，RAM)。处理器604还可以包括非易失性存储器(non-volatilememory)，例如只读存储器(read-only memory，ROM)，快闪存储器，机械硬盘(hard diskdrive，HDD)或固态硬盘(solid state drive，SSD)。The memory 606 may include a volatile memory (volatile memory), such as a random access memory (random access memory, RAM). The processor 604 may also include a non-volatile memory (non-volatile memory), such as a read-only memory (read-only memory, ROM), a flash memory, a mechanical hard disk (hard diskdrive, HDD) or a solid state disk (solid state drive, SSD).

存储器606中存储有可执行的程序代码，处理器604执行该可执行的程序代码以分别实现前述获取模块501和确定模块502的功能，从而实现磁盘状态的检测方法。也即，存储器606上存有用于执行磁盘状态的检测方法的指令。Executable program codes are stored in the memory 606 , and the processor 604 executes the executable program codes to respectively implement the functions of theacquisition module 501 and thedetermination module 502 , thereby realizing the disk state detection method. That is, the memory 606 stores instructions for executing the detection method of the disk state.

通信接口608使用例如但不限于网络接口卡、收发器一类的收发模块，来实现计算设备600与其他设备或通信网络之间的通信。The communication interface 608 implements communication between the computing device 600 and other devices or communication networks using transceiver modules such as but not limited to network interface cards and transceivers.

本申请实施例还提供了一种计算设备集群。该计算设备集群包括至少一台计算设备。该计算设备可以配置为上述实施场景中的设备。The embodiment of the present application also provides a computing device cluster. The cluster of computing devices includes at least one computing device. The computing device may be configured as the device in the above implementation scenario.

图7是本申请实施例提供的一种计算设备集群的结构示意图。如图7所示，该计算设备集群包括至少一个计算设备600。计算设备集群中的一个或多个计算设备600中的存储器606中可以存有相同的用于执行磁盘状态的检测方法的指令。FIG. 7 is a schematic structural diagram of a computing device cluster provided by an embodiment of the present application. As shown in FIG. 7 , the computing device cluster includes at least one computing device 600 . The memory 606 of one or more computing devices 600 in the computing device cluster may store the same instructions for performing the disk state detection method.

在一些可能的实现方式中，该计算设备集群中的一个或多个计算设备600的存储器606中也可以分别存有用于执行磁盘状态的检测方法的部分指令。换言之，一个或多个计算设备600的组合可以共同执行用于执行磁盘状态的检测方法的指令。In some possible implementation manners, the memory 606 of one or more computing devices 600 in the computing device cluster may also respectively store some instructions for executing the disk state detection method. In other words, a combination of one or more computing devices 600 can jointly execute the instructions for performing the detection method of the disk state.

需要说明的是，计算设备集群中的不同的计算设备600中的存储器606可以存储不同的指令，分别用于执行磁盘状态的检测装置的部分功能。也即，不同的计算设备600中的存储器606存储的指令可以实现获取模块501和确定模块502中的一个或多个模块的功能。It should be noted that the memories 606 in different computing devices 600 in the computing device cluster may store different instructions, which are respectively used to execute part of the functions of the apparatus for detecting the disk state. That is, the instructions stored in the memory 606 in different computing devices 600 may realize the functions of one or more modules in the obtainingmodule 501 and the determiningmodule 502 .

在一些实施例中，计算设备集群中的一个或多个计算设备可以通过网络连接。其中，该网络可以是广域网或局域网等等。图8是本申请实施例提供的一种计算设备集群的连接方式示意图。如图8所示，两个计算设备600之间通过网络进行连接。具体地，通过各个计算设备中的通信接口与该网络进行连接。In some embodiments, one or more computing devices in a cluster of computing devices may be connected by a network. Wherein, the network may be a wide area network or a local area network or the like. FIG. 8 is a schematic diagram of a connection manner of a computing device cluster provided by an embodiment of the present application. As shown in FIG. 8 , two computing devices 600 are connected through a network. Specifically, each computing device is connected to the network through a communication interface.

应理解的是，图8中示出的计算设备600的功能也可以由多个计算设备600完成。It should be understood that the functions of the computing device 600 shown in FIG. 8 may also be performed by multiple computing devices 600 .

在示例性实施例中，提供了一种包含指令的计算机程序产品，当该指令被计算设备集群运行时，使得该计算设备集群执行如图2所示的磁盘状态的检测方法。该计算机程序产品可以为一个软件安装包，在需要实现前述计算设备集群的功能的情况下，可以下载该计算机程序产品并在计算设备集群上执行该计算机程序产品。In an exemplary embodiment, a computer program product including instructions is provided, and when the instructions are executed by a cluster of computing devices, the cluster of computing devices executes the method for detecting the state of a disk as shown in FIG. 2 . The computer program product may be a software installation package, and the computer program product may be downloaded and executed on the computing device cluster when it is necessary to realize the aforementioned functions of the computing device cluster.

在示例性实施例中，提供了一种计算机可读存储介质，包括计算机程序指令，当该计算机程序指令由计算设备集群执行时，该计算设备集群执行如图2所示的磁盘状态的检测方法。该存储介质包括但不限于易失性存储器，例如随机访问存储器，非易失性存储器，例如快闪存储器、硬盘、固态硬盘。In an exemplary embodiment, a computer-readable storage medium is provided, including computer program instructions. When the computer program instructions are executed by a cluster of computing devices, the cluster of computing devices executes the detection method of the disk state as shown in FIG. 2 . The storage medium includes but not limited to volatile memory, such as random access memory, and nonvolatile memory, such as flash memory, hard disk, and solid state hard disk.

在示例性实施例中，提供了一种计算设备集群，包括至少一个计算设备，每个计算设备包括处理器，处理器与存储器耦合；该至少一个计算设备的处理器用于执行该至少一个计算设备的存储器中存储的指令，以使得该计算设备集群执行如图2所示的磁盘状态的检测方法。In an exemplary embodiment, a computing device cluster is provided, including at least one computing device, each computing device includes a processor, and the processor is coupled to a memory; the processor of the at least one computing device is used to execute the at least one computing device Instructions stored in the memory, so that the computing device cluster executes the detection method of the disk state as shown in FIG. 2 .

在上述实施例中，可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时，可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时，全部或部分地产生按照本申请所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中，或者从一个计算机可读存储介质向另一个计算机可读存储介质传输，例如，所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质，(例如，软盘、硬盘、磁带)、光介质(例如，DVD)、或者半导体介质(例如固态硬盘Solid StateDisk)等。In the above embodiments, all or part of them may be implemented by software, hardware, firmware or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the present application will be generated in whole or in part. The computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server, or data center by wired (eg, coaxial cable, optical fiber, DSL) or wireless (eg, infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media. The available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk).

本申请中术语“第一”“第二”等字样用于对作用和功能基本相同的相同项或相似项进行区分，应理解，“第一”、“第二”、“第n”之间不具有逻辑或时序上的依赖关系，也不对数量和执行顺序进行限定。还应理解，尽管以下描述使用术语第一、第二等来描述各种元素，但这些元素不应受术语的限制。这些术语只是用于将一元素与另一元素区别分开。In this application, the terms "first" and "second" are used to distinguish the same or similar items with basically the same function and function. It should be understood that "first", "second" and "nth" There are no logical or timing dependencies, nor are there restrictions on quantity or order of execution. It should also be understood that although the following description uses the terms first, second, etc. to describe various elements, these elements should not be limited by the terms. These terms are only used to distinguish one element from another.

还应理解，在本申请的各个实施例中，各个过程的序号的大小并不意味着执行顺序的先后，各过程的执行顺序应以其功能和内在逻辑确定，而不应对本申请实施例的实施过程构成任何限定。It should also be understood that in each embodiment of the present application, the size of the sequence numbers of the various processes does not mean the order of execution, and the execution order of the processes should be determined by their functions and internal logic, rather than by the implementation order of the embodiments of the present application. The implementation process constitutes no limitation.

本申请中术语“至少一个”的含义是指一个或多个，本申请中术语“多个”的含义是指两个或两个以上，例如，多个第二设备是指两个或两个以上的第二设备。本文中术语“系统”和“网络”经常可互换使用。The meaning of the term "at least one" in this application refers to one or more, the meaning of the term "multiple" in this application refers to two or more, for example, a plurality of second devices refers to two or two above the second device. The terms "system" and "network" are often used interchangeably herein.

应理解，在本文中对各种所述示例的描述中所使用的术语只是为了描述特定示例，而并非旨在进行限制。如在对各种所述示例的描述和所附权利要求书中所使用的那样，单数形式“一个(“a”，“an”)”和“该”旨在也包括复数形式，除非上下文另外明确地指示。It is to be understood that the terminology used in describing the various described examples herein is for the purpose of describing particular examples only and is not intended to be limiting. As used in the description of the various described examples and in the appended claims, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context dictates otherwise Clearly instruct.

还应理解，本文中所使用的术语“和/或”是指并且涵盖相关联的所列出的项目中的一个或多个项目的任何和全部可能的组合。术语“和/或”，是一种描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况。另外，本申请中的字符“/”，一般表示前后关联对象是一种“或”的关系。It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. The term "and/or" is an association relationship describing associated objects, which means that there may be three kinds of relationships, for example, A and/or B can mean: A exists alone, A and B exist simultaneously, and B exists independently. situation. In addition, the character "/" in this application generally indicates that the contextual objects are an "or" relationship.

需要说明的是，本申请所涉及的信息、数据以及信号，均为经用户授权或者经过各方充分授权的，且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。例如，本申请中涉及到的更新信息都是在充分授权的情况下获取的。It should be noted that the information, data and signals involved in this application are all authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data must comply with relevant laws, regulations and standards of relevant countries and regions . For example, the updated information involved in this application is obtained under full authorization.

还应理解，术语“若”和“如果”可被解释为意指“当...时”(“when”或“upon”)或“响应于确定”或“响应于检测到”。类似地，根据上下文，短语“若确定...”或“若检测到[所陈述的条件或事件]”可被解释为意指“在确定...时”或“响应于确定...”或“在检测到[所陈述的条件或事件]时”或“响应于检测到[所陈述的条件或事件]”。It should also be understood that the terms "if" and "if" may be construed to mean "when" ("when" or "upon") or "in response to determining" or "in response to detecting". Similarly, depending on the context, the phrases "if it is determined..." or "if [the stated condition or event] is detected" may be construed to mean "when determining" or "in response to determining... ” or “upon detection of [stated condition or event]” or “in response to detection of [stated condition or event]”.

以上所述仅为本申请的实施例，并不用以限制本申请，凡在本申请的原则之内，所作的任何修改、等同替换、改进等，均应包含在本申请的保护范围之内。The above descriptions are only examples of the present application, and are not intended to limit the present application. Any modifications, equivalent replacements, improvements, etc. made within the principles of the present application shall be included within the protection scope of the present application.