Movatterモバイル変換


[0]ホーム

URL:


CN111124724A - A node fault testing method and device for a distributed block storage system - Google Patents

A node fault testing method and device for a distributed block storage system
Download PDF

Info

Publication number
CN111124724A
CN111124724ACN201911120925.9ACN201911120925ACN111124724ACN 111124724 ACN111124724 ACN 111124724ACN 201911120925 ACN201911120925 ACN 201911120925ACN 111124724 ACN111124724 ACN 111124724A
Authority
CN
China
Prior art keywords
execution
node
test
storage system
script
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911120925.9A
Other languages
Chinese (zh)
Other versions
CN111124724B (en
Inventor
李军站
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Metabrain Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co LtdfiledCriticalSuzhou Inspur Intelligent Technology Co Ltd
Priority to CN201911120925.9ApriorityCriticalpatent/CN111124724B/en
Publication of CN111124724ApublicationCriticalpatent/CN111124724A/en
Application grantedgrantedCritical
Publication of CN111124724BpublicationCriticalpatent/CN111124724B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The invention provides a node fault testing method and a device of a distributed block storage system, which comprises the following steps: setting a test script according to a preset client service model; selecting a test node according to the node information of the storage system; sending the test script to a test node; executing the test script according to a preset execution mode and generating an execution log; and monitoring the node state and the client service execution result of the storage system in the test script execution process. The node fault testing method and device of the distributed block storage system provided by the invention have the advantages that the testing cost is reduced, the testing efficiency is improved, and the testing flexibility and the software reliability are improved.

Description

Node fault testing method and device of distributed block storage system
Technical Field
The invention relates to the technical field of storage system testing, in particular to a node fault testing method and device of a distributed block storage system.
Background
The distributed block storage system is widely applied to the fields of IT enterprises, cloud computing, big data, virtualization and the like, and meanwhile, the reliability requirements of the fields on the storage system are higher and higher. The reliability of the storage system is a fatal problem, and the reliability must be discovered and solved as early as possible in the project development stage. In practical application of a client, node failures (such as unexpected jamming, shutdown, restart, power failure and the like of a certain node) are often encountered, and a storage system needs to ensure that the storage system can still normally operate in a redundancy rule, so that it is very important to perform node failure testing in a project development stage. Due to the fact that the service models used by customers are various, the service of each node of the storage system is different, and the redundancy rules of the storage system are different, the number of node fault test cases is large. If the node fault is tested by pure manual, the labor capacity is large, the testing period is long, the testing coverage is not high, and the testing efficiency is low.
Disclosure of Invention
In view of the above disadvantages of the prior art, the present invention provides a node fault testing method and apparatus for a distributed block storage system.
In a first aspect, the present invention provides a node fault testing method for a distributed block storage system, including:
setting a test script according to a preset client service model;
selecting a test node according to the node information of the storage system;
sending the test script to a test node;
executing the test script according to a preset execution mode and generating an execution log;
and monitoring the node state and the client service execution result of the storage system in the test script execution process.
Further, before the executing the test script according to the preset execution mode and generating the execution log, the method further includes:
calling a volume mounting program according to a preset client service model;
performing clock synchronization with a storage system;
and issuing a test script execution command and synchronously starting monitoring of the storage system.
Further, the setting of the test script according to the preset customer service model includes:
setting test script execution content according to the client service model; the client business model comprises an 8k random read-write model, a 1024k sequence read-write model, an 8k and 1024k mixed read-write model, an OLTP business model and an OLAP business model; the test script execution content comprises reboot and power on and power off.
Further, the selecting a test node according to the node information of the storage system includes:
setting the number of fault nodes according to a redundancy rule of a storage system;
and selecting a test node according to the service type of each node.
Further, the executing the test script and generating the execution log according to the preset execution mode includes:
setting an execution sequence and an execution time of the test script according to the IP of the test node where the test script is located, wherein the execution sequence and the execution time ensure that at least one normal node exists in the same group of redundant nodes executing the same subtask;
sequentially executing test scripts under the corresponding test node IP according to the script execution sequence and the execution time;
and screening and outputting error information in the execution log.
Further, the monitoring of the node state and the client service execution result of the storage system during the execution of the test script includes:
monitoring whether the state of the test node is consistent with the execution condition of the test script on the test node, and if not, outputting the execution failure information of the test script under the test node;
monitoring the execution condition of customer service, generating a customer service operation record file, screening and outputting whether operation error information exists in the operation record file, wherein the operation error information comprises data inconsistency errors and flow break errors.
In a second aspect, the present invention provides a node fault testing apparatus for a distributed block storage system, including:
the script setting unit is configured for setting a test script according to a preset client service model;
the node selection unit is configured to select a test node according to the node information of the storage system;
the script issuing unit is configured to issue the test script to the test node;
the script execution unit is configured for executing the test script according to a preset execution mode and generating an execution log;
and the result monitoring unit is configured for monitoring the node state and the client service execution result of the storage system in the test script execution process.
Further, the node selecting unit includes:
the quantity determining module is configured for setting the quantity of the fault nodes according to the redundancy rule of the storage system;
and the position determining module is configured to select the test node according to the service type of each node.
Further, the script execution unit includes:
the execution setting module is configured to set an execution sequence and an execution time of the test script according to the IP of the test node where the test script is located, wherein the execution sequence and the execution time ensure that at least one normal node exists in a same group of redundant nodes executing the same subtask;
the script execution module is configured for sequentially executing the test scripts under the corresponding test node IP according to the script execution sequence and the execution time;
and the log output module is configured for screening and outputting the error information in the execution log.
Further, the result monitoring unit includes:
the execution monitoring module is configured to monitor whether the state of the test node is consistent with the execution condition of the test script on the test node or not, and if not, the execution monitoring module outputs the execution failure information of the test script under the test node;
and the task monitoring module is configured for monitoring the execution condition of the client service, generating a client service operation record file, screening and outputting whether the operation record file has operation error information, wherein the operation error information comprises data inconsistency errors and flow break errors.
The beneficial effect of the invention is that,
according to the node fault testing method and device of the distributed block storage system, the testing scripts for fault simulation at the testing nodes are output according to the client service model, and the execution sequence and the execution time of the testing scripts are preset. Selecting test nodes from the storage system according to the node information of the storage system, issuing the test scripts to the test nodes, and then sequentially executing the test scripts according to a preset execution sequence and execution time. After the test script is executed by the test node, an execution log of the script is generated, wherein the execution log is a part of test result data; in addition, in the whole process of executing all the test scripts, the node state of the storage system and the execution result of the customer service are monitored, and the two monitoring results are also test result data. And the tester can more intuitively analyze the fault stability of the storage system according to the test result data. The node fault testing method and device of the distributed block storage system provided by the invention have the advantages that the testing cost is reduced, the testing efficiency is improved, and the testing flexibility and the software reliability are improved.
In addition, the invention has reliable design principle, simple structure and very wide application prospect.
Drawings
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a schematic flow diagram of a method of one embodiment of the invention.
Fig. 2 is a schematic block diagram of an apparatus of one embodiment of the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
FIG. 1 is a schematic flow diagram of a method of one embodiment of the invention. The execution subject in fig. 1 may be a node failure testing system of a distributed block storage system.
As shown in fig. 1, the method 100 includes:
step 110, setting a test script according to a preset customer service model;
step 120, selecting a test node according to the node information of the storage system;
step 130, sending the test script to a test node;
step 140, executing the test script according to a preset execution mode and generating an execution log;
and 150, monitoring the node state and the client service execution result of the storage system in the test script execution process.
In order to facilitate understanding of the present invention, the node fault testing method of the distributed block storage system provided in the present invention is further described below with reference to the principle of the node fault testing method of the distributed block storage system of the present invention and in combination with the process of performing node fault testing on the distributed block storage system in the embodiment.
Specifically, the node fault testing method of the distributed block storage system includes:
the method comprises the steps of firstly establishing communication connection between a client and a distributed storage system server, wherein the communication connection adopts a gigabit network, a plurality of clients can be set as redundant setting, and the downtime of the client in the test process is avoided.
And S1, setting a test script according to the preset client service model.
Determining the number of created volumes (the volume number is generally the number of volumes divided into 4T space with 90% of the total capacity of the storage system) according to the state and the configuration of the storage system, setting a client service model in the node fault test process, such as an 8k random read-write model, a 1024k sequential read-write model, an 8k &1024k mixed read-write model, an oltp service model, an olap service model and the like, and supplementing and modifying according to project requirements.
After the customer service model is determined, different customer service models with different execution contents (reboot or power on and power off) of the test script are further set to match with the different execution contents, and the configuration is carried out according to the test requirements.
And S2, selecting a test node according to the node information of the storage system.
Determining the number of test nodes according to a redundancy rule (2 copies/3 copies) of the storage system, wherein the more copies are, the larger the number of test nodes is; and determining nodes (such as a main node, a secondary main node and a common node) which are specifically used as test nodes according to the service of each node of the storage system.
If the nodes of the initial storage system all normally operate, all the nodes can be used as test nodes, if the nodes of the storage system have the nodes which are just restarted and the nodes are in the data recovery period, if the nodes of the storage system have the 2-copy redundant system, the same group of redundant nodes of the nodes are in the normal working state, the nodes in the recovery period are used as the test nodes, and the purpose is to ensure that at least one normal node in the nodes processing the same subtask can normally process the task.
And S3, sending the test script to the test node.
And issuing the test script to the corresponding test nodes to ensure that each test node receives one test script, and returning a receiving success prompt after the test nodes receive the test scripts to avoid multiple sending or missed sending.
And S4, executing the test script according to the preset execution mode and generating an execution log.
When a client starts a client service, a volume mount program (such as 64 volumes and 32 volumes) is called first, then clock synchronization between the client and a storage system is carried out, then a test script is started, and a monitoring start command is sent to a storage system monitoring device while the test script is started. The storage system monitoring device can monitor the state of each node and the execution result of the customer service under the storage system.
The execution sequence and execution time of the test scripts are preset at the client, and are determined according to the service (executed subtask) of the execution node where each test script is located. And representing the test scripts by using the test node IP where the test scripts are located, wherein each test script corresponds to one test node IP. When the execution time is set, the restarting time of the script node and the data recovery time after restarting need to be considered, if two nodes execute the same subtask according to the redundancy rule, after the test script is executed on the node 1, the waiting time needs to exceed the sum of the restarting time and the data recovery time, and the test script of the node 2 can be executed after the waiting time is up. There is no time limit between nodes executing different subtasks.
And restarting or powering off and restarting the test node after each test script is executed, generating an execution log in the execution process of the test script, and recording the restart time of the test node, the error information of the system, the loss of the data of the test node before and after the restart and the like by the log.
And S5, monitoring the node state and the client service execution result of the storage system in the test script execution process.
The node state monitored by the monitoring device of the storage system in the whole testing process is collected, the node state comprises information such as whether the node runs and the corresponding time of state change, the node state is compared with the execution time of the test script, if the node state and the execution time of the test script are in accordance, the test script is successfully executed, and if the node state and the execution time of the test script are not in accordance, the execution is failed, and the execution failure information is output.
Collecting a monitored client service execution result, and determining whether data inconsistency and EIO (electronic article inspection) problems occur or not by checking an error. The logfile file is checked to determine if a cut-out has occurred and to determine the cut-out time.
If no error or failure information occurs in the execution log in step S4 and the monitoring information collected in this step, the node failure test of this time passes, otherwise, the test fails.
As shown in fig. 2, the system 200 includes:
a script setting unit 210 configured to set a test script according to a preset customer service model;
a node selecting unit 220 configured to select a test node according to the node information of the storage system;
the script issuing unit 230 is configured to issue the test script to the test node;
the script execution unit 240 is configured to execute the test script according to a preset execution mode and generate an execution log;
and the result monitoring unit 250 is configured to monitor the node state and the client service execution result of the storage system during the execution process of the test script.
Optionally, as an embodiment of the present invention, the node selecting unit includes:
the quantity determining module is configured for setting the quantity of the fault nodes according to the redundancy rule of the storage system;
and the position determining module is configured to select the test node according to the service type of each node.
Optionally, as an embodiment of the present invention, the script execution unit includes:
the execution setting module is configured to set an execution sequence and an execution time of the test script according to the IP of the test node where the test script is located, wherein the execution sequence and the execution time ensure that at least one normal node exists in a same group of redundant nodes executing the same subtask;
the script execution module is configured for sequentially executing the test scripts under the corresponding test node IP according to the script execution sequence and the execution time;
and the log output module is configured for screening and outputting the error information in the execution log.
Optionally, as an embodiment of the present invention, the result monitoring unit includes:
the execution monitoring module is configured to monitor whether the state of the test node is consistent with the execution condition of the test script on the test node or not, and if not, the execution monitoring module outputs the execution failure information of the test script under the test node;
and the task monitoring module is configured for monitoring the execution condition of the client service, generating a client service operation record file, screening and outputting whether the operation record file has operation error information, wherein the operation error information comprises data inconsistency errors and flow break errors.
Although the present invention has been described in detail by referring to the drawings in connection with the preferred embodiments, the present invention is not limited thereto. Various equivalent modifications or substitutions can be made on the embodiments of the present invention by those skilled in the art without departing from the spirit and scope of the present invention, and these modifications or substitutions are within the scope of the present invention/any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

CN201911120925.9A2019-11-152019-11-15 A node failure testing method and device for a distributed block storage systemActiveCN111124724B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201911120925.9ACN111124724B (en)2019-11-152019-11-15 A node failure testing method and device for a distributed block storage system

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201911120925.9ACN111124724B (en)2019-11-152019-11-15 A node failure testing method and device for a distributed block storage system

Publications (2)

Publication NumberPublication Date
CN111124724Atrue CN111124724A (en)2020-05-08
CN111124724B CN111124724B (en)2023-01-10

Family

ID=70495956

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201911120925.9AActiveCN111124724B (en)2019-11-152019-11-15 A node failure testing method and device for a distributed block storage system

Country Status (1)

CountryLink
CN (1)CN111124724B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN111953560A (en)*2020-07-172020-11-17苏州浪潮智能科技有限公司 Distributed cluster fault testing method and device
CN111949436A (en)*2020-08-102020-11-17星辰天合(北京)数据科技有限公司Test data verification method, verification device and computer readable storage medium
CN112667453A (en)*2020-12-252021-04-16深圳创新科技术有限公司Kicking disk testing method, terminal and computer storage medium
CN116684244A (en)*2023-06-152023-09-01深圳微众信用科技股份有限公司Dual-machine high availability implementation method, device, equipment and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN106556765A (en)*2017-01-042017-04-05国网浙江省电力公司电力科学研究院The method of testing and RTDS emulators of distributed distribution automation system
CN107391333A (en)*2017-08-142017-11-24郑州云海信息技术有限公司A kind of OSD disk failures method of testing and system
CN109559583A (en)*2017-09-272019-04-02华为技术有限公司Failure simulation method and its device
CN109634824A (en)*2018-12-032019-04-16郑州云海信息技术有限公司Distributed storage performance test methods and system under a kind of broadcasting and TV business scenario

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN106556765A (en)*2017-01-042017-04-05国网浙江省电力公司电力科学研究院The method of testing and RTDS emulators of distributed distribution automation system
CN107391333A (en)*2017-08-142017-11-24郑州云海信息技术有限公司A kind of OSD disk failures method of testing and system
CN109559583A (en)*2017-09-272019-04-02华为技术有限公司Failure simulation method and its device
CN109634824A (en)*2018-12-032019-04-16郑州云海信息技术有限公司Distributed storage performance test methods and system under a kind of broadcasting and TV business scenario

Cited By (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN111953560A (en)*2020-07-172020-11-17苏州浪潮智能科技有限公司 Distributed cluster fault testing method and device
CN111953560B (en)*2020-07-172022-02-25苏州浪潮智能科技有限公司 Distributed cluster fault testing method and device
CN111949436A (en)*2020-08-102020-11-17星辰天合(北京)数据科技有限公司Test data verification method, verification device and computer readable storage medium
CN112667453A (en)*2020-12-252021-04-16深圳创新科技术有限公司Kicking disk testing method, terminal and computer storage medium
CN116684244A (en)*2023-06-152023-09-01深圳微众信用科技股份有限公司Dual-machine high availability implementation method, device, equipment and medium

Also Published As

Publication numberPublication date
CN111124724B (en)2023-01-10

Similar Documents

PublicationPublication DateTitle
CN107660289B (en)Automatic network control
CN110716842B (en) Cluster fault detection method and device
CN111881014A (en)System test method, device, storage medium and electronic equipment
CN108897666A (en)Server failure log generation method and relevant device
CN111124724A (en) A node fault testing method and device for a distributed block storage system
CN112769922A (en)Device and method for self-starting micro-service cluster
CN116382968B (en)Fault detection method and device for external equipment
CN111309625A (en)Regression testing method and device based on real transaction data
CN111966599A (en)Virtualization platform reliability testing method, system, terminal and storage medium
US11550683B2 (en)Fault definition and injection process to simulate timing based errors in a distributed system
CN105027083B (en)Use the recovery routine of diagnostic result
CN114116330A (en) Server performance testing method, system, terminal and storage medium
CN111813872A (en) A method, device, and device for generating a troubleshooting model
CN115168236A (en)Automatic testing method, electronic device and storage medium
CN114205231A (en)Method and system for starting hadoop clusters in batch and readable storage medium
CN119669093B (en) Testing method, system, device, electronic equipment and medium for fault-tolerant system
CN111240990A (en)ISMCLI command line testing method, system, terminal and storage medium
CN115048244B (en) A server hardware repair method, system, computer equipment and medium
CN117118986B (en)Block chain-based fault tolerance verification method, device, equipment and medium
CN107153550A (en)A kind of computer fault diagnosis method based on interface
CN118519902A (en)Test method and computing device
CN117135075A (en) Test methods, devices, computer equipment and storage media for network equipment
CN118041779A (en)Method, device, computer equipment and storage medium for updating node information
CN117914741A (en)Node upgrading state determining method, node upgrading device and electronic equipment
CN115129308A (en) A system testing method, device, equipment and storage medium

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant
CP03Change of name, title or address

Address after:Building 9, No.1, guanpu Road, Guoxiang street, Wuzhong Economic Development Zone, Wuzhong District, Suzhou City, Jiangsu Province

Patentee after:Suzhou Yuannao Intelligent Technology Co.,Ltd.

Country or region after:China

Address before:Building 9, No.1, guanpu Road, Guoxiang street, Wuzhong Economic Development Zone, Wuzhong District, Suzhou City, Jiangsu Province

Patentee before:SUZHOU LANGCHAO INTELLIGENT TECHNOLOGY Co.,Ltd.

Country or region before:China


[8]ページ先頭

©2009-2025 Movatter.jp