Disclosure of Invention
In view of the foregoing, the present invention has been made to provide a fault simulation method and system, a distributed system testing method, which overcome or at least partially solve the foregoing problems.
The embodiment of the invention provides a fault simulation method, which comprises the following steps:
acquiring complex fault configuration information, wherein the complex fault configuration information comprises complex fault component configuration information and logic relations thereof;
Creating a complex fault injection task according to complex fault component configuration information and logic relation thereof included in the complex fault configuration information, wherein the complex fault injection task comprises a fault injection subtask and an execution time sequence thereof;
according to the execution time sequence, calling a fault injection subtask, and establishing an executable workflow object comprising the fault injection subtask;
And executing the workflow object, calling an execution assembly required by the fault injection subtask to execute corresponding operation of the complex fault component, and injecting the complex fault into the tested system.
In some alternative embodiments, the obtaining the complex fault configuration information includes:
and inquiring and acquiring complex fault configuration information which is stored in a database and is matched with the fault identification according to the fault identification included in the input fault injection instruction.
In some optional embodiments, the creating a complex fault injection task according to the complex fault component configuration information and the logic relationship thereof included in the complex fault configuration information includes:
generating a task name of a complex fault injection task according to the complex task name included in the complex fault configuration information;
generating task description of a complex fault injection task according to the complex task description included in the complex fault configuration information;
and respectively generating subtask names, subtask description information, subtask command parameters and subtask execution time of fault injection subtasks in the complex fault injection tasks according to complex fault component configuration information and logic relations thereof, wherein the complex fault component comprises simple faults, pre-fault preparation and post-fault processing.
In some alternative embodiments, according to the execution timing, invoking the fault injection subtask, and creating an executable workflow object including the fault injection subtask, including:
The method comprises the steps that a state in a complex fault injection task created through periodic scanning is a callable fault injection subtask, and when the executable fault injection subtask is determined to be executable according to the execution time of the callable fault injection subtask, the complex fault injection task is marked as an executable state;
and calling a workflow engine, periodically scanning the fault injection subtasks which are executable in the complex fault injection tasks, and adding the fault injection subtasks into executable workflow object fault injection subtasks based on the execution time sequence of the fault injection subtasks.
In some optional embodiments, executing the workflow object, calling an execution component required by the fault injection subtask to execute a corresponding operation of a complex fault component, to implement injection of the complex fault into the tested system, includes:
And sequentially acquiring the fault injection subtask to be executed currently from the workflow object, and calling a corresponding execution assembly to execute the operation required by the corresponding complex fault component according to the command parameter of the fault injection subtask aiming at the fault injection subtask to be executed currently until the fault injection subtask in the workflow object is executed.
In some alternative embodiments, sequentially obtaining the fault injection subtasks to be executed currently from the workflow object includes:
and acquiring the fault injection subtask which is successfully executed and the execution time of which reaches from the workflow object, and taking the fault injection subtask as the fault injection subtask to be executed currently.
In some alternative embodiments, invoking the corresponding execution assembly to perform the operations required by the corresponding complex failure component according to the command parameters of the failure injection subtask includes:
when the currently executed fault injection subtask is a simple fault injection task, according to the subtask command parameters, calling a fault simulation plug-in unit local to the target equipment to simulate a corresponding simple fault, and/or calling external system equipment to inject the corresponding simple fault on the target equipment;
When the currently executed fault injection subtask is a pre-fault preparation task and/or a post-fault processing task of a simple fault simulated by using a local fault simulation plug-in, according to the subtask command parameters, calling a pre-fault preparation action component and/or a post-fault processing action component of the local target equipment to execute corresponding action pre-fault preparation and/or post-fault processing operation;
When the currently executed fault injection subtask is a pre-fault preparation task and/or a post-fault processing task of a simple fault injected by using external system equipment, according to the subtask command parameters, a pre-action fault preparation operation instruction and/or a post-fault processing operation instruction is sent to the corresponding external system equipment.
In some alternative embodiments, the method further comprises:
The method comprises the steps of obtaining input configuration information of a complex fault component, and generating complex fault configuration information comprising the configuration information of the complex fault component and logic relations of the complex fault component, wherein the complex fault component comprises simple faults, pre-fault preparation and post-fault processing.
In some alternative embodiments, the obtaining the input configuration information of the complex fault component includes:
acquiring the name, description and command parameters of the complex fault component and the execution time of the complex fault component;
The execution time is the time point of the complex fault component or the delay time length relative to the precursor component;
The command parameters include at least one of target device, action content, duration, build action index of the complex fault component.
The embodiment of the invention also provides a fault simulation system, which comprises:
a fault injection subsystem for:
Acquiring complex fault configuration information, wherein the complex fault configuration information comprises complex fault component configuration information and logic relations thereof;
Creating a complex fault injection task according to complex fault component configuration information and logic relation thereof included in the complex fault configuration information, wherein the complex fault injection task comprises a fault injection subtask and an execution time sequence thereof;
according to the execution time sequence, calling a fault injection subtask, and establishing an executable workflow object comprising the fault injection subtask;
Executing the workflow object, calling an execution assembly required by the fault injection subtask to execute corresponding operation of a complex fault component, and injecting complex faults into a tested system;
And the workflow engine is used for controlling the execution of the workflow object.
In some alternative embodiments, the system further comprises:
The fault arrangement subsystem is used for acquiring the configuration information of the complex fault components and arranging the configuration information of the complex faults according to the configuration information of the complex fault components and the logic relationship thereof.
In some alternative embodiments, the fault orchestration subsystem comprises:
the complex fault scene arrangement interface is used for acquiring configuration information of complex fault components input by a user;
The fault description management module is used for generating complex fault configuration information comprising the complex fault component configuration information and logic relation thereof according to the input complex fault component configuration information, and calling the basic data engine to store the well-arranged complex fault configuration information into the database, wherein the complex fault component comprises simple faults, pre-fault preparation and post-fault processing.
In some alternative embodiments, the fault injection subsystem includes:
The fault injection interface is used for acquiring a fault injection instruction input by a user;
The fault injection engine is used for acquiring well-organized complex fault configuration information according to an input fault injection instruction, creating a complex fault injection task comprising a fault injection subtask and an execution time sequence thereof according to complex fault component configuration information and a logic relation included in the complex fault configuration information, calling the executable fault injection subtask in a polling mode according to the execution time sequence to create an executable workflow object comprising the executable fault injection subtask, executing the workflow object, calling an execution assembly required by the fault injection subtask to execute corresponding operation of the complex fault component, and realizing the injection of the complex fault into a tested system;
And the execution assembly is used for executing corresponding operation of the fault component.
In some alternative embodiments, the fault injection engine includes a fault injection task creation component for:
According to the fault identification included in the fault injection instruction, calling a basic data engine of a basic framework layer to inquire and acquire complex fault configuration information which is stored in a database and is matched with the fault identification;
Generating task names in complex fault injection tasks according to complex task names included in the complex fault configuration information;
Generating task description in the complex fault injection task according to the complex task description included in the complex fault configuration information;
And respectively generating subtask names, subtask description information, subtask command parameters and subtask execution time of fault injection subtasks in the complex fault injection tasks according to complex fault component configuration information and logic relations thereof, wherein the complex fault component configuration information comprises simple faults, pre-fault preparation and post-fault processing.
In some alternative embodiments, the fault injection engine includes a fault injection task scheduling component for:
The method comprises the steps that a state in a complex fault injection task created through periodic scanning is a callable fault injection subtask, and when the executable fault injection subtask is determined to be executable according to the execution time of the callable fault injection subtask, the complex fault injection task is marked as an executable state;
and calling a workflow engine, periodically scanning fault injection subtasks which are executable in the complex fault injection tasks, and adding the fault injection subtasks into the fault injection subtasks in the executable workflow objects based on the execution time sequence of the fault injection subtasks.
In some alternative embodiments, the fault injection engine includes a fault injection action component for:
And acquiring the fault injection subtask to be executed currently from the workflow object in turn under the control of the workflow engine, and calling a corresponding execution assembly to execute the operation required by the corresponding complex fault component according to the subtask command parameter aiming at the fault injection subtask to be executed currently until the fault injection subtask in the workflow object is executed.
The embodiment of the invention also provides a testing method of the distributed system, which comprises the following steps:
Acquiring a system test instruction through a test instruction input interface, and displaying a corresponding fault injection interface to a user according to a test requirement included in the system test instruction;
acquiring a fault injection instruction through a fault injection interface, acquiring well-organized complex fault configuration information according to the fault injection instruction, and injecting complex faults into a tested system by adopting the fault simulation method;
and acquiring performance data of the tested system after the complex fault injection.
In some alternative embodiments, the method further comprises:
Acquiring complex fault component configuration information through a complex fault scene arrangement interface, and pre-arranging the complex fault configuration information according to the acquired complex fault component configuration information and the logic relationship thereof.
In some alternative embodiments, the method further comprises:
and providing the test requirements and the acquired performance data of the tested system to a test terminal, and displaying the test requirements and the acquired performance data of the tested system to a user through a test result display interface.
The embodiment of the invention also provides a computer readable storage medium, on which computer instructions are stored, which when executed by a processor, implement the fault simulation method or the testing method of the distributed system.
The embodiment of the invention also provides computer equipment, which comprises a memory, a processor and a computer program and instructions stored on the memory and capable of running on the processor, wherein the processor realizes the fault simulation method or the testing method of the distributed system when executing the program.
The technical scheme provided by the embodiment of the invention has the beneficial effects that at least:
For complex faults to be injected, creating complex fault injection tasks comprising fault injection subtasks and execution time sequences thereof according to complex fault configuration information, calling the currently executable fault injection subtasks in the complex fault injection tasks based on the execution time sequences, adding the executable fault injection subtasks into a workflow object, executing each complex fault injection subtask in the complex fault injection tasks according to an upper time sequence relation based on a workflow engine, and realizing system test for injecting complex faults into each target device in a distributed system.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
A distributed system refers to a software system that builds on a network, in which a group of computer devices presents to a user a unified whole.
The chaotic engineering is a system robustness testing idea, and controllable faults are actively manufactured in a distributed system production environment, so that whether the fault tolerance of the system meets the requirement or not is verified, or unknown knowledge of the distributed system robustness is obtained.
In order to solve the problem that an automatic test system based on chaotic engineering in the prior art does not have the capability of simulating complex faults and cannot meet the test requirements of a distributed system, the embodiment of the invention provides a fault simulation method, which is used for arranging complex faults based on a logic relationship and forming complex fault injection tasks based on a time sequence relationship, task scheduling is carried out in a polling mode, task execution is realized based on a workflow technology, and the complex faults in a distributed system scene can be simulated well and the test requirements of the distributed system are met.
Example 1
The first embodiment of the invention provides a fault simulation method which can be used for testing a distributed system, and realizes the injection of complex faults in the distributed system by simulating faults caused by various reasons on various devices of the distributed system, and the simulation replay of the situations when the faults occur. The flow of the method is shown in fig. 1, and comprises the following steps:
and step S101, acquiring complex fault configuration information.
In the step, when a complex task is required to be injected into a tested system, the well-arranged complex fault configuration information is obtained according to an input fault injection instruction. The obtained complex fault configuration information comprises complex fault component configuration information and logic relations thereof.
The tested system can be a distributed system, complex fault configuration information is obtained through a fault simulation system, complex fault injection is realized, the fault simulation system can provide a man-machine interaction interface through a test terminal, such as a fault injection interface, a complex fault scene arrangement interface and the like, and the fault injection instruction, the complex fault configuration information and the like are obtained.
The complex faults can be arranged in advance, the obtained complex fault configuration information is stored in a database, and when a fault injection instruction input by a user is received, the complex fault configuration information matched with the fault identification and stored in the database is inquired and obtained according to the fault identification included in the input fault injection instruction.
Typically, a complex fault refers to a fault formed by multiple fault causes, including multiple simple faults, which may occur in a time-sequential relationship, such as at least one of simultaneous, predecessor, or successor, etc. In order to achieve the simulation of complex faults, pre-fault preparation and post-fault processing are also performed for simple faults in the complex faults to be injected. Thus, the injection of a complex fault will typically include multiple fault injection subtasks, each to accomplish injection of a different simple fault, pre-fault preparation, post-fault handling, etc.
The complex fault configuration information is pre-programmed for complex fault injection, and because the complex fault comprises a plurality of fault injection subtasks, one fault injection subtask can correspond to one complex fault component, and the complex fault component comprises simple faults, pre-fault preparation and post-fault treatment. Therefore, the corresponding complex fault configuration information may include configuration information of a plurality of complex fault components, and in order to execute each complex fault subtask according to a correct time sequence, the complex fault configuration information further includes logic relationships of the complex fault construction, such as execution time, execution sequence, and the like.
Step S102, creating a complex fault injection task according to complex fault component configuration information and logic relations thereof included in the complex fault configuration information. The complex fault injection tasks comprise fault injection subtasks and execution time sequences thereof.
And creating a complex fault injection task according to the complex fault configuration information so as to realize the injection of complex faults, wherein the complex fault injection task can be realized through a fault simulation system. The created complex fault injection task may include a task name, a task description, configuration information of each complex fault component and a logic relationship thereof, and specifically may include:
generating task names in complex fault injection tasks according to complex task names included in the complex fault configuration information;
Generating task description in the complex fault injection task according to the complex task description included in the complex fault configuration information;
And respectively generating subtask names, subtask description information, subtask command parameters and subtask execution time of each fault injection subtask in the complex fault injection task according to the complex fault component configuration information and the logic relation thereof, wherein the complex fault component comprises simple faults, pre-fault preparation and post-fault treatment. Each complex failure component may correspondingly create a failure injection subtask.
For example, the task name of the complex fault injection task may be complex fault 1, relevant content describing the complex fault 1 in the task description, such as system performance when the network card delay and the CPU are used for testing, the configuration information of the complex fault component 1 may include configuration information of the complex fault injection subtask 1, including subtask name of subtask 1, subtask description information of the network card delay, subtask command parameters including network card and delay time of the device A, network card and delay time of the device B, etc., subtask execution time of the device A is immediate or after a period of time, etc., and may include configuration information of the complex fault injection subtask 2, including subtask name of subtask 2, subtask description information of the CPU is exhausted, subtask command parameters including CPU exhaustion of the device A and the device B, etc., subtask execution time of the device A is immediate or after a period of time, etc. The subtask command parameters are different for different fault injection subtasks, and are specifically set according to the respective conditions.
Step S103, calling a fault injection subtask according to the execution time sequence, and establishing an executable workflow object comprising the fault injection subtask.
The method comprises the steps of periodically scanning and creating a complex fault injection sub-task, wherein the state in the complex fault injection task created by the periodic scanning is a callable fault injection sub-task, marking the complex fault injection sub-task as an executable state when the executable fault injection sub-task is determined according to the execution time of the callable fault injection sub-task, calling a workflow engine, adding the complex fault injection sub-task created by the periodic scanning as the executable fault injection sub-task in an executable workflow object based on the execution time sequence of the fault injection sub-task, and the complex fault injection sub-task can be also called a sub-task node.
And checking whether the execution time of each fault injection subtask in the complex fault injection task meets the requirement or not in a polling mode, and scheduling the fault injection subtask based on the execution time to create an executable workflow object. Whether the execution time meets the requirement can be judged according to the execution time sequence, the execution time sequence can comprise an execution time point of the fault injection subtask, the execution time point is reached or not later than the execution time point, the execution time is considered to be satisfactory, the execution time sequence can also comprise a precursor or a subsequent subtask of the fault injection subtask, and the execution time is considered to be satisfactory after all the precursor subtasks are completely executed.
In the above example, the execution sequence of the two fault injection subtasks of the network card delay and the CPU exhaustion is that the CPU exhaustion subtask is executed after the execution of the network card delay subtask is finished, then the two subtasks are executed in a polling mode, and based on the execution time, the network card delay subtask is scheduled first and then the CPU exhaustion subtask is scheduled, and an executable workflow object is created, wherein the execution sequence comprises the execution sequence of the network card delay subtask and the CPU exhaustion subtask.
Step S104, executing the established executable workflow object, calling an execution assembly required by the fault injection subtask to execute the corresponding operation of the complex fault component, and realizing the injection of the complex fault into the tested system.
And sequentially acquiring the fault injection subtask to be executed currently from the execution workflow object, and calling a corresponding execution assembly to execute the operation required by the corresponding fault component according to the command parameter of the fault injection subtask aiming at the fault injection subtask to be executed currently until the fault injection subtask in the workflow object is executed, so as to complete the complex fault injection of the tested system. And when the fault injection subtasks in the workflow object are executed, the fault injection subtasks are executed in sequence according to the predecessor and successor relations among the subtasks, and after all the fault injection subtasks in the complex fault injection tasks are executed, the injection of the complex fault is considered to be completed, and the injection of the complex fault is realized through injecting each fault component of the complex fault according to the time sequence.
And sequentially acquiring the fault injection subtasks to be executed currently from the workflow object, wherein the fault injection subtasks which are successfully executed and the execution time of which reaches are acquired from the workflow object and serve as the fault injection subtasks to be executed currently.
And if the current fault injection subtask fails to execute or is abnormal, stopping executing all subsequent fault injection subtasks. That is, the fault injection subtasks in the workflow object are executed to meet the above requirements. When the execution failure and the execution abnormality occur, the failure reason and the abnormality reason can be checked, and if the current fault injection subtask to be executed is successfully executed after the check, the subsequent fault injection subtask in the workflow object can be continuously executed.
In practical application, the fault injection subtask without the precursor fault injection subtask can be used as a special case that the precursor fault injection subtask is successfully executed.
Along with the above example, the network card delay subtask in the workflow object is executed first, then the CPU exhaustion subtask is executed after the execution is finished, and the injection of the complex fault 1 is completed after the execution is finished.
According to the command parameters of the fault injection subtask, invoking the corresponding execution assembly to execute the operation required by the corresponding fault component, and invoking the target equipment local or invoking the external system equipment to execute the corresponding operation according to the requirement of the subtask, including:
When the currently executed fault injection subtask is a simple fault injection task, according to the command parameters of the subtask, calling a fault simulation plug-in unit local to target equipment to simulate a corresponding simple fault, and/or calling external system equipment to inject the corresponding simple fault on the target equipment;
When the currently executed fault injection subtask is a pre-fault preparation task and/or a post-fault processing task of a simple fault simulated by using a local fault simulation plug-in, according to command parameters of the subtask, a local pre-fault preparation action component and/or a post-fault processing action component of target equipment are called to execute corresponding action pre-fault preparation and/or post-fault processing operations;
When the currently executed fault injection subtask is a pre-fault preparation task and/or a post-fault processing task of a simple fault injected by using external system equipment, according to command parameters of the subtask, a pre-fault preparation operation instruction and/or a post-fault processing operation instruction of actions are sent to corresponding external equipment.
The fault simulation method further comprises the step of arranging complex fault configuration information, wherein the process comprises the steps of obtaining the input configuration information of each complex fault component and generating complex fault configuration information comprising the complex fault component configuration information and logic relations thereof, and the complex fault components comprise simple faults, pre-fault preparation and post-fault processing.
The method comprises the steps of obtaining the name, description and command parameters of the complex fault components and execution time of the complex fault components;
the execution time is the time point of the complex fault component or the delay time length relative to the precursor component;
the command parameters include at least one of target device, action content, duration, and build action index of the complex fault component.
Embodiments of the present invention also provide a fault simulation system based on the same inventive concept, the system structure is shown in fig. 2, and the system structure includes a fault injection subsystem 22 and a workflow engine 23;
The fault injection subsystem 22 is used for acquiring complex fault configuration information, wherein the complex fault configuration information comprises complex fault component configuration information and logic relation thereof, creating complex fault injection tasks according to the complex fault component configuration information and logic relation thereof, wherein the complex fault injection tasks comprise fault injection subtasks and execution time sequences thereof;
A workflow engine 23 for controlling the execution of the workflow objects.
The system further comprises a fault arrangement subsystem 21 for obtaining complex fault component configuration information and arranging the complex fault configuration information according to the complex fault component configuration information and the logic relationship thereof.
The fault arrangement subsystem 21 is specifically configured to obtain configuration information of each complex fault component input through the complex fault scene arrangement interface, generate complex fault configuration information including the configuration information of the complex fault component and its logic relationship, and invoke the basic data engine to store the arranged complex fault configuration information into the database, where the complex fault component includes simple fault, pre-fault preparation and post-fault processing.
The fault injection subsystem 22 is configured to obtain organized complex fault configuration information according to an input fault injection instruction, and includes that a fault injection engine in the fault injection subsystem obtains the input fault injection instruction through a fault injection interface, and invokes a basic data engine of a basic framework layer to query and obtain complex fault configuration information matched with a fault identifier stored in a database according to the fault identifier included in the fault injection instruction.
The fault injection subsystem 22 is configured to create a complex fault injection task including a fault injection subtask and an execution time sequence thereof according to complex fault component configuration information and logic relation thereof included in the complex fault configuration information, and includes a fault injection engine in the fault injection subsystem generating a task name in the complex fault injection task according to a complex task name included in the complex fault configuration information, generating a task description in the complex fault injection task according to a complex task description included in the complex fault configuration information, and generating a subtask name, a subtask description information, a subtask command parameter and a subtask execution time of the fault injection subtask in the complex fault injection task according to the complex fault component configuration information and logic relation thereof included in the complex fault configuration information.
The fault injection subsystem 22 is configured to call the executable fault injection subtask in a polling manner according to the execution timing, and establish an executable workflow object including the executable fault injection subtask, including:
the method comprises the steps that a fault injection engine in a fault injection subsystem periodically scans and creates a complex fault injection task, wherein the state of the complex fault injection task is a callable fault injection subtask, and when the execution time of the callable fault injection subtask is determined to be executable, the complex fault injection task is marked as an executable state;
and calling a workflow engine, periodically scanning fault injection subtasks which are executable in the complex fault injection tasks, and adding the fault injection subtasks into the fault injection subtasks in the executable workflow objects based on the execution time sequence of the fault injection subtasks.
The fault injection subsystem 22 is configured to execute a workflow object, call an action execution component required by each fault injection subtask to execute a corresponding operation of each fault component, and implement injection of a complex fault into the tested system, where the fault injection subsystem includes:
The fault injection engine in the fault injection subsystem sequentially acquires the fault injection subtasks to be executed currently from the workflow object under the control of the workflow engine, and calls a corresponding execution assembly to execute the operation required by a corresponding fault component according to the subtask command parameter aiming at the fault injection subtasks to be executed currently until the fault injection subtasks in the workflow object are executed, so that the complex fault injection of the tested system is completed.
Example two
A specific implementation architecture of the fault simulation system is shown in FIG. 3, and the overall structure of the system is divided into three layers, namely a user interface layer, a working logic layer and a basic framework layer.
The user interface layer provides an operation interface for a user using the system, and interacts with the working logic layer in an API manner. The user interface layer provides an operation interface for a user, acquires information input by the user through the operation interface, calls a corresponding functional module of the working logic layer through an API mode, and transmits the acquired input information to the working logic layer.
The user interface layer may also provide a complex fault scenario orchestration interface and fault injection interface for the user. The configuration information of the complex faults edited by the user can be obtained through the complex fault scene arrangement interface, wherein the configuration information of each simple fault and the logic relationship thereof can be included. The fault injection interface can receive a fault injection instruction input by a user, such as which complex fault is to be injected, and can input or select a fault identifier to realize selection of the complex fault.
The working logic layer comprises a fault description management module for realizing complex fault arrangement, a fault injection engine for realizing complex fault injection, a remote agent and a fault simulation plug-in, APIs of the fault description management module and the fault injection engine respectively and a user interface layer, remote or local calling interfaces of the fault description management module and the fault injection engine respectively and a basic framework layer, and remote calling interfaces of the fault injection engine and the remote agent and the fault simulation plug-in.
The fault description management module realizes the format conversion of the configuration information of the complex fault obtained by the operation interface and provides the arranged complex fault configuration information for the basic framework layer for storage.
The fault injection engine comprises a fault injection task creation function module, a fault injection task scheduling function module, a fault injection action module and the like. The fault injection task creating functional module creates a complex fault injection task comprising a simple fault injection subtask and an execution time sequence thereof according to simple fault configuration information and a logic relation of the simple fault configuration information, the fault injection task scheduling functional module calls the current executable simple fault injection subtask in a polling mode according to the execution time sequence, the fault injection action component can call a workflow engine, a remote agent, a fault simulation plug-in and the like, the workflow engine can establish an executable workflow object comprising the executable simple fault injection subtask based on the execution time sequence and control the execution of the workflow object, and the fault injection action component executes the workflow object to call fault simulation equipment corresponding to each simple fault injection subtask to inject each simple fault included in the complex fault into a tested system. The fault simulation device may be a target machine or other external system device, in which a remote agent and a fault simulation plug-in may be disposed to implement call upon fault injection.
The basic framework layer comprises a basic data engine, an external system call, a workflow engine and other functional modules, and also comprises a call interface with a database and a remote or local call interface with a working logic layer.
The basic data engine can call the database, so that the configuration information of the organized complex faults is stored in the database, and the configuration information of the organized complex faults is called from the database.
The external system calls the function module to realize the call function with the external system, for example, when the equipment in the external system needs to be called to realize the fault action, the external system call can be carried out, and the like.
The workflow engine enables creation of workflow objects and controls execution of the workflow objects.
The above describes the fault simulation system provided by the embodiment of the invention from the aspect of system architecture, and the system is divided from the implemented functions and can comprise three subsystems, namely a fault arrangement subsystem, a fault injection subsystem and a workflow engine.
The fault arrangement subsystem comprises a complex fault scene arrangement interface of a user interface layer, a fault description management function module of a working logic layer and a base frame layer function module relied on by the fault description management function module, such as a base data engine of a base frame layer. The fault injection subsystem comprises a fault injection interface of a user interface layer, a fault injection engine of a working logic layer and a base framework layer function module relied on by the fault injection engine, such as a workflow engine, a base data engine and some external system calls. The fault injection subsystem may also include a remote agent and a fault simulation plug-in and an external system call. The fault injection subsystem is used for realizing the injection of complex faults under the cooperation of a workflow engine.
As shown in fig. 3, the troubleshooting subsystem includes a complex fault scenario orchestration interface and a fault description management module.
And the complex fault scene arrangement interface is used for acquiring configuration information of the complex fault components input by a user.
The fault description management module is used for generating complex fault configuration information comprising the complex fault component configuration information and logic relation thereof according to the input complex fault component configuration information, calling the basic data engine to store the well-arranged complex fault configuration information into the database, wherein the complex fault component comprises simple faults, pre-fault preparation and post-fault processing.
The fault injection subsystem comprises a fault injection interface, a fault injection engine and an execution component. Wherein:
The fault injection interface is used for acquiring a fault injection instruction input by a user.
The fault injection engine is used for acquiring well-arranged complex fault configuration information according to an input fault injection instruction, creating a complex fault injection task comprising a fault injection subtask and an execution time sequence thereof according to complex fault component configuration information and logic relation included in the complex fault configuration information, calling the executable fault injection subtask in a polling mode according to the execution time sequence to establish an executable workflow object comprising the executable fault injection subtask, executing the workflow object, calling an execution assembly required by the fault injection subtask to execute corresponding operation of the complex fault component, and injecting the complex fault into a tested system.
And the execution assembly is used for executing corresponding operation of the complex fault component. The execution component may be a fault simulation plug-in or other action execution component local to the target device, or may be an external system device.
The fault injection engine may include a fault injection task creation component, a fault injection task scheduling component, and a fault injection action component. Wherein:
the fault injection task creation component is used for acquiring the well-arranged complex fault configuration information according to the input fault injection instruction, and creating a complex fault injection task comprising a fault injection subtask and an execution time sequence thereof according to the complex fault component configuration information and the logic relationship thereof included in the complex fault configuration information.
The fault injection task creation component is specifically configured to:
according to the fault identification included in the fault injection instruction, calling a basic data engine of a basic framework layer to inquire and acquire complex fault configuration information which is stored in a database and is matched with the fault identification;
generating task names in complex fault injection tasks according to complex task names included in the complex fault configuration information;
Generating task description in the complex fault injection task according to the complex task description included in the complex fault configuration information;
and respectively generating subtask names, subtask description information, subtask command parameters and subtask execution time of fault injection subtasks in the complex fault injection tasks according to complex fault component configuration information and logic relations thereof, wherein the complex fault component comprises simple faults, pre-fault preparation and post-fault treatment.
The fault injection task scheduling component is used for calling the executable fault injection subtasks in a polling mode according to the execution time sequence and establishing an executable workflow object comprising the executable fault injection subtasks.
The fault injection task scheduling component is specifically used for periodically scanning the fault injection subtasks which are executable in the complex fault injection task, marking the fault injection subtasks as executable states when the executable states are determined according to the execution time of the callable fault injection subtasks, calling a workflow engine, and adding the fault injection subtasks as executable fault injection subtasks in the executable workflow objects based on the execution time sequence of the fault injection subtasks.
The fault injection action component is used for executing the workflow object, calling the action execution component required by the fault injection subtask to execute the corresponding operation of the complex fault component, and realizing the injection of the complex fault into the tested system.
The fault injection action component is specifically used for sequentially acquiring the fault injection subtasks to be executed currently from the workflow object under the control of the workflow engine, calling the corresponding execution component to execute the operation required by the corresponding fault component according to the subtask command parameters aiming at the fault injection subtasks to be executed currently until all the fault injection subtasks in the workflow object are executed, and completing the complex fault injection of the tested system.
The specific implementation process of the fault simulation system for implementing the complex fault arrangement and the complex fault injection is described in detail below.
The fault arrangement subsystem realizes the arrangement of complex faults, acquires the input configuration information of each simple fault, and generates complex fault configuration information comprising the simple fault configuration information and the logic relationship thereof.
The user can edit the various parts and their relationships that make up the complex fault scenario, which can be referred to as components of the fault scenario, including the cause of the fault that makes up the complex fault (referred to as a simple fault) and some other components (pre-fault preparation, post-fault cleaning). The relationships between components of a complex failure scenario include parallelism, predecessor, and successor. In particular, a simple fault can be seen as a complex fault with only one cause of the fault. The fault description management component functions to add, delete and revise complex fault description data.
When complex faults are organized, the dependency relationship among complex fault components can be described by a Directed Acyclic Graph (DAG), wherein the Directed Acyclic Graph (DAG) consists of nodes and edges, and is a graph with any edge directed and without loops. If an edge points from node A to node B, then A is said to be the precursor to B and B is the successor to A. One example of a directed acyclic graph is shown in fig. 4, including pre-anomaly injection preparation, simple fault 1, simple fault 2, and post-anomaly injection cleaning of four nodes, wherein the predecessor nodes of simple fault 1 and simple fault 2 are pre-anomaly injection preparation, and the successor nodes are post-anomaly injection cleaning.
An example of orchestration of complex faults is shown in fig. 5, where the complex fault scenario orchestration interface comprises two parts, a directed acyclic graph presentation part and a parameter configuration part. The directed acyclic graph display part can display and edit the directed acyclic graph, and starting from the START node, the directed preparation before failure, the simple failure 1 and the simple failure 2, and the cleaning continuation after failure until the END node is finished. Nodes in the directed acyclic graph can be edited, and the nodes can be added, deleted and modified. The parameter configuration part may configure parameters for the complex fault components corresponding to each node, that is, input configuration information of the complex fault components. As shown in fig. 5, when the simple fault 2 is in the point, the name, description, parameter 1, parameter 2, parameter 3, parameter 4, etc. of the simple fault 2 may be input, and the input parameter may be the target device of fault injection, delay time with respect to the precursor component (action), or other parameters.
The user edits (displays as directed acyclic graph) in a graphic mode through an interface, the graphic and the complex fault configuration information (namely complex fault description) are mutually converted through a fault description management module of a working logic layer, and the graphic is converted into the complex fault description, so that the complex fault scene is formalized and represented, and the automatic processing is convenient. When the graph is converted into the complex fault description, names in parameters of complex fault components (such as simple faults) are converted into description information in the complex fault description through index inquiry. One example of the graph after being converted into complex fault configuration information (i.e., complex fault description) is as follows:
name:Name Of The Task
description:Description Of The Task
actions
-name:Action1
description:Description Of Action1
next:[Action2,Action3]
command:Command1
-name:Action2
description:Description Of Action2
next:[Action4]
command:Command2
-name:Action3
description:Description Of Action3
next:[Action4]
command:Command3
-name:Action4
description:Description Of Action4
next:[NULL]
command:Command4
The name of the first row in the complex fault configuration information is the name of the complex fault injection task, the description of the second row is the description information of the complex fault injection task, a plurality of later actions are performed, each action corresponds to a complex fault component, and each complex fault component comprises a name (name), description information (description), command parameters (command) and execution time (such as subsequent actions).
The first capital part of the complex fault configuration information is a variable parameter, and the full lowercase part is a fixed part. Wherein actions are used to describe complex fault components in complex fault scenarios, command in each action representing a command to be executed. For example, when the action is a simple fault, command parameters of command may include a) a fault injection target device, which may be represented by a host IP address, a network switching device ID, etc., b) a delay time relative to the precursor action, and may include additional command parameters, such as a parameter for controlling the duration of the fault and a parameter for controlling the intensity of the fault.
The fault injection subsystem realizes complex fault injection, and can execute fault injection tasks according to parameters given by a user. The part is matched with a workflow engine, and because the execution of a complex fault injection task can be divided into a plurality of steps, each step can execute an operation corresponding to a complex fault component, the steps can also become fault injection subtasks, the fault injection subtasks can have a precursor, subsequent or parallel time sequence relationship, and the workflow engine can realize the execution process of the sub tasks according to the time sequence relationship of the fault injection subtasks when the system runs.
The complex fault injection process of a distributed system may be seen in fig. 6, where solid arrows identify data flows, dashed arrows represent control flows, dashed boxes represent task states, and solid boxes represent system components. The process of complex fault injection is described as follows:
And the user initiates a fault injection task through a fault injection interface.
The user designates the ID of the complex fault configuration information to be injected to initiate a task through an interface, the complex fault configuration information exists in a database, the ID is taken as a main key, and the corresponding complex fault configuration information can be searched according to the ID after the ID is input. Task parameters, such as user-specified IDs, may be passed through the user interface call API.
The fault injection task creation component creates a complex fault injection task according to related parameters in complex fault configuration information input by a user
The parameters in the complex fault configuration information may include a) an ID of the complex fault configuration information, b) an execution time, which may be a timing execution time, or an execution time represented by a precursor or a successor, or may be set to null, that is, the execution time may be selectively configured, that is, if the parameter is not configured, the parameter is represented as immediate execution, and if the parameter is specified, the parameter is executed at the specified time.
As shown in fig. 6, the output of the task creation is a task description of the created complex fault injection task, which may include the complex task name, complex task description information, configuration information of complex fault components, and the like shown in the above examples, and may further include a runtime state (such as a task start time, a task current state, and the like) of each fault injection subtask. The created task description of the complex fault injection task can be stored in a database, and the task running state is set as a created, namely, the task description can be called.
The fault injection task scheduling component executes a task scheduling function and can call the created complex fault injection task stored in the database. The task scheduling execution process comprises that the fault injection task scheduling component periodically scans callable complex fault injection tasks, such as tasks with a state of found, and the period can be set, for example, can be set to be 1 second. In a certain round of scanning, if a certain state is a created task, if the user designates the timing execution time, the task can be marked to be executed if the timing execution time is not later than the current time, otherwise, no processing is performed, and if the user does not designate the execution time, the task can be marked to be executed directly. Setting the running state of the task which can be executed as ready (i.e. executable state), and updating the modified task description into a database.
After marking the task in the executable state, the workflow engine can be called to execute each subtask in the complex fault injection task, and the task execution process comprises the following steps:
a) The workflow engine periodically scans for tasks in ready state, which may be set, for example, to 1 second.
B) And for a task with a ready state, according to the description of each complex fault component in the complex fault configuration information and the time sequence relation thereof, calling a create method of a workflow engine to create an actually executable workflow object.
Referring to FIG. 7, the workflow is a Directed Acyclic Graph (DAG). The workflow object illustrated in fig. 7 includes 4 subtask nodes, corresponding to the four complex fault components shown in fig. 4, respectively, and is logically a workflow, representing an overall task, and each subtask node is actually a piece of code that can be executed, called a fault injection subtask, and its function is simple fault injection, or preparation before fault injection, and cleaning after fault injection. Each fault injection subtask in the workflow object corresponds to one action in the complex fault configuration information, and the time sequence relationship of the fault injection subtasks in the workflow object is consistent with the precursor relationship between the actions in the complex fault configuration information.
C) And calling a start method of the workflow object to start executing the workflow. Execution of the workflow object satisfies the following requirements:
For any fault injection subtask (except for the fault injection subtask without precursor), the fault injection subtask can be executed after all the precursor fault injection subtasks are executed;
If any fault injection subtask fails to execute or the code is thrown out abnormally, stopping the execution of all subsequent fault injection subtasks;
For any fault injection subtask, when all the precursor fault injection subtasks are successfully executed, the current fault injection subtask is started to be executed after waiting for corresponding time according to a preset delay time (in command parameters of actions in fault description) before starting to execute. Referring to fig. 6, when an abnormality occurs in the task execution process, the task running state is set to failed, otherwise, the task running state is set to success. The modified task description is updated into the database.
Executing the workflow object to realize that each fault injection subtask in the workflow object is sequentially executed, the fault injection subtask can correspond to complex fault components of simple fault injection, preparation before fault injection and cleaning after fault injection, and the execution process of the fault injection subtask can be shown in fig. 8.
For example, when the fault injection subtask is to execute simple fault injection, the execution of the simple fault injection has two implementation manners:
Calling a fault simulation plug-in on the target equipment to simulate a simple fault, wherein the command parameter of the action in the complex fault configuration information comprises the target equipment to be called and corresponding fault simulation plug-in information, and directly executing fault simulation on the target equipment, such as single disk space filling, single CPU exhaustion, single network card delay and the like.
In the fault injection execution process of the fault simulation plug-in mode, referring to fig. 8, a fault injection action component (located in a node of a workflow object) in a fault injection engine remotely invokes a agent on a target device (target machine), and simultaneously inputs subtask parameters including a fault name (such as single disk space write, single CPU exhaustion, single network card delay) and other parameters (such as fault intensity and fault duration), and then the agent invokes a local fault simulation plug-in.
And secondly, calling an API of external system equipment to indirectly perform fault injection. Such as closing a network switch.
In the fault injection execution process of the external system calling mode, referring to fig. 8, a fault injection action component in the fault injection engine calls an API of an external system (such as a switch management system), and an external system agent executes a specified action.
The fault injection subtask is similar to the execution process in preparation before fault injection and cleaning after fault injection, and is also divided into two cases:
In the first case, preparation before fault injection and cleaning after fault injection in a fault simulation plug-in mode are carried out, taking a single disk space exhaustion fault as an example, calling a disk selection plug-in by a agent before fault injection to randomly select one of a plurality of data disks on a local machine, and recording the current disk space water level (for fault recovery), cleaning after fault injection, and sending a disk space cleaning instruction to the single disk space water level recovery plug-in by the agent to recover to the disk space water level before fault injection. Referring to fig. 8, other action components remotely call agents on the target device (target machine) while entering subtask parameters, and then the agents call local action execution plug-ins to perform related actions of preparation before fault injection and cleaning after fault injection.
And secondly, preparing before fault injection and cleaning after fault injection in an external system calling mode, taking a network switch closing fault as an example, sending an instruction for checking whether the switch state is normal or not to a switch management system by an action component before fault injection and then obtaining the switch operation authority, and sending a network switch opening instruction to the switch management system by an action execution component after fault injection. Referring to FIG. 8, the API of the external system is called by other action components in the fault injection engine and the external system agent performs the specified action.
Alternatively, the fault injection may be replaced by ssh or other similar remote execution command without using the agent mode.
Example III
The third embodiment of the invention provides a test system and a test method of a distributed system, wherein the structure of the test system of the distributed system is shown in fig. 9, and the test system comprises a test terminal 1 and a fault simulation system 2.
The test terminal 1 is used for acquiring a system test instruction through a test instruction input interface and acquiring a fault injection instruction through a fault injection interface;
the fault simulation system 2 is used for displaying a corresponding fault injection interface to a user according to the test requirement included in the system test instruction, acquiring well-arranged complex fault configuration information according to the fault injection instruction, injecting complex faults into the tested system 3, and acquiring performance data of the tested system after the complex faults are injected.
The fault simulation system 2 is also used for providing test requirements and acquired performance data of the tested system to the test terminal, and the corresponding test terminal 1 is also used for displaying the test requirements and the performance data to a user through a test result display interface.
The test terminal 1 can be provided with a test client and provides various man-machine interaction interfaces related in the test process for testers. For example, a test instruction input interface for acquiring a test instruction, through which a user can input the test instruction, such as any one of an instruction for starting a test and an instruction containing a test requirement, etc., a fault injection interface through which a user can input a fault injection instruction and information related to fault injection, a complex fault scene arrangement interface through which a user can input information related to complex fault arrangement, such as complex fault component configuration information, etc., and an information display interface through which the user can display the test requirement input by the user and the corresponding information of a test result, etc.
The structure and function of the fault simulation system 2 are described with reference to the first and second embodiments, and will not be described in detail herein.
Optionally, the system further includes an external system 4, configured to call an execution component required by the fault injection subtask according to a call instruction of the fault simulation system 2, and execute a corresponding operation of the fault injection subtask on a corresponding target device in the tested system 3.
When the complex fault injection task of the tested system 3 needs the external system to realize the injection of one or more fault injection subtasks, the external system 4 can be called by the fault simulation system 2, and the fault simulation system 2 can call one or more devices in the external system needed by the fault injection subtasks according to the requirements to realize the simulation injection of the corresponding fault injection subtasks.
As shown in fig. 9, the test terminal 1 may be arranged in plurality for use by different testers. The fault simulation system 2 may be implemented on a separate server, or may be implemented on a server cluster, or may be implemented by a cloud device. The system under test 3 may include one or more of cloud devices, personal computing, servers, mainframes, hosts, switches, etc. network node devices. The external system 4 may include one or more of cloud devices, personal computing, servers, mainframes, etc., external system devices.
The flow of the distributed system testing method provided by the embodiment of the invention is shown in figure 10, and the method comprises the following steps:
step S201, acquiring a system test instruction through a test instruction input interface, and displaying a corresponding fault injection interface to a user according to a test requirement included in the system test instruction;
The acquisition of the test instruction can be realized through a man-machine interaction interface on the test terminal, and the display of the fault injection interface is realized.
Step S202, acquiring a fault injection instruction through a fault injection interface, acquiring well-organized complex fault configuration information according to the fault injection instruction, and injecting complex faults into a tested system by adopting the fault simulation method;
And step 203, acquiring performance data of the tested system after the complex fault injection.
After complex faults are injected into a tested system, various performance indexes of the tested system are monitored, and corresponding performance data are obtained so as to know the running state and fault tolerance of the tested system when the injected faults occur.
Optionally, the method further comprises:
And step S204, providing the test requirements and the acquired performance data of the tested system to a test terminal, and displaying the test requirements and the acquired performance data to a user through a test result display interface.
Related data of various performance indexes of the tested system after complex fault injection, such as CPU occupation condition, memory use condition, system fault information and the like, can be displayed for a user through an information display interface on the test terminal.
Optionally, the method further comprises the steps of obtaining complex fault component configuration information through a complex fault scene arrangement interface, and pre-arranging the complex fault configuration information according to the obtained complex fault component configuration information and the logic relationship thereof.
The above-mentioned testing method and system of the distributed system have been described in the first embodiment and the second embodiment, and will not be described herein.
The embodiment of the invention also provides a computer readable storage medium, on which computer instructions are stored, which when executed by a processor, implement the fault simulation method or the testing method of the distributed system.
The embodiment of the invention also provides computer equipment, which comprises a memory, a processor and a computer program and instructions stored on the memory and capable of running on the processor, wherein the processor realizes the fault simulation method or the testing method of the distributed system when executing the program.
The method and the system provided by the embodiment of the invention are designed under the guidance of the chaotic engineering concept, can realize the simulation capability of complex faults, and have the functions of arranging the complex faults and timing the complex faults. The method can be used for cloud platform services or products such as proprietary cloud, block storage and the like. The method and the system realize the arrangement function of complex faults by means of the concept of Directed Acyclic Graphs (DAGs), ensure the orderly execution of each complex fault component in the complex faults, check whether the time requirement is met by using the polled thought, realize the timing function of complex fault tasks, realize the timing function of each substep by adopting the thought of relative time delay in the complex faults, realize the complex fault tasks and simple tasks or actions included in the complex fault tasks to be executed according to the required time, ensure the orderly and reasonable execution of the complex fault injection by means of the complex fault injection behavior realized by workflow technology, do not need manual intervention, realize the simple fault injection function of a single target by means of the ideas of agents and agent plug-ins, and unify the purposes of the agent mode and the external system call mode to realize the fault injection. Therefore, each link from fault arrangement to fault task creation and execution and fault injection can be automatically executed, and simulation reproduction of complex faults can be realized. When the fault simulation method is used for testing the distributed system, the automation degree of the system test can be greatly improved.
Unless specifically stated otherwise, terms such as processing, computing, calculating, determining, displaying, or the like, may refer to an action and/or process of one or more processing or computing systems, or similar devices, that manipulates and transforms data represented as physical (e.g., electronic) quantities within the processing system's registers or memories into other data similarly represented as physical quantities within the processing system's memories, registers or other such information storage, transmission or display devices. Information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
It should be understood that the specific order or hierarchy of steps in the processes disclosed are examples of exemplary approaches. Based on design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged without departing from the scope of the present disclosure. The accompanying method claims present elements of the various steps in a sample order, and are not meant to be limited to the specific order or hierarchy presented.
In the foregoing detailed description, various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments of the subject matter require more features than are expressly recited in each claim. Rather, as the following claims reflect, invention lies in less than all features of a single disclosed embodiment. Thus the following claims are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate preferred embodiment of this invention.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. The processor and the storage medium may reside as discrete components in a user terminal.
For a software implementation, the techniques described in this disclosure may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. These software codes may be stored in memory units and executed by processors. The memory unit may be implemented within the processor or external to the processor, in which case it can be communicatively coupled to the processor via various means as is known in the art.
The foregoing description includes examples of one or more embodiments. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the aforementioned embodiments, but one of ordinary skill in the art may recognize that many further combinations and permutations of various embodiments are possible. Accordingly, the embodiments described herein are intended to embrace all such alterations, modifications and variations that fall within the scope of the appended claims. Furthermore, as used in the specification or claims, the term "comprising" is intended to be inclusive in a manner similar to the term "comprising," as interpreted when employed as a transitional word in a claim. Furthermore, any use of the term "or" in the specification of the claims is intended to mean "non-exclusive or".