Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the application.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
In the field of network control, arranging instructions of a network controller is an important task in the field of network control. The inventor of the present application finds that the following characteristics need to be satisfied in order to ensure that the network instruction arrangement can be smoothly performed:
firstly, because the instructions have a mutual dependence relationship, for the instruction arrangement, the atomicity of the arrangement is guaranteed to be crucial; secondly, if the arranged instructions or tasks are lost, the success rate of the instruction arrangement is directly influenced, so that the network controller also needs to have stronger disaster tolerance capability to ensure that the instructions and tasks arranged after the network controller is down are not lost; moreover, for the development of the command arrangement of the controller, each network controller has its own logic, so that the overall development efficiency needs to be improved.
However, the traditional network controller instruction arrangement process is difficult to ensure the atomicity of a group of instructions, and usually, error correction can only be completed through later configuration examination; in addition, the network controller does not have disaster tolerance capability, and once software errors occur, the instructions cannot be recovered, so that the instruction arrangement effect is greatly influenced; in addition, at present, a corresponding instruction arrangement strategy needs to be separately developed for each set of network controller, so that the development efficiency of the whole system is low, and the development cost is high.
In the related art, the distribution and execution of tasks may be implemented using some open-source task management framework, which may include, for example, an AirFlow. The task management frameworks realize the arrangement of the network instructions by embedding the network instructions into the tasks.
However, the following drawbacks still exist with the task management framework:
first, these open source task management frameworks are almost independently deployed and cannot be embedded as components in the controller, which greatly delays the issuance of network instructions.
Secondly, these task management frameworks often do not have a rollback configuration, and are difficult to implement atomicity of a set of instruction layout, or difficult to implement atomicity of a set of instruction layout conveniently.
Thirdly, the existing open source task management framework has insufficient support for disaster tolerance capability, and when a node hangs up, a task is easily lost.
To this end, the present application first provides a task processing method applied to a multi-node system. The task herein may be any task that can be represented by program code and used for processing a certain flow, such as a task that implements network instruction arrangement. Tasks are represented in program code as certain entities, for example, in a Java program, tasks may be represented in the form of objects. Therefore, the task processing method applied to the multi-node system provided by the embodiment of the application can be applied to various task processing scenarios.
When the task processing method applied to the multi-node system is used for a scene of arranging network instructions, the defects can be overcome, and the characteristics required by smoothly finishing the instruction arrangement can be realized.
Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of the embodiments of the present application can be applied. This is described below in terms of the system architecture shown in fig. 1 for network instruction orchestration.
As shown in fig. 1, thesystem architecture 100 may include a network device (e.g., one or more of theswitches 101,routers 102, andgateway devices 103 shown in fig. 1, but various other network devices, etc.), anetwork 104, and amulti-node system 110. Themulti-node system 110 includes afirst server 111, asecond server 112, and athird server 113, and each of the servers included in themulti-node system 110 is a task processing node in themulti-node system 110.Network 104 is used to provide a medium for communication links between end devices and task processing nodes inmulti-node system 110.Network 104 may include various connection types, such as wired communication links, wireless communication links, and so forth.
It should be understood that the number of task processing nodes in the terminal device, network, and multi-node system in FIG. 1 is merely illustrative. The multi-node system can comprise any number of task processing nodes, and each task processing node can even be formed by a cluster. For example, thefirst server 111 may be a server cluster composed of a plurality of servers.
The task processing node in themulti-node system 110 organizes a set of network instructions included in the task by processing the task and sends the network instructions to the network device corresponding to the network instructions. When the task processing method applied to the multi-node system provided by the embodiment of the application is used in a scenario of arranging network instructions, if a downtime event occurs in a process of processing a task by one task processing node, then, by the task processing method applied to the multi-node system provided by the embodiment of the application, each task processing node can acquire the identification information of the task to be consumed from the message queue, the message queue contains the identification information of the tasks which are not processed and completed by the down task processing node, therefore, the task processing node which does not generate downtime can acquire the identification information of the tasks which are not processed and completed by the task processing node which generates downtime, therefore, other tasks which are not processed by the task processing node and are not completed by the task processing node can be processed by other task processing nodes which do not generate the downtime event, and the distributed tasks can be guaranteed to be effectively processed.
It should be noted that, although the embodiment of the present application is used in a scenario in which a network instruction is arranged, in practice, the present application may be applied to any task processing flow, such as in a control process of a workflow; although the task processing node is a server in the embodiment of the present application, in other embodiments of the present application, the task processing node may be any type of terminal device, and the task processing node may be not only a physical node but also a virtualized node, and in addition, the task processing node may also be a cluster of one server; moreover, although the multi-node system in the embodiment of the present application only includes the task processing node, the multi-node system may further include other various types of entities such as a database, a message queue, and the like.
Moreover, it is easy to understand that the task processing method applied to the multi-node system provided by the embodiments of the present application is generally executed by a server, and accordingly, the task processing device applied to the multi-node system is generally disposed in the server. However, in other embodiments of the present application, the terminal device may also have a similar function as the server, so as to execute the solution applied to the task processing of the multi-node system provided in the embodiments of the present application.
The task processing method applied to the multi-node system provided by the embodiment of the application can be applied to the field of cloud computing (cloud computing) or cloud storage (cloud storage), and can also be applied to a Block Chain (Block Chain) network.
Cloud computing is a computing model that distributes computing tasks over a resource pool of large numbers of computers, enabling various application systems to obtain computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". Resources in the "cloud" appear to the user as being infinitely expandable and available at any time, available on demand, expandable at any time, and paid for on-demand.
As a basic capability provider of cloud computing, a cloud computing resource pool (called as an ifas (Infrastructure as a Service) platform for short is established, and multiple types of virtual resources are deployed in the resource pool and are selectively used by external clients.
According to the logic function division, a PaaS (Platform as a Service) layer can be deployed on an IaaS (Infrastructure as a Service) layer, a SaaS (Software as a Service) layer is deployed on the PaaS layer, and the SaaS can be directly deployed on the IaaS. PaaS is a platform on which software runs, such as a database, a web container, etc. SaaS is a variety of business software, such as web portal, sms, and mass texting. Generally speaking, SaaS and PaaS are upper layers relative to IaaS.
Cloud storage is a new concept extended and developed from a cloud computing concept, and a distributed cloud storage system (hereinafter referred to as a storage system) refers to a storage system which integrates a large number of storage devices (storage devices are also referred to as storage nodes) of different types in a network through application software or application interfaces to cooperatively work through functions of cluster application, a grid technology, a distributed storage file system and the like, and provides data storage and service access functions to the outside.
At present, a storage method of a storage system is as follows: logical volumes are created, and when created, each logical volume is allocated physical storage space, which may be the disk composition of a certain storage device or of several storage devices. The client stores data on a certain logical volume, that is, the data is stored on a file system, the file system divides the data into a plurality of parts, each part is an object, the object not only contains the data but also contains additional information such as data identification (ID, ID entry), the file system writes each object into a physical storage space of the logical volume, and the file system records storage location information of each object, so that when the client requests to access the data, the file system can allow the client to access the data according to the storage location information of each object.
The process of allocating physical storage space for the logical volume by the storage system specifically includes: physical storage space is divided in advance into stripes according to a group of capacity measures of objects stored in a logical volume (the measures often have a large margin with respect to the capacity of the actual objects to be stored) and Redundant Array of Independent Disks (RAID), and one logical volume can be understood as one stripe, thereby allocating physical storage space to the logical volume.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.
The block chain underlying platform can comprise processing modules such as user management, basic service, intelligent contract and operation monitoring. The user management module is responsible for identity information management of all blockchain participants, and comprises public and private key generation maintenance (account management), key management, user real identity and blockchain address corresponding relation maintenance (authority management) and the like, and under the authorization condition, the user management module supervises and audits the transaction condition of certain real identities and provides rule configuration (wind control audit) of risk control; the basic service module is deployed on all block chain node equipment and used for verifying the validity of the service request, recording the service request to storage after consensus on the valid request is completed, for a new service request, the basic service firstly performs interface adaptation analysis and authentication processing (interface adaptation), then encrypts service information (consensus management) through a consensus algorithm, transmits the service information to a shared account (network communication) completely and consistently after encryption, and performs recording and storage; the intelligent contract module is responsible for registering and issuing contracts, triggering the contracts and executing the contracts, developers can define contract logics through a certain programming language, issue the contract logics to a block chain (contract registration), call keys or other event triggering and executing according to the logics of contract clauses, complete the contract logics and simultaneously provide the function of upgrading and canceling the contracts; the operation monitoring module is mainly responsible for deployment, configuration modification, contract setting, cloud adaptation in the product release process and visual output of real-time states in product operation, such as: alarm, monitoring network conditions, monitoring node equipment health status, and the like.
The platform product service layer provides basic capability and an implementation framework of typical application, and developers can complete block chain implementation of business logic based on the basic capability and the characteristics of the superposed business. The application service layer provides the application service based on the block chain scheme for the business participants to use.
When the embodiment of the application is applied to a scenario of arranging a Network instruction, the embodiment of the application may be particularly applied to the field of edge computing and a Network architecture of a Software Defined Network (SDN), and the arranged Network instruction may be used for controlling an edge gateway. The software defined network is a new open network architecture, realizes flexible scheduling of global view network resources and rapid deployment of new services by separating control and forwarding, centralizing control plane and opening and programming interface, and can simplify operation and maintenance, improve network resource utilization rate and improve customer perception.
The implementation details of the technical solution of the embodiment of the present application are set forth in detail below:
fig. 2 illustrates a flowchart of a task processing method applied to a multi-node system according to an embodiment of the present application, which may be performed by a device having computing and communication functions, such as thefirst server 111 illustrated in fig. 1, the multi-node system including a plurality of task processing nodes. Referring to fig. 2, the task processing method applied to the multi-node system at least comprises the following steps:
instep 220, the working states of the plurality of task processing nodes are monitored.
When the embodiment of the present application is used in a network instruction arrangement scenario, a task processing node may adopt an organization form shown in fig. 3, that is, the node includes a network controller, the network controller further includes a northbound control API, a task arrangement management frame, an RPC Client (Remote Procedure Call Client), and a function module for generating a network instruction, the function module for generating a network instruction specifically includes a network arrangement and calculation module, a network model object management module, and a global quality monitoring traffic scheduling module, where the northbound control API is a user management application program interface provided by the northbound of the network controller, and is used to provide an entry for network instruction arrangement; each functional module for generating the network instructions can generate the network configuration instructions through operations such as network model object management, network arrangement, calculation and the like; the task orchestration management framework may be configured to execute the task processing method applied to the multi-node system provided in the embodiment of the present application; the RPC Client is used for sending the network instruction from the task orchestration management framework to the network device corresponding to the network instruction. In the following, nodes and task processing nodes in a multi-node system are equivalent, unless otherwise specified.
Each task processing node in the multi-node system can monitor the working state of other task processing nodes, and the steps shown in the embodiment of FIG. 2 can be executed by any node in the multi-node system. The working state of the task processing node may include a normal state and a down state.
In one embodiment of the present application, the operational status of a task processing node in a multi-node system may be monitored in a variety of ways.
FIG. 4 shows a flowchart ofsteps preceding step 220 in FIG. 2 and details ofstep 220 according to one embodiment of the present application.
Please refer to fig. 4, which specifically includes the following steps:
instep 210, a registration request is sent to a registration center, and after the registration is successful, a heartbeat connection with the registration center is established.
The registration center is used for registering each task processing node in the multi-node system and carrying out heartbeat detection on each successfully registered task processing node. The task processing node performs registration by sending a registration request to the registration center, where the registration request may carry information related to the task processing node, such as an identifier of the task processing node.
The establishment of the heartbeat connection between the task processing node and the registration center is performed by sending a heartbeat packet, specifically, the task processing node sends the heartbeat packet to the registration center, or the registration center sends the heartbeat packet to the task processing node, and the registration center completes heartbeat detection according to the result of the received or sent heartbeat packet.
In step 220', the state of the heartbeat connection maintained in the registry is monitored to monitor the working states of the plurality of task processing nodes.
The status of the heartbeat connection maintained in the registry can be represented by a variety of data.
For example, registration information or identification information of a task processing node with normal heartbeat connection may be maintained in a registration center; when the task processing node monitors the registration center, all registration information or identification information which keeps normal heartbeat connection with the registration center is obtained from the registration center and is stored locally; when a task processing node goes down, the heartbeat of the task processing node in the registration center stops, and the registration center removes the registration information or the identification information of the task processing node. The other task processing nodes may determine that the task processing node is down according to that the registration information or the identification information of the task processing node is locally stored, but the registration information or the identification information of the task processing node is found to disappear in the registry.
For another example, the identification information and the state information of the task processing node may also be maintained in the registry, where the state information represents that the heartbeat of the corresponding task processing node in the registry is stopped or normal, for example, the state information may be represented by 0 or 1, where 0 may represent that the heartbeat of the corresponding task processing node in the registry is stopped, and 1 may represent that the heartbeat of the corresponding task processing node in the registry is normal; when the task processing node monitors the state of the heartbeat connection maintained in the registration center, the identification information and the corresponding state information of each task processing node are acquired, and if the acquired state information represents that the heartbeat stops, the working state of the corresponding task processing node can be determined to be down.
It can be seen that the manner of monitoring the operating status of the task processing nodes and determining whether the task processing nodes are down may be various and is not limited to the above.
Referring to fig. 2, in step 230, identification information of the tasks to be consumed is acquired from the message queue, where the message queue includes identification information of a first task that is not processed and completed by a first task processing node that is down, and the identification information of the first task is acquired by a second task processing node and is released into the message queue after monitoring that the first task processing node is down.
The identification information is information that can be used to uniquely identify a task, and is usually in a data format of a character string, and the identification information may be composed of various symbols such as letters and numbers, for example, the identification information may be a string of numbers.
Alternatively, the identification information of the task may be stored in a cache.
In one embodiment of the application, the identification information of the first task is posted to a location in the message queue that is consumed first.
In the embodiment of the present application, since the first task is a task that is not processed and completed by the first task processing node, the task that is not processed and completed can be preferentially processed by putting the identification information of the first task to the position that is consumed first in the message queue.
The first task processing node and the second task processing node are both nodes in a multi-node system. After monitoring that the first task processing node is down, the second task processing node may acquire identification information of the first task that is not processed and completed by the first task processing node, and place the identification information into the message queue.
In one embodiment of the present application, the method further comprises: if the first task processing node is monitored to be down, acquiring identification information of a first task which is not processed and completed by the first task processing node in a competitive mode with other task processing nodes in the plurality of task processing nodes; wherein the second task processing node comprises a task processing node that succeeds in the competition.
Specifically, the second task processing node may be a task processing node that succeeds in competition.
The step of acquiring the identification information of the first task by each task processing node in a competitive manner refers to a process of screening each task processing node according to a certain index to obtain the only task processing node which has the right to legally acquire the identification information of the first task.
The competition rules employed by the competition mode can be various. For example, the competition rule may be that the task processing node that first acquires the identification information of the first task is a task processing node that succeeds in acquiring the competition, and the task processing node broadcasts to other task processing nodes to stop the other task processing nodes from acquiring the identification information of the first task; the competition rule may also be that the task processing node that first acquires the identification information of the first task is a task processing node that succeeds in competition, and the task processing node broadcasts to other task processing nodes, so that the other task processing nodes do not have the right to deliver the identification information of the first task to the message queue even if acquiring the identification information of the first task; in addition, the competition rule may be that the task processing node with the lowest CPU utilization rate or memory utilization rate is the task processing node that succeeds in competition, and the advantage of using the competition rule is to ensure that the identification information of the first task can be timely and accurately released into the message queue.
In one embodiment of the present application, the second task processing node is a preset task processing node in the multi-node system.
For example, a task processing node with characteristics of high performance and configuration, high availability, and standby power supply may be preset in the multi-node system, and the task processing node is responsible for acquiring identification information of tasks that are not processed and completed by other task processing nodes. Since the preset task processing node itself has very high availability, if the task processing node is specified to acquire the identification information of the task, the high availability of the whole system can be ensured.
FIG. 5 is a process flow diagram of a multi-node system when a node is down according to an embodiment of the present application.
Referring to fig. 5, when a certain node goes down, the processing flow of the multi-node system may be as follows:
1. node 01, node 02 and node 03 are all registered in the registry and after the registration is successful a heartbeat connection is established with the registry.
2. The nodes 01 and 02 monitor the state of heartbeat connection maintained in the registration center, and when the heartbeat of the node 03 in the registration center stops, the nodes 01 and 02 monitor that the node 03 goes down.
3. The node 01 and the node 02 acquire a task list of the node 03 from a cache in a competitive manner, the task list of each node is stored in the cache, and the task list comprises one or more pieces of task identification information.
4. After the node 02 succeeds in competition, the node 02 successfully acquires the task list of the node 03, and pushes the identification information of the tasks which are not processed and completed by the node 03 in the task list to the message queue again.
5. Because the nodes 01 and 02 are not down, the nodes 01 and 02 can acquire the identification information of the tasks to be consumed from the message queue, so as to acquire the tasks to be consumed, wherein when the identification information of the tasks which are not processed and completed by the node 03 is acquired, the tasks which are not processed and completed by the node 03 can be processed.
Referring to fig. 2, instep 240, the task to be consumed is processed according to the obtained identification information of the task to be consumed.
After the identification information of the task to be consumed is acquired, the task to be consumed can be consumed, and then the task to be consumed is processed.
Any task processing node which does not go down in the multi-node system can acquire identification information of the tasks to be consumed from the message queue.
Under the condition that the current task processing node acquires the identification information of the first task, the current task processing node can process the first task; when the other task processing nodes acquire the identification information of the first task, the other task processing nodes may also perform processing on the first task.
Therefore, after the identification information of the first task is released into the message queue, no matter the identification information of the first task is acquired by the current task node or other task processing nodes, the recovery processing of the first task can be realized, and the availability of the system is greatly improved.
Fig. 6 shows a flowchart for processing a task to be consumed according to acquired identification information of the task to be consumed according to an embodiment of the present application. Referring to fig. 6, step 240 may specifically include the following steps:
and step 610, loading binary data of the task entity corresponding to the identification information of the task to be consumed from a cache corresponding to the multi-node system.
The binary data of the task entity and the identification information of the task to be consumed can be stored in the same cache, and can also be stored in different caches respectively.
The cache may belong to a multi-node system or may be located outside the multi-node system. The cache is a high-speed storage space commonly used by each task processing node in the multi-node system, the cache can be located on a physical node or a virtual node, and the physical substance of the cache can be a memory. The advantage of using the cache is that the loading efficiency of binary data of the task entity can be improved, so that the recovery efficiency of the task after the downtime of the task processing node can be improved, and the availability of the system is improved.
In other embodiments of the present application, the task entity binary data may also be stored on disk.
The task entity binary data is a byte sequence obtained by serializing the task, the task entity binary data is flushed into a cache when the task is allocated, and the task is stored in a task entity binary data mode, so that the task storage and transmission can be facilitated.
In one embodiment of the present application, the method further comprises:
and after the task entity binary data corresponding to the identification information of the task to be consumed is loaded from the cache corresponding to the multi-node system, the task entity binary data in the cache is persisted to a disk.
And step 620, performing deserialization processing on the binary data of the task entity to obtain the task to be consumed.
The binary data of the task entity is subjected to deserialization processing, an object of the task to be consumed can be obtained, and the object can be represented in a Java object form.
Step 630, the task to be consumed is processed.
In one embodiment of the application, the task to be consumed comprises at least one network instruction, and the task entity binary data comprises original binary data corresponding to the network instruction.
The step of processing the task to be consumed may comprise:
and sequentially executing the network instructions contained in the tasks to be consumed, wherein after each network instruction is successfully executed, each network instruction is serialized, and the binary data obtained after serialization replaces the corresponding original binary data of each network instruction in the cache.
When all network instructions in a task to be consumed are successfully executed, the original binary data corresponding to each network instruction in the cache is covered into binary data.
The binary data of the network instruction is the corresponding original binary data of the network instruction in the cache, and can also be called as a snapshot of the binary data of the task entity.
It is obvious that, in a scenario where a network instruction is arranged, because after the network instruction is successfully executed, the network instruction is serialized, and binary data obtained after the serialization is used to replace original binary data corresponding to the network instruction in a cache, where the original binary data and the binary data corresponding to one network instruction are different and represent different states of the network instruction, it may be determined that, if original binary data exists in the cache, the network instruction corresponding to the original binary data is not successfully executed, and if binary data exists in the cache, the network instruction corresponding to the binary data is successfully executed.
Based on this, after the identification information of the first task is put into the message queue, if a part of network instructions in the first task are executed successfully and another part of network instructions are not executed successfully, the cache includes binary data corresponding to the part of network instructions and original binary data corresponding to the another part of network instructions. At this time, after the designated task processing node in the multi-node system acquires the identification information of the first task, only the original binary data corresponding to the first task may be loaded, so that only the network instructions that are not successfully executed in the first task are processed, thereby preventing the network instructions that have been successfully executed in the first task from being repeatedly executed, and ensuring the accuracy of the instruction arrangement task recovery after the node is down.
On the basis, each task processing node can distinguish which tasks and which network instructions in the tasks are executed, and the scheme in the embodiment can be adjusted adaptively based on the characteristics.
Specifically, with continued reference to fig. 5, although the node that did not go down in fig. 5 obtains the task list of the node that went down, the task list may include both the identification information of the tasks that were completed by processing and the identification information of the tasks that were not completed by processing; however, in other embodiments, only the identification information of the tasks that are completed by the node that is not down in the task list may be obtained.
Moreover, when a node acquires the task list, the identification information of the tasks pushed to the message queue by the node can be identification information of unprocessed completed tasks in the task list, and can also be identification information of all tasks in the task list, so that the identification information of all tasks can be pushed to the message queue.
In one embodiment of the present application, the method further comprises: and after the task to be consumed is successfully executed, persisting the binary data corresponding to the task to be consumed in the cache into a disk.
According to the embodiment of the application, the binary data are subjected to persistence processing, the task processing records are subjected to disk dropping, and the processed tasks can be backtracked and audited afterwards.
The purpose of the persistence processing is to retain more data for backtracking and auditing, and thus, the way of implementing persistence in the embodiments of the present application may be arbitrary. The persistence processing can be performed at any stage of the task processing, the persistence processing can be performed periodically, the persistence processing can be performed according to a user instruction, only the binary data can be subjected to the persistence processing, and the persistence processing can be performed on the task entity binary data containing the original binary data.
In one embodiment of the present application, the step of processing the task to be consumed further comprises: and if the network instruction contained in the task to be consumed fails to be executed, performing rollback processing on the executed network instruction in the task to be consumed so as to reprocess the task to be consumed.
Specifically, the task to be consumed comprises a plurality of network instructions which need to be executed in sequence, and after one network instruction contained in the task to be consumed fails to be executed, the executed network instruction needs to be re-executed before the network instruction is executed, so that the atomicity of network instruction execution is ensured, the error in network instruction execution is avoided, and the reliability of network instruction execution is improved.
In an embodiment of the present application, a network controller is deployed on each task processing node in a multi-node system, and the network controller includes a task orchestration management framework, and the task orchestration management framework is configured to sequentially execute network instructions included in tasks to be consumed.
With continued reference to FIG. 3, there is shown an organization of a node in a multi-node system, which each node in the multi-node system may adopt.
In the embodiment of the application, the task arranging and managing framework is embedded into the network controller as a component, so that the delay of issuing the network instruction is greatly reduced, and meanwhile, as a set of task arranging and managing framework can be respectively embedded into the network controllers of different task processing nodes, the reusability of program codes is also improved; moreover, secondary development can be carried out based on the task arranging and managing framework, so that different network controllers can realize different instruction arranging strategies, and the development efficiency can be improved; in addition, each task orchestration management framework can be maintained and tested independently, and the coupling of the system is reduced.
In an embodiment of the application, the sequentially executing the network instructions included in the tasks to be consumed includes: and sequentially sending the network instructions contained in the tasks to be consumed to the network equipment corresponding to the network instructions.
The network device here may be various types of network devices such as an edge gateway.
The successful execution of the network instruction may indicate that the network instruction is successfully transmitted to the network device corresponding to the network instruction, and may also indicate that response information representing the successful execution of the network instruction returned by the network instruction is received after the network instruction is transmitted to the corresponding network device.
In an embodiment of the application, the network controller further includes a remote invocation client, and the step of sequentially executing the network instructions included in the task to be consumed may include: and sequentially sending the network instructions in the tasks to be consumed to the remote calling client so that the remote calling client sequentially sends the network instructions to the network equipment corresponding to the network instructions.
Referring to fig. 3, the RPC Client in the network controller is the remote call Client, so that the embodiment of the present application further implements the arrangement of the network instruction based on the remote call protocol.
In one embodiment of the present application, the method is performed by a designated task processing node of a plurality of task processing nodes, the method further comprising:
if the appointed task processing node is the task processing node with the longest continuous normal working time in the multi-node system, acquiring the identification information of a second task which is not processed and completed in the multi-node system;
and putting the identification information of the second task into the message queue.
The task processing nodes in the multi-node system can form a cluster, and the lasting normal working time of one task processing node is the longest, namely the task processing node is the task processing node which keeps the lasting duration of normal heartbeat connection between the task processing node and the registration center in the multi-node system, and can also be called as the oldest node in the cluster.
In an embodiment of the present application, if the designated task processing node is a task processing node that lasts for the longest normal working time in the multi-node system, the step of acquiring the identification information of the second task that is not processed and completed in the multi-node system may specifically include:
and if the appointed task processing node is the task processing node with the longest continuous normal working time in the multi-node system, acquiring the identification information of the second task which is not processed and completed in the multi-node system after waiting for the preset time.
By acquiring the identification information of the second task after waiting for the predetermined time, the backlog of tasks at the designated task processing node can be reduced, and the load of the designated task processing node can be reduced.
The predetermined time may be arbitrarily set as needed, and may be set to 2 minutes, for example.
Fig. 7 is a schematic diagram illustrating a processing flow of a multi-node system when nodes are all down according to an embodiment of the present application, where a schematic diagram illustrating a relationship between a node status and time is shown in an upper right corner of fig. 7. Referring to fig. 7, the processing flow of the multi-node system when all nodes are down may be as follows:
1. node 01 crashes and exits at point T0.
2. The node 02 crashes at the time point T0 and exits at the time point T2, wherein the reason why the time point of the crash of the node 02 is different from the time point of the exit is that the node 02 is detected as being crashed by the registry only at the time point T2 due to the heartbeat delay.
3. The node 03 is on line at T1, the node 03 can monitor that the node 02 is down according to the scheme in the foregoing embodiment, and can acquire the task list of the node 02 from the cache and put the tasks in the task list into the message queue, so that the tasks that are not completed by the node 02 are recovered to be executed, the node 03 can take over the tasks, and the tasks that are not completed by the node 01 cannot be taken over.
4. When the node 03 finds itself as the oldest node of the current cluster through the registry, that is, finds itself as the node with the longest duration of normal operation, a cluster recovery event is executed after 2 minutes (for example only).
5. The cluster recovery event acquires identification information of tasks in the cache, the execution time of which is earlier than the birth time of the oldest node (node 03) of the current cluster, of all the tasks according to the task list and the task executed time set of each node in the cache, and delivers the identification information of the tasks to the message queue again.
6. The node 03 obtains the identification information of the task from the message queue to consume the task, so that the task which is not processed and completed by the node 01 can be resumed to be executed.
In the embodiment of the application, even if all task processing nodes in the multi-node system are down, as long as one task processing node is on line, tasks which are not processed and completed by other task processing nodes can be continuously executed, so that the availability of the system is further improved.
In the aforementioned example, only the identification information of the first task that is not completed by the node is contained in the message queue, but the message queue may also include the identification information of the task that is normally allocated to the task processing node.
In an embodiment of the present application, the network controller further includes a task delivery client, and the message queue further includes identification information of a third task, where the identification information of the third task is delivered to the message queue by the task delivery client.
The task delivery client is responsible for sending identification information of tasks which need to be processed under normal conditions to the task arranging and managing framework, the task arranging and managing framework can process tasks which are not processed and completed by other nodes when other nodes are down, can process tasks which need to be executed under normal conditions, and for the tasks which are not processed and completed by other task processing nodes, the current task processing node can process the tasks which need to be processed under normal conditions.
Referring to fig. 3, the task delivery client may include various functional modules for generating network instructions, such as a network orchestration and computation module, a network model object management module, and a global quality monitoring traffic scheduling module, and the location of the task delivery client in the network controller is similar to the location of these modules in the network controller. Therefore, each task processing node can comprise a task delivery client and a task orchestration management framework, which are both embedded on the network controller of the task processing node.
FIG. 8 illustrates a task execution flow diagram based on a task orchestration management framework according to an embodiment of the application.
The components and corresponding functionality of the task orchestration management framework based system are described below.
(1) A client: and the network command generation is responsible for combining a group of network commands into a task to be submitted, wherein the client is equivalent to the task delivery client in the embodiment.
(2) Message queue: and the client and the bridge of the task arrangement management framework are responsible for peak clipping of the task quantity.
(3) Storage/caching: and persistence and caching of tasks and instructions are completed, the tasks are guaranteed not to be lost and restored, and backtracking and auditing can be realized.
(4) The registration center: and the system is responsible for service registration and discovery and is used for ensuring high availability and disaster tolerance of the task framework.
(5) Task orchestration management framework: the specific executor of the network instructions is responsible for executing a group of instructions in the tasks, and refreshes the snapshot all the time to ensure that the tasks are not lost, and can be taken over by task scheduling frames of other nodes when the network instructions are down.
In the above components, a client and a task orchestration management frame may be located on the same node, each node may include a client and a task orchestration management frame, the registry and the message queue may be located on any node, and the storage/cache may be located on one or more nodes; in addition, a plurality of message queues can be arranged to improve the availability of the system.
The task execution flow based on the task orchestration management framework in the embodiment of fig. 8 is as follows:
a) each task arranging and managing framework needs to go to a registration center to complete registration at the beginning of system starting, and monitors the states of other task arranging and managing frameworks, so as to ensure that when other frameworks/nodes are down, faults can be found and migrated in time, and the system is a system comprising the task arranging and managing frameworks and the components.
b) The client sends the task ID to the message queue according to the instruction of the user, and the client also puts the task ID and the corresponding task entity binary system into a cache.
c) The task orchestration management framework obtains the task ID from the message queue.
In an embodiment of the present application, the task orchestration management framework obtains, from the message queue, a task ID sent by a client corresponding to the task orchestration management framework.
In this case, the task list of the node in the cache in the foregoing embodiment may be written by the client, that is, the client specifies a task to be processed by the node, and for a task ID that is not processed and completed by other nodes, the task list may be written into the cache by the node that has acquired the task ID immediately after acquiring the task ID.
In other embodiments of the present application, the task orchestration management framework may also obtain task IDs sent by other clients.
d) And loading the binary system of the corresponding task entity according to the task ID.
e) And performing binary deserialization on the task entity to obtain the task.
f) And sequentially executing the network instructions in the tasks, and when the network instructions fail to be executed, automatically rolling back the executed network instructions by the framework, wherein the step is a key step for ensuring atomicity, and the network instructions are sent to the network equipment corresponding to the network instructions when the network instructions are executed.
g) And serializing the tasks to obtain a task binary system.
h) And (5) snapshotting the binary system of the task and brushing the snapshot into a cache. The snapshot is the binary task in the cache. The step is a key step of high availability and disaster tolerance, and the task snapshot is refreshed every time, so that when the frame/node is down, other frames/nodes can restore to execute the task according to the snapshot information, and the task is ensured to be completed.
i) And if the task is not completed, continuing to execute the step e) and the subsequent steps.
j) And (5) persisting the task to a magnetic disk for storage, and performing disk dropping.
In summary, according to the task processing method applied to the multi-node system in the embodiment of the present application, the atomicity of instruction arrangement can be ensured, the instruction arranging task is not lost after the controller is down, and the overall development efficiency can be improved.
The following describes an embodiment of an apparatus of the present application, which may be used to execute a task processing method applied to a multi-node system in the above-described embodiment of the present application. For details that are not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the task processing method applied to a multi-node system described above.
FIG. 9 shows a block diagram of a task processing device applied to a multi-node system according to one embodiment of the present application.
Referring to fig. 9, atask processing apparatus 900 applied to a multi-node system according to an embodiment of the present application includes: a listeningunit 910, anacquisition unit 920 and aprocessing unit 930.
Themonitoring unit 910 is configured to monitor the working states of the plurality of task processing nodes; the obtainingunit 920 is configured to obtain identification information of a task to be consumed from a message queue, where the message queue includes identification information of a first task that is not processed and completed by a first task processing node that is down, and the identification information of the first task is obtained and released to the message queue by a second task processing node after monitoring that the first task processing node is down; theprocessing unit 930 is configured to process the to-be-consumed task according to the acquired identification information of the to-be-consumed task.
In some embodiments of the present application, based on the foregoing scheme, the obtainingunit 920 is further configured to: if the first task processing node is monitored to be down, acquiring identification information of a first task which is not processed and completed by the first task processing node in a competitive mode with other task processing nodes in the plurality of task processing nodes; wherein the second task processing node comprises a task processing node that succeeds in contention.
In some embodiments of the present application, based on the foregoing solution, thetask processing apparatus 900 applied to the multi-node system is located in a designated task processing node of the plurality of task processing nodes, and the obtainingunit 920 is further configured to: if the designated task processing node is the task processing node with the longest continuous normal working time in the multi-node system, acquiring identification information of a second task which is not processed and completed in the multi-node system; and putting the identification information of the second task into the message queue.
In some embodiments of the present application, based on the foregoing scheme, the obtainingunit 920 is configured to: and if the designated task processing node is the task processing node with the longest continuous normal working time in the multi-node system, acquiring the identification information of the second task which is not processed and completed in the multi-node system after waiting for the preset time.
In some embodiments of the present application, based on the foregoing solution, before monitoring the working states of the plurality of task processing nodes, themonitoring unit 910 is further configured to: sending a registration request to a registration center, and establishing heartbeat connection with the registration center after the registration is successful; thelistening unit 910 is configured to: and monitoring the state of the heartbeat connection maintained in the registration center so as to monitor the working states of the plurality of task processing nodes.
In some embodiments of the present application, based on the foregoing solution, theprocessing unit 930 is configured to: loading task entity binary data corresponding to the identification information of the task to be consumed from a cache corresponding to the multi-node system; performing deserialization processing on the binary data of the task entity to obtain the task to be consumed; and processing the task to be consumed.
In some embodiments of the present application, based on the foregoing solution, the task to be consumed includes at least one network instruction, and the task entity binary data includes original binary data corresponding to the network instruction; theprocessing unit 930 is configured to: and sequentially executing the network instructions contained in the tasks to be consumed, wherein after each network instruction is successfully executed, each network instruction is serialized, and the binary data obtained after serialization replaces the corresponding original binary data of each network instruction in the cache.
In some embodiments of the present application, based on the foregoing solution, theprocessing unit 930 is further configured to: and if the network instruction contained in the task to be consumed fails to be executed, performing rollback processing on the executed network instruction in the task to be consumed so as to reprocess the task to be consumed.
In some embodiments of the present application, based on the foregoing solution, theprocessing unit 930 is further configured to: and after the task entity binary data corresponding to the identification information of the task to be consumed is loaded from the cache corresponding to the multi-node system, the task entity binary data in the cache is persisted to a disk.
In some embodiments of the present application, based on the foregoing solution, a network controller is deployed on each task processing node in the multi-node system, where the network controller includes a task orchestration management framework, and the task orchestration management framework is configured to sequentially execute network instructions included in the tasks to be consumed.
In some embodiments of the present application, based on the foregoing solution, the network controller further includes a remote invocation client, and theprocessing unit 930 is configured to: and sequentially sending the network instructions in the tasks to be consumed to the remote call client so that the remote call client sequentially sends the network instructions to the network equipment corresponding to the network instructions.
In some embodiments of the present application, based on the foregoing scheme, the network controller further includes a task delivery client, and the message queue further includes identification information of a third task, where the identification information of the third task is delivered to the message queue by the task delivery client.
FIG. 10 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.
It should be noted that thecomputer system 1000 of the electronic device shown in fig. 10 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 10, thecomputer system 1000 includes a Central Processing Unit (CPU)1001 that can perform various appropriate actions and processes, such as performing the methods described in the above embodiments, according to a program stored in a Read-Only Memory (ROM) 1002 or a program loaded from astorage portion 1008 into a Random Access Memory (RAM) 1003. In theRAM 1003, various programs and data necessary for system operation are also stored. TheCPU 1001,ROM 1002, andRAM 1003 are connected to each other via abus 1004. An Input/Output (I/O) interface 1005 is also connected to thebus 1004.
The following components are connected to the I/O interface 1005: aninput section 1006 including a keyboard, a mouse, and the like; anoutput section 1007 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; astorage portion 1008 including a hard disk and the like; and acommunication section 1009 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. Thecommunication section 1009 performs communication processing via a network such as the internet. Thedriver 1010 is also connected to the I/O interface 1005 as necessary. A removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on thedrive 1010 as necessary, so that a computer program read out therefrom is mounted into thestorage section 1008 as necessary.
In particular, according to embodiments of the application, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through thecommunication part 1009 and/or installed from theremovable medium 1011. When the computer program is executed by a Central Processing Unit (CPU)1001, various functions defined in the system of the present application are executed.
It should be noted that the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.
As an aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method described in the above embodiments.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present application.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.