Movatterモバイル変換


[0]ホーム

URL:


CN113064744A - Task processing method and device, computer readable medium and electronic equipment - Google Patents

Task processing method and device, computer readable medium and electronic equipment
Download PDF

Info

Publication number
CN113064744A
CN113064744ACN202110492295.9ACN202110492295ACN113064744ACN 113064744 ACN113064744 ACN 113064744ACN 202110492295 ACN202110492295 ACN 202110492295ACN 113064744 ACN113064744 ACN 113064744A
Authority
CN
China
Prior art keywords
task
node
task processing
consumed
identification information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110492295.9A
Other languages
Chinese (zh)
Other versions
CN113064744B (en
Inventor
王晓晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co LtdfiledCriticalTencent Technology Shenzhen Co Ltd
Priority to CN202110492295.9ApriorityCriticalpatent/CN113064744B/en
Publication of CN113064744ApublicationCriticalpatent/CN113064744A/en
Application grantedgrantedCritical
Publication of CN113064744BpublicationCriticalpatent/CN113064744B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本申请的实施例提供了一种应用于多节点系统的任务处理方法、装置、计算机可读介质及电子设备,所述多节点系统包括多个任务处理节点,该方法包括:监听所述多个任务处理节点的工作状态;从消息队列中获取待消费任务的标识信息,所述消息队列中包含有发生宕机的第一任务处理节点未处理完成的第一任务的标识信息,所述第一任务的标识信息是由第二任务处理节点在监听到所述第一任务处理节点宕机后获取并投放至所述消息队列中的;根据获取到的待消费任务的标识信息,处理所述待消费任务。本申请实施例能够在任务处理节点发生宕机时,保证由该任务处理节点处理的任务能够及时恢复执行,提高了系统的容灾能力,也大大提高了系统的可用性。

Figure 202110492295

Embodiments of the present application provide a task processing method, apparatus, computer-readable medium, and electronic device applied to a multi-node system, where the multi-node system includes a plurality of task processing nodes, and the method includes: monitoring the plurality of task processing nodes. The working state of the task processing node; the identification information of the task to be consumed is obtained from the message queue, and the message queue contains the identification information of the first task that has not been processed and completed by the first task processing node that is down, and the first task processing node The identification information of the task is obtained by the second task processing node after monitoring the downtime of the first task processing node and put it into the message queue; consumption task. The embodiments of the present application can ensure that the task processed by the task processing node can be resumed and executed in time when the task processing node is down, thereby improving the disaster tolerance capability of the system and greatly improving the availability of the system.

Figure 202110492295

Description

Task processing method and device, computer readable medium and electronic equipment
Technical Field
The present application relates to the field of task processing technologies, and in particular, to a task processing method and apparatus applied to a multi-node system, a computer-readable medium, and an electronic device.
Background
In a task processing system, a task distributor generally distributes tasks to task processing nodes, and the task processing nodes process the distributed tasks. However, in the related art, once a task processing node fails or goes down, the task processing node cannot process a task assigned thereto, so that the task cannot be processed, thereby making the entire task processing system low in availability.
Disclosure of Invention
Embodiments of the present application provide a task processing method and apparatus, a computer-readable medium, and an electronic device applied to a multi-node system, so that it is at least possible to avoid that a task that is allocated cannot be processed due to a downtime of a task processing node to a certain extent, and to improve disaster recovery capability and availability of the system.
Other features and advantages of the present application will be apparent from the following detailed description, or may be learned by practice of the application.
According to an aspect of an embodiment of the present application, there is provided a task processing method applied to a multi-node system including a plurality of task processing nodes, the method including: monitoring the working states of the plurality of task processing nodes; acquiring identification information of a task to be consumed from a message queue, wherein the message queue comprises identification information of a first task which is not processed and completed by a first task processing node which is down, and the identification information of the first task is acquired and released into the message queue by a second task processing node after monitoring that the first task processing node is down; and processing the task to be consumed according to the acquired identification information of the task to be consumed.
According to an aspect of an embodiment of the present application, there is provided a task processing apparatus applied to a multi-node system including a plurality of task processing nodes, the apparatus including: the monitoring unit is used for monitoring the working states of the plurality of task processing nodes; the system comprises an acquisition unit, a message queue and a processing unit, wherein the acquisition unit is used for acquiring identification information of a task to be consumed from the message queue, the message queue comprises identification information of a first task which is not processed and completed by a first task processing node which is down, and the identification information of the first task is acquired by a second task processing node after monitoring that the first task processing node is down and is put into the message queue; and the processing unit is used for processing the tasks to be consumed according to the acquired identification information of the tasks to be consumed.
In some embodiments of the present application, based on the foregoing solution, the obtaining unit is further configured to: if the first task processing node is monitored to be down, acquiring identification information of a first task which is not processed and completed by the first task processing node in a competitive mode with other task processing nodes in the plurality of task processing nodes; wherein the second task processing node comprises a task processing node that succeeds in contention.
In some embodiments of the present application, based on the foregoing solution, the task processing apparatus applied to the multi-node system is located in a designated task processing node among the plurality of task processing nodes, and the obtaining unit is further configured to: if the designated task processing node is the task processing node with the longest continuous normal working time in the multi-node system, acquiring identification information of a second task which is not processed and completed in the multi-node system; and putting the identification information of the second task into the message queue.
In some embodiments of the present application, based on the foregoing scheme, the obtaining unit is configured to: and if the designated task processing node is the task processing node with the longest continuous normal working time in the multi-node system, acquiring the identification information of the second task which is not processed and completed in the multi-node system after waiting for the preset time.
In some embodiments of the present application, based on the foregoing solution, before monitoring the working states of the plurality of task processing nodes, the monitoring unit is further configured to: sending a registration request to a registration center, and establishing heartbeat connection with the registration center after the registration is successful; the listening unit is configured to: and monitoring the state of the heartbeat connection maintained in the registration center so as to monitor the working states of the plurality of task processing nodes.
In some embodiments of the present application, based on the foregoing solution, the processing unit is configured to: loading task entity binary data corresponding to the identification information of the task to be consumed from a cache corresponding to the multi-node system; performing deserialization processing on the binary data of the task entity to obtain the task to be consumed; and processing the task to be consumed.
In some embodiments of the present application, based on the foregoing solution, the task to be consumed includes at least one network instruction, and the task entity binary data includes original binary data corresponding to the network instruction; the processing unit is configured to: and sequentially executing the network instructions contained in the tasks to be consumed, wherein after each network instruction is successfully executed, each network instruction is serialized, and the binary data obtained after serialization replaces the corresponding original binary data of each network instruction in the cache.
In some embodiments of the present application, based on the foregoing solution, the processing unit is further configured to: and if the network instruction contained in the task to be consumed fails to be executed, performing rollback processing on the executed network instruction in the task to be consumed so as to reprocess the task to be consumed.
In some embodiments of the present application, based on the foregoing solution, the processing unit is further configured to: and after the task entity binary data corresponding to the identification information of the task to be consumed is loaded from the cache corresponding to the multi-node system, the task entity binary data in the cache is persisted to a disk.
In some embodiments of the present application, based on the foregoing solution, a network controller is deployed on each task processing node in the multi-node system, where the network controller includes a task orchestration management framework, and the task orchestration management framework is configured to sequentially execute network instructions included in the tasks to be consumed.
In some embodiments of the present application, based on the foregoing solution, the network controller further includes a remote invocation client, and the processing unit is configured to: and sequentially sending the network instructions in the tasks to be consumed to the remote call client so that the remote call client sequentially sends the network instructions to the network equipment corresponding to the network instructions.
In some embodiments of the present application, based on the foregoing scheme, the network controller further includes a task delivery client, and the message queue further includes identification information of a third task, where the identification information of the third task is delivered to the message queue by the task delivery client.
According to an aspect of embodiments of the present application, there is provided a computer-readable medium on which a computer program is stored, the computer program, when executed by a processor, implementing a task processing method applied to a multi-node system as described in the above embodiments.
According to an aspect of an embodiment of the present application, there is provided an electronic device including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the task processing method applied to the multi-node system as described in the above embodiments.
In the technical solutions provided in some embodiments of the present application, when a task processing node is down, other task processing nodes can re-deliver identification information of a task that is not processed and completed by the task processing node to a message queue, and the other task processing nodes can acquire identification information of a task to be consumed from the message queue, so that the task that is not processed and completed by the task processing node can be processed. Therefore, the embodiment of the application allows the task processing nodes in the multi-node system to be down, even if one task processing node is down, the tasks processed by the task processing node are not lost, the tasks which are not completed by the task processing node can be processed in time, the disaster tolerance capability of the system is improved, and the availability of the system is also greatly improved; in addition, the task processing node processes the tasks to be consumed according to the acquired identification information of the tasks to be consumed, so that the decoupling of task allocation and task processing is realized, and the message queue can also carry out peak clipping on the tasks, thereby further improving the reliability of the system and reducing the complexity of the system.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:
FIG. 1 shows a schematic diagram of an exemplary system architecture to which aspects of embodiments of the present application may be applied;
FIG. 2 illustrates a flow diagram of a method of task processing as applied to a multi-node system according to one embodiment of the present application;
FIG. 3 illustrates an organizational form diagram of a network controller and task orchestration management framework on a node according to one embodiment of the present application;
FIG. 4 shows a flowchart ofsteps preceding step 220 in FIG. 2 and details ofstep 220 according to one embodiment of the present application;
FIG. 5 illustrates a process flow diagram of a multi-node system when a node is down according to one embodiment of the present application;
FIG. 6 shows a flowchart for processing a task to be consumed according to acquired identification information of the task to be consumed according to an embodiment of the present application;
FIG. 7 illustrates a process flow diagram for a multi-node system when nodes are all down according to one embodiment of the present application;
FIG. 8 illustrates a task execution flow diagram based on a task orchestration management framework according to an embodiment of the application;
FIG. 9 shows a block diagram of a task processing device applied to a multi-node system, according to an embodiment of the present application;
FIG. 10 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the application.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
In the field of network control, arranging instructions of a network controller is an important task in the field of network control. The inventor of the present application finds that the following characteristics need to be satisfied in order to ensure that the network instruction arrangement can be smoothly performed:
firstly, because the instructions have a mutual dependence relationship, for the instruction arrangement, the atomicity of the arrangement is guaranteed to be crucial; secondly, if the arranged instructions or tasks are lost, the success rate of the instruction arrangement is directly influenced, so that the network controller also needs to have stronger disaster tolerance capability to ensure that the instructions and tasks arranged after the network controller is down are not lost; moreover, for the development of the command arrangement of the controller, each network controller has its own logic, so that the overall development efficiency needs to be improved.
However, the traditional network controller instruction arrangement process is difficult to ensure the atomicity of a group of instructions, and usually, error correction can only be completed through later configuration examination; in addition, the network controller does not have disaster tolerance capability, and once software errors occur, the instructions cannot be recovered, so that the instruction arrangement effect is greatly influenced; in addition, at present, a corresponding instruction arrangement strategy needs to be separately developed for each set of network controller, so that the development efficiency of the whole system is low, and the development cost is high.
In the related art, the distribution and execution of tasks may be implemented using some open-source task management framework, which may include, for example, an AirFlow. The task management frameworks realize the arrangement of the network instructions by embedding the network instructions into the tasks.
However, the following drawbacks still exist with the task management framework:
first, these open source task management frameworks are almost independently deployed and cannot be embedded as components in the controller, which greatly delays the issuance of network instructions.
Secondly, these task management frameworks often do not have a rollback configuration, and are difficult to implement atomicity of a set of instruction layout, or difficult to implement atomicity of a set of instruction layout conveniently.
Thirdly, the existing open source task management framework has insufficient support for disaster tolerance capability, and when a node hangs up, a task is easily lost.
To this end, the present application first provides a task processing method applied to a multi-node system. The task herein may be any task that can be represented by program code and used for processing a certain flow, such as a task that implements network instruction arrangement. Tasks are represented in program code as certain entities, for example, in a Java program, tasks may be represented in the form of objects. Therefore, the task processing method applied to the multi-node system provided by the embodiment of the application can be applied to various task processing scenarios.
When the task processing method applied to the multi-node system is used for a scene of arranging network instructions, the defects can be overcome, and the characteristics required by smoothly finishing the instruction arrangement can be realized.
Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of the embodiments of the present application can be applied. This is described below in terms of the system architecture shown in fig. 1 for network instruction orchestration.
As shown in fig. 1, thesystem architecture 100 may include a network device (e.g., one or more of theswitches 101,routers 102, andgateway devices 103 shown in fig. 1, but various other network devices, etc.), anetwork 104, and amulti-node system 110. Themulti-node system 110 includes afirst server 111, asecond server 112, and athird server 113, and each of the servers included in themulti-node system 110 is a task processing node in themulti-node system 110.Network 104 is used to provide a medium for communication links between end devices and task processing nodes inmulti-node system 110.Network 104 may include various connection types, such as wired communication links, wireless communication links, and so forth.
It should be understood that the number of task processing nodes in the terminal device, network, and multi-node system in FIG. 1 is merely illustrative. The multi-node system can comprise any number of task processing nodes, and each task processing node can even be formed by a cluster. For example, thefirst server 111 may be a server cluster composed of a plurality of servers.
The task processing node in themulti-node system 110 organizes a set of network instructions included in the task by processing the task and sends the network instructions to the network device corresponding to the network instructions. When the task processing method applied to the multi-node system provided by the embodiment of the application is used in a scenario of arranging network instructions, if a downtime event occurs in a process of processing a task by one task processing node, then, by the task processing method applied to the multi-node system provided by the embodiment of the application, each task processing node can acquire the identification information of the task to be consumed from the message queue, the message queue contains the identification information of the tasks which are not processed and completed by the down task processing node, therefore, the task processing node which does not generate downtime can acquire the identification information of the tasks which are not processed and completed by the task processing node which generates downtime, therefore, other tasks which are not processed by the task processing node and are not completed by the task processing node can be processed by other task processing nodes which do not generate the downtime event, and the distributed tasks can be guaranteed to be effectively processed.
It should be noted that, although the embodiment of the present application is used in a scenario in which a network instruction is arranged, in practice, the present application may be applied to any task processing flow, such as in a control process of a workflow; although the task processing node is a server in the embodiment of the present application, in other embodiments of the present application, the task processing node may be any type of terminal device, and the task processing node may be not only a physical node but also a virtualized node, and in addition, the task processing node may also be a cluster of one server; moreover, although the multi-node system in the embodiment of the present application only includes the task processing node, the multi-node system may further include other various types of entities such as a database, a message queue, and the like.
Moreover, it is easy to understand that the task processing method applied to the multi-node system provided by the embodiments of the present application is generally executed by a server, and accordingly, the task processing device applied to the multi-node system is generally disposed in the server. However, in other embodiments of the present application, the terminal device may also have a similar function as the server, so as to execute the solution applied to the task processing of the multi-node system provided in the embodiments of the present application.
The task processing method applied to the multi-node system provided by the embodiment of the application can be applied to the field of cloud computing (cloud computing) or cloud storage (cloud storage), and can also be applied to a Block Chain (Block Chain) network.
Cloud computing is a computing model that distributes computing tasks over a resource pool of large numbers of computers, enabling various application systems to obtain computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". Resources in the "cloud" appear to the user as being infinitely expandable and available at any time, available on demand, expandable at any time, and paid for on-demand.
As a basic capability provider of cloud computing, a cloud computing resource pool (called as an ifas (Infrastructure as a Service) platform for short is established, and multiple types of virtual resources are deployed in the resource pool and are selectively used by external clients.
According to the logic function division, a PaaS (Platform as a Service) layer can be deployed on an IaaS (Infrastructure as a Service) layer, a SaaS (Software as a Service) layer is deployed on the PaaS layer, and the SaaS can be directly deployed on the IaaS. PaaS is a platform on which software runs, such as a database, a web container, etc. SaaS is a variety of business software, such as web portal, sms, and mass texting. Generally speaking, SaaS and PaaS are upper layers relative to IaaS.
Cloud storage is a new concept extended and developed from a cloud computing concept, and a distributed cloud storage system (hereinafter referred to as a storage system) refers to a storage system which integrates a large number of storage devices (storage devices are also referred to as storage nodes) of different types in a network through application software or application interfaces to cooperatively work through functions of cluster application, a grid technology, a distributed storage file system and the like, and provides data storage and service access functions to the outside.
At present, a storage method of a storage system is as follows: logical volumes are created, and when created, each logical volume is allocated physical storage space, which may be the disk composition of a certain storage device or of several storage devices. The client stores data on a certain logical volume, that is, the data is stored on a file system, the file system divides the data into a plurality of parts, each part is an object, the object not only contains the data but also contains additional information such as data identification (ID, ID entry), the file system writes each object into a physical storage space of the logical volume, and the file system records storage location information of each object, so that when the client requests to access the data, the file system can allow the client to access the data according to the storage location information of each object.
The process of allocating physical storage space for the logical volume by the storage system specifically includes: physical storage space is divided in advance into stripes according to a group of capacity measures of objects stored in a logical volume (the measures often have a large margin with respect to the capacity of the actual objects to be stored) and Redundant Array of Independent Disks (RAID), and one logical volume can be understood as one stripe, thereby allocating physical storage space to the logical volume.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.
The block chain underlying platform can comprise processing modules such as user management, basic service, intelligent contract and operation monitoring. The user management module is responsible for identity information management of all blockchain participants, and comprises public and private key generation maintenance (account management), key management, user real identity and blockchain address corresponding relation maintenance (authority management) and the like, and under the authorization condition, the user management module supervises and audits the transaction condition of certain real identities and provides rule configuration (wind control audit) of risk control; the basic service module is deployed on all block chain node equipment and used for verifying the validity of the service request, recording the service request to storage after consensus on the valid request is completed, for a new service request, the basic service firstly performs interface adaptation analysis and authentication processing (interface adaptation), then encrypts service information (consensus management) through a consensus algorithm, transmits the service information to a shared account (network communication) completely and consistently after encryption, and performs recording and storage; the intelligent contract module is responsible for registering and issuing contracts, triggering the contracts and executing the contracts, developers can define contract logics through a certain programming language, issue the contract logics to a block chain (contract registration), call keys or other event triggering and executing according to the logics of contract clauses, complete the contract logics and simultaneously provide the function of upgrading and canceling the contracts; the operation monitoring module is mainly responsible for deployment, configuration modification, contract setting, cloud adaptation in the product release process and visual output of real-time states in product operation, such as: alarm, monitoring network conditions, monitoring node equipment health status, and the like.
The platform product service layer provides basic capability and an implementation framework of typical application, and developers can complete block chain implementation of business logic based on the basic capability and the characteristics of the superposed business. The application service layer provides the application service based on the block chain scheme for the business participants to use.
When the embodiment of the application is applied to a scenario of arranging a Network instruction, the embodiment of the application may be particularly applied to the field of edge computing and a Network architecture of a Software Defined Network (SDN), and the arranged Network instruction may be used for controlling an edge gateway. The software defined network is a new open network architecture, realizes flexible scheduling of global view network resources and rapid deployment of new services by separating control and forwarding, centralizing control plane and opening and programming interface, and can simplify operation and maintenance, improve network resource utilization rate and improve customer perception.
The implementation details of the technical solution of the embodiment of the present application are set forth in detail below:
fig. 2 illustrates a flowchart of a task processing method applied to a multi-node system according to an embodiment of the present application, which may be performed by a device having computing and communication functions, such as thefirst server 111 illustrated in fig. 1, the multi-node system including a plurality of task processing nodes. Referring to fig. 2, the task processing method applied to the multi-node system at least comprises the following steps:
instep 220, the working states of the plurality of task processing nodes are monitored.
When the embodiment of the present application is used in a network instruction arrangement scenario, a task processing node may adopt an organization form shown in fig. 3, that is, the node includes a network controller, the network controller further includes a northbound control API, a task arrangement management frame, an RPC Client (Remote Procedure Call Client), and a function module for generating a network instruction, the function module for generating a network instruction specifically includes a network arrangement and calculation module, a network model object management module, and a global quality monitoring traffic scheduling module, where the northbound control API is a user management application program interface provided by the northbound of the network controller, and is used to provide an entry for network instruction arrangement; each functional module for generating the network instructions can generate the network configuration instructions through operations such as network model object management, network arrangement, calculation and the like; the task orchestration management framework may be configured to execute the task processing method applied to the multi-node system provided in the embodiment of the present application; the RPC Client is used for sending the network instruction from the task orchestration management framework to the network device corresponding to the network instruction. In the following, nodes and task processing nodes in a multi-node system are equivalent, unless otherwise specified.
Each task processing node in the multi-node system can monitor the working state of other task processing nodes, and the steps shown in the embodiment of FIG. 2 can be executed by any node in the multi-node system. The working state of the task processing node may include a normal state and a down state.
In one embodiment of the present application, the operational status of a task processing node in a multi-node system may be monitored in a variety of ways.
FIG. 4 shows a flowchart ofsteps preceding step 220 in FIG. 2 and details ofstep 220 according to one embodiment of the present application.
Please refer to fig. 4, which specifically includes the following steps:
instep 210, a registration request is sent to a registration center, and after the registration is successful, a heartbeat connection with the registration center is established.
The registration center is used for registering each task processing node in the multi-node system and carrying out heartbeat detection on each successfully registered task processing node. The task processing node performs registration by sending a registration request to the registration center, where the registration request may carry information related to the task processing node, such as an identifier of the task processing node.
The establishment of the heartbeat connection between the task processing node and the registration center is performed by sending a heartbeat packet, specifically, the task processing node sends the heartbeat packet to the registration center, or the registration center sends the heartbeat packet to the task processing node, and the registration center completes heartbeat detection according to the result of the received or sent heartbeat packet.
In step 220', the state of the heartbeat connection maintained in the registry is monitored to monitor the working states of the plurality of task processing nodes.
The status of the heartbeat connection maintained in the registry can be represented by a variety of data.
For example, registration information or identification information of a task processing node with normal heartbeat connection may be maintained in a registration center; when the task processing node monitors the registration center, all registration information or identification information which keeps normal heartbeat connection with the registration center is obtained from the registration center and is stored locally; when a task processing node goes down, the heartbeat of the task processing node in the registration center stops, and the registration center removes the registration information or the identification information of the task processing node. The other task processing nodes may determine that the task processing node is down according to that the registration information or the identification information of the task processing node is locally stored, but the registration information or the identification information of the task processing node is found to disappear in the registry.
For another example, the identification information and the state information of the task processing node may also be maintained in the registry, where the state information represents that the heartbeat of the corresponding task processing node in the registry is stopped or normal, for example, the state information may be represented by 0 or 1, where 0 may represent that the heartbeat of the corresponding task processing node in the registry is stopped, and 1 may represent that the heartbeat of the corresponding task processing node in the registry is normal; when the task processing node monitors the state of the heartbeat connection maintained in the registration center, the identification information and the corresponding state information of each task processing node are acquired, and if the acquired state information represents that the heartbeat stops, the working state of the corresponding task processing node can be determined to be down.
It can be seen that the manner of monitoring the operating status of the task processing nodes and determining whether the task processing nodes are down may be various and is not limited to the above.
Referring to fig. 2, in step 230, identification information of the tasks to be consumed is acquired from the message queue, where the message queue includes identification information of a first task that is not processed and completed by a first task processing node that is down, and the identification information of the first task is acquired by a second task processing node and is released into the message queue after monitoring that the first task processing node is down.
The identification information is information that can be used to uniquely identify a task, and is usually in a data format of a character string, and the identification information may be composed of various symbols such as letters and numbers, for example, the identification information may be a string of numbers.
Alternatively, the identification information of the task may be stored in a cache.
In one embodiment of the application, the identification information of the first task is posted to a location in the message queue that is consumed first.
In the embodiment of the present application, since the first task is a task that is not processed and completed by the first task processing node, the task that is not processed and completed can be preferentially processed by putting the identification information of the first task to the position that is consumed first in the message queue.
The first task processing node and the second task processing node are both nodes in a multi-node system. After monitoring that the first task processing node is down, the second task processing node may acquire identification information of the first task that is not processed and completed by the first task processing node, and place the identification information into the message queue.
In one embodiment of the present application, the method further comprises: if the first task processing node is monitored to be down, acquiring identification information of a first task which is not processed and completed by the first task processing node in a competitive mode with other task processing nodes in the plurality of task processing nodes; wherein the second task processing node comprises a task processing node that succeeds in the competition.
Specifically, the second task processing node may be a task processing node that succeeds in competition.
The step of acquiring the identification information of the first task by each task processing node in a competitive manner refers to a process of screening each task processing node according to a certain index to obtain the only task processing node which has the right to legally acquire the identification information of the first task.
The competition rules employed by the competition mode can be various. For example, the competition rule may be that the task processing node that first acquires the identification information of the first task is a task processing node that succeeds in acquiring the competition, and the task processing node broadcasts to other task processing nodes to stop the other task processing nodes from acquiring the identification information of the first task; the competition rule may also be that the task processing node that first acquires the identification information of the first task is a task processing node that succeeds in competition, and the task processing node broadcasts to other task processing nodes, so that the other task processing nodes do not have the right to deliver the identification information of the first task to the message queue even if acquiring the identification information of the first task; in addition, the competition rule may be that the task processing node with the lowest CPU utilization rate or memory utilization rate is the task processing node that succeeds in competition, and the advantage of using the competition rule is to ensure that the identification information of the first task can be timely and accurately released into the message queue.
In one embodiment of the present application, the second task processing node is a preset task processing node in the multi-node system.
For example, a task processing node with characteristics of high performance and configuration, high availability, and standby power supply may be preset in the multi-node system, and the task processing node is responsible for acquiring identification information of tasks that are not processed and completed by other task processing nodes. Since the preset task processing node itself has very high availability, if the task processing node is specified to acquire the identification information of the task, the high availability of the whole system can be ensured.
FIG. 5 is a process flow diagram of a multi-node system when a node is down according to an embodiment of the present application.
Referring to fig. 5, when a certain node goes down, the processing flow of the multi-node system may be as follows:
1. node 01, node 02 and node 03 are all registered in the registry and after the registration is successful a heartbeat connection is established with the registry.
2. The nodes 01 and 02 monitor the state of heartbeat connection maintained in the registration center, and when the heartbeat of the node 03 in the registration center stops, the nodes 01 and 02 monitor that the node 03 goes down.
3. The node 01 and the node 02 acquire a task list of the node 03 from a cache in a competitive manner, the task list of each node is stored in the cache, and the task list comprises one or more pieces of task identification information.
4. After the node 02 succeeds in competition, the node 02 successfully acquires the task list of the node 03, and pushes the identification information of the tasks which are not processed and completed by the node 03 in the task list to the message queue again.
5. Because the nodes 01 and 02 are not down, the nodes 01 and 02 can acquire the identification information of the tasks to be consumed from the message queue, so as to acquire the tasks to be consumed, wherein when the identification information of the tasks which are not processed and completed by the node 03 is acquired, the tasks which are not processed and completed by the node 03 can be processed.
Referring to fig. 2, instep 240, the task to be consumed is processed according to the obtained identification information of the task to be consumed.
After the identification information of the task to be consumed is acquired, the task to be consumed can be consumed, and then the task to be consumed is processed.
Any task processing node which does not go down in the multi-node system can acquire identification information of the tasks to be consumed from the message queue.
Under the condition that the current task processing node acquires the identification information of the first task, the current task processing node can process the first task; when the other task processing nodes acquire the identification information of the first task, the other task processing nodes may also perform processing on the first task.
Therefore, after the identification information of the first task is released into the message queue, no matter the identification information of the first task is acquired by the current task node or other task processing nodes, the recovery processing of the first task can be realized, and the availability of the system is greatly improved.
Fig. 6 shows a flowchart for processing a task to be consumed according to acquired identification information of the task to be consumed according to an embodiment of the present application. Referring to fig. 6, step 240 may specifically include the following steps:
and step 610, loading binary data of the task entity corresponding to the identification information of the task to be consumed from a cache corresponding to the multi-node system.
The binary data of the task entity and the identification information of the task to be consumed can be stored in the same cache, and can also be stored in different caches respectively.
The cache may belong to a multi-node system or may be located outside the multi-node system. The cache is a high-speed storage space commonly used by each task processing node in the multi-node system, the cache can be located on a physical node or a virtual node, and the physical substance of the cache can be a memory. The advantage of using the cache is that the loading efficiency of binary data of the task entity can be improved, so that the recovery efficiency of the task after the downtime of the task processing node can be improved, and the availability of the system is improved.
In other embodiments of the present application, the task entity binary data may also be stored on disk.
The task entity binary data is a byte sequence obtained by serializing the task, the task entity binary data is flushed into a cache when the task is allocated, and the task is stored in a task entity binary data mode, so that the task storage and transmission can be facilitated.
In one embodiment of the present application, the method further comprises:
and after the task entity binary data corresponding to the identification information of the task to be consumed is loaded from the cache corresponding to the multi-node system, the task entity binary data in the cache is persisted to a disk.
And step 620, performing deserialization processing on the binary data of the task entity to obtain the task to be consumed.
The binary data of the task entity is subjected to deserialization processing, an object of the task to be consumed can be obtained, and the object can be represented in a Java object form.
Step 630, the task to be consumed is processed.
In one embodiment of the application, the task to be consumed comprises at least one network instruction, and the task entity binary data comprises original binary data corresponding to the network instruction.
The step of processing the task to be consumed may comprise:
and sequentially executing the network instructions contained in the tasks to be consumed, wherein after each network instruction is successfully executed, each network instruction is serialized, and the binary data obtained after serialization replaces the corresponding original binary data of each network instruction in the cache.
When all network instructions in a task to be consumed are successfully executed, the original binary data corresponding to each network instruction in the cache is covered into binary data.
The binary data of the network instruction is the corresponding original binary data of the network instruction in the cache, and can also be called as a snapshot of the binary data of the task entity.
It is obvious that, in a scenario where a network instruction is arranged, because after the network instruction is successfully executed, the network instruction is serialized, and binary data obtained after the serialization is used to replace original binary data corresponding to the network instruction in a cache, where the original binary data and the binary data corresponding to one network instruction are different and represent different states of the network instruction, it may be determined that, if original binary data exists in the cache, the network instruction corresponding to the original binary data is not successfully executed, and if binary data exists in the cache, the network instruction corresponding to the binary data is successfully executed.
Based on this, after the identification information of the first task is put into the message queue, if a part of network instructions in the first task are executed successfully and another part of network instructions are not executed successfully, the cache includes binary data corresponding to the part of network instructions and original binary data corresponding to the another part of network instructions. At this time, after the designated task processing node in the multi-node system acquires the identification information of the first task, only the original binary data corresponding to the first task may be loaded, so that only the network instructions that are not successfully executed in the first task are processed, thereby preventing the network instructions that have been successfully executed in the first task from being repeatedly executed, and ensuring the accuracy of the instruction arrangement task recovery after the node is down.
On the basis, each task processing node can distinguish which tasks and which network instructions in the tasks are executed, and the scheme in the embodiment can be adjusted adaptively based on the characteristics.
Specifically, with continued reference to fig. 5, although the node that did not go down in fig. 5 obtains the task list of the node that went down, the task list may include both the identification information of the tasks that were completed by processing and the identification information of the tasks that were not completed by processing; however, in other embodiments, only the identification information of the tasks that are completed by the node that is not down in the task list may be obtained.
Moreover, when a node acquires the task list, the identification information of the tasks pushed to the message queue by the node can be identification information of unprocessed completed tasks in the task list, and can also be identification information of all tasks in the task list, so that the identification information of all tasks can be pushed to the message queue.
In one embodiment of the present application, the method further comprises: and after the task to be consumed is successfully executed, persisting the binary data corresponding to the task to be consumed in the cache into a disk.
According to the embodiment of the application, the binary data are subjected to persistence processing, the task processing records are subjected to disk dropping, and the processed tasks can be backtracked and audited afterwards.
The purpose of the persistence processing is to retain more data for backtracking and auditing, and thus, the way of implementing persistence in the embodiments of the present application may be arbitrary. The persistence processing can be performed at any stage of the task processing, the persistence processing can be performed periodically, the persistence processing can be performed according to a user instruction, only the binary data can be subjected to the persistence processing, and the persistence processing can be performed on the task entity binary data containing the original binary data.
In one embodiment of the present application, the step of processing the task to be consumed further comprises: and if the network instruction contained in the task to be consumed fails to be executed, performing rollback processing on the executed network instruction in the task to be consumed so as to reprocess the task to be consumed.
Specifically, the task to be consumed comprises a plurality of network instructions which need to be executed in sequence, and after one network instruction contained in the task to be consumed fails to be executed, the executed network instruction needs to be re-executed before the network instruction is executed, so that the atomicity of network instruction execution is ensured, the error in network instruction execution is avoided, and the reliability of network instruction execution is improved.
In an embodiment of the present application, a network controller is deployed on each task processing node in a multi-node system, and the network controller includes a task orchestration management framework, and the task orchestration management framework is configured to sequentially execute network instructions included in tasks to be consumed.
With continued reference to FIG. 3, there is shown an organization of a node in a multi-node system, which each node in the multi-node system may adopt.
In the embodiment of the application, the task arranging and managing framework is embedded into the network controller as a component, so that the delay of issuing the network instruction is greatly reduced, and meanwhile, as a set of task arranging and managing framework can be respectively embedded into the network controllers of different task processing nodes, the reusability of program codes is also improved; moreover, secondary development can be carried out based on the task arranging and managing framework, so that different network controllers can realize different instruction arranging strategies, and the development efficiency can be improved; in addition, each task orchestration management framework can be maintained and tested independently, and the coupling of the system is reduced.
In an embodiment of the application, the sequentially executing the network instructions included in the tasks to be consumed includes: and sequentially sending the network instructions contained in the tasks to be consumed to the network equipment corresponding to the network instructions.
The network device here may be various types of network devices such as an edge gateway.
The successful execution of the network instruction may indicate that the network instruction is successfully transmitted to the network device corresponding to the network instruction, and may also indicate that response information representing the successful execution of the network instruction returned by the network instruction is received after the network instruction is transmitted to the corresponding network device.
In an embodiment of the application, the network controller further includes a remote invocation client, and the step of sequentially executing the network instructions included in the task to be consumed may include: and sequentially sending the network instructions in the tasks to be consumed to the remote calling client so that the remote calling client sequentially sends the network instructions to the network equipment corresponding to the network instructions.
Referring to fig. 3, the RPC Client in the network controller is the remote call Client, so that the embodiment of the present application further implements the arrangement of the network instruction based on the remote call protocol.
In one embodiment of the present application, the method is performed by a designated task processing node of a plurality of task processing nodes, the method further comprising:
if the appointed task processing node is the task processing node with the longest continuous normal working time in the multi-node system, acquiring the identification information of a second task which is not processed and completed in the multi-node system;
and putting the identification information of the second task into the message queue.
The task processing nodes in the multi-node system can form a cluster, and the lasting normal working time of one task processing node is the longest, namely the task processing node is the task processing node which keeps the lasting duration of normal heartbeat connection between the task processing node and the registration center in the multi-node system, and can also be called as the oldest node in the cluster.
In an embodiment of the present application, if the designated task processing node is a task processing node that lasts for the longest normal working time in the multi-node system, the step of acquiring the identification information of the second task that is not processed and completed in the multi-node system may specifically include:
and if the appointed task processing node is the task processing node with the longest continuous normal working time in the multi-node system, acquiring the identification information of the second task which is not processed and completed in the multi-node system after waiting for the preset time.
By acquiring the identification information of the second task after waiting for the predetermined time, the backlog of tasks at the designated task processing node can be reduced, and the load of the designated task processing node can be reduced.
The predetermined time may be arbitrarily set as needed, and may be set to 2 minutes, for example.
Fig. 7 is a schematic diagram illustrating a processing flow of a multi-node system when nodes are all down according to an embodiment of the present application, where a schematic diagram illustrating a relationship between a node status and time is shown in an upper right corner of fig. 7. Referring to fig. 7, the processing flow of the multi-node system when all nodes are down may be as follows:
1. node 01 crashes and exits at point T0.
2. The node 02 crashes at the time point T0 and exits at the time point T2, wherein the reason why the time point of the crash of the node 02 is different from the time point of the exit is that the node 02 is detected as being crashed by the registry only at the time point T2 due to the heartbeat delay.
3. The node 03 is on line at T1, the node 03 can monitor that the node 02 is down according to the scheme in the foregoing embodiment, and can acquire the task list of the node 02 from the cache and put the tasks in the task list into the message queue, so that the tasks that are not completed by the node 02 are recovered to be executed, the node 03 can take over the tasks, and the tasks that are not completed by the node 01 cannot be taken over.
4. When the node 03 finds itself as the oldest node of the current cluster through the registry, that is, finds itself as the node with the longest duration of normal operation, a cluster recovery event is executed after 2 minutes (for example only).
5. The cluster recovery event acquires identification information of tasks in the cache, the execution time of which is earlier than the birth time of the oldest node (node 03) of the current cluster, of all the tasks according to the task list and the task executed time set of each node in the cache, and delivers the identification information of the tasks to the message queue again.
6. The node 03 obtains the identification information of the task from the message queue to consume the task, so that the task which is not processed and completed by the node 01 can be resumed to be executed.
In the embodiment of the application, even if all task processing nodes in the multi-node system are down, as long as one task processing node is on line, tasks which are not processed and completed by other task processing nodes can be continuously executed, so that the availability of the system is further improved.
In the aforementioned example, only the identification information of the first task that is not completed by the node is contained in the message queue, but the message queue may also include the identification information of the task that is normally allocated to the task processing node.
In an embodiment of the present application, the network controller further includes a task delivery client, and the message queue further includes identification information of a third task, where the identification information of the third task is delivered to the message queue by the task delivery client.
The task delivery client is responsible for sending identification information of tasks which need to be processed under normal conditions to the task arranging and managing framework, the task arranging and managing framework can process tasks which are not processed and completed by other nodes when other nodes are down, can process tasks which need to be executed under normal conditions, and for the tasks which are not processed and completed by other task processing nodes, the current task processing node can process the tasks which need to be processed under normal conditions.
Referring to fig. 3, the task delivery client may include various functional modules for generating network instructions, such as a network orchestration and computation module, a network model object management module, and a global quality monitoring traffic scheduling module, and the location of the task delivery client in the network controller is similar to the location of these modules in the network controller. Therefore, each task processing node can comprise a task delivery client and a task orchestration management framework, which are both embedded on the network controller of the task processing node.
FIG. 8 illustrates a task execution flow diagram based on a task orchestration management framework according to an embodiment of the application.
The components and corresponding functionality of the task orchestration management framework based system are described below.
(1) A client: and the network command generation is responsible for combining a group of network commands into a task to be submitted, wherein the client is equivalent to the task delivery client in the embodiment.
(2) Message queue: and the client and the bridge of the task arrangement management framework are responsible for peak clipping of the task quantity.
(3) Storage/caching: and persistence and caching of tasks and instructions are completed, the tasks are guaranteed not to be lost and restored, and backtracking and auditing can be realized.
(4) The registration center: and the system is responsible for service registration and discovery and is used for ensuring high availability and disaster tolerance of the task framework.
(5) Task orchestration management framework: the specific executor of the network instructions is responsible for executing a group of instructions in the tasks, and refreshes the snapshot all the time to ensure that the tasks are not lost, and can be taken over by task scheduling frames of other nodes when the network instructions are down.
In the above components, a client and a task orchestration management frame may be located on the same node, each node may include a client and a task orchestration management frame, the registry and the message queue may be located on any node, and the storage/cache may be located on one or more nodes; in addition, a plurality of message queues can be arranged to improve the availability of the system.
The task execution flow based on the task orchestration management framework in the embodiment of fig. 8 is as follows:
a) each task arranging and managing framework needs to go to a registration center to complete registration at the beginning of system starting, and monitors the states of other task arranging and managing frameworks, so as to ensure that when other frameworks/nodes are down, faults can be found and migrated in time, and the system is a system comprising the task arranging and managing frameworks and the components.
b) The client sends the task ID to the message queue according to the instruction of the user, and the client also puts the task ID and the corresponding task entity binary system into a cache.
c) The task orchestration management framework obtains the task ID from the message queue.
In an embodiment of the present application, the task orchestration management framework obtains, from the message queue, a task ID sent by a client corresponding to the task orchestration management framework.
In this case, the task list of the node in the cache in the foregoing embodiment may be written by the client, that is, the client specifies a task to be processed by the node, and for a task ID that is not processed and completed by other nodes, the task list may be written into the cache by the node that has acquired the task ID immediately after acquiring the task ID.
In other embodiments of the present application, the task orchestration management framework may also obtain task IDs sent by other clients.
d) And loading the binary system of the corresponding task entity according to the task ID.
e) And performing binary deserialization on the task entity to obtain the task.
f) And sequentially executing the network instructions in the tasks, and when the network instructions fail to be executed, automatically rolling back the executed network instructions by the framework, wherein the step is a key step for ensuring atomicity, and the network instructions are sent to the network equipment corresponding to the network instructions when the network instructions are executed.
g) And serializing the tasks to obtain a task binary system.
h) And (5) snapshotting the binary system of the task and brushing the snapshot into a cache. The snapshot is the binary task in the cache. The step is a key step of high availability and disaster tolerance, and the task snapshot is refreshed every time, so that when the frame/node is down, other frames/nodes can restore to execute the task according to the snapshot information, and the task is ensured to be completed.
i) And if the task is not completed, continuing to execute the step e) and the subsequent steps.
j) And (5) persisting the task to a magnetic disk for storage, and performing disk dropping.
In summary, according to the task processing method applied to the multi-node system in the embodiment of the present application, the atomicity of instruction arrangement can be ensured, the instruction arranging task is not lost after the controller is down, and the overall development efficiency can be improved.
The following describes an embodiment of an apparatus of the present application, which may be used to execute a task processing method applied to a multi-node system in the above-described embodiment of the present application. For details that are not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the task processing method applied to a multi-node system described above.
FIG. 9 shows a block diagram of a task processing device applied to a multi-node system according to one embodiment of the present application.
Referring to fig. 9, atask processing apparatus 900 applied to a multi-node system according to an embodiment of the present application includes: a listeningunit 910, anacquisition unit 920 and aprocessing unit 930.
Themonitoring unit 910 is configured to monitor the working states of the plurality of task processing nodes; the obtainingunit 920 is configured to obtain identification information of a task to be consumed from a message queue, where the message queue includes identification information of a first task that is not processed and completed by a first task processing node that is down, and the identification information of the first task is obtained and released to the message queue by a second task processing node after monitoring that the first task processing node is down; theprocessing unit 930 is configured to process the to-be-consumed task according to the acquired identification information of the to-be-consumed task.
In some embodiments of the present application, based on the foregoing scheme, the obtainingunit 920 is further configured to: if the first task processing node is monitored to be down, acquiring identification information of a first task which is not processed and completed by the first task processing node in a competitive mode with other task processing nodes in the plurality of task processing nodes; wherein the second task processing node comprises a task processing node that succeeds in contention.
In some embodiments of the present application, based on the foregoing solution, thetask processing apparatus 900 applied to the multi-node system is located in a designated task processing node of the plurality of task processing nodes, and the obtainingunit 920 is further configured to: if the designated task processing node is the task processing node with the longest continuous normal working time in the multi-node system, acquiring identification information of a second task which is not processed and completed in the multi-node system; and putting the identification information of the second task into the message queue.
In some embodiments of the present application, based on the foregoing scheme, the obtainingunit 920 is configured to: and if the designated task processing node is the task processing node with the longest continuous normal working time in the multi-node system, acquiring the identification information of the second task which is not processed and completed in the multi-node system after waiting for the preset time.
In some embodiments of the present application, based on the foregoing solution, before monitoring the working states of the plurality of task processing nodes, themonitoring unit 910 is further configured to: sending a registration request to a registration center, and establishing heartbeat connection with the registration center after the registration is successful; thelistening unit 910 is configured to: and monitoring the state of the heartbeat connection maintained in the registration center so as to monitor the working states of the plurality of task processing nodes.
In some embodiments of the present application, based on the foregoing solution, theprocessing unit 930 is configured to: loading task entity binary data corresponding to the identification information of the task to be consumed from a cache corresponding to the multi-node system; performing deserialization processing on the binary data of the task entity to obtain the task to be consumed; and processing the task to be consumed.
In some embodiments of the present application, based on the foregoing solution, the task to be consumed includes at least one network instruction, and the task entity binary data includes original binary data corresponding to the network instruction; theprocessing unit 930 is configured to: and sequentially executing the network instructions contained in the tasks to be consumed, wherein after each network instruction is successfully executed, each network instruction is serialized, and the binary data obtained after serialization replaces the corresponding original binary data of each network instruction in the cache.
In some embodiments of the present application, based on the foregoing solution, theprocessing unit 930 is further configured to: and if the network instruction contained in the task to be consumed fails to be executed, performing rollback processing on the executed network instruction in the task to be consumed so as to reprocess the task to be consumed.
In some embodiments of the present application, based on the foregoing solution, theprocessing unit 930 is further configured to: and after the task entity binary data corresponding to the identification information of the task to be consumed is loaded from the cache corresponding to the multi-node system, the task entity binary data in the cache is persisted to a disk.
In some embodiments of the present application, based on the foregoing solution, a network controller is deployed on each task processing node in the multi-node system, where the network controller includes a task orchestration management framework, and the task orchestration management framework is configured to sequentially execute network instructions included in the tasks to be consumed.
In some embodiments of the present application, based on the foregoing solution, the network controller further includes a remote invocation client, and theprocessing unit 930 is configured to: and sequentially sending the network instructions in the tasks to be consumed to the remote call client so that the remote call client sequentially sends the network instructions to the network equipment corresponding to the network instructions.
In some embodiments of the present application, based on the foregoing scheme, the network controller further includes a task delivery client, and the message queue further includes identification information of a third task, where the identification information of the third task is delivered to the message queue by the task delivery client.
FIG. 10 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.
It should be noted that thecomputer system 1000 of the electronic device shown in fig. 10 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 10, thecomputer system 1000 includes a Central Processing Unit (CPU)1001 that can perform various appropriate actions and processes, such as performing the methods described in the above embodiments, according to a program stored in a Read-Only Memory (ROM) 1002 or a program loaded from astorage portion 1008 into a Random Access Memory (RAM) 1003. In theRAM 1003, various programs and data necessary for system operation are also stored. TheCPU 1001,ROM 1002, andRAM 1003 are connected to each other via abus 1004. An Input/Output (I/O) interface 1005 is also connected to thebus 1004.
The following components are connected to the I/O interface 1005: aninput section 1006 including a keyboard, a mouse, and the like; anoutput section 1007 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; astorage portion 1008 including a hard disk and the like; and acommunication section 1009 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. Thecommunication section 1009 performs communication processing via a network such as the internet. Thedriver 1010 is also connected to the I/O interface 1005 as necessary. A removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on thedrive 1010 as necessary, so that a computer program read out therefrom is mounted into thestorage section 1008 as necessary.
In particular, according to embodiments of the application, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through thecommunication part 1009 and/or installed from theremovable medium 1011. When the computer program is executed by a Central Processing Unit (CPU)1001, various functions defined in the system of the present application are executed.
It should be noted that the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.
As an aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method described in the above embodiments.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present application.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (15)

Translated fromChinese
1.一种应用于多节点系统的任务处理方法,其特征在于,所述多节点系统包括多个任务处理节点,所述方法包括:1. A task processing method applied to a multi-node system, wherein the multi-node system comprises a plurality of task processing nodes, and the method comprises:监听所述多个任务处理节点的工作状态;monitoring the working status of the multiple task processing nodes;从消息队列中获取待消费任务的标识信息,所述消息队列中包含有发生宕机的第一任务处理节点未处理完成的第一任务的标识信息,所述第一任务的标识信息是由第二任务处理节点在监听到所述第一任务处理节点宕机后获取并投放至所述消息队列中的;The identification information of the task to be consumed is obtained from the message queue, where the message queue contains the identification information of the first task that has not been processed and completed by the first task processing node that has crashed, and the identification information of the first task is determined by the first task processing node. The second task processing node acquires and puts it into the message queue after monitoring that the first task processing node is down;根据获取到的待消费任务的标识信息,处理所述待消费任务。The task to be consumed is processed according to the acquired identification information of the task to be consumed.2.根据权利要求1所述的应用于多节点系统的任务处理方法,其特征在于,所述方法还包括:2. The task processing method applied to a multi-node system according to claim 1, wherein the method further comprises:若监听到所述第一任务处理节点发生宕机,则与所述多个任务处理节点中的其它任务处理节点通过竞争的方式获取所述第一任务处理节点未处理完成的第一任务的标识信息;其中,所述第二任务处理节点包括在竞争中取得成功的任务处理节点。If it is detected that the first task processing node is down, compete with other task processing nodes in the plurality of task processing nodes to obtain the identifier of the first task that has not been processed and completed by the first task processing node information; wherein, the second task processing node includes a task processing node that succeeds in the competition.3.根据权利要求1所述的应用于多节点系统的任务处理方法,其特征在于,所述方法由所述多个任务处理节点中的指定任务处理节点执行,所述方法还包括:3. The task processing method applied to a multi-node system according to claim 1, wherein the method is executed by a designated task processing node in the plurality of task processing nodes, and the method further comprises:若所述指定任务处理节点是所述多节点系统中持续正常工作时间最长的任务处理节点,则获取所述多节点系统中未处理完成的第二任务的标识信息;If the designated task processing node is the task processing node with the longest continuous working time in the multi-node system, acquiring the identification information of the unprocessed second task in the multi-node system;将所述第二任务的标识信息投放至所述消息队列中。Putting the identification information of the second task into the message queue.4.根据权利要求3所述的应用于多节点系统的任务处理方法,其特征在于,所述若所述指定任务处理节点是所述多节点系统中持续正常工作时间最长的任务处理节点,则获取所述多节点系统中未处理完成的第二任务的标识信息,包括:4. The task processing method applied to a multi-node system according to claim 3, wherein, if the designated task processing node is the task processing node with the longest continuous normal working time in the multi-node system, then acquiring the identification information of the unprocessed second task in the multi-node system, including:若所述指定任务处理节点是所述多节点系统中持续正常工作时间最长的任务处理节点,则在等待预定时间之后获取所述多节点系统中未处理完成的第二任务的标识信息。If the designated task processing node is the task processing node with the longest continuous working time in the multi-node system, the identification information of the unprocessed second task in the multi-node system is acquired after waiting for a predetermined time.5.根据权利要求1所述的应用于多节点系统的任务处理方法,其特征在于,在监听所述多个任务处理节点的工作状态之前,所述方法还包括:5. The task processing method applied to a multi-node system according to claim 1, wherein before monitoring the working status of the multiple task processing nodes, the method further comprises:向注册中心发送注册请求,并在注册成功后建立与所述注册中心之间的心跳连接;Send a registration request to the registration center, and establish a heartbeat connection with the registration center after the registration is successful;所述监听所述多个任务处理节点的工作状态,包括:The monitoring of the working status of the multiple task processing nodes includes:对所述注册中心中维护的心跳连接的状态进行监听,以监听所述多个任务处理节点的工作状态。The state of the heartbeat connection maintained in the registration center is monitored, so as to monitor the working state of the multiple task processing nodes.6.根据权利要求1所述的应用于多节点系统的任务处理方法,其特征在于,所述根据获取到的待消费任务的标识信息,处理所述待消费任务,包括:6. The task processing method applied to a multi-node system according to claim 1, wherein the processing of the task to be consumed according to the acquired identification information of the task to be consumed comprises:从所述多节点系统对应的缓存中加载所述待消费任务的标识信息所对应的任务实体二进制数据;Load the task entity binary data corresponding to the identification information of the task to be consumed from the cache corresponding to the multi-node system;对所述任务实体二进制数据进行反序列化处理,得到所述待消费任务;Deserialize the binary data of the task entity to obtain the task to be consumed;处理所述待消费任务。Process the to-be-consumed task.7.根据权利要求6所述的应用于多节点系统的任务处理方法,其特征在于,所述待消费任务包括至少一个网络指令,所述任务实体二进制数据包括与网络指令对应的原始二进制数据;7. The task processing method applied to a multi-node system according to claim 6, wherein the task to be consumed comprises at least one network command, and the task entity binary data comprises original binary data corresponding to the network command;所述处理所述待消费任务,包括:The processing of the task to be consumed includes:依次执行所述待消费任务中包含的网络指令,其中,在每个网络指令执行成功之后,将所述每个网络指令进行序列化处理,并通过序列化处理后得到的二进制数据替换所述每个网络指令在所述缓存中对应的原始二进制数据。Execute the network commands included in the tasks to be consumed in sequence, wherein, after each network command is successfully executed, perform serialization processing on each network command, and replace the each network command with binary data obtained after serialization processing. The original binary data corresponding to each network instruction in the cache.8.根据权利要求7所述的应用于多节点系统的任务处理方法,其特征在于,所述处理所述待消费任务,还包括:8. The task processing method applied to a multi-node system according to claim 7, wherein the processing the task to be consumed further comprises:若对所述待消费任务中包含的网络指令执行失败,则对所述待消费任务中已执行过的网络指令进行回滚处理,以重新处理所述待消费任务。If the execution of the network command included in the task to be consumed fails, the network command that has been executed in the task to be consumed is rolled back to reprocess the task to be consumed.9.根据权利要求6所述的应用于多节点系统的任务处理方法,其特征在于,所述方法还包括:9. The task processing method applied to a multi-node system according to claim 6, wherein the method further comprises:从所述多节点系统对应的缓存中加载所述待消费任务的标识信息所对应的任务实体二进制数据之后,将所述缓存中的任务实体二进制数据持久化到磁盘中。After loading the task entity binary data corresponding to the identification information of the task to be consumed from the cache corresponding to the multi-node system, the task entity binary data in the cache is persisted to a disk.10.根据权利要求7所述的应用于多节点系统的任务处理方法,其特征在于,所述多节点系统中的各个任务处理节点上部署有网络控制器,所述网络控制器包括任务编排管理框架,所述任务编排管理框架用于依次执行所述待消费任务中包含的网络指令。10 . The task processing method applied to a multi-node system according to claim 7 , wherein a network controller is deployed on each task processing node in the multi-node system, and the network controller includes task scheduling and management. 11 . A framework, where the task scheduling and management framework is configured to sequentially execute network instructions contained in the tasks to be consumed.11.根据权利要求10所述的应用于多节点系统的任务处理方法,其特征在于,所述网络控制器还包括远程调用客户端,所述依次执行所述待消费任务中包含的网络指令,包括:11 . The task processing method applied to a multi-node system according to claim 10 , wherein the network controller further comprises a remote invocation client, which sequentially executes the network instructions contained in the to-be-consumed tasks, 11 . include:依次将所述待消费任务中的网络指令发送至所述远程调用客户端,以使所述远程调用客户端依次向与所述网络指令对应的网络设备发送所述网络指令。The network instructions in the tasks to be consumed are sequentially sent to the remote invocation client, so that the remote invocation client sequentially sends the network instructions to the network devices corresponding to the network instructions.12.根据权利要求10所述的应用于多节点系统的任务处理方法,其特征在于,所述网络控制器还包括任务投放客户端,所述消息队列中还包含有第三任务的标识信息,所述第三任务的标识信息是由所述任务投放客户端投放至所述消息队列中的。12. The task processing method applied to a multi-node system according to claim 10, wherein the network controller further comprises a task delivery client, and the message queue also includes identification information of the third task, The identification information of the third task is put into the message queue by the task putting client.13.一种应用于多节点系统的任务处理装置,其特征在于,所述多节点系统包括多个任务处理节点,所述装置包括:13. A task processing device applied to a multi-node system, wherein the multi-node system comprises a plurality of task processing nodes, and the device comprises:监听单元,用于监听所述多个任务处理节点的工作状态;a monitoring unit, configured to monitor the working states of the multiple task processing nodes;获取单元,用于从消息队列中获取待消费任务的标识信息,所述消息队列中包含有发生宕机的第一任务处理节点未处理完成的第一任务的标识信息,所述第一任务的标识信息是由第二任务处理节点在监听到所述第一任务处理节点宕机后获取并投放至所述消息队列中的;The obtaining unit is used to obtain the identification information of the task to be consumed from the message queue, where the message queue contains the identification information of the first task that has not been processed and completed by the first task processing node that has crashed, and the identification information of the first task is The identification information is acquired by the second task processing node after monitoring that the first task processing node is down and put into the message queue;处理单元,用于根据获取到的待消费任务的标识信息,处理所述待消费任务。The processing unit is configured to process the task to be consumed according to the acquired identification information of the task to be consumed.14.一种计算机可读介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至12中任一项所述的应用于多节点系统的任务处理方法。14. A computer-readable medium on which a computer program is stored, wherein when the computer program is executed by a processor, the task applied to a multi-node system according to any one of claims 1 to 12 is realized Approach.15.一种电子设备,其特征在于,包括:15. An electronic device, comprising:一个或多个处理器;one or more processors;存储装置,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现如权利要求1至12中任一项所述的应用于多节点系统的任务处理方法。A storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement any one of claims 1 to 12 A task processing method applied to a multi-node system.
CN202110492295.9A2021-05-062021-05-06 Task processing method, device, computer readable medium and electronic deviceActiveCN113064744B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202110492295.9ACN113064744B (en)2021-05-062021-05-06 Task processing method, device, computer readable medium and electronic device

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202110492295.9ACN113064744B (en)2021-05-062021-05-06 Task processing method, device, computer readable medium and electronic device

Publications (2)

Publication NumberPublication Date
CN113064744Atrue CN113064744A (en)2021-07-02
CN113064744B CN113064744B (en)2025-07-01

Family

ID=76568065

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202110492295.9AActiveCN113064744B (en)2021-05-062021-05-06 Task processing method, device, computer readable medium and electronic device

Country Status (1)

CountryLink
CN (1)CN113064744B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN113590287A (en)*2021-07-282021-11-02百度在线网络技术(北京)有限公司Task processing method, device, equipment, storage medium and scheduling system
CN113726744A (en)*2021-08-022021-11-30南京南瑞信息通信科技有限公司Visual safety alarm processing system and method based on task arrangement
CN113839783A (en)*2021-09-272021-12-24中国工商银行股份有限公司Task processing method, device and equipment
CN113886040A (en)*2021-09-242022-01-04浙江大华技术股份有限公司Task scheduling method and device
CN114356542A (en)*2021-11-302022-04-15杭州光云科技股份有限公司Asynchronous processing method and device for mass tasks, computer equipment and storage medium
CN114880093A (en)*2022-04-292022-08-09蚂蚁区块链科技(上海)有限公司Under-link processing method and device for block chain task
CN114978892A (en)*2022-07-042022-08-30北京尽微致广信息技术有限公司Target node determination method and device
CN115314769A (en)*2022-07-282022-11-08深圳市思达仪表有限公司Remote meter reading method and device, computer equipment and medium
CN115766842A (en)*2022-11-022023-03-07成都长城开发科技股份有限公司Task execution method and device, computer readable medium and electronic equipment
CN115794802A (en)*2023-01-292023-03-14国网瑞嘉(天津)智能机器人有限公司Data real-time processing method and device, computer equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20170242735A1 (en)*2016-02-222017-08-24International Business Machines CorporationBalancing work of tasks at a sending node of a transaction server
CN110035103A (en)*2018-01-122019-07-19宁波中科集成电路设计中心有限公司A kind of transferable distributed scheduling system of internodal data
US20190268351A1 (en)*2018-02-272019-08-29Alibaba Group Holding LimitedMethod, apparatus, system, and electronic device for cross-blockchain interaction
US20190391844A1 (en)*2018-11-062019-12-26Beijing Baidu Netcom Science And Technology Co., Ltd.Task orchestration method and system
CN111290854A (en)*2020-01-202020-06-16腾讯科技(深圳)有限公司Task management method, device and system, computer storage medium and electronic equipment
WO2020238365A1 (en)*2019-05-312020-12-03深圳前海微众银行股份有限公司Message consumption method, apparatus and device, and computer storage medium
CN112667414A (en)*2020-12-232021-04-16平安普惠企业管理有限公司Message queue-based message consumption method and device, computer equipment and medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20170242735A1 (en)*2016-02-222017-08-24International Business Machines CorporationBalancing work of tasks at a sending node of a transaction server
CN110035103A (en)*2018-01-122019-07-19宁波中科集成电路设计中心有限公司A kind of transferable distributed scheduling system of internodal data
US20190268351A1 (en)*2018-02-272019-08-29Alibaba Group Holding LimitedMethod, apparatus, system, and electronic device for cross-blockchain interaction
US20190391844A1 (en)*2018-11-062019-12-26Beijing Baidu Netcom Science And Technology Co., Ltd.Task orchestration method and system
WO2020238365A1 (en)*2019-05-312020-12-03深圳前海微众银行股份有限公司Message consumption method, apparatus and device, and computer storage medium
CN111290854A (en)*2020-01-202020-06-16腾讯科技(深圳)有限公司Task management method, device and system, computer storage medium and electronic equipment
CN112667414A (en)*2020-12-232021-04-16平安普惠企业管理有限公司Message queue-based message consumption method and device, computer equipment and medium

Cited By (11)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN113590287A (en)*2021-07-282021-11-02百度在线网络技术(北京)有限公司Task processing method, device, equipment, storage medium and scheduling system
CN113590287B (en)*2021-07-282024-03-01百度在线网络技术(北京)有限公司Task processing method, device, equipment, storage medium and scheduling system
CN113726744A (en)*2021-08-022021-11-30南京南瑞信息通信科技有限公司Visual safety alarm processing system and method based on task arrangement
CN113886040A (en)*2021-09-242022-01-04浙江大华技术股份有限公司Task scheduling method and device
CN113839783A (en)*2021-09-272021-12-24中国工商银行股份有限公司Task processing method, device and equipment
CN114356542A (en)*2021-11-302022-04-15杭州光云科技股份有限公司Asynchronous processing method and device for mass tasks, computer equipment and storage medium
CN114880093A (en)*2022-04-292022-08-09蚂蚁区块链科技(上海)有限公司Under-link processing method and device for block chain task
CN114978892A (en)*2022-07-042022-08-30北京尽微致广信息技术有限公司Target node determination method and device
CN115314769A (en)*2022-07-282022-11-08深圳市思达仪表有限公司Remote meter reading method and device, computer equipment and medium
CN115766842A (en)*2022-11-022023-03-07成都长城开发科技股份有限公司Task execution method and device, computer readable medium and electronic equipment
CN115794802A (en)*2023-01-292023-03-14国网瑞嘉(天津)智能机器人有限公司Data real-time processing method and device, computer equipment and storage medium

Also Published As

Publication numberPublication date
CN113064744B (en)2025-07-01

Similar Documents

PublicationPublication DateTitle
CN113064744B (en) Task processing method, device, computer readable medium and electronic device
US10931599B2 (en)Automated failure recovery of subsystems in a management system
CN112073269B (en)Block chain network testing method, device, server and storage medium
CN102346460B (en)Transaction-based service control system and method
US20200241613A1 (en)Persistent reservations for virtual disk using multiple targets
JP6499986B2 (en) Fault tolerant batch processing
JP5258019B2 (en) A predictive method for managing, logging, or replaying non-deterministic operations within the scope of application process execution
US8381015B2 (en)Fault tolerance for map/reduce computing
CN113886089B (en)Task processing method, device, system, equipment and medium
CN104583960B (en) Transaction-level health monitoring for online services
CN113569987A (en) Model training method and device
WO2018103318A1 (en)Distributed transaction handling method and system
EP3765982B1 (en)Autonomous cross-scope secrets management
US12105735B2 (en)Asynchronous accounting method and apparatus for blockchain, medium and electronic device
CN109886693B (en)Consensus realization method, device, equipment and medium for block chain system
CN109117252B (en)Method and system for task processing based on container and container cluster management system
CN111464589A (en)Intelligent contract processing method, computer equipment and storage medium
CN114189439A (en) Method and device for automatic capacity expansion
CN115834600B (en)Multi-cloud nanotube data synchronization method and device, electronic equipment and storage medium
CN115686813A (en)Resource scheduling method and device, electronic equipment and storage medium
CN111666132A (en)Distributed transaction implementation method, device, computer system and readable storage medium
HK40048684A (en)Task processing method, device, computer readable medium and electronic equipment
US8180846B1 (en)Method and apparatus for obtaining agent status in a network management application
CN114598700A (en)Communication method and communication system
CN117493030B (en)Cluster system business flow control method and system based on distributed lock

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
REGReference to a national code

Ref country code:HK

Ref legal event code:DE

Ref document number:40048684

Country of ref document:HK

SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp