Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
It should be noted that the terms "comprises" and "comprising," and any variations thereof, in the description and claims of this application and the above-described drawings, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Referring to fig. 1, an embodiment of the present invention provides a specific implementation manner of a distributed control method applied to space measurement and control software, and the method specifically includes the following steps:
step 100: and communicating with a plurality of space measurement and control software.
It can be known from the background art that the aerospace measurement and control software in the prior art are independent, and can not communicate with each other, and along with the rapid development of hardware technology, the hardware configuration of each aerospace measurement and control software is greatly improved, which causes the waste of hardware resources corresponding to some aerospace measurement and control software in some time periods in the past, and the excessive shortage of other corresponding hardware resources.
Step 200: and responding to the task request of the space measurement and control software, and performing master-slave role distribution on the plurality of space measurement and control software to generate a first master-slave role distribution result.
In order to avoid the control node becoming a single point of failure, the control node of the distributed stream computing platform adopts a master-slave backup mechanism. The master-slave control node coordination module is mainly used for role judgment, heartbeat keep-alive and angle of the master-slave dual control node
Color switching, see table 1.
Table 1 master slave control node role adjudication table
The Master control node and the slave control node are located in two different physical machines during system installation, are respectively configured to have two roles of Master and slave during system initial configuration, and are configured to communicate TCP listening socket information of another control node; the master control node and the slave control node are started out in no sequence, and when the master control node and the slave control node are started, a registration message is sent to the other control node, and the role of the local machine is notified; the control node A which is started first receives a registration message sent by the control node B which is started later, the control node A carries out role arbitration according to the registration message and the role of the control node B (refer to table 1), and then replies the arbitrated role to the control node B; after the role is arbitrated, the master control node and the slave control node execute different initialization processes. The main control node becomes the only central node of the system and is responsible for metadata management and resource allocation scheduling of the distributed platform, receiving and replying a request of a management interface and managing a computing node cluster. The slave control node only keeps connection with the master control node and detects the state of the master control node; the slave control node replies a redirection message when receiving a request or registration information of the computing node and the management interface, so that the slave control node can be correctly connected with the master control node to obtain services; the slave control node sends a heartbeat message to the master control node at regular time after the initialization process is finished; the slave control node can monitor the HEARTBEAT reply sent by the MASTER control node all the time, and when the slave control node does not receive the HEARTBEAT reply after exceeding the time length of the MASTER _ HEARTBEAT _ TIMEOUT configured by the user, the slave control node determines that the MASTER control node is abnormal, switches to a 'MASTER' role, and informs a daemon process of the physical node where the MASTER control node is located to kill the MASTER control node process and restart the MASTER control node process.
Step 300: and distributing the tasks to the plurality of space measurement and control software according to the first master-slave role distribution result.
Specifically, the application software allocates the node task according to the first master-slave role allocation result and the software logic according to the distributed coordination service. On the other hand, the distributed service multiple nodes work to ensure the high availability and high reliability of the distributed coordination service, and the distributed coordination service is efficiently fed back to the service software cluster information in real time.
As can be seen from the above description, the distributed control method applied to the space measurement and control software provided by the embodiment of the present invention firstly communicates with a plurality of space measurement and control software; secondly, responding to a task request of the space measurement and control software, and performing master-slave role distribution on the plurality of space measurement and control software to generate a first master-slave role distribution result so as to generate a first master-slave role distribution result; and finally, distributing tasks to a plurality of space measurement and control software according to the first master-slave role distribution result. The invention realizes software clustering by calling service, realizes software master-slave queue relationship maintenance and dynamic adjustment, and adjusts software master-slave state according to the corresponding sequence of plan or service software starting and on-line condition.
In an embodiment, referring to fig. 2, the distributed regulation and control method applied to the space flight measurement and control software further includes:
step 400: and sending heartbeat messages to the plurality of space measurement and control software.
It is understood that the term "heartbeat information" instep 400 refers to information periodically transmitted between the main memory database cluster and each device during the monitoring of the industrial equipment, so as to determine the health condition of the device and determine whether the other device is "alive". Specifically, the space flight measurement and control software sends a heartbeat to the memory database cluster, and the memory database cluster sends a heartbeat response to the space flight measurement and control software, so that a complete handshake of the space flight measurement and control software memory database cluster is formed, both sides know that the connection between the space flight measurement and control software memory database cluster is not disconnected, and the space flight measurement and control software is on-line. If the time exceeds a threshold value, the aerospace measurement and control software does not receive the response of the memory database cluster, or the memory database cluster does not receive the heartbeat of the aerospace measurement and control software, the aerospace measurement and control software is disconnected from the memory database cluster to reestablish a connection, and the memory database cluster only needs to be disconnected.
In one embodiment, referring to fig. 3, step 200 further comprises:
step 201: and performing master-slave role distribution on the plurality of space measurement and control software according to the task request and the heartbeat information.
Specifically, the Master of the Master control node is started and initialized; then, starting up and initializing the slave control node SecondardyMaster; (note: step 1, step 2 are not consecutive, the control node started later sends registration information to the control node started first, and set up the heartbeat keep-alive connection between the control nodes;) every computational node monitors the process Tracker and starts the initialization in the computational node cluster; a computing node monitoring process acquires metadata such as hardware resource information of a physical computer, registers the metadata with a main control node and reports the hardware resource information; and then, sending a task allocation instruction to the Master control node Master according to the hardware resource information.
In an embodiment, referring to fig. 4, the distributed regulation and control method applied to the space flight measurement and control software further includes:
step 500: detecting the states of the plurality of space measurement and control software according to the heartbeat information;
step 600: and when the state is changed, re-election is carried out to generate a second master-slave role distribution result.
Instep 500 and step 600, master-slave state information is judged according to the heartbeat, and if the state changes; then re-election operation is executed to regenerate the master-slave role assignment result. Specifically, in the cluster initialization stage, when one space measurement and control software node Server1 is started, it cannot perform and complete Leader election alone, when the second space measurement and control software node Server2 is started, two machines can communicate with each other at this time, each machine tries to find the Leader, and then enters the election process. The election process is as follows:
(1) each Server sends out a vote, and because the initial condition is, both the Server1 and the Server2 can vote by taking the Server1 and the Server2 as Leader space measurement and control software nodes. Each space flight measurement and control software node sends voting information to other space flight measurement and control software nodes, wherein the voting information comprises a SID (identifier) and a ZXID (identifier), and the SID is the unique identifier (myid) of the machine; ZXID is a transaction ID, which is 64 bits, separated into upper 32 bits and lower 32 bits.
(2) Since ZXID is the same at the time of the initial voting, SID is compared, and the larger SID, the larger the possibility of Leader acquisition.
(3) After two space measurement and control software nodes send out own voting information, whether the voting information of the two space measurement and control software nodes is changed or not is determined according to the received voting information of other space measurement and control software nodes, the SID of the first space measurement and control software node is 1, the SID of the second space measurement and control software node is 2, the voting of the Server2 is changed into 2, namely, two votes exist, and the determined Leader is the Server2 because of the total three space measurement and control software nodes, at the moment, the Server2 is more than half of the total space measurement and control software nodes; (half vote) even if the Server3 starts, since the Leader has already decided, it is not necessary to vote, and the Server3 only needs to establish a connection with the Leader and perform state synchronization.
Particularly, the Leader goes down suddenly and elects again, for example, 5 space flight monitoring and control software nodes exist at this time, and the Server3 is elected as the Leader, and the sudden Leader (Server3) goes down, so that other four space flight monitoring and control software nodes enter a logging state to reselect at this time. During run time, their ZXIDs may not be the same, so a new round of Leader election is needed to compare not only the SID, but also ZXID, the larger ZXID the larger the likelihood of electing a Leader. In the initial election, a conclusion can be drawn that the SID is in the middle and the Leader is most likely to be elected. However, when the Leader goes down suddenly during the operation and elects again, the conclusion is not applicable, and the Leader may be selected to be the Server1, the Server2, or the Server4 or the Server 5. It is worth noting that the larger ZXID just said, the larger the possibility of electing Leader, and the earlier said ZXID is divided into 32 upper bits and 32 lower bits, where red is noted: in ZXID, the lower 32 bits are more likely to result in a Leader when the lower 32 bits are smaller. After voting is sent out by each space measurement and control software node, the space measurement and control software node is also subjected to election of other machines, and each machine processes received voting information of other machines according to a certain rule and compares the voting information with the space measurement and control software node.
From the above description, it can be seen that the distributed regulation and control method applied to the space flight measurement and control software provided by the embodiment of the invention can synchronously support various large-scale space task joint debugging tests or task execution, and has a good effect in the application process. During task execution, for example: the service is used by software such as telemetering calculation, information receiving and transmitting statistics, track calculation, a database, platform monitoring and the like, so that the clustering processing capacity is expanded.
In order to further explain the scheme, the invention further provides a specific application example of the distributed regulation and control method applied to the space flight measurement and control software, which specifically includes the following contents, and refer to fig. 5.
Step S1: and building a distributed coordination service end program according to the task requirement.
Step S2: and the application software calls the initialization service to establish connection with the memory database cluster.
Step S3: and the application software sends the heartbeat message to the memory database cluster.
Step S4: the task software automatically generates a heartbeat thread and establishes a channel with the distributed coordination service, and the distributed coordination service distributes the roles in the cluster according to an algorithm.
Specifically, referring to fig. 6, first, service initialization is performed, filling in parameters; electing a master and a slave machines according to the initialization parameters, and starting a heartbeat sending thread; judging master-slave state information according to the heartbeat, wherein the state changes; judging master-slave state information according to the heartbeat, wherein the state is not changed; and if the state judgment returns information, re-election state judgment returning information is executed, and a master-slave result is generated according to the original state. The data dictionary referred to in fig. 6 includes:
1: executing initialization parameters;
2: initializing election master and slave machine parameters;
3: starting heartbeat sending thread parameters;
4: judging state information;
5: status judgment information;
6: returning information is judged by the state;
7: returning information is judged by the state;
8: generating a master-slave result;
9: generating a master-slave result;
10: informing a result;
in addition, in step S4, a distributed global lock is designed by using the memory database, the member contents in the cluster are determined according to the automatic zero clearing characteristic of data overtime, and are sorted according to the entering order, with the first bit being the main.
Step S5: and the application software allocates the node tasks according to the software logic according to the calling of the distributed coordination service.
Specifically, referring to fig. 7, role a within the same cluster group invokes a service library parameter; the cluster processes election information and obtains master-slave information according to a certain algorithm; the method comprises the steps of recording and maintaining cluster information by a plurality of nodes of a redundant backup memory database cluster, and calling service library parameters by a role B in the same cluster group. The data dictionary referred to in fig. 7 includes:
1: the role A calls a service library parameter;
2: processing election information by a cluster;
3: the memory database cluster records and maintains cluster information;
4: the role B calls the parameters of the service library;
as can be seen from the above description, the distributed control method and apparatus applied to space measurement and control software provided in the embodiments of the present invention first communicate with a plurality of space measurement and control software; secondly, responding to a task request of the space measurement and control software, and performing master-slave role distribution on the plurality of space measurement and control software to generate a first master-slave role distribution result so as to generate a first master-slave role distribution result; and finally, distributing tasks to a plurality of space measurement and control software according to the first master-slave role distribution result. The distributed service multiple nodes work to ensure the high availability and high reliability of the distributed coordination service, and the distributed coordination service is efficiently fed back to the service software cluster information in real time.
In summary, the present invention provides a highly available and highly reliable distributed software coordination service technology, which achieves software clustering by invoking services, achieves software master-slave queue relationship maintenance and dynamic adjustment, and adjusts software master-slave states according to a corresponding sequence of plan or service software start and an online condition. The distributed software coordination service software has the main functions of generating and modifying distributed software coordination service information and recording and storing the information. The distributed software coordination service software needs to have the distributed information maintenance function of the single target, and also needs to have the distributed information maintenance capability of different targets under the same task, and the distributed coordination information maintenance function of common configuration information of a plurality of targets or tasks. Specifically, the invention has the following beneficial effects:
1. distributed master-slave election rules are provided.
The service realizes the distribution of the roles of the nodes in the cluster according to a first-come-first-serve principle and a state comprehensive election algorithm based on software heartbeat sending.
2. Cluster state records based on the in-memory database.
The distributed information record maintenance of the software cluster is realized by building a redundant backup mechanism of a plurality of service nodes. The redundant backup mechanism of a plurality of service nodes ensures the high availability of the service, and the failure of the distributed coordination service can not be caused by the disconnection of a single or a small number of service nodes.
Based on the same inventive concept, the embodiment of the present application further provides a distributed control device applied to space measurement and control software, which can be used to implement the method described in the above embodiment, such as the following embodiments. Because the principle of solving the problems of the distributed regulation and control device applied to the space flight measurement and control software is similar to that of the distributed regulation and control method applied to the space flight measurement and control software, the implementation of the distributed regulation and control device applied to the space flight measurement and control software can be referred to the implementation of the distributed regulation and control method applied to the space flight measurement and control software, and repeated parts are not described again. As used hereinafter, the term "unit" or "module" may be a combination of software and/or hardware that implements a predetermined function. While the system described in the embodiments below is preferably implemented in software, implementations in hardware, or a combination of software and hardware are also possible and contemplated.
The embodiment of the present invention provides a specific implementation manner of a distributed control device applied to space flight measurement and control software, which can implement a distributed control method applied to space flight measurement and control software, and referring to fig. 8, the distributed control device applied to space flight measurement and control software specifically includes the following contents:
thesoftware communication unit 10 is used for communicating with a plurality of space measurement and control software;
therole distribution unit 20 is configured to perform master-slave role distribution on the plurality of space measurement and control software in response to a task request of the space measurement and control software to generate a first master-slave role distribution result;
and thetask allocation unit 30 is configured to allocate the tasks to the plurality of space measurement and control software according to the first master-slave role allocation result.
In an embodiment, referring to fig. 9, the distributed control apparatus applied to the space measurement and control software further includes: and the heartbeatinformation sending unit 40 is used for sending heartbeat messages to the plurality of space measurement and control software.
In an embodiment, therole allocation unit 20 is specifically configured to perform master-slave role allocation on the plurality of space measurement and control software according to the task request and the heartbeat information.
In an embodiment, referring to fig. 10, the distributed control apparatus applied to the space measurement and control software further includes:
thestate monitoring unit 50 is used for detecting the states of the plurality of space measurement and control software according to the heartbeat information;
and are-election unit 60, configured to perform re-election to generate a second master-slave role assignment result.
As can be seen from the above description, the distributed control device applied to the space measurement and control software provided in the embodiment of the present invention first communicates with a plurality of space measurement and control software; secondly, responding to a task request of the space measurement and control software, and performing master-slave role distribution on the plurality of space measurement and control software to generate a first master-slave role distribution result so as to generate a first master-slave role distribution result; and finally, distributing tasks to a plurality of space measurement and control software according to the first master-slave role distribution result. The invention realizes software clustering by calling service, realizes software master-slave queue relationship maintenance and dynamic adjustment, and adjusts software master-slave state according to the corresponding sequence of plan or service software starting and on-line condition.
The apparatuses, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or implemented by a product with certain functions. A typical implementation device is an electronic device, which may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
In a typical example, the electronic device specifically includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the program, the processor implements the steps of the distributed regulation and control method applied to the space measurement and control software, where the steps include:
step 100: communicating with a plurality of space measurement and control software;
step 200: responding to a task request of the space measurement and control software, and performing master-slave role distribution on the plurality of space measurement and control software to generate a first master-slave role distribution result;
step 300: and distributing the tasks to the plurality of space measurement and control software according to the first master-slave role distribution result.
Referring now to FIG. 11, shown is a schematic diagram of anelectronic device 600 suitable for use in implementing embodiments of the present application.
As shown in fig. 11, theelectronic apparatus 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate works and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from astorage section 608 into a Random Access Memory (RAM)) 603. In the RAM603, various programs and data necessary for the operation of thesystem 600 are also stored. The CPU601, ROM602, and RAM603 are connected to each other via abus 604. An input/output (I/O)interface 605 is also connected tobus 604.
The following components are connected to the I/O interface 605: aninput portion 606 including a keyboard, a mouse, and the like; anoutput portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; astorage section 608 including a hard disk and the like; and acommunication section 609 including a network interface card such as a LAN card, a modem, or the like. Thecommunication section 609 performs communication processing via a network such as the internet. Thedriver 610 is also connected to the I/O interface 605 as needed. Aremovable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on thedrive 610 as necessary, so that a computer program read out therefrom is mounted as necessary on thestorage section 608.
In particular, according to an embodiment of the present invention, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, an embodiment of the present invention includes a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the above-mentioned distributed regulation and control method applied to space flight control software, where the steps include:
step 100: communicating with a plurality of space measurement and control software;
step 200: responding to a task request of the space measurement and control software, and performing master-slave role distribution on the plurality of space measurement and control software to generate a first master-slave role distribution result;
step 300: and distributing the tasks to the plurality of space measurement and control software according to the first master-slave role distribution result.
In such an embodiment, the computer program may be downloaded and installed from a network through thecommunication section 609, and/or installed from theremovable medium 611.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.