Disclosure of Invention
The invention aims to overcome the defects in the prior art, provides a combat network self-adaptive combination method, device, equipment and medium, and solves the technical problem that the traditional manual decision based on commander experience is difficult to quickly respond to combat tasks, so that the combat network capacity is reduced.
In order to achieve the above purpose, the invention is realized by adopting the following technical scheme:
in a first aspect, the present invention provides a method for adaptive combining of a combat network, including:
acquiring a control node, a investigation node, a hit node and a target node;
connecting all nodes by adopting directed edges representing the dependency relationship to construct a decision space network;
constructing a fight chain aiming at each target node, and combining each fight chain to construct a fight network of the target node;
calculating and summing the combat capability of each combat chain to obtain the combat capability of the combat network;
constructing a Markov decision process according to the decision space network and the combat network;
and constructing a Belman optimal equation of the Markov decision process, and solving and obtaining a combined result.
Optionally, the directed edge representing the dependency includes:
directed edges from a scout node to another scout node or control node, directed edges from a control node to another control node or hit node, directed edges from a hit node to a target node.
Optionally, the combat chain comprises a investigation node, a control node, a striking node and a target node which are sequentially connected through directed edges;
the combat capability of the combat chain is as follows:
wherein E isOC (lj ) For the j-th combat chain lj Is the combat ability of(s)j 、dj 、ij 、tj Respectively fight chain lj In (a) a detection node, a control node, a hit node, and a target node, OS (sj )、OD (dj )、OT (tj ) Respectively, scout nodes sj Control node dj Striking node ij Capability value of OT (tj ) For the target node tj The value of the degree of damage, tj ∈T;
The combat capability of the combat network is as follows:
wherein E isN (G) To combat the combat capability of the combat network G.
Optionally, the dependency relationship between the nodes connected by the directed edges satisfies:
Oj =min(SOD_Oj ,COD_Oj )
SOD_Oj =Average(SOD_Oj1 ,SOD_Oj2 ,…,SOD_Ojn )
SOD_Oji =αij Oi +(1-αij )SEj
COD_Oj =min(COD_Oj1 ,COD_Oj2 ,…,COD_Ojn )
COD_Oj =Oi +βij
wherein O isi 、Oj For the running performance of the nodes i and j, average is an Average function, alphaij 、βij The dependent strength SOD and dependent critical COD, SE of the nodes i, j respectivelyj Active performance for node j;
wherein, when node i is a investigation node, a control node or a hit node, the operation performance Oi A capability value for node i; when the node i is the target node, the operation performance Oi Is the value of the damage degree of the node i.
Optionally, the constructing a markov decision process includes:
taking nodes and directed edges in a combat network as states and marking the states as Gt =(Nt ,Et ) Wherein G ist 、Nt 、Et The method comprises the steps of respectively obtaining a combat network at a moment t, a node set and a directed edge set corresponding to the combat network;
taking a directed edge in a combat network as a removable action, taking a directed edge capable of being connected with a node in the combat network in a decision space network as an addable action, taking the removable action and the addable action as decision actions, and marking as xt =(nt ,et ) Wherein x ist 、et 、nt The decision action at the moment t and the directed edge and the node corresponding to the decision action are respectively; constructing a decision action space according to the decision action;
selecting and executing a decision action from the decision action space, taking the change of the combat capability of the combat network after execution as a return value, and recording the return value as delta Ct+1 =EN (Gt+1 )-EN (Gt ) Wherein E isN (Gt+1 )、EN (Gt ) The combat capabilities of the combat network at the moments t+1 and t respectively;
taking the state of the combat network after execution as state transition, and marking the state as Gt+1 (Nt+1 ,Et+1 )=Gt (Nt ±nt ,Et+1 ±et );
And constructing a Markov decision process according to the states, the decision actions, the return values and the state transitions.
Optionally, the constructing the bellman optimal equation of the markov decision process includes:
initializing a decision action sequence which is expected to be executed: { x1 ,x2 ,…xt …,xk -wherein k is the total number of times;
constructing an objective function with a maximum return value based on the decision action sequence:
wherein DeltaCt The return value is the time t;
and (3) expanding and pouring the objective function to obtain a Belman optimal equation:
wherein, gamma is a discount factor, gamma is E (0, 1)];ΔCt+1 For the return value at time t+1, Vt (Gt ,xt ) Is a state-decision action cost function.
Optionally, the solving to obtain the combined result includes:
updating a Belman optimal equation by adopting a time sequence difference method:
Vt (Gt ,xt )←Vt (Gt ,xt )+ηδt
wherein eta is the update step length deltat As error, deltat =ΔCt+1 +γVt+1 (Gt+1 ,xt+1 )-Vt (Gt ,xt );
The decision action is selected according to epsilon-greedy strategy, and V is selected by probability epsilont (Gt ,xt ) The decision action with the largest value is randomly selected according to the probability 1-epsilon;
selecting Vt (Gt ,xt ) Maximum value decision action xt As an optimal decision action
From initial state G based on state transition0 According to the optimal decision actionForward transferring to generate an optimal decision action sequence;
and obtaining the combat network combination result at each moment according to the execution result of the optimal decision action sequence.
In a second aspect, the present invention provides an adaptive assembly apparatus for a combat network, the apparatus comprising:
the node acquisition module is used for acquiring control nodes, investigation nodes, hit nodes and target nodes;
the space construction module is used for connecting the nodes by adopting directed edges representing the dependency relationship to construct a decision space network;
the network construction module is used for constructing a combat chain aiming at each target node and combining each combat chain to construct a combat network of the target node;
the capacity calculation module is used for calculating and summing the combat capacity of each combat chain to obtain the combat capacity of the combat network;
the process construction module is used for constructing a Markov decision process according to the decision space network and the combat network;
and the process solving module is used for constructing a Belman optimal equation of the Markov decision process and solving and obtaining a combined result.
In a third aspect, the present invention provides an electronic device, including a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is operative according to the instructions to perform steps according to the method described above.
In a fourth aspect, the present invention provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the above method.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a combat network self-adaptive combination method, device, equipment and medium, which combines a function-dependent network analysis method to establish a decision action space and bring cascade effect into a system combat capability evaluation index; solving an optimal strategy function solution aiming at a target node, and drawing out a combat network with maximum combat capability; when the weapon equipment encounters serious faults or damages, a new combat network is reconfigured and combat capability is partially restored, so that the elasticity and flexibility of the combat network are improved; and the method can provide reference for mosaic combat design and planning.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.
Embodiment one:
as shown in fig. 1, the present invention provides a combat network adaptive combining method, which includes:
1. acquiring a control node, a investigation node, a hit node and a target node.
2. Connecting all nodes by adopting directed edges representing the dependency relationship to construct a decision space network;
2.1, directed edges representing dependencies include:
directed edges from a scout node to another scout node or control node, directed edges from a control node to another control node or hit node, directed edges from a hit node to a target node;
2.2, the dependency relationship between nodes connected by directed edges satisfies:
Oj =min(SOD_Oj ,COD_Oj )
SOD_Oj =Average(SOD_Oj1 ,SOD_Oj2 ,…,SOD_Ojn )
SOD_Oji =αij Oi +(1-αij )SEj
COD_Oj =min(COD_Oj1 ,COD_Oj2 ,…,COD_Ojn )
COD_Oj =Oi +βij
wherein O isi 、Oj For the running performance of the nodes i and j, average is an Average function, alphaij 、βij The dependent strength SOD and dependent critical COD, SE of the nodes i, j respectivelyj Is a nodej active performance;
wherein, when node i is a investigation node, a control node or a hit node, the operation performance Oi A capability value for node i; when the node i is the target node, the operation performance Oi Is the value of the damage degree of the node i.
3. Constructing a fight chain aiming at each target node, and combining each fight chain to construct a fight network of the target node;
and 3.1, the combat chain comprises a investigation node, a control node, a hit node and a target node which are sequentially connected through directed edges.
4. Calculating and summing the combat capability of each combat chain to obtain the combat capability of the combat network;
4.1, the combat capability of the combat chain is as follows:
wherein E isOC (lj ) For the j-th combat chain lj Is the combat ability of(s)j 、dj 、ij 、tj Respectively fight chain lj In (a) a detection node, a control node, a hit node, and a target node, OS (sj )、OD (dj )、OT (tj ) Respectively, scout nodes sj Control node dj Striking node ij Capability value of OT (tj ) For the target node tj The value of the degree of damage, tj ∈T;
4.2, the combat capability of the combat network is as follows:
wherein E isN (G) To combat the combat capability of the combat network G.
5. Constructing a Markov decision process according to the decision space network and the combat network;
the construction of the markov decision process includes:
taking nodes and directed edges in a combat network as states and marking the states as Gt =(Nt ,Et ) Wherein G ist 、Nt 、Et The method comprises the steps of respectively obtaining a combat network at a moment t, a node set and a directed edge set corresponding to the combat network;
taking a directed edge in a combat network as a removable action, taking a directed edge capable of being connected with a node in the combat network in a decision space network as an addable action, taking the removable action and the addable action as decision actions, and marking as xt =(nt ,et ) Wherein x ist 、et 、nt The decision action at the moment t and the directed edge and the node corresponding to the decision action are respectively; constructing a decision action space according to the decision action;
selecting and executing a decision action from the decision action space, taking the change of the combat capability of the combat network after execution as a return value, and recording the return value as delta Ct+1 =EN (Gt+1 )-EN (Gt ) Wherein E isN (Gt+1 )、EN (Gt ) The combat capabilities of the combat network at the moments t+1 and t respectively;
taking the state of the combat network after execution as state transition, and marking the state as Gt+1 (Nt+1 ,Et+1 )=Gt (Nt ±nt ,Et+1 ±et );
And constructing a Markov decision process according to the states, the decision actions, the return values and the state transitions.
6. And constructing a Belman optimal equation of the Markov decision process, and solving and obtaining a combined result.
6.1, constructing a Belman optimal equation of a Markov decision process, wherein the Belman optimal equation comprises the following steps:
initializing a decision action sequence which is expected to be executed: { x1 ,x2 ,…xt …,xk -wherein k is the total number of times;
constructing an objective function with a maximum return value based on the decision action sequence:
wherein DeltaCt The return value is the time t;
and (3) expanding and pouring the objective function to obtain a Belman optimal equation:
wherein, gamma is a discount factor, gamma is E (0, 1)];ΔCt+1 For the return value at time t+1, Vt (Gt ,xt ) Is a state-decision action cost function.
And 6.2, solving to obtain a combined result comprises the following steps:
updating a Belman optimal equation by adopting a time sequence difference method:
Vt (Gt ,xt )←Vt (Gt ,xt )+ηδt
wherein eta is the update step length deltat As error, deltat =ΔCt+1 +γVt+1 (Gt+1 ,xt+1 )-Vt (Gt ,xt );
The decision action is selected according to epsilon-greedy strategy, and V is selected by probability epsilont (Gt ,xt ) The decision action with the largest value is randomly selected according to the probability 1-epsilon;
selecting Vt (Gt ,xt ) Maximum value decision action xt As an optimal decision action
From initial state G based on state transition0 According to the optimal decision actionForward transferring to generate an optimal decision action sequence;
and obtaining the combat network combination result at each moment according to the execution result of the optimal decision action sequence.
As shown in fig. 2, assuming a decision space network formed by 104 nodes and 732 sides, the unidirectional arrow between the nodes is used as a directed side representing the dependency relationship, and the node types and the number are divided into: scout node s= {1,2, …,52}, command node d= {53,54, …,68}, hit node i= {69,70, …,99}, target node t= {100,101, …,104};
from initial state G0 Starting state transition, and selecting decision action by using epsilon-greedy strategy; setting the maximum number of nodes of the combat network as kN The method comprises the steps of carrying out a first treatment on the surface of the The algorithm operation parameters are set to learning rate eta=0.01, discount coefficient gamma=0.9, greedy strategy probability epsilon=0.75, and maximum node number kN =10; setting the operational capacity profit delta C of the decision actiont <When 0 or when the number of nodes is at most kN >10, a round of training ending mark; the combat network performs interactive training with the decision space, and the obtained return value is used for updating the strategy function; the strategy function is trained to reach a convergence state, in this example a training round number of 3000 rounds.
Selecting initial state G0 According to an optimal strategy function V as a starting pointt (Gt ,xt ) The decision action with the largest selection function value continuously carries out state transition until the number k of nodes of the combat networkN Or the operational capacity benefit DeltaC of the decision actiont <And 0, stopping state transition, wherein the state transition time is the optimal combat network state. The optimal combat network for the target node 103 in this example is shown in fig. 3, and the corresponding combat capability is 125658;
to embody the flexibility of the proposed method, a random node attack is performed on fig. 2, the attacked node and all the relevant edges thereof are removed, and the removed edges cannot be selected by a decision action; under the random node attack strategy, nodes 80 and 85 of the combat network in FIG. 3 areAfter attack, the attack is removed, and the combat capability is reduced to 87880; then from the initial state G0 According to an optimal strategy function V as a starting pointt (Gt ,xt ) State transition is carried out, but the removed edge is not selected in the process; the recombined combat network is shown in fig. 4, and the combat capability is 103250; the recovery rate of lost combat ability is 68.59%.
Embodiment two:
the embodiment of the invention provides a combat network self-adaptive combination device, which comprises:
the node acquisition module is used for acquiring control nodes, investigation nodes, hit nodes and target nodes;
the space construction module is used for connecting the nodes by adopting directed edges representing the dependency relationship to construct a decision space network;
the network construction module is used for constructing a combat chain aiming at each target node and combining each combat chain to construct a combat network of the target node;
the capacity calculation module is used for calculating and summing the combat capacity of each combat chain to obtain the combat capacity of the combat network;
the process construction module is used for constructing a Markov decision process according to the decision space network and the combat network;
and the process solving module is used for constructing a Belman optimal equation of the Markov decision process and solving and obtaining a combined result.
Embodiment III:
based on the first embodiment, the embodiment of the invention provides electronic equipment, which comprises a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is operative according to the instructions to perform steps according to the method described above.
Embodiment four:
based on the first embodiment, the embodiment of the present invention provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the above method.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and variations could be made by those skilled in the art without departing from the technical principles of the present invention, and such modifications and variations should also be regarded as being within the scope of the invention.