Movatterモバイル変換


[0]ホーム

URL:


CN116489193A - Combat network self-adaptive combination method, device, equipment and medium - Google Patents

Combat network self-adaptive combination method, device, equipment and medium
Download PDF

Info

Publication number
CN116489193A
CN116489193ACN202310487406.6ACN202310487406ACN116489193ACN 116489193 ACN116489193 ACN 116489193ACN 202310487406 ACN202310487406 ACN 202310487406ACN 116489193 ACN116489193 ACN 116489193A
Authority
CN
China
Prior art keywords
combat
node
network
decision
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310487406.6A
Other languages
Chinese (zh)
Other versions
CN116489193B (en
Inventor
张婷婷
孙云鹏
陈岩
李辉
肖春霞
宦蕾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
PLA University of Science and Technology
Original Assignee
PLA University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by PLA University of Science and TechnologyfiledCriticalPLA University of Science and Technology
Priority to CN202310487406.6ApriorityCriticalpatent/CN116489193B/en
Publication of CN116489193ApublicationCriticalpatent/CN116489193A/en
Application grantedgrantedCritical
Publication of CN116489193BpublicationCriticalpatent/CN116489193B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

The invention discloses a combat network self-adaptive combination method, device, equipment and medium, wherein the method comprises the following steps: acquiring a control node, a investigation node, a hit node and a target node; connecting all nodes by adopting directed edges representing the dependency relationship to construct a decision space network; constructing a fight chain aiming at each target node, and combining each fight chain to construct a fight network of the target node; calculating and summing the combat capability of each combat chain to obtain the combat capability of the combat network; constructing a Markov decision process according to the decision space network and the combat network; constructing a Belman optimal equation of a Markov decision process, and solving and obtaining a combined result; the invention can adapt to complex combat environment and improve the elasticity and flexibility of combat network.

Description

Combat network self-adaptive combination method, device, equipment and medium
Technical Field
The invention relates to a combat network self-adaptive combination method, device, equipment and medium, belonging to the technical field of system management.
Background
With the rapid development of network information technology and military technology, the scale and style of war have changed deeply. The traditional centralized command control mode is difficult to adapt to a battlefield with high dynamic and high uncertainty. In this regard, the strategic technical office of the advanced research planning agency of the united states department of defense presents a "mosaic" combat concept that is expected to be rapidly integrated into a sensor network, multi-domain command control system, by utilizing a low cost, low complexity autonomous system, thereby imposing complexity on an adversary. In mosaic battles, dispersed manned or unmanned battles on the battlefield may be dynamically assembled into an elastic battle network through communication.
However, the combined decision making of the warfare network is a difficult problem due to cascading effects resulting from interdependencies between heterogeneous weapon systems, and capturing and quantifying such dependencies is critical to maximizing the warfare network's capacity. High uncertainty battlefield environments can cause damage to the battlefield network, such as the decline of the battlefield network capability caused by major faults or damages of weapon equipment, and traditional manual decisions based on commander experience are difficult to quickly respond to battlefield tasks.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, provides a combat network self-adaptive combination method, device, equipment and medium, and solves the technical problem that the traditional manual decision based on commander experience is difficult to quickly respond to combat tasks, so that the combat network capacity is reduced.
In order to achieve the above purpose, the invention is realized by adopting the following technical scheme:
in a first aspect, the present invention provides a method for adaptive combining of a combat network, including:
acquiring a control node, a investigation node, a hit node and a target node;
connecting all nodes by adopting directed edges representing the dependency relationship to construct a decision space network;
constructing a fight chain aiming at each target node, and combining each fight chain to construct a fight network of the target node;
calculating and summing the combat capability of each combat chain to obtain the combat capability of the combat network;
constructing a Markov decision process according to the decision space network and the combat network;
and constructing a Belman optimal equation of the Markov decision process, and solving and obtaining a combined result.
Optionally, the directed edge representing the dependency includes:
directed edges from a scout node to another scout node or control node, directed edges from a control node to another control node or hit node, directed edges from a hit node to a target node.
Optionally, the combat chain comprises a investigation node, a control node, a striking node and a target node which are sequentially connected through directed edges;
the combat capability of the combat chain is as follows:
wherein E isOC (lj ) For the j-th combat chain lj Is the combat ability of(s)j 、dj 、ij 、tj Respectively fight chain lj In (a) a detection node, a control node, a hit node, and a target node, OS (sj )、OD (dj )、OT (tj ) Respectively, scout nodes sj Control node dj Striking node ij Capability value of OT (tj ) For the target node tj The value of the degree of damage, tj ∈T;
The combat capability of the combat network is as follows:
wherein E isN (G) To combat the combat capability of the combat network G.
Optionally, the dependency relationship between the nodes connected by the directed edges satisfies:
Oj =min(SOD_Oj ,COD_Oj )
SOD_Oj =Average(SOD_Oj1 ,SOD_Oj2 ,…,SOD_Ojn )
SOD_Oji =αij Oi +(1-αij )SEj
COD_Oj =min(COD_Oj1 ,COD_Oj2 ,…,COD_Ojn )
COD_Oj =Oiij
wherein O isi 、Oj For the running performance of the nodes i and j, average is an Average function, alphaij 、βij The dependent strength SOD and dependent critical COD, SE of the nodes i, j respectivelyj Active performance for node j;
wherein, when node i is a investigation node, a control node or a hit node, the operation performance Oi A capability value for node i; when the node i is the target node, the operation performance Oi Is the value of the damage degree of the node i.
Optionally, the constructing a markov decision process includes:
taking nodes and directed edges in a combat network as states and marking the states as Gt =(Nt ,Et ) Wherein G ist 、Nt 、Et The method comprises the steps of respectively obtaining a combat network at a moment t, a node set and a directed edge set corresponding to the combat network;
taking a directed edge in a combat network as a removable action, taking a directed edge capable of being connected with a node in the combat network in a decision space network as an addable action, taking the removable action and the addable action as decision actions, and marking as xt =(nt ,et ) Wherein x ist 、et 、nt The decision action at the moment t and the directed edge and the node corresponding to the decision action are respectively; constructing a decision action space according to the decision action;
selecting and executing a decision action from the decision action space, taking the change of the combat capability of the combat network after execution as a return value, and recording the return value as delta Ct+1 =EN (Gt+1 )-EN (Gt ) Wherein E isN (Gt+1 )、EN (Gt ) The combat capabilities of the combat network at the moments t+1 and t respectively;
taking the state of the combat network after execution as state transition, and marking the state as Gt+1 (Nt+1 ,Et+1 )=Gt (Nt ±nt ,Et+1 ±et );
And constructing a Markov decision process according to the states, the decision actions, the return values and the state transitions.
Optionally, the constructing the bellman optimal equation of the markov decision process includes:
initializing a decision action sequence which is expected to be executed: { x1 ,x2 ,…xt …,xk -wherein k is the total number of times;
constructing an objective function with a maximum return value based on the decision action sequence:
wherein DeltaCt The return value is the time t;
and (3) expanding and pouring the objective function to obtain a Belman optimal equation:
wherein, gamma is a discount factor, gamma is E (0, 1)];ΔCt+1 For the return value at time t+1, Vt (Gt ,xt ) Is a state-decision action cost function.
Optionally, the solving to obtain the combined result includes:
updating a Belman optimal equation by adopting a time sequence difference method:
Vt (Gt ,xt )←Vt (Gt ,xt )+ηδt
wherein eta is the update step length deltat As error, deltat =ΔCt+1 +γVt+1 (Gt+1 ,xt+1 )-Vt (Gt ,xt );
The decision action is selected according to epsilon-greedy strategy, and V is selected by probability epsilont (Gt ,xt ) The decision action with the largest value is randomly selected according to the probability 1-epsilon;
selecting Vt (Gt ,xt ) Maximum value decision action xt As an optimal decision action
From initial state G based on state transition0 According to the optimal decision actionForward transferring to generate an optimal decision action sequence;
and obtaining the combat network combination result at each moment according to the execution result of the optimal decision action sequence.
In a second aspect, the present invention provides an adaptive assembly apparatus for a combat network, the apparatus comprising:
the node acquisition module is used for acquiring control nodes, investigation nodes, hit nodes and target nodes;
the space construction module is used for connecting the nodes by adopting directed edges representing the dependency relationship to construct a decision space network;
the network construction module is used for constructing a combat chain aiming at each target node and combining each combat chain to construct a combat network of the target node;
the capacity calculation module is used for calculating and summing the combat capacity of each combat chain to obtain the combat capacity of the combat network;
the process construction module is used for constructing a Markov decision process according to the decision space network and the combat network;
and the process solving module is used for constructing a Belman optimal equation of the Markov decision process and solving and obtaining a combined result.
In a third aspect, the present invention provides an electronic device, including a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is operative according to the instructions to perform steps according to the method described above.
In a fourth aspect, the present invention provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the above method.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a combat network self-adaptive combination method, device, equipment and medium, which combines a function-dependent network analysis method to establish a decision action space and bring cascade effect into a system combat capability evaluation index; solving an optimal strategy function solution aiming at a target node, and drawing out a combat network with maximum combat capability; when the weapon equipment encounters serious faults or damages, a new combat network is reconfigured and combat capability is partially restored, so that the elasticity and flexibility of the combat network are improved; and the method can provide reference for mosaic combat design and planning.
Drawings
Fig. 1 is a flowchart of a method for adaptive combining of a combat network according to a first embodiment of the present invention;
FIG. 2 is a schematic diagram of a decision space network according to an embodiment of the present invention;
FIG. 3 is a schematic view of a combat graph according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a battle diagram after being changed according to an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.
Embodiment one:
as shown in fig. 1, the present invention provides a combat network adaptive combining method, which includes:
1. acquiring a control node, a investigation node, a hit node and a target node.
2. Connecting all nodes by adopting directed edges representing the dependency relationship to construct a decision space network;
2.1, directed edges representing dependencies include:
directed edges from a scout node to another scout node or control node, directed edges from a control node to another control node or hit node, directed edges from a hit node to a target node;
2.2, the dependency relationship between nodes connected by directed edges satisfies:
Oj =min(SOD_Oj ,COD_Oj )
SOD_Oj =Average(SOD_Oj1 ,SOD_Oj2 ,…,SOD_Ojn )
SOD_Oji =αij Oi +(1-αij )SEj
COD_Oj =min(COD_Oj1 ,COD_Oj2 ,…,COD_Ojn )
COD_Oj =Oiij
wherein O isi 、Oj For the running performance of the nodes i and j, average is an Average function, alphaij 、βij The dependent strength SOD and dependent critical COD, SE of the nodes i, j respectivelyj Is a nodej active performance;
wherein, when node i is a investigation node, a control node or a hit node, the operation performance Oi A capability value for node i; when the node i is the target node, the operation performance Oi Is the value of the damage degree of the node i.
3. Constructing a fight chain aiming at each target node, and combining each fight chain to construct a fight network of the target node;
and 3.1, the combat chain comprises a investigation node, a control node, a hit node and a target node which are sequentially connected through directed edges.
4. Calculating and summing the combat capability of each combat chain to obtain the combat capability of the combat network;
4.1, the combat capability of the combat chain is as follows:
wherein E isOC (lj ) For the j-th combat chain lj Is the combat ability of(s)j 、dj 、ij 、tj Respectively fight chain lj In (a) a detection node, a control node, a hit node, and a target node, OS (sj )、OD (dj )、OT (tj ) Respectively, scout nodes sj Control node dj Striking node ij Capability value of OT (tj ) For the target node tj The value of the degree of damage, tj ∈T;
4.2, the combat capability of the combat network is as follows:
wherein E isN (G) To combat the combat capability of the combat network G.
5. Constructing a Markov decision process according to the decision space network and the combat network;
the construction of the markov decision process includes:
taking nodes and directed edges in a combat network as states and marking the states as Gt =(Nt ,Et ) Wherein G ist 、Nt 、Et The method comprises the steps of respectively obtaining a combat network at a moment t, a node set and a directed edge set corresponding to the combat network;
taking a directed edge in a combat network as a removable action, taking a directed edge capable of being connected with a node in the combat network in a decision space network as an addable action, taking the removable action and the addable action as decision actions, and marking as xt =(nt ,et ) Wherein x ist 、et 、nt The decision action at the moment t and the directed edge and the node corresponding to the decision action are respectively; constructing a decision action space according to the decision action;
selecting and executing a decision action from the decision action space, taking the change of the combat capability of the combat network after execution as a return value, and recording the return value as delta Ct+1 =EN (Gt+1 )-EN (Gt ) Wherein E isN (Gt+1 )、EN (Gt ) The combat capabilities of the combat network at the moments t+1 and t respectively;
taking the state of the combat network after execution as state transition, and marking the state as Gt+1 (Nt+1 ,Et+1 )=Gt (Nt ±nt ,Et+1 ±et );
And constructing a Markov decision process according to the states, the decision actions, the return values and the state transitions.
6. And constructing a Belman optimal equation of the Markov decision process, and solving and obtaining a combined result.
6.1, constructing a Belman optimal equation of a Markov decision process, wherein the Belman optimal equation comprises the following steps:
initializing a decision action sequence which is expected to be executed: { x1 ,x2 ,…xt …,xk -wherein k is the total number of times;
constructing an objective function with a maximum return value based on the decision action sequence:
wherein DeltaCt The return value is the time t;
and (3) expanding and pouring the objective function to obtain a Belman optimal equation:
wherein, gamma is a discount factor, gamma is E (0, 1)];ΔCt+1 For the return value at time t+1, Vt (Gt ,xt ) Is a state-decision action cost function.
And 6.2, solving to obtain a combined result comprises the following steps:
updating a Belman optimal equation by adopting a time sequence difference method:
Vt (Gt ,xt )←Vt (Gt ,xt )+ηδt
wherein eta is the update step length deltat As error, deltat =ΔCt+1 +γVt+1 (Gt+1 ,xt+1 )-Vt (Gt ,xt );
The decision action is selected according to epsilon-greedy strategy, and V is selected by probability epsilont (Gt ,xt ) The decision action with the largest value is randomly selected according to the probability 1-epsilon;
selecting Vt (Gt ,xt ) Maximum value decision action xt As an optimal decision action
From initial state G based on state transition0 According to the optimal decision actionForward transferring to generate an optimal decision action sequence;
and obtaining the combat network combination result at each moment according to the execution result of the optimal decision action sequence.
As shown in fig. 2, assuming a decision space network formed by 104 nodes and 732 sides, the unidirectional arrow between the nodes is used as a directed side representing the dependency relationship, and the node types and the number are divided into: scout node s= {1,2, …,52}, command node d= {53,54, …,68}, hit node i= {69,70, …,99}, target node t= {100,101, …,104};
from initial state G0 Starting state transition, and selecting decision action by using epsilon-greedy strategy; setting the maximum number of nodes of the combat network as kN The method comprises the steps of carrying out a first treatment on the surface of the The algorithm operation parameters are set to learning rate eta=0.01, discount coefficient gamma=0.9, greedy strategy probability epsilon=0.75, and maximum node number kN =10; setting the operational capacity profit delta C of the decision actiont <When 0 or when the number of nodes is at most kN >10, a round of training ending mark; the combat network performs interactive training with the decision space, and the obtained return value is used for updating the strategy function; the strategy function is trained to reach a convergence state, in this example a training round number of 3000 rounds.
Selecting initial state G0 According to an optimal strategy function V as a starting pointt (Gt ,xt ) The decision action with the largest selection function value continuously carries out state transition until the number k of nodes of the combat networkN Or the operational capacity benefit DeltaC of the decision actiont <And 0, stopping state transition, wherein the state transition time is the optimal combat network state. The optimal combat network for the target node 103 in this example is shown in fig. 3, and the corresponding combat capability is 125658;
to embody the flexibility of the proposed method, a random node attack is performed on fig. 2, the attacked node and all the relevant edges thereof are removed, and the removed edges cannot be selected by a decision action; under the random node attack strategy, nodes 80 and 85 of the combat network in FIG. 3 areAfter attack, the attack is removed, and the combat capability is reduced to 87880; then from the initial state G0 According to an optimal strategy function V as a starting pointt (Gt ,xt ) State transition is carried out, but the removed edge is not selected in the process; the recombined combat network is shown in fig. 4, and the combat capability is 103250; the recovery rate of lost combat ability is 68.59%.
Embodiment two:
the embodiment of the invention provides a combat network self-adaptive combination device, which comprises:
the node acquisition module is used for acquiring control nodes, investigation nodes, hit nodes and target nodes;
the space construction module is used for connecting the nodes by adopting directed edges representing the dependency relationship to construct a decision space network;
the network construction module is used for constructing a combat chain aiming at each target node and combining each combat chain to construct a combat network of the target node;
the capacity calculation module is used for calculating and summing the combat capacity of each combat chain to obtain the combat capacity of the combat network;
the process construction module is used for constructing a Markov decision process according to the decision space network and the combat network;
and the process solving module is used for constructing a Belman optimal equation of the Markov decision process and solving and obtaining a combined result.
Embodiment III:
based on the first embodiment, the embodiment of the invention provides electronic equipment, which comprises a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is operative according to the instructions to perform steps according to the method described above.
Embodiment four:
based on the first embodiment, the embodiment of the present invention provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the above method.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and variations could be made by those skilled in the art without departing from the technical principles of the present invention, and such modifications and variations should also be regarded as being within the scope of the invention.

Claims (10)

Translated fromChinese
1.一种作战网络自适应组合方法,其特征在于,包括:1. A combat network adaptive combination method, characterized in that, comprising:获取控制节点、侦查节点、打击节点以及目标节点;Obtain control nodes, reconnaissance nodes, strike nodes and target nodes;采用代表依赖关系的有向边对各节点进行连接,构建决策空间网络;Use directed edges representing dependencies to connect each node to construct a decision space network;针对各目标节点构建作战链,并组合各作战链构建目标节点的作战网络;Build a combat chain for each target node, and combine each combat chain to build a combat network for the target node;计算各作战链的作战能力并进行求和,获取作战网络的作战能力;Calculate and sum the combat capabilities of each combat chain to obtain the combat capabilities of the combat network;根据决策空间网络和作战网络,构建马尔科夫决策过程;According to the decision space network and combat network, construct the Markov decision process;构建马尔科夫决策过程的贝尔曼最优方程,并求解获取组合结果。Construct the Bellman optimality equation of the Markov decision process and solve it to obtain the combination result.2.根据权利要求1所述的作战网络自适应组合方法,其特征在于,所述代表依赖关系的有向边包括:2. the combat network adaptive combination method according to claim 1, is characterized in that, the directed edge of described representative dependence relationship comprises:由侦查节点指向另一侦查节点或控制节点的有向边,由控制节点指向另一控制节点或打击节点的有向边,由打击节点指向目标节点的有向边。A directed edge from a detection node to another detection node or a control node, a directed edge from a control node to another control node or a strike node, and a directed edge from a strike node to a target node.3.根据权利要求1所述的作战网络自适应组合方法,其特征在于,所述作战链包括通过有向边依次连接的侦查节点、控制节点、打击节点以及目标节点;3. The combat network adaptive combination method according to claim 1, wherein the combat chain includes a reconnaissance node, a control node, a strike node and a target node connected in sequence by directed edges;所述作战链的作战能力为:The combat capabilities of the combat chain are:式中,EOC(lj)为第j个作战链lj的作战能力,sj、dj、ij、tj分别为作战链lj中的侦查节点、控制节点、打击节点以及目标节点,OS(sj)、OD(dj)、OT(tj)分别为侦查节点sj、控制节点dj、打击节点ij的能力值,OT(tj)为目标节点tj的毁伤程度值,tj∈T;In the formula, EOC (lj ) is the combat capability of the jth combat chain lj , sj , dj , ij , tj are the reconnaissance nodes, control nodes, attack nodes and target nodes in the combat chain lj respectively,OS (sj ), OD (dj ), OT (tj ) are the capability values of the reconnaissance node sj , control node dj , and strike node ij respectively, OT (tj ) is the damage degree value of the target node tj , tj ∈ T;所述作战网络的作战能力为:The combat capabilities of the combat network are:式中,EN(G)为作战网络G的作战能力。In the formula, EN (G) is the combat capability of the combat network G.4.根据权利要求1所述的作战网络自适应组合方法,其特征在于,所述有向边连接的节点之间的依赖关系满足:4. the combat network adaptive combination method according to claim 1, is characterized in that, the dependency relationship between the nodes connected by the directed edge satisfies:Oj=min(SOD_Oj,COD_Oj)Oj = min(SOD_Oj , COD_Oj )SOD_Oj=Average(SOD_Oj1,SOD_Oj2,,SOD_Ojn)SOD_Oj =Average(SOD_Oj1 , SOD_Oj2 , SOD_Ojn )SOD_Oji=αijOi+(1-αij)SEjSOD_Ojiij Oi +(1-αij )SEjCOD_Oj=min(COD_Oj1,COD_Oj2,…,COD_Ojn)COD_Oj = min(COD_Oj1 , COD_Oj2 , ..., COD_Ojn )COD_Oj=OiijCOD_Oj =Oiij式中,Oi、Oj为节点i、j的运行性能,Average为平均值函数,αij、βij分别为节点i、j的依赖强度SOD和依赖关键性COD,SEj为节点j的主动效能;In the formula, Oi and Oj are the operating performance of nodes i and j, Average is the average function, αij and βij are the dependency strength SOD and dependency criticality COD of nodes i and j respectively, SEj is the active efficiency of node j;其中,当节点i为侦查节点、控制节点或打击节点时,运行性能Oi为节点i的能力值;当节点i为目标节点时,运行性能Oi为节点i的毁伤程度值。Among them, when node i is a reconnaissance node, a control node or a strike node, the operation performance Oi is the capability value of node i; when node i is a target node, the operation performance Oi is the damage degree value of node i.5.根据权利要求1所述的作战网络自适应组合方法,其特征在于,所述构建马尔科夫决策过程包括:5. the combat network adaptive combination method according to claim 1, is characterized in that, described construction Markov decision-making process comprises:将作战网络中节点和有向边作为状态,记为Gt=(Nt,Et),其中,Gt、Nt、Et分别为时刻t的作战网络、作战网络对应的节点集合和有向边集合;Taking the nodes and directed edges in the combat network as the state, it is recorded as Gt = (Nt , Et ), where Gt , Nt , and Et are the combat network at time t, the corresponding node set and directed edge set of the combat network;将作战网络中的有向边作为可移除动作,将决策空间网络中能够与作战网络中节点连接的有向边作为可添加动作,将可移除动作和可添加动作作为决策动作,记为xt=(nt,et),其中,xt、et、nt分别为时刻t的决策动作、决策动作对应的有向边和节点;根据决策动作构建决策动作空间;Take the directed edges in the combat network as removable actions, use the directed edges in the decision space network that can be connected with nodes in the combat network as addable actions, and take the removable actions and addable actions as decision actions, which are recorded as xt = (nt , et ), where xt , et , and nt are respectively the decision action at time t, the directed edge and the node corresponding to the decision action; construct the decision action space according to the decision action;从决策动作空间中选取决策动作并执行,将执行后作战网络的作战能力变化量作为回报值,记为ΔCt+1=EN(Gt+1)-EN(Gt),其中,EN(Gt+1)、EN(Gt)分别为时刻t+1、t作战网络的作战能力;Select the decision-making action from the decision-making action space and execute it, and take the change in the combat capability of the combat network after execution as the reward value, which is recorded as ΔCt+1 = EN (Gt+1 )-EN (Gt ), whereE N( Gt+1 ) and EN (Gt ) are the combat capabilities of the combat network at time t+1 and t, respectively;将执行后作战网络的状态作为状态转移,记为Gt+1(Nt+1,Et+1)=Gt(Nt±nt,Et+1±et);The state of the combat network after execution is regarded as the state transition, recorded as Gt+1 (Nt+1 ,Et+1 )=Gt (Nt ±nt ,Et+1 ±et );根据状态、决策动作、回报值以及状态转移构建马尔科夫决策过程。Construct a Markov decision process based on states, decision actions, reward values, and state transitions.6.根据权利要求5所述的作战网络自适应组合方法,其特征在于,所述构建马尔科夫决策过程的贝尔曼最优方程包括:6. the combat network adaptive combination method according to claim 5, is characterized in that, the Bellman optimum equation of described construction Markov decision process comprises:初始化期望执行的决策动作序列:{x1,x2,…xt…,xk},其中,k为时刻总数;Initialize the desired decision-making action sequence: {x1 ,x2 ,…xt …,xk }, where k is the total number of moments;基于决策动作序列构建以回报值最大化的目标函数:Based on the sequence of decision-making actions, the objective function to maximize the reward value is constructed:式中,ΔCt为时刻t的回报值;In the formula, ΔCt is the return value at time t;对目标函数展开推倒得到贝尔曼最优方程:The objective function is expanded and deduced to obtain the Bellman optimal equation:式中,γ为折扣因子,γ∈(0,1];ΔCt+1为时刻t+1的回报值,Vt(Gt,xt)为状态-决策动作价值函数。In the formula, γ is the discount factor, γ∈(0,1]; ΔCt+1 is the reward value at time t+1, and Vt (Gt , xt ) is the state-decision action value function.7.根据权利要求6所述的作战网络自适应组合方法,其特征在于,所述求解获取组合结果包括:7. The combat network adaptive combination method according to claim 6, wherein said solving and obtaining combination results comprises:采用时序差分方法更新贝尔曼最优方程:The Bellman optimality equation is updated using the temporal difference method:Vt(Gt,xt)←Vt(Gt,xt)+ηδtVt (Gt , xt )←Vt (Gt , xt )+ηδt式中,η为更新步长,δt为误差,δt=ΔCt+1+γVt+1(Gt+1,xt+1)-Vt(Gt,xt);In the formula, η is the update step size, δt is the error, δt = ΔCt+1 +γVt+1 (Gt+1 ,xt+1 )-Vt (Gt ,xt );决策动作按照ε-贪心策略选取,以概率ε选择使Vt(Gt,xt)值最大的决策动作,以概率1-ε随机选择决策动作;The decision-making action is selected according to the ε-greedy strategy, the decision-making action that maximizes the value of Vt (Gt , xt ) is selected with the probability ε, and the decision-making action is randomly selected with the probability 1-ε;选取Vt(Gt,xt)值最大的决策动作xt,作为最优决策动作Select the decision-making action xt with the largest value of Vt (Gt,xt ) as the optimal decision-making action基于状态转移从初始状态G0按照最优决策动作向前转移,生成最优决策动作序列;Based on the state transition from the initial state G0 according to the optimal decision-making action Move forward to generate the optimal decision-making action sequence;根据最优决策动作序列的执行结果,获取各时刻的作战网络组合结果。According to the execution results of the optimal decision-making action sequence, the combat network combination results at each moment are obtained.8.一种作战网络自适应组合装置,其特征在于,所述装置包括:8. A combat network adaptive combination device, characterized in that the device comprises:节点获取模块,用于获取控制节点、侦查节点、打击节点以及目标节点;The node acquisition module is used to acquire control nodes, reconnaissance nodes, strike nodes and target nodes;空间构建模块,用于采用代表依赖关系的有向边对各节点进行连接,构建决策空间网络;The space building module is used to connect each node with the directed edge representing the dependency relationship to construct the decision space network;网络构建模块,用于针对各目标节点构建作战链,并组合各作战链构建目标节点的作战网络;The network construction module is used to construct a combat chain for each target node, and combine each combat chain to construct a combat network of the target node;能力计算模块,用于计算各作战链的作战能力并进行求和,获取作战网络的作战能力;A capability calculation module is used to calculate and sum the combat capabilities of each combat chain to obtain the combat capabilities of the combat network;过程构建模块,用于根据决策空间网络和作战网络,构建马尔科夫决策过程;A process building block for constructing a Markov decision process based on the decision space network and the combat network;过程求解模块,用于构建马尔科夫决策过程的贝尔曼最优方程,并求解获取组合结果。The process solving module is used to construct the Bellman optimal equation of the Markov decision process and solve it to obtain the combination result.9.一种电子设备,其特征在于,包括处理器及存储介质;9. An electronic device, comprising a processor and a storage medium;所述存储介质用于存储指令;The storage medium is used to store instructions;所述处理器用于根据所述指令进行操作以执行根据权利要求1-7任一项所述方法的步骤。The processor is configured to operate according to the instructions to perform the steps of the method according to any one of claims 1-7.10.一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现权利要求1-7任一项所述方法的步骤。10. A computer-readable storage medium, on which a computer program is stored, wherein, when the program is executed by a processor, the steps of the method according to any one of claims 1-7 are implemented.
CN202310487406.6A2023-05-042023-05-04 A combat network adaptive combination method, device, equipment and mediumActiveCN116489193B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202310487406.6ACN116489193B (en)2023-05-042023-05-04 A combat network adaptive combination method, device, equipment and medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202310487406.6ACN116489193B (en)2023-05-042023-05-04 A combat network adaptive combination method, device, equipment and medium

Publications (2)

Publication NumberPublication Date
CN116489193Atrue CN116489193A (en)2023-07-25
CN116489193B CN116489193B (en)2024-01-23

Family

ID=87215475

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202310487406.6AActiveCN116489193B (en)2023-05-042023-05-04 A combat network adaptive combination method, device, equipment and medium

Country Status (1)

CountryLink
CN (1)CN116489193B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
KR20200092457A (en)*2019-01-072020-08-04한국과학기술원System and method for predicting human choice behavior and underlying strategy using meta-reinforcement learning
CA3144397A1 (en)*2019-07-192021-01-28Mark GORSKIAn unmanned aerial vehicle (uav)-based system for collecting and distributing animal data for monitoring
CN112632744A (en)*2020-11-132021-04-09中国人民解放军国防科技大学Combat system architecture modeling method and space exploration algorithm based on hyper-network model
US20210111988A1 (en)*2019-10-102021-04-15United States Of America As Represented By The Secretary Of The NavyReinforcement Learning-Based Intelligent Control of Packet Transmissions Within Ad-Hoc Networks
CN112947581A (en)*2021-03-252021-06-11西北工业大学Multi-unmanned aerial vehicle collaborative air combat maneuver decision method based on multi-agent reinforcement learning
CN113093802A (en)*2021-04-032021-07-09西北工业大学Unmanned aerial vehicle maneuver decision method based on deep reinforcement learning
CN114202010A (en)*2021-10-252022-03-18北京仿真中心Information entropy-based complex system networked modeling method, device and medium
CN115034067A (en)*2022-06-162022-09-09中国人民解放军国防科技大学Game optimization method and device based on combat network attack and defense strategy of link
CN115169131A (en)*2022-07-182022-10-11中国人民解放军国防科技大学 Resilience-based node protection method, device and electronic device for combat system
CN115906673A (en)*2023-01-102023-04-04中国人民解放军陆军工程大学Integrated modeling method and system for combat entity behavior model

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
KR20200092457A (en)*2019-01-072020-08-04한국과학기술원System and method for predicting human choice behavior and underlying strategy using meta-reinforcement learning
CA3144397A1 (en)*2019-07-192021-01-28Mark GORSKIAn unmanned aerial vehicle (uav)-based system for collecting and distributing animal data for monitoring
US20210111988A1 (en)*2019-10-102021-04-15United States Of America As Represented By The Secretary Of The NavyReinforcement Learning-Based Intelligent Control of Packet Transmissions Within Ad-Hoc Networks
CN112632744A (en)*2020-11-132021-04-09中国人民解放军国防科技大学Combat system architecture modeling method and space exploration algorithm based on hyper-network model
CN112947581A (en)*2021-03-252021-06-11西北工业大学Multi-unmanned aerial vehicle collaborative air combat maneuver decision method based on multi-agent reinforcement learning
CN113093802A (en)*2021-04-032021-07-09西北工业大学Unmanned aerial vehicle maneuver decision method based on deep reinforcement learning
CN114202010A (en)*2021-10-252022-03-18北京仿真中心Information entropy-based complex system networked modeling method, device and medium
CN115034067A (en)*2022-06-162022-09-09中国人民解放军国防科技大学Game optimization method and device based on combat network attack and defense strategy of link
CN115169131A (en)*2022-07-182022-10-11中国人民解放军国防科技大学 Resilience-based node protection method, device and electronic device for combat system
CN115906673A (en)*2023-01-102023-04-04中国人民解放军陆军工程大学Integrated modeling method and system for combat entity behavior model

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
YANHONG DUAN ET AL: ""A New Model for Evaluation of Radar Anti-Stealth Preplan in Mosaic Warfare"", 《2022 2ND INTERNATIONAL CONFERENCE ON FRONTIERS OF ELECTRONICS, INFORMATION AND COMPUTATION TECHNOLOGIES (ICFEICT)》*
张婷婷等: ""马赛克作战模式的递归拼图计算体系"", 《指挥与控制学报》*
潘勃;陶茜;刘同豪: "基于动态影响网络的空战效能评估方法", 火力与指挥控制, no. 004*
王楠等: ""基于演化博弈的网络信息体系资源优选"", 《计算机系统应用》*

Also Published As

Publication numberPublication date
CN116489193B (en)2024-01-23

Similar Documents

PublicationPublication DateTitle
CN114358141B (en) A multi-agent reinforcement learning method for collaborative decision-making of multiple combat units
CN110929394B (en)Combined combat system modeling method based on super network theory and storage medium
CN108170147B (en) A UAV mission planning method based on self-organizing neural network
Alighanbari et al.Cooperative task assignment of unmanned aerial vehicles in adversarial environments
CN107219858B (en) An Improved Firefly Algorithm for Multi-UAV Cooperative Coupling Task Assignment
CN114741886B (en)Unmanned aerial vehicle cluster multi-task training method and system based on contribution degree evaluation
Yuan et al.Skill reinforcement learning and planning for open-world long-horizon tasks
WO2016107426A1 (en)Systems and methods to adaptively select execution modes
Han et al.$ H_\infty $ Model-free Reinforcement Learning with Robust Stability Guarantee
CN110061870B (en) A joint effectiveness evaluation method based on nodes and edges in tactical Internet
CN112215364A (en)Enemy-friend depth certainty strategy method and system based on reinforcement learning
CN117434961B (en) Joint optimization method of multi-task allocation and trajectory planning for heterogeneous UAV swarm
CN105446742B (en) An optimization method for artificial intelligence to perform tasks
CN114970826B (en) Multi-agent collaboration method and device based on task representation and teammate perception
Xu et al.A study of count-based exploration and bonus for reinforcement learning
CN116088586B (en)Method for planning on-line tasks in unmanned aerial vehicle combat process
Hou et al.Evolutionary multiagent transfer learning with model-based opponent behavior prediction
CN113324545A (en)Multi-unmanned aerial vehicle collaborative task planning method based on hybrid enhanced intelligence
CN120124859A (en) A dynamic game comprehensive evaluation method for system combat capability based on deep learning
CN120079113A (en) An intelligent war game decision-making method based on sample transfer
CN119739189A (en) Unmanned group game confrontation strategy generation method and device
CN116489193A (en)Combat network self-adaptive combination method, device, equipment and medium
CN117114622A (en) An agent distributed collaboration method based on models and reinforcement learning methods
Ma et al.Catching Two Birds with One Stone: Reward Shaping with Dual Random Networks for Balancing Exploration and Exploitation
Zou et al.Solving multi-stage weapon target assignment problems by C-TAEA

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp