CN116489193A

Movatterモバイル変換

Info

Publication number: CN116489193A
Application number: CN202310487406.6A
Authority: CN
Inventors: 张婷婷; 孙云鹏; 陈岩; 李辉; 肖春霞; 宦蕾
Original assignee: PLA University of Science and Technology
Current assignee: PLA University of Science and Technology
Priority date: 2023-05-04
Filing date: 2023-05-04
Publication date: 2023-07-25
Anticipated expiration: 2043-05-04
Also published as: CN116489193B

Abstract

The invention discloses a combat network self-adaptive combination method, device, equipment and medium, wherein the method comprises the following steps: acquiring a control node, a investigation node, a hit node and a target node; connecting all nodes by adopting directed edges representing the dependency relationship to construct a decision space network; constructing a fight chain aiming at each target node, and combining each fight chain to construct a fight network of the target node; calculating and summing the combat capability of each combat chain to obtain the combat capability of the combat network; constructing a Markov decision process according to the decision space network and the combat network; constructing a Belman optimal equation of a Markov decision process, and solving and obtaining a combined result; the invention can adapt to complex combat environment and improve the elasticity and flexibility of combat network.

Description

Combat network self-adaptive combination method, device, equipment and medium

Technical Field

The invention relates to a combat network self-adaptive combination method, device, equipment and medium, belonging to the technical field of system management.

Background

With the rapid development of network information technology and military technology, the scale and style of war have changed deeply. The traditional centralized command control mode is difficult to adapt to a battlefield with high dynamic and high uncertainty. In this regard, the strategic technical office of the advanced research planning agency of the united states department of defense presents a "mosaic" combat concept that is expected to be rapidly integrated into a sensor network, multi-domain command control system, by utilizing a low cost, low complexity autonomous system, thereby imposing complexity on an adversary. In mosaic battles, dispersed manned or unmanned battles on the battlefield may be dynamically assembled into an elastic battle network through communication.

However, the combined decision making of the warfare network is a difficult problem due to cascading effects resulting from interdependencies between heterogeneous weapon systems, and capturing and quantifying such dependencies is critical to maximizing the warfare network's capacity. High uncertainty battlefield environments can cause damage to the battlefield network, such as the decline of the battlefield network capability caused by major faults or damages of weapon equipment, and traditional manual decisions based on commander experience are difficult to quickly respond to battlefield tasks.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, provides a combat network self-adaptive combination method, device, equipment and medium, and solves the technical problem that the traditional manual decision based on commander experience is difficult to quickly respond to combat tasks, so that the combat network capacity is reduced.

In order to achieve the above purpose, the invention is realized by adopting the following technical scheme:

in a first aspect, the present invention provides a method for adaptive combining of a combat network, including:

acquiring a control node, a investigation node, a hit node and a target node;

connecting all nodes by adopting directed edges representing the dependency relationship to construct a decision space network;

constructing a fight chain aiming at each target node, and combining each fight chain to construct a fight network of the target node;

calculating and summing the combat capability of each combat chain to obtain the combat capability of the combat network;

constructing a Markov decision process according to the decision space network and the combat network;

and constructing a Belman optimal equation of the Markov decision process, and solving and obtaining a combined result.

Optionally, the directed edge representing the dependency includes:

directed edges from a scout node to another scout node or control node, directed edges from a control node to another control node or hit node, directed edges from a hit node to a target node.

Optionally, the combat chain comprises a investigation node, a control node, a striking node and a target node which are sequentially connected through directed edges;

the combat capability of the combat chain is as follows:

wherein E is_OC (l_j ) For the j-th combat chain l_j Is the combat ability of(s)_j 、d_j 、i_j 、t_j Respectively fight chain l_j In (a) a detection node, a control node, a hit node, and a target node, O_S (s_j )、O_D (d_j )、O_T (t_j ) Respectively, scout nodes s_j Control node d_j Striking node i_j Capability value of O_T (t_j ) For the target node t_j The value of the degree of damage, t_j ∈T；

The combat capability of the combat network is as follows:

wherein E is_N (G) To combat the combat capability of the combat network G.

Optionally, the dependency relationship between the nodes connected by the directed edges satisfies:

O_j ＝min(SOD_O_j ,COD_O_j )

SOD_O_j ＝Average(SOD_O_j1 ,SOD_O_j2 ,…,SOD_O_jn )

SOD_O_ji ＝α_ij O_i +(1-α_ij )SE_j

COD_O_j ＝min(COD_O_j1 ,COD_O_j2 ,…,COD_O_jn )

COD_O_j ＝O_i +β_ij

wherein O is_i 、O_j For the running performance of the nodes i and j, average is an Average function, alpha_ij 、β_ij The dependent strength SOD and dependent critical COD, SE of the nodes i, j respectively_j Active performance for node j;

wherein, when node i is a investigation node, a control node or a hit node, the operation performance O_i A capability value for node i; when the node i is the target node, the operation performance O_i Is the value of the damage degree of the node i.

Optionally, the constructing a markov decision process includes:

taking nodes and directed edges in a combat network as states and marking the states as G_t ＝(N_t ,E_t ) Wherein G is_t 、N_t 、E_t The method comprises the steps of respectively obtaining a combat network at a moment t, a node set and a directed edge set corresponding to the combat network;

taking a directed edge in a combat network as a removable action, taking a directed edge capable of being connected with a node in the combat network in a decision space network as an addable action, taking the removable action and the addable action as decision actions, and marking as x_t ＝(n_t ,e_t ) Wherein x is_t 、e_t 、n_t The decision action at the moment t and the directed edge and the node corresponding to the decision action are respectively; constructing a decision action space according to the decision action;

selecting and executing a decision action from the decision action space, taking the change of the combat capability of the combat network after execution as a return value, and recording the return value as delta C_t+1 ＝E_N (G_t+1 )-E_N (G_t ) Wherein E is_N (G_t+1 )、E_N (G_t ) The combat capabilities of the combat network at the moments t+1 and t respectively;

taking the state of the combat network after execution as state transition, and marking the state as G_t+1 (N_t+1 ,E_t+1 )＝G_t (N_t ±n_t ,E_t+1 ±e_t )；

And constructing a Markov decision process according to the states, the decision actions, the return values and the state transitions.

Optionally, the constructing the bellman optimal equation of the markov decision process includes:

initializing a decision action sequence which is expected to be executed: { x₁ ,x₂ ,…x_t …,x_k -wherein k is the total number of times;

constructing an objective function with a maximum return value based on the decision action sequence:

wherein DeltaC_t The return value is the time t;

and (3) expanding and pouring the objective function to obtain a Belman optimal equation:

wherein, gamma is a discount factor, gamma is E (0, 1)]；ΔC_t+1 For the return value at time t+1, V_t (G_t ,x_t ) Is a state-decision action cost function.

Optionally, the solving to obtain the combined result includes:

updating a Belman optimal equation by adopting a time sequence difference method:

V_t (G_t ,x_t )←V_t (G_t ,x_t )+ηδ_t

wherein eta is the update step length delta_t As error, delta_t ＝ΔC_t+1 +γV_t+1 (G_t+1 ,x_t+1 )-V_t (G_t ,x_t )；

The decision action is selected according to epsilon-greedy strategy, and V is selected by probability epsilon_t (G_t ,x_t ) The decision action with the largest value is randomly selected according to the probability 1-epsilon;

selecting V_t (G_t ,x_t ) Maximum value decision action x_t As an optimal decision action

From initial state G based on state transition₀ According to the optimal decision actionForward transferring to generate an optimal decision action sequence;

and obtaining the combat network combination result at each moment according to the execution result of the optimal decision action sequence.

In a second aspect, the present invention provides an adaptive assembly apparatus for a combat network, the apparatus comprising:

the node acquisition module is used for acquiring control nodes, investigation nodes, hit nodes and target nodes;

the space construction module is used for connecting the nodes by adopting directed edges representing the dependency relationship to construct a decision space network;

the network construction module is used for constructing a combat chain aiming at each target node and combining each combat chain to construct a combat network of the target node;

the capacity calculation module is used for calculating and summing the combat capacity of each combat chain to obtain the combat capacity of the combat network;

the process construction module is used for constructing a Markov decision process according to the decision space network and the combat network;

and the process solving module is used for constructing a Belman optimal equation of the Markov decision process and solving and obtaining a combined result.

In a third aspect, the present invention provides an electronic device, including a processor and a storage medium;

the storage medium is used for storing instructions;

the processor is operative according to the instructions to perform steps according to the method described above.

In a fourth aspect, the present invention provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the above method.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides a combat network self-adaptive combination method, device, equipment and medium, which combines a function-dependent network analysis method to establish a decision action space and bring cascade effect into a system combat capability evaluation index; solving an optimal strategy function solution aiming at a target node, and drawing out a combat network with maximum combat capability; when the weapon equipment encounters serious faults or damages, a new combat network is reconfigured and combat capability is partially restored, so that the elasticity and flexibility of the combat network are improved; and the method can provide reference for mosaic combat design and planning.

Drawings

Fig. 1 is a flowchart of a method for adaptive combining of a combat network according to a first embodiment of the present invention;

FIG. 2 is a schematic diagram of a decision space network according to an embodiment of the present invention;

FIG. 3 is a schematic view of a combat graph according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a battle diagram after being changed according to an embodiment of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.

Embodiment one:

as shown in fig. 1, the present invention provides a combat network adaptive combining method, which includes:

1. acquiring a control node, a investigation node, a hit node and a target node.

2. Connecting all nodes by adopting directed edges representing the dependency relationship to construct a decision space network;

2.1, directed edges representing dependencies include:

directed edges from a scout node to another scout node or control node, directed edges from a control node to another control node or hit node, directed edges from a hit node to a target node;

2.2, the dependency relationship between nodes connected by directed edges satisfies:

O_j ＝min(SOD_O_j ,COD_O_j )

SOD_O_j ＝Average(SOD_O_j1 ,SOD_O_j2 ,…,SOD_O_jn )

SOD_O_ji ＝α_ij O_i +(1-α_ij )SE_j

COD_O_j ＝min(COD_O_j1 ,COD_O_j2 ,…,COD_O_jn )

COD_O_j ＝O_i +β_ij

wherein O is_i 、O_j For the running performance of the nodes i and j, average is an Average function, alpha_ij 、β_ij The dependent strength SOD and dependent critical COD, SE of the nodes i, j respectively_j Is a nodej active performance;

3. Constructing a fight chain aiming at each target node, and combining each fight chain to construct a fight network of the target node;

and 3.1, the combat chain comprises a investigation node, a control node, a hit node and a target node which are sequentially connected through directed edges.

4. Calculating and summing the combat capability of each combat chain to obtain the combat capability of the combat network;

4.1, the combat capability of the combat chain is as follows:

4.2, the combat capability of the combat network is as follows:

wherein E is_N (G) To combat the combat capability of the combat network G.

5. Constructing a Markov decision process according to the decision space network and the combat network;

the construction of the markov decision process includes:

6. And constructing a Belman optimal equation of the Markov decision process, and solving and obtaining a combined result.

6.1, constructing a Belman optimal equation of a Markov decision process, wherein the Belman optimal equation comprises the following steps:

wherein DeltaC_t The return value is the time t;

And 6.2, solving to obtain a combined result comprises the following steps:

V_t (G_t ,x_t )←V_t (G_t ,x_t )+ηδ_t

As shown in fig. 2, assuming a decision space network formed by 104 nodes and 732 sides, the unidirectional arrow between the nodes is used as a directed side representing the dependency relationship, and the node types and the number are divided into: scout node s= {1,2, …,52}, command node d= {53,54, …,68}, hit node i= {69,70, …,99}, target node t= {100,101, …,104};

from initial state G₀ Starting state transition, and selecting decision action by using epsilon-greedy strategy; setting the maximum number of nodes of the combat network as k_N The method comprises the steps of carrying out a first treatment on the surface of the The algorithm operation parameters are set to learning rate eta=0.01, discount coefficient gamma=0.9, greedy strategy probability epsilon=0.75, and maximum node number k_N =10; setting the operational capacity profit delta C of the decision action_t <When 0 or when the number of nodes is at most k_N >10, a round of training ending mark; the combat network performs interactive training with the decision space, and the obtained return value is used for updating the strategy function; the strategy function is trained to reach a convergence state, in this example a training round number of 3000 rounds.

Selecting initial state G₀ According to an optimal strategy function V as a starting point_t (G_t ,x_t ) The decision action with the largest selection function value continuously carries out state transition until the number k of nodes of the combat network_N Or the operational capacity benefit DeltaC of the decision action_t <And 0, stopping state transition, wherein the state transition time is the optimal combat network state. The optimal combat network for the target node 103 in this example is shown in fig. 3, and the corresponding combat capability is 125658;

to embody the flexibility of the proposed method, a random node attack is performed on fig. 2, the attacked node and all the relevant edges thereof are removed, and the removed edges cannot be selected by a decision action; under the random node attack strategy, nodes 80 and 85 of the combat network in FIG. 3 areAfter attack, the attack is removed, and the combat capability is reduced to 87880; then from the initial state G₀ According to an optimal strategy function V as a starting point_t (G_t ,x_t ) State transition is carried out, but the removed edge is not selected in the process; the recombined combat network is shown in fig. 4, and the combat capability is 103250; the recovery rate of lost combat ability is 68.59%.

Embodiment two:

the embodiment of the invention provides a combat network self-adaptive combination device, which comprises:

Embodiment III:

based on the first embodiment, the embodiment of the invention provides electronic equipment, which comprises a processor and a storage medium;

the storage medium is used for storing instructions;

Embodiment four:

based on the first embodiment, the embodiment of the present invention provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the above method.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and variations could be made by those skilled in the art without departing from the technical principles of the present invention, and such modifications and variations should also be regarded as being within the scope of the invention.

Claims

Translated fromChinese

1.一种作战网络自适应组合方法，其特征在于，包括：1. A combat network adaptive combination method, characterized in that, comprising:

获取控制节点、侦查节点、打击节点以及目标节点；Obtain control nodes, reconnaissance nodes, strike nodes and target nodes;

采用代表依赖关系的有向边对各节点进行连接，构建决策空间网络；Use directed edges representing dependencies to connect each node to construct a decision space network;

针对各目标节点构建作战链，并组合各作战链构建目标节点的作战网络；Build a combat chain for each target node, and combine each combat chain to build a combat network for the target node;

计算各作战链的作战能力并进行求和，获取作战网络的作战能力；Calculate and sum the combat capabilities of each combat chain to obtain the combat capabilities of the combat network;

根据决策空间网络和作战网络，构建马尔科夫决策过程；According to the decision space network and combat network, construct the Markov decision process;

构建马尔科夫决策过程的贝尔曼最优方程，并求解获取组合结果。Construct the Bellman optimality equation of the Markov decision process and solve it to obtain the combination result.

2.根据权利要求1所述的作战网络自适应组合方法，其特征在于，所述代表依赖关系的有向边包括：2. the combat network adaptive combination method according to claim 1, is characterized in that, the directed edge of described representative dependence relationship comprises:

由侦查节点指向另一侦查节点或控制节点的有向边，由控制节点指向另一控制节点或打击节点的有向边，由打击节点指向目标节点的有向边。A directed edge from a detection node to another detection node or a control node, a directed edge from a control node to another control node or a strike node, and a directed edge from a strike node to a target node.

3.根据权利要求1所述的作战网络自适应组合方法，其特征在于，所述作战链包括通过有向边依次连接的侦查节点、控制节点、打击节点以及目标节点；3. The combat network adaptive combination method according to claim 1, wherein the combat chain includes a reconnaissance node, a control node, a strike node and a target node connected in sequence by directed edges;

所述作战链的作战能力为：The combat capabilities of the combat chain are:

式中，E_OC(l_j)为第j个作战链l_j的作战能力，s_j、d_j、i_j、t_j分别为作战链l_j中的侦查节点、控制节点、打击节点以及目标节点，O_S(s_j)、O_D(d_j)、O_T(t_j)分别为侦查节点s_j、控制节点d_j、打击节点i_j的能力值，O_T(t_j)为目标节点t_j的毁伤程度值，t_j∈T；In the formula, E_OC (l_j ) is the combat capability of the jth combat chain l_j , s_j , d_j , i_j , t_j are the reconnaissance nodes, control nodes, attack nodes and target nodes in the combat chain l_j respectively,_OS (s_j ), O_D (d_j ), O_T (t_j ) are the capability values of the reconnaissance node s_j , control node d_j , and strike node i_j respectively, O_T (t_j ) is the damage degree value of the target node t_j , t_j ∈ T;

所述作战网络的作战能力为：The combat capabilities of the combat network are:

式中，E_N(G)为作战网络G的作战能力。In the formula, E_N (G) is the combat capability of the combat network G.

4.根据权利要求1所述的作战网络自适应组合方法，其特征在于，所述有向边连接的节点之间的依赖关系满足：4. the combat network adaptive combination method according to claim 1, is characterized in that, the dependency relationship between the nodes connected by the directed edge satisfies:

O_j＝min(SOD_O_j，COD_O_j)O_j = min(SOD_O_j , COD_O_j )

SOD_O_j＝Average(SOD_O_j1，SOD_O_j2，，SOD_O_jn)SOD_O_j =Average(SOD_O_j1 , SOD_O_j2 , SOD_O_jn )

SOD_O_ji＝α_ijO_i+(1-α_ij)SE_jSOD_O_ji =α_ij O_i +(1-α_ij )SE_j

COD_O_j＝min(COD_O_j1，COD_O_j2，…，COD_O_jn)COD_O_j = min(COD_O_j1 , COD_O_j2 , ..., COD_O_jn )

COD_O_j＝O_i+β_ijCOD_O_j ＝O_i +β_ij

式中，O_i、O_j为节点i、j的运行性能，Average为平均值函数，α_ij、β_ij分别为节点i、j的依赖强度SOD和依赖关键性COD，SE_j为节点j的主动效能；In the formula, O_i and O_j are the operating performance of nodes i and j, Average is the average function, α_ij and β_ij are the dependency strength SOD and dependency criticality COD of nodes i and j respectively, SE_j is the active efficiency of node j;

其中，当节点i为侦查节点、控制节点或打击节点时，运行性能O_i为节点i的能力值；当节点i为目标节点时，运行性能O_i为节点i的毁伤程度值。Among them, when node i is a reconnaissance node, a control node or a strike node, the operation performance O_i is the capability value of node i; when node i is a target node, the operation performance O_i is the damage degree value of node i.

5.根据权利要求1所述的作战网络自适应组合方法，其特征在于，所述构建马尔科夫决策过程包括：5. the combat network adaptive combination method according to claim 1, is characterized in that, described construction Markov decision-making process comprises:

将作战网络中节点和有向边作为状态，记为G_t＝(N_t,E_t)，其中，G_t、N_t、E_t分别为时刻t的作战网络、作战网络对应的节点集合和有向边集合；Taking the nodes and directed edges in the combat network as the state, it is recorded as G_t = (N_t , E_t ), where G_t , N_t , and E_t are the combat network at time t, the corresponding node set and directed edge set of the combat network;

将作战网络中的有向边作为可移除动作，将决策空间网络中能够与作战网络中节点连接的有向边作为可添加动作，将可移除动作和可添加动作作为决策动作，记为x_t＝(n_t,e_t)，其中，x_t、e_t、n_t分别为时刻t的决策动作、决策动作对应的有向边和节点；根据决策动作构建决策动作空间；Take the directed edges in the combat network as removable actions, use the directed edges in the decision space network that can be connected with nodes in the combat network as addable actions, and take the removable actions and addable actions as decision actions, which are recorded as x_t = (n_t , e_t ), where x_t , e_t , and n_t are respectively the decision action at time t, the directed edge and the node corresponding to the decision action; construct the decision action space according to the decision action;

从决策动作空间中选取决策动作并执行，将执行后作战网络的作战能力变化量作为回报值，记为ΔC_t+1＝E_N(G_t+1)-E_N(G_t)，其中，E_N(G_t+1)、E_N(G_t)分别为时刻t+1、t作战网络的作战能力；Select the decision-making action from the decision-making action space and execute it, and take the change in the combat capability of the combat network after execution as the reward value, which is recorded as ΔC_t+1 = E_N (G_t+1 )-EN (G_t ), where_{E N}₍ G_t+1 ) and E_N (G_t ) are the combat capabilities of the combat network at time t+1 and t, respectively;

将执行后作战网络的状态作为状态转移，记为G_t+1(N_t+1,E_t+1)＝G_t(N_t±n_t,E_t+1±e_t)；The state of the combat network after execution is regarded as the state transition, recorded as G_t+1 (N_t+1 ,E_t+1 )=G_t (N_t ±n_t ,E_t+1 ±e_t );

根据状态、决策动作、回报值以及状态转移构建马尔科夫决策过程。Construct a Markov decision process based on states, decision actions, reward values, and state transitions.

6.根据权利要求5所述的作战网络自适应组合方法，其特征在于，所述构建马尔科夫决策过程的贝尔曼最优方程包括：6. the combat network adaptive combination method according to claim 5, is characterized in that, the Bellman optimum equation of described construction Markov decision process comprises:

初始化期望执行的决策动作序列：{x₁,x₂,…x_t…,x_k}，其中，k为时刻总数；Initialize the desired decision-making action sequence: {x₁ ,x₂ ,…x_t …,x_k }, where k is the total number of moments;

基于决策动作序列构建以回报值最大化的目标函数：Based on the sequence of decision-making actions, the objective function to maximize the reward value is constructed:

式中，ΔC_t为时刻t的回报值；In the formula, ΔC_t is the return value at time t;

对目标函数展开推倒得到贝尔曼最优方程：The objective function is expanded and deduced to obtain the Bellman optimal equation:

式中，γ为折扣因子，γ∈(0,1]；ΔC_t+1为时刻t+1的回报值，V_t(G_t,x_t)为状态-决策动作价值函数。In the formula, γ is the discount factor, γ∈(0,1]; ΔC_t+1 is the reward value at time t+1, and V_t (G_t , x_t ) is the state-decision action value function.

7.根据权利要求6所述的作战网络自适应组合方法，其特征在于，所述求解获取组合结果包括：7. The combat network adaptive combination method according to claim 6, wherein said solving and obtaining combination results comprises:

采用时序差分方法更新贝尔曼最优方程：The Bellman optimality equation is updated using the temporal difference method:

V_t(G_t，x_t)←V_t(G_t，x_t)+ηδ_tV_t (G_t , x_t )←V_t (G_t , x_t )+ηδ_t

式中，η为更新步长，δ_t为误差，δ_t＝ΔC_t+1+γV_t+1(G_t+1,x_t+1)-V_t(G_t,x_t)；In the formula, η is the update step size, δ_t is the error, δ_t = ΔC_t+1 +γV_t+1 (G_t+1 ,x_t+1 )-V_t (G_t ,x_t );

决策动作按照ε-贪心策略选取，以概率ε选择使V_t(G_t,x_t)值最大的决策动作，以概率1-ε随机选择决策动作；The decision-making action is selected according to the ε-greedy strategy, the decision-making action that maximizes the value of V_t (G_t , x_t ) is selected with the probability ε, and the decision-making action is randomly selected with the probability 1-ε;

选取V_t(Gt,x_t)值最大的决策动作x_t，作为最优决策动作Select the decision-making action x_t with the largest value of V_t (Gt,x_t ) as the optimal decision-making action

基于状态转移从初始状态G₀按照最优决策动作向前转移，生成最优决策动作序列；Based on the state transition from the initial state G₀ according to the optimal decision-making action Move forward to generate the optimal decision-making action sequence;

根据最优决策动作序列的执行结果，获取各时刻的作战网络组合结果。According to the execution results of the optimal decision-making action sequence, the combat network combination results at each moment are obtained.

8.一种作战网络自适应组合装置，其特征在于，所述装置包括：8. A combat network adaptive combination device, characterized in that the device comprises:

节点获取模块，用于获取控制节点、侦查节点、打击节点以及目标节点；The node acquisition module is used to acquire control nodes, reconnaissance nodes, strike nodes and target nodes;

空间构建模块，用于采用代表依赖关系的有向边对各节点进行连接，构建决策空间网络；The space building module is used to connect each node with the directed edge representing the dependency relationship to construct the decision space network;

网络构建模块，用于针对各目标节点构建作战链，并组合各作战链构建目标节点的作战网络；The network construction module is used to construct a combat chain for each target node, and combine each combat chain to construct a combat network of the target node;

能力计算模块，用于计算各作战链的作战能力并进行求和，获取作战网络的作战能力；A capability calculation module is used to calculate and sum the combat capabilities of each combat chain to obtain the combat capabilities of the combat network;

过程构建模块，用于根据决策空间网络和作战网络，构建马尔科夫决策过程；A process building block for constructing a Markov decision process based on the decision space network and the combat network;

过程求解模块，用于构建马尔科夫决策过程的贝尔曼最优方程，并求解获取组合结果。The process solving module is used to construct the Bellman optimal equation of the Markov decision process and solve it to obtain the combination result.

9.一种电子设备，其特征在于，包括处理器及存储介质；9. An electronic device, comprising a processor and a storage medium;

所述存储介质用于存储指令；The storage medium is used to store instructions;

所述处理器用于根据所述指令进行操作以执行根据权利要求1-7任一项所述方法的步骤。The processor is configured to operate according to the instructions to perform the steps of the method according to any one of claims 1-7.

10.一种计算机可读存储介质，其上存储有计算机程序，其特征在于，该程序被处理器执行时实现权利要求1-7任一项所述方法的步骤。10. A computer-readable storage medium, on which a computer program is stored, wherein, when the program is executed by a processor, the steps of the method according to any one of claims 1-7 are implemented.