技术领域Technical Field
本发明属于无人机技术领域,具体地说,是涉及一种数据驱动的自适应动态规划空战决策方法。The invention belongs to the technical field of unmanned aerial vehicles, and in particular relates to a data-driven adaptive dynamic programming air combat decision-making method.
背景技术Background Art
无人飞行战斗机决策的目的是使其能够在战斗中占据优势取胜或者转劣势为优势,研究的关键就是设计出高效的自主决策机制。无人飞行战斗机的自主决策是关于如何在空战中根据实战环境实时制定战术计划或选择飞行动作的机制,该决策机制的优劣程度反映了无人飞行战斗机在现代化空战中的智能化水平。自主决策机制的输入是与空战有关的各种参数,如飞行器的飞行参数,武器参数和三维空间场景参数以及敌我双方的相对关系,决策过程是系统内部的信息处理和计算决策过程,输出是决策制定的战术计划或某些特定的飞行动作。The purpose of unmanned aerial fighter decision-making is to enable it to gain an advantage in combat or turn a disadvantage into an advantage. The key to the research is to design an efficient autonomous decision-making mechanism. The autonomous decision-making of unmanned aerial fighters is about how to formulate tactical plans or select flight actions in real time according to the actual combat environment in air combat. The quality of this decision-making mechanism reflects the intelligence level of unmanned aerial fighters in modern air combat. The input of the autonomous decision-making mechanism is various parameters related to air combat, such as the flight parameters of the aircraft, weapon parameters and three-dimensional space scene parameters and the relative relationship between the enemy and us. The decision-making process is the information processing and calculation decision-making process within the system, and the output is the tactical plan or certain specific flight actions made by the decision.
自适应动态规划融合了动态规划和增强学习的思想,不但继承了动态规划方法的优势,而且又能克服动态规划产生的“维数灾”问题。自适应动态规划的原理是采用函数近似结构,逼近传统的动态规划中的性能函数和控制策略,借助增强学习的思想得到最优值函数和控制策略以满足贝尔曼最优性原理。自适应动态规划的思想可以用图1表示。Adaptive dynamic programming combines the ideas of dynamic programming and reinforcement learning. It not only inherits the advantages of dynamic programming methods, but also overcomes the "curse of dimensionality" problem caused by dynamic programming. The principle of adaptive dynamic programming is to use function approximation structure to approximate the performance function and control strategy in traditional dynamic programming, and use the idea of reinforcement learning to obtain the optimal value function and control strategy to meet the Bellman optimality principle. The idea of adaptive dynamic programming can be represented by Figure 1.
空战决策是一项复杂的任务,涉及到大量的信息和变量,使得传统的人工制定决策规则难以适应不断变化的战场环境。因此,现有的空战决策方法往往存在以下问题:Air combat decision-making is a complex task involving a large amount of information and variables, making it difficult for traditional manual decision-making rules to adapt to the ever-changing battlefield environment. Therefore, existing air combat decision-making methods often have the following problems:
1.静态规划方法无法应对动态环境:传统的决策方法通常是基于事先设定的规则或者模型,难以适应实时变化的战场环境和动态的敌情。1. Static planning methods cannot cope with dynamic environments: Traditional decision-making methods are usually based on pre-set rules or models, which are difficult to adapt to real-time changing battlefield environments and dynamic enemy situations.
2.人工决策需要耗费大量时间和精力:决策过程需要处理大量的信息和变量,需要消耗大量的时间和精力,同时还容易出现疏漏和误判。2. Manual decision-making requires a lot of time and energy: The decision-making process needs to process a large amount of information and variables, which consumes a lot of time and energy, and is also prone to omissions and misjudgments.
3.缺乏综合考虑和灵活应变能力:传统的决策方法通常是基于单一因素或者少数几个因素进行决策,难以对多种因素进行综合考虑和灵活应变,从而可能导致决策的偏颇或者不准确。3. Lack of comprehensive consideration and flexible response capabilities: Traditional decision-making methods are usually based on a single factor or a few factors. It is difficult to comprehensively consider and flexibly respond to multiple factors, which may lead to biased or inaccurate decisions.
4.无法适应信息化战争的需要:现代空战环境信息量大、变化快,传统的人工制定决策规则的方法已经无法适应信息化战争的需要。4. Unable to adapt to the needs of information warfare: The modern air combat environment has a large amount of information and changes rapidly. The traditional method of manually formulating decision-making rules can no longer meet the needs of information warfare.
发明内容Summary of the invention
本发明的目的在于提供一种数据驱动的自适应动态规划空战决策方法,主要解决传统的人工制定决策规则难以适应不断变化的战场环境的问题。The purpose of the present invention is to provide a data-driven adaptive dynamic programming air combat decision-making method, which mainly solves the problem that traditional manually formulated decision rules are difficult to adapt to the ever-changing battlefield environment.
为实现上述目的,本发明采用的技术方案如下:To achieve the above purpose, the technical solution adopted by the present invention is as follows:
一种数据驱动的自适应动态规划空战决策方法,包括以下步骤:A data-driven adaptive dynamic programming air combat decision-making method comprises the following steps:
S1,假定对战双方无人机为红方无人机和蓝方无人机;分别以红方追击-蓝方逃逸和红方逃逸-蓝方追击问题建立无人机追逃问题系统模型;S1, assuming that the two drones in the battle are red drones and blue drones; establish the drone pursuit and escape problem system model based on the red pursuit-blue escape and red escape-blue pursuit problems respectively;
S2,采用无模型自适应动态规划求解上述无人机追逃问题,并采用有界探索信号对策略进行改进;S2, adopts model-free adaptive dynamic programming to solve the above drone pursuit problem, and uses bounded exploration signals to improve the strategy;
S3,采用离线神经网络模型训练算法获得红方无人机和蓝方无人机实时控制率,并实时收集红方控制率信息和红蓝双方状态信息;S3, using offline neural network model training algorithm to obtain the real-time control rate of the red UAV and the blue UAV, and collect the red UAV control rate information and the red and blue UAV status information in real time;
S4,通过在线模型训练算法在线更新神经网络,实现红方无人机和蓝方无人机在“追踪-逃逸”问题中的自适应动态规划的空战决策。S4, through the online model training algorithm, updates the neural network online to achieve the adaptive dynamic programming air combat decision-making of the red and blue drones in the "track-and-escape" problem.
进一步地,在本发明中,红方追击-蓝方逃逸问题模型建立方法如下:Furthermore, in the present invention, the method for establishing the red party pursuit-blue party escape problem model is as follows:
记红方无人机实时位置为Xr(t),蓝方无人机位置为Xb(t),则双方位置差为:The real-time position of the red drone is Xr (t), and the position of the blue drone is Xb (t). The position difference between the two is:
e=Xb(t)-Xr(t) (1)e=Xb (t)-Xr (t) (1)
则跟踪误差系统为:The tracking error system is:
式中,表示跟踪误差为双方位置差e关于时间的微分,为蓝方无人机实时位置Xb(t)关于时间的微分,为红方无人机实时位置Xr(t)关于时间的微分;In the formula, The tracking error is the differential of the position difference e between the two parties with respect to time, is the time derivative of the real-time position Xb (t) of the blue drone, is the time derivative of the real-time position of the Red Team UAV Xr (t);
假设红方追击方仅能测到蓝方无人机的三维运动速度,因此式(2)可以具体表示为:Assuming that the red pursuer can only measure the three-dimensional motion speed of the blue drone, equation (2) can be specifically expressed as:
则红方追击蓝方的系统模型表示为:Then the system model of the red party chasing the blue party is expressed as:
其中,Vr为红方无人机速度,单位为马赫,为红方无人机速度Vr关于时间的微分;χr为红方无人机航向角,单位为弧度,为红方无人机航向角χr关于时间的微分;γr为红方无人机航迹倾角,单位为弧度,为红方无人机航迹倾角关于时间的微分;距离误差ex,ey,ez,单位为千米,为距离误差关于ex,ey,ez时间的微分;g为重力加速度;Vc为声速,nx,ny,nz为红方无人机的过载控制量。Where Vr is the speed of the Red UAV in Mach. is the differential of the speedVr of the Red UAV with respect to time;χr is the heading angle of the Red UAV, in radians, is the differential of the heading angleχr of the red drone with respect to time;γr is the track inclination angle of the red drone, in radians, is the differential of the Red Team UAV’s track inclination with respect to time; the distance errorex , ey , ez is in kilometers, is the differential of the distance error with respect to timeex ,ey ,ez ; g is the acceleration due to gravity;Vc is the speed of sound,nx ,ny ,nz are the overload control quantities of the red side UAV.
进一步地,在本发明中红方逃逸-蓝方追击问题模型建立方法如下:Furthermore, in the present invention, the method for establishing the red party escape-blue party pursuit problem model is as follows:
采用“虚位移”方法,极小化本机反向位移与敌方飞行器之间的距离,即达到极大化本机位置与敌机位置的效果,其中,虚位移为“虚位移速度”V′产生的位移量,即:The "virtual displacement" method is used to minimize the distance between the aircraft's reverse displacement and the enemy aircraft, that is, to maximize the position of the aircraft and the enemy aircraft. The virtual displacement is the displacement generated by the "virtual displacement speed" V', that is:
则红方逃逸-蓝方追击的系统模型表示为:The system model of the red party escaping and the blue party pursuing is expressed as:
其中符号及意义与追击问题相同。The symbols and meanings are the same as those in the pursuit problem.
进一步地,在本发明中对红方追击-蓝方逃逸追逃问题系统模型进行处理,包括:Furthermore, in the present invention, the red party pursuit-blue party escape and pursuit problem system model is processed, including:
S11,将无人机的非线性连续状态空间方程简记为:S11, the nonlinear continuous state space equation of the UAV is simplified as:
其中,x=[Vr,χr,γr,ex,ey,ez]T表示红方飞行器状态向量,表示红方飞行器状态向量x关于时间的微分,u=[nx,ny,nz]T表示红方飞行器控制向量,F(x),G(x)分别为Where x = [Vr , χr , γr , ex , ey , ez ]T represents the state vector of the red aircraft, represents the differential of the red aircraft state vector x with respect to time, u = [nx ,ny ,nz ]T represents the red aircraft control vector, F(x) and G(x) are respectively
S12,定义性能指标函数为:S12, define the performance index function as:
其中,Q(x,t)为与状态相关的指标函数,R(u,t)为与控制量相关的指标函数;Among them, Q(x, t) is the index function related to the state, and R(u, t) is the index function related to the control amount;
S13,建立无人机角度优势函数,设红方无人机速度矢量为:S13, establish the UAV angle advantage function, assuming that the red UAV velocity vector is:
Vr=[cosγrcosχr,cosγrsinχr,sinγr]TVr = [cosγr cosχr , cosγr sinχr , sinγr ]T
蓝方无人机速度矢量为:The velocity vector of the blue drone is:
Vb=[cosγbcosχb,cosγbsinχb,sinγb]T,Vb = [cosγb cosχb , cosγb sinχb , sinγb ]T ,
红方无人机对蓝方蓝方无人机距离矢量为erb=[ex,ey,ez]T,其几何关系为The distance vector of the red drone to the blue drone iserb = [ex ,ey ,ez ]T , and the geometric relationship is:
得到角度优势函数:Get the angle advantage function:
Qα=cαr+(1-c)αb (9)Qα = c αr + (1-c) αb (9)
其中c=(αr+αb)/2π;where c=(αr +αb )/2π;
S14,定义距离优势函数为:S14, define the distance advantage function as:
Qd=eTQ1e (10)Qd =eT Q1 e (10)
其中e=[ex,ey,ez]T,为正定矩阵;Where e=[ex , ey , ez ]T , is a positive definite matrix;
红方的状态指标函数可表示为:The state indicator function of the red team can be expressed as:
Q(x,t)=Qd+Q2Qα (11)Q(x, t) = Qd + Q2 Qα (11)
其中Q2为权重系数;Where Q2 is the weight coefficient;
S15,定义控制器指标函数为:S15, define the controller indicator function as:
R(u,t)=(u-u0)TR(u-u0) (12)R(u,t)=(uu0 )T R(uu0 ) (12)
其中,为控制量权重系数,u0=[sinγr,0,cosγr]T为无人机稳定飞行下的控制量。in, is the control variable weight coefficient, u0 =[sinγr , 0, cosγr ]T is the control variable under stable flight of the UAV.
进一步地,在本发明中所述步骤2的具体实现方法如下:Furthermore, the specific implementation method of step 2 in the present invention is as follows:
定义有界探索信号ue,将红方无人机系统模型式(5)可改写为:Define the bounded exploration signal ue , and rewrite the red team UAV system model (5) as follows:
则性能指标函数为:The performance index function is:
则性能指标函数式(7)关于时间的导数,表示为:Then the derivative of the performance index function (7) with respect to time is expressed as:
性能指标函数式(7)求的极小值时,满足如下贝尔曼方程:When the performance index function (7) is minimized, it satisfies the following Bellman equation:
其中r(j)=Q(x,t)+R(u,t);结合式(17)和式(18),可以得到:Where r(j) = Q (x, t) + R (u, t); Combining equation (17) and equation (18), we can get:
真实系统的最优控制量为:The optimal control quantity of the real system is:
通过式(20)反解出G,带入式(19)得到:By inversely solving G through equation (20), and substituting it into equation (19), we can obtain:
将式(21)两端从t0到t进行积分,得到:Integrating both ends of equation (21) from t0 to t, we obtain:
采用神经网络来近似代价函数和控制输入,即:A neural network is used to approximate the cost function and control input, namely:
其中,分别是评价网络和执行网络的理想神经网络权重;L1,L2分别是评价网络和执行网络的隐藏层神经元数量;分别是评价网络和执行网络的神经网络激活函数;分别是评价网络和执行网络的重建误差;in, are the ideal neural network weights of the evaluation network and the execution network respectively; L1 and L2 are the numbers of hidden layer neurons of the evaluation network and the execution network respectively; They are the neural network activation functions of the evaluation network and the execution network respectively; are the reconstruction errors of the evaluation network and the execution network respectively;
令评价网络和执行网络的估计值为:Let the estimated values of the evaluation network and the execution network be:
其中,分别是理想神经网络权重Wc,Wa的估计值;将式(24)代入式(22)可得残差项误差为:in, are the estimated values of the ideal neural network weights Wc andWa respectively; Substituting equation (24) into equation (22), the residual error can be obtained as follows:
其中为改进策略得到的控制量,其表达式为:in The control quantity obtained by improving the strategy is expressed as:
其中Ω为控制量的探索集合,由添加有界随机探索信号得到,且通过最小二乘算法优化即:Where Ω is the exploration set of the control quantity, given by Adding a bounded random exploration signal yields Optimization by least squares algorithm Right now:
通过最小二乘算法优化即:Optimization by least squares algorithm Right now:
进一步地,在本发明中在步骤S3中,离线神经网络模型训练算法包括如下步骤:Furthermore, in step S3 of the present invention, the offline neural network model training algorithm includes the following steps:
S31:通过给定不同的初始状态,可得到数据集{xk(t0)},初始化S31: By giving different initial states, we can get the data set {xk (t0 )}, initialize
S32:根据式(26)得到状态对应的控制量,即数据集S32: According to formula (26), the control quantity corresponding to the state is obtained, that is, the data set
S33:利用数据集根据式(27)更新得到根据式(28)更新得到S33: Utilizing Datasets According to formula (27), we can get According to formula (28), we can get
S34:如果或则终止算法;否则j=j+1,跳转步骤S32,其中∈a、∈c为收敛精度。S34: If or Then the algorithm is terminated; otherwise j=j+1, jump to step S32, where ∈a , ∈c are the convergence accuracy.
进一步地,在步骤S4中,通过在线模型训练算法在线更新神经网络的步骤如下:Furthermore, in step S4, the steps of updating the neural network online through the online model training algorithm are as follows:
S41:当前神经网络离权重为Wc,Wa,在线学习率为α,通过固定时间间隔δt采样得到实时数据集{x(t),u(t)},采集到多组数据后进入步骤S42;S41: The current neural network weights are Wc , Wa , the online learning rate is α, and a real-time data set {x(t), u(t)} is obtained by sampling at a fixed time interval δt. After collecting multiple sets of data, step S42 is entered;
S42:根据式(26)得到状态对应的控制量,即数据集S42: According to formula (26), the control quantity corresponding to the state is obtained, that is, the data set
S43:利用数据集根据式(27)计算根据式(28)计算S43: Utilizing Datasets According to formula (27) According to formula (28),
S44:在线更新神经网络权重,即跳转到步骤S41。S44: Update the neural network weights online, i.e. Jump to step S41.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1为现有技术中自适应动态规划结构图。FIG. 1 is a diagram of an adaptive dynamic programming structure in the prior art.
图2为本发明的流程示意图。FIG. 2 is a schematic diagram of the process of the present invention.
图3为本发明-实施例中无人机角度优势示意图。FIG. 3 is a schematic diagram of the angle advantage of a drone in an embodiment of the present invention.
图4为本发明-实施例中虚位移原理示意图。FIG. 4 is a schematic diagram of the virtual displacement principle in an embodiment of the present invention.
具体实施方式DETAILED DESCRIPTION
下面结合附图说明和实施例对本发明作进一步说明,本发明的方式包括但不仅限于以下实施例。The present invention is further described below in conjunction with the accompanying drawings and embodiments. The embodiments of the present invention include but are not limited to the following embodiments.
如图2所示,本发明公开的一种数据驱动的自适应动态规划空战决策方法,在“追踪-逃逸”场景中,存在一个追踪者和一个逃逸者,本实施例用红蓝双方表示追逃双方。文中以红方追击、蓝方逃逸来描述该问题,红方无人机通过机动减少与蓝方无人机之间的距离,同时避免被蓝方无人机捕获,即避免被蓝方无人机机头对准,从而处于劣势态势。As shown in FIG2 , the present invention discloses a data-driven adaptive dynamic programming air combat decision method. In the "pursuit-escape" scenario, there is a pursuer and an escaper. In this embodiment, the red and blue parties are used to represent the pursuit and escape parties. The article describes the problem by using the red party chasing and the blue party escaping. The red party UAV reduces the distance between it and the blue party UAV through maneuvers, while avoiding being captured by the blue party UAV, that is, avoiding being pointed at by the blue party UAV's nose, and thus being in a disadvantageous situation.
本实施例分别以红方追击-蓝方逃逸和红方逃逸-蓝方追击问题建立无人机追逃问题系统模型。This embodiment establishes a drone pursuit problem system model based on the red party pursuit-blue party escape and red party escape-blue party pursuit problems respectively.
首先,记红方无人机实时位置为Xr(t),蓝方无人机位置为Xb(t),则双方位置差为:First, let the real-time position of the red drone be Xr (t) and the position of the blue drone be Xb (t), then the position difference between the two is:
e=Xb(t)-Xr(t) (1)e=Xb (t)-Xr (t) (1)
则跟踪误差系统为:The tracking error system is:
式中,表示跟踪误差为双方位置差e关于时间的微分,为蓝方无人机实时位置Xb(t)关于时间的微分,为红方无人机实时位置Xr(t)关于时间的微分;In the formula, The tracking error is the differential of the position difference e between the two parties with respect to time, is the time derivative of the real-time position Xb (t) of the blue drone, is the time derivative of the real-time position of the Red Team UAV Xr (t);
假设红方追击方仅能测到蓝方无人机的三维运动速度,因此式(2)可以具体表示为:Assuming that the red pursuer can only measure the three-dimensional motion speed of the blue drone, equation (2) can be specifically expressed as:
则红方追击蓝方的系统模型表示为:Then the system model of the red party chasing the blue party is expressed as:
其中,Vr为红方无人机速度,单位为马赫,为红方无人机速度Vr关于时间的微分;χr为红方无人机航向角,单位为弧度,为红方无人机航向角χr关于时间的微分;γr为红方无人机航迹倾角,单位为弧度,为红方无人机航迹倾角关于时间的微分;距离误差ex,ey,ez,单位为千米,为距离误差关于ex,ey,ez时间的微分;g为重力加速度;Vc为声速,nx,ny,nz为红方无人机的过载控制量,通常情况下过载会受到饱和约束。Where Vr is the speed of the Red UAV in Mach. is the differential of the speedVr of the Red UAV with respect to time;χr is the heading angle of the Red UAV, in radians, is the differential of the heading angleχr of the red drone with respect to time;γr is the track inclination angle of the red drone, in radians, is the differential of the Red Team UAV’s track inclination with respect to time; the distance errorex ,ey , ez , is in kilometers, is the time differential of the distance error with respect toex ,ey ,ez ; g is the gravitational acceleration;Vc is the speed of sound,nx ,ny ,nz are the overload control quantities of the red team’s UAV. Usually, the overload will be subject to saturation constraints.
为了描述方便,将无人机的非线性连续状态空间方程简记为For the convenience of description, the nonlinear continuous state space equation of the UAV is simplified as
其中,x=[Vr,χr,γr,ex,ey,ez]T表示红方飞行器状态向量,表示红方飞行器状态向量x关于时间的微分,u=[nx,ny,nz]T表示红方飞行器控制向量,F(x),G(x)分别为:Where x = [Vr , χr , γr , ex , ey , ez ]T represents the state vector of the red aircraft, represents the differential of the red aircraft state vector x with respect to time, u = [nx ,ny ,nz ]T represents the red aircraft control vector, F(x) and G(x) are respectively:
由于无人机追逃问题是一个带饱和执行器的非线性最优控制问题,因此定义性能指标函数为:Since the UAV pursuit problem is a nonlinear optimal control problem with a saturated actuator, the performance index function is defined as:
其中,Q(x,t)为与状态相关的指标函数,R(u,t)为与控制量相关的指标函数;Among them, Q(x, t) is the index function related to the state, and R(u, t) is the index function related to the control amount;
S13,建立无人机角度优势函数,设红方无人机速度矢量为:S13, establish the UAV angle advantage function, assuming that the red UAV velocity vector is:
Vr=[cosγrcosχr,cosγrsinχr,sinγr]T,Vr = [cosγr cosχr , cosγr sinχr , sinγr ]T ,
蓝方无人机速度矢量为:The velocity vector of the blue drone is:
Vb=[cosγbcosχb,cosγbsinχb,sinγb]T,Vb = [cosγb cosχb , cosγb sinχb , sinγb ]T ,
红方无人机对蓝方蓝方无人机距离矢量为erb=[ex,ey,ez]T,如图3所示,其几何关系为:The distance vector of the red drone to the blue drone iserb = [ex ,ey ,ez ]T , as shown in Figure 3, and the geometric relationship is:
在空战过程中,期望αr,αb都尽可能小,以达到红方角度态势占优。以红方为例,当αr-(π-αb)<0,即αr+αb<π则红方攻击角度占优;相反,若αr+αb>π,则红方攻击角度处于劣势;当αr+αb=π,则红方攻击角度处于均势。设置角度优势函数:In the air combat process, it is expected that αr and αb are as small as possible to achieve the Red side's angular situation advantage. Taking the Red side as an example, when αr -(π-αb )<0, that is, αr +αb <π, the Red side's attack angle is advantageous; on the contrary, if αr +αb >π, the Red side's attack angle is at a disadvantage; when αr +αb =π, the Red side's attack angle is in balance. Set the angle advantage function:
Qα=cαr+(1-c)αb (9)Qα = c αr + (1-c) αb (9)
其中c=(αr+αb)/2π;通过权重c可动态调整角度αr,αb的优化级关系,当c<0.5时,红方攻击角度占优,应当重点优化αb,防止蓝方获得优势角度态势;当c>0.5时,红方攻击角度处于劣势,应当重点优化αr,使得红方获得优势角度态势。Wherein c=(αr +αb )/2π; the optimization level relationship of angles αr and αb can be dynamically adjusted through the weight c. When c<0.5, the red side has an advantage in attack angle, and αb should be optimized to prevent the blue side from gaining an advantage in angle. When c>0.5, the red side has a disadvantage in attack angle, and αr should be optimized to enable the red side to gain an advantage in angle.
在追踪问题中,红方的目标是缩短与蓝方的距离,因此定义距离优势函数为:In the tracking problem, the goal of the red team is to shorten the distance with the blue team, so the distance advantage function is defined as:
Qd=eTQ1e (10)Qd =eT Q1 e (10)
其中e=[ex,ey,ez]T,为正定矩阵;Where e=[ex , ey , ez ]T , is a positive definite matrix;
红方的状态指标函数可表示为:The state indicator function of the red team can be expressed as:
Q(x,t)=Qd+Q2Qα (11)Q(x, t) = Qd + Q2 Qα (11)
其中Q2为权重系数;Where Q2 is the weight coefficient;
为了满足控制受限要求,同时使得无人机在稳定飞行状态下,控制器也稳定,定义控制器指标函数为:In order to meet the control constraints and make the UAV stable in flight, the controller is also stable. The controller index function is defined as:
R(u,t)=(u-u0)TR(u-u0) (12)R(u,t)=(uu0 )T R(uu0 ) (12)
其中,为控制量权重系数,u0=[sinγr,0,cosγr]T为无人机稳定飞行下的控制量。in, is the control variable weight coefficient, u0 =[sinγr , 0, cosγr ]T is the control variable under stable flight of the UAV.
对于红方逃逸-蓝方追击问题模型建立,逃逸问题与追击问题的不同之处,在于目标函数与追击问题相反,为极大化双机距离。同时为躲避导弹,在无人机与导弹距离较小时,无人机需要做大机动改变航向和爬升角,躲避导弹。为解决极大化双机距离,采用“虚位移”方法,极小化本机反向位移与敌方飞行器之间的距离,即达到极大化本机位置与敌机位置的效果。For the establishment of the red team escape-blue team pursuit problem model, the difference between the escape problem and the pursuit problem is that the objective function is opposite to that of the pursuit problem, which is to maximize the distance between the two aircraft. At the same time, in order to avoid missiles, when the distance between the UAV and the missile is small, the UAV needs to make large maneuvers to change the heading and climb angle to avoid the missile. In order to maximize the distance between the two aircraft, the "virtual displacement" method is used to minimize the distance between the reverse displacement of the aircraft and the enemy aircraft, that is, to achieve the effect of maximizing the position of the aircraft and the position of the enemy aircraft.
如图4所示本机被敌机追击,我方想要极大化与敌机之间的距离,而对于本机速度矢量V方向相反的“虚位移速度”V’来说,为极小化虚位移与敌机之间的距离。虚位移为V’产生的位移量,即:As shown in Figure 4, the aircraft is being chased by an enemy aircraft. We want to maximize the distance between us and the enemy aircraft. For the "virtual displacement speed" V' in the opposite direction of the aircraft's velocity vector V, the virtual displacement is minimized. The distance between us and the enemy aircraft. The virtual displacement is the displacement generated by V', that is:
则红方逃逸-蓝方追击的系统模型表示为:The system model of the red party escaping and the blue party pursuing is expressed as:
其中符号及意义与追击问题相同。The symbols and meanings are the same as those in the pursuit problem.
通常来说,实际操作中无法获得精确的无人机系统模型,而现有基于数据的无模型自适应动态规划严重依赖于数据,无法在已有数据基础上进行策略改进。因此,本实施例采用无模型自适应动态规划求解上述无人机追逃问题,并采用有界探索信号对策略进行改进。Generally speaking, it is impossible to obtain an accurate drone system model in actual operation, and the existing data-based model-free adaptive dynamic programming is heavily dependent on data and cannot improve the strategy based on the existing data. Therefore, this embodiment uses model-free adaptive dynamic programming to solve the above drone pursuit problem, and uses bounded exploration signals to improve the strategy.
定义有界探索信号ue,将红方无人机系统模型式(5)可改写为:Define the bounded exploration signal ue , and rewrite the red team UAV system model (5) as follows:
则性能指标函数为:The performance index function is:
则性能指标函数式(7)关于时间的导数,表示为:Then the derivative of the performance index function (7) with respect to time is expressed as:
性能指标函数式(16)求的极小值时,满足如下贝尔曼方程:When the performance index function (16) is minimized, it satisfies the following Bellman equation:
其中r(j)=Q(x,t)+R(u,t);结合式(17)和式(18),可以得到:Where r(j) = Q (x, t) + R (u, t); Combining equation (17) and equation (18), we can get:
真实系统的最优控制量为:The optimal control quantity of the real system is:
通过式(20)反解出G,带入式(19)得到:By inversely solving G through equation (20), and substituting it into equation (19), we can obtain:
将式(21)两端从t0到t进行积分,得到:Integrating both ends of equation (21) from t0 to t, we obtain:
采用神经网络来近似代价函数和控制输入,即:A neural network is used to approximate the cost function and control input, namely:
其中,分别是评价网络和执行网络的理想神经网络权重;L1,L2分别是评价网络和执行网络的隐藏层神经元数量;分别是评价网络和执行网络的神经网络激活函数;分别是评价网络和执行网络的重建误差。in, are the ideal neural network weights of the evaluation network and the execution network respectively; L1 and L2 are the numbers of hidden layer neurons of the evaluation network and the execution network respectively; They are the neural network activation functions of the evaluation network and the execution network respectively; are the reconstruction errors of the evaluation network and the execution network, respectively.
令评价网络和执行网络的估计值为:Let the estimated values of the evaluation network and the execution network be:
其中,分别是理想神经网络权重Wc,Wa的估计值;将式(24)代入式(22)可得残差项误差为:in, are the estimated values of the ideal neural network weights Wc andWa respectively; Substituting equation (24) into equation (22), the residual error can be obtained as follows:
其中为改进策略得到的控制量,其表达式为:in The control quantity obtained by improving the strategy is expressed as:
其中Ω为控制量的探索集合,由添加有界随机探索信号得到,且通过最小二乘算法优化即:Where Ω is the exploration set of the control quantity, given by Adding a bounded random exploration signal yields Optimization by least squares algorithm Right now:
通过最小二乘算法优化即:Optimization by least squares algorithm Right now:
本实施例采用离线神经网络模型训练算法获得红方无人机和蓝方无人机实时控制率,并实时收集红方控制率信息和红蓝双方状态信息。具体包括:This embodiment uses an offline neural network model training algorithm to obtain the real-time control rate of the red drone and the blue drone, and collects the red drone control rate information and the red and blue drone status information in real time. Specifically including:
S31:通过给定不同的初始状态,可得到数据集{xk(t0)},初始化S31: By giving different initial states, we can get the data set {xk (t0 )}, initialize
S32:根据式(26)得到状态对应的控制量,即数据集S32: According to formula (26), the control quantity corresponding to the state is obtained, that is, the data set
S33:利用数据集根据式(27)更新得到根据式(28)更新得到S33: Utilizing Datasets According to formula (27), we can get According to formula (28), we can get
S34:如果或则终止算法;否则j=j+1,跳转步骤S32,其中∈a、∈c为收敛精度。S34: If or Then the algorithm is terminated; otherwise j=j+1, jump to step S32, where ∈a , ∈c are the convergence accuracy.
在本实施例中,每隔一段时间,通过在线模型训练算法在线更新神经网络,实现红方无人机和蓝方无人机在“追踪-逃逸”问题中的自适应动态规划的空战决策。具体包括:In this embodiment, the neural network is updated online at regular intervals through an online model training algorithm to achieve adaptive dynamic programming of air combat decisions between the red and blue drones in the "track-and-escape" problem. Specifically, it includes:
S41:当前神经网络离权重为Wc,Wa,在线学习率为α,通过固定时间间隔δt采样得到实时数据集{x(t),u(t)},采集到多组数据后进入步骤S42;S41: The current neural network weights are Wc , Wa , the online learning rate is α, and a real-time data set {x(t), u(t)} is obtained by sampling at a fixed time interval δt. After collecting multiple sets of data, step S42 is entered;
S42:根据式(26)得到状态对应的控制量,即数据集S42: According to formula (26), the control quantity corresponding to the state is obtained, that is, the data set
S43:利用数据集根据式(27)计算根据式(28)计算S43: Utilizing Datasets According to formula (27) According to formula (28),
S44:在线更新神经网络权重,即跳转到步骤S41。S44: Update the neural network weights online, i.e. Jump to step S41.
通过上述方法,本发明提升了本发明的在线自适应调整策略的能力,提升无人机空战决策在不同场景的适应能力。且本发明不依赖于飞行器系统模型,具有很强的泛化能力,可以推广到其他设备的控制技术上,如无人车、机械臂等多个应用场景。因此,与现有技术相比,本发明具有突出的实质性的特点和显著的进步。Through the above method, the present invention improves the ability of the online adaptive adjustment strategy of the present invention and improves the adaptability of drone air combat decision-making in different scenarios. In addition, the present invention does not rely on the aircraft system model and has a strong generalization ability, and can be extended to the control technology of other devices, such as unmanned vehicles, robotic arms and other application scenarios. Therefore, compared with the prior art, the present invention has outstanding substantial characteristics and significant progress.
上述实施例仅为本发明的优选实施方式之一,不应当用于限制本发明的保护范围,但凡在本发明的主体设计思想和精神上作出的毫无实质意义的改动或润色,其所解决的技术问题仍然与本发明一致的,均应当包含在本发明的保护范围之内。The above embodiment is only one of the preferred implementation modes of the present invention and should not be used to limit the protection scope of the present invention. Any changes or modifications that are made to the main design concept and spirit of the present invention and have no substantive significance, and the technical problems they solve are still consistent with the present invention, should be included in the protection scope of the present invention.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310861633.0ACN116880186B (en) | 2023-07-13 | 2023-07-13 | A data-driven adaptive dynamic programming method for air combat decision making |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310861633.0ACN116880186B (en) | 2023-07-13 | 2023-07-13 | A data-driven adaptive dynamic programming method for air combat decision making |
| Publication Number | Publication Date |
|---|---|
| CN116880186Atrue CN116880186A (en) | 2023-10-13 |
| CN116880186B CN116880186B (en) | 2024-04-16 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202310861633.0AActiveCN116880186B (en) | 2023-07-13 | 2023-07-13 | A data-driven adaptive dynamic programming method for air combat decision making |
| Country | Link |
|---|---|
| CN (1) | CN116880186B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118171742A (en)* | 2024-05-15 | 2024-06-11 | 南京理工大学 | Knowledge-data driven air combat target intention reasoning method and system based on residual estimation |
| CN118657170A (en)* | 2024-08-09 | 2024-09-17 | 西北工业大学 | A strategy iteration algorithm and device |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109085754A (en)* | 2018-07-25 | 2018-12-25 | 西北工业大学 | A kind of spacecraft neural network based is pursued and captured an escaped prisoner game method |
| CN112215283A (en)* | 2020-10-12 | 2021-01-12 | 中国人民解放军海军航空大学 | Close-range air combat intelligent decision method based on manned/unmanned aerial vehicle system |
| CN113050686A (en)* | 2021-03-19 | 2021-06-29 | 北京航空航天大学 | Combat strategy optimization method and system based on deep reinforcement learning |
| CN113095481A (en)* | 2021-04-03 | 2021-07-09 | 西北工业大学 | Air combat maneuver method based on parallel self-game |
| CN113791634A (en)* | 2021-08-22 | 2021-12-14 | 西北工业大学 | A decision-making method for multi-aircraft air combat based on multi-agent reinforcement learning |
| CN114330115A (en)* | 2021-10-27 | 2022-04-12 | 中国空气动力研究与发展中心计算空气动力研究所 | Neural network air combat maneuver decision method based on particle swarm search |
| CN115951709A (en)* | 2023-01-09 | 2023-04-11 | 中国人民解放军国防科技大学 | Multi-unmanned aerial vehicle air combat strategy generation method based on TD3 |
| CN116185059A (en)* | 2022-08-17 | 2023-05-30 | 西北工业大学 | Unmanned aerial vehicle air combat autonomous evasion maneuver decision-making method based on deep reinforcement learning |
| CN116187777A (en)* | 2022-12-28 | 2023-05-30 | 中国航空研究院 | Unmanned aerial vehicle air combat autonomous decision-making method based on SAC algorithm and alliance training |
| CN116400718A (en)* | 2023-04-06 | 2023-07-07 | 中国人民解放军空军航空大学 | Unmanned aerial vehicle short-distance air combat maneuver autonomous decision-making method, system, equipment and terminal |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109085754A (en)* | 2018-07-25 | 2018-12-25 | 西北工业大学 | A kind of spacecraft neural network based is pursued and captured an escaped prisoner game method |
| CN112215283A (en)* | 2020-10-12 | 2021-01-12 | 中国人民解放军海军航空大学 | Close-range air combat intelligent decision method based on manned/unmanned aerial vehicle system |
| CN113050686A (en)* | 2021-03-19 | 2021-06-29 | 北京航空航天大学 | Combat strategy optimization method and system based on deep reinforcement learning |
| CN113095481A (en)* | 2021-04-03 | 2021-07-09 | 西北工业大学 | Air combat maneuver method based on parallel self-game |
| US20220315219A1 (en)* | 2021-04-03 | 2022-10-06 | Northwestern Polytechnical University | Air combat maneuvering method based on parallel self-play |
| CN113791634A (en)* | 2021-08-22 | 2021-12-14 | 西北工业大学 | A decision-making method for multi-aircraft air combat based on multi-agent reinforcement learning |
| CN114330115A (en)* | 2021-10-27 | 2022-04-12 | 中国空气动力研究与发展中心计算空气动力研究所 | Neural network air combat maneuver decision method based on particle swarm search |
| CN116185059A (en)* | 2022-08-17 | 2023-05-30 | 西北工业大学 | Unmanned aerial vehicle air combat autonomous evasion maneuver decision-making method based on deep reinforcement learning |
| CN116187777A (en)* | 2022-12-28 | 2023-05-30 | 中国航空研究院 | Unmanned aerial vehicle air combat autonomous decision-making method based on SAC algorithm and alliance training |
| CN115951709A (en)* | 2023-01-09 | 2023-04-11 | 中国人民解放军国防科技大学 | Multi-unmanned aerial vehicle air combat strategy generation method based on TD3 |
| CN116400718A (en)* | 2023-04-06 | 2023-07-07 | 中国人民解放军空军航空大学 | Unmanned aerial vehicle short-distance air combat maneuver autonomous decision-making method, system, equipment and terminal |
| Title |
|---|
| JINGJING XU, ET AL.: "Deep Neural Network-Based Footprint Prediction and Attack Intention Inference of Hypersonic Glide Vehicles", 《MATHEMATICS 》, vol. 11, no. 1, pages 1 - 24* |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118171742A (en)* | 2024-05-15 | 2024-06-11 | 南京理工大学 | Knowledge-data driven air combat target intention reasoning method and system based on residual estimation |
| CN118657170A (en)* | 2024-08-09 | 2024-09-17 | 西北工业大学 | A strategy iteration algorithm and device |
| Publication number | Publication date |
|---|---|
| CN116880186B (en) | 2024-04-16 |
| Publication | Publication Date | Title |
|---|---|---|
| CN113095481B (en) | Air combat maneuver method based on parallel self-game | |
| CN116880186A (en) | Data-driven self-adaptive dynamic programming air combat decision method | |
| CN113625740B (en) | Unmanned aerial vehicle air combat game method based on transfer learning pigeon swarm optimization | |
| CN112198892B (en) | A multi-UAV intelligent cooperative penetration countermeasure method | |
| CN111666631A (en) | Unmanned aerial vehicle maneuvering decision method combining hesitation fuzzy and dynamic deep reinforcement learning | |
| CN112906233B (en) | Distributed near-end strategy optimization method based on cognitive behavior knowledge and application thereof | |
| CN108319286A (en) | A kind of unmanned plane Air Combat Maneuvering Decision Method based on intensified learning | |
| CN116501086B (en) | A method for aircraft autonomous avoidance decision-making based on reinforcement learning | |
| CN114063644B (en) | Autonomous decision-making method for unmanned combat aircraft in air combat based on pigeon group reverse confrontation learning | |
| CN111461294B (en) | Intelligent aircraft brain cognitive learning method facing dynamic game | |
| CN113297506B (en) | Brain-like relative navigation method based on social position cells/grid cells | |
| CN108549210A (en) | Multiple no-manned plane based on BP neural network PID control cooperates with flying method | |
| CN118444578B (en) | Adaptive UAV control method and system based on back propagation neural network | |
| CN116991074B (en) | Close-range air combat maneuver decision optimization method under intelligent weight | |
| CN116700079A (en) | Unmanned aerial vehicle countermeasure occupation maneuver control method based on AC-NFSP | |
| CN118778678A (en) | Cooperative confrontation method for proximal strategy optimization of drone swarm based on counterfactual baseline | |
| CN117130379A (en) | A UAV air combat attack method based on LQR short-range visual range | |
| CN117452963A (en) | UAV path planning method and system based on improved particle swarm algorithm | |
| Wen et al. | Optimized fuzzy attitude control of quadrotor unmanned aerial vehicle using adaptive reinforcement learning strategy | |
| Chen et al. | Deep reinforcement learning based strategy for quadrotor UAV pursuer and evader problem | |
| CN114995129B (en) | A distributed optimal event-triggered collaborative guidance method | |
| CN114492677B (en) | Unmanned aerial vehicle countermeasure method and device | |
| CN119717842A (en) | Method and system for collaborative formation of multiple unmanned aerial vehicles in complex dynamic environment based on MASAC algorithm | |
| CN119739189A (en) | Unmanned group game confrontation strategy generation method and device | |
| CN116610139B (en) | Unmanned plane cluster distributed cooperative guidance law based on multi-agent reinforcement learning method |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| CB03 | Change of inventor or designer information | Inventor after:Li Bin Inventor after:Ning Zhaoke Inventor after:Shi Mingming Inventor after:Li Qingliang Inventor before:Li Bin Inventor before:Ning Zhaoke Inventor before:Shi Mingming Inventor before:Li Qingliang Inventor before:Tao Chenggang Inventor before:Sun Shaoshan Inventor before:Li Dao | |
| CB03 | Change of inventor or designer information |