Abstract
In this paper, a UAV swarm engagement scenario is considered, where the defender swarm tries to intercept the attacker swarm cooperatively to prevent it from entering into the target area. Different from the previous attack–defense strategy on deep reinforcement learning, we remove the assumption that the swarms have perfect knowledge of situation information of both sides and the environment. On this basis, a double experience pool strategic interaction Q-learning (DEP-SIQ) swarm attack and defense algorithm is proposed, which makes the dimension of the network input reduced and the UAV with the same task use the same network. The algorithm sets up different experience pools for the attacker and the defender, respectively. During training, both sides take samples from their own experience pool to train their own network. The estimation reward of the UAV is decomposed into the sum of the interaction reward values with other friendly UAVs, which is effectively suitable for large-scale swarm. Simulation experiments shows the feasibility of the proposed algorithm. Compare with other algorithms, the proposed DEP-SIQ algorithm has a faster learning efficiency, higher win rate and better applicability.
This is a preview of subscription content,log in via an institution to check access.
Access this article
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (Japan)
Instant access to the full article PDF.











Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
All relevant data are within the paper.
References
Duan HB, Huo MZ, Fan YM (2018) Flight verification of multiple UAVs collaborative air combat imitating the intelligent behavior in hawks. Control Theory Appl.https://doi.org/10.7641/CTA.2018.80433. (Chinese)
Gupta JK, Egorov M, Kochenderfer M (2017) Cooperative multi-agent control using deep reinforcement learning. In: Autonomous agents and multiagent systems: AAMAS 2017 workshops, best papers, São Paulo, Brazil, May 8–12, 2017, Revised Selected Papers 16, pp 66–83. Springer International Publishing.https://doi.org/10.1007/978-3-319-71682-4_5
Jia YN, Tian SY, Li Q (2020) Recent development of unmanned aerial vehicle swarms. J Aeronaut.https://doi.org/10.7527/S10006893.2019.23738. (Chinese)
Kouzeghar M, Song Y, Meghjani M, Bouffanais R (2023) Multi-target pursuit by a decentralized heterogeneous UAV swarm using deep multi-agent reinforcement learning. arxiv preprinthttp://arxiv.org/abs/2303.01799.https://doi.org/10.48550/arXiv.2303.01799
Lamont GB, Slear JN, Melendez K (2007) UAV swarm mission planning and routing using multi-objective evolutionary algorithms. In: 2007 IEEE symposium on computational intelligence in multi-criteria decision-making. IEEE, pp 10–20.https://doi.org/10.1109/MCDM.2007.369410
Luo DL, Zhang HY, Xie RZ, Wu SX (2015) Unmanned aerial vehicles swarm conflict based on multi-agent system. Control Theory Appl.https://doi.org/10.1360/zf2011-41-5562. (Chinese)
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533.https://doi.org/10.1038/nature14236
Nowak DJ, Price I, Lamont GB (2007) Self organized UAV swarm planning optimization for search and destroy using SWARMFARE simulation. In: 2007 winter simulation conference. IEEE, pp 1315–1323.https://doi.org/10.1109/WSC.2007.4419738
Shuprajhaa T, Sujit SK, Srinivasan K (2022) Reinforcement learning based adaptive PID controller design for control of linear/nonlinear unstable processes. Appl Soft Comput 128:109450.https://doi.org/10.1016/j.asoc.2022.109450
Sunehag P, Lever G, Gruslys A, Czarnecki WM, Zambaldi V, Jaderberg M et al (2017) Value-decomposition networks for cooperative multi-agent learning. arxiv preprinthttp://arxiv.org/abs/1706.05296.https://doi.org/10.48550/arXiv.1706.05296
Tampuu A, Matiisen T, Kodelja D, Kuzovkin I, Korjus K, Aru J et al (2017) Multiagent cooperation and competition with deep reinforcement learning. PLoS ONE 12(4):e0172395.https://doi.org/10.1371/journal.pone.0172395
Wang B, Li S, Gao X, Xie T (2021) UAV swarm confrontation using hierarchical multiagent reinforcement learning. Int J Aerosp Eng 2021:1–12.https://doi.org/10.1155/2021/3360116
Xiang L, Xie T (2020) Research on UAV swarm confrontation task based on MADDPG algorithm. In: 2020 5th international conference on mechanical, control and computer engineering (ICMCCE). IEEE, pp 1513–1518.https://doi.org/10.1109/icmcce51767.2020.00332
Xiong J, Wang Q, Yang Z, Sun P, Han L, Zheng Y et al (2018) Parametrized deep q-networks learning: reinforcement learning with discrete-continuous hybrid action space.https://doi.org/10.48550/arXiv.1810.06394. arxiv preprinthttp://arxiv.org/abs/1810.06394
Yang Y, Luo R, Li M, Zhou M, Zhang W, Wang J (2018) Mean field multi-agent reinforcement learning. In: International conference on machine learning, pp 5571–5580. PMLR
Yu C, Velu A, Vinitsky E, Gao J, Wang Y, Bayen A, Wu Y (2022) The surprising effectiveness of PPO in cooperative multi-agent games. Adv Neural Inf Process Syst 35:24611–24624
Zhang K, Yang Z, Liu H, Zhang T, Basar T (2018) Fully decentralized multi-agent reinforcement learning with networked agents. In: International conference on machine learning. PMLR, pp 5872–5881.https://doi.org/10.1631/FITEE.1900661
Zhang L, Yu X, Zhang S (2021) Research on collaborative and confrontation of UAV swarms based on SAC-OD rules. In: Proceedings of the 4th international conference on information management and management science, pp 273–278.https://doi.org/10.1145/3485190.3485232
Zheng Z, Ruan L, Zhu M, Guo X (2020) Reinforcement learning control for underactuated surface vessel with output error constraints and uncertainties. Neurocomputing 399:479–490.https://doi.org/10.1016/j.neucom.2020.03.021
Zhou Y, Liu Z, Shi H, Li S, Ning N, Liu F, Gao X (2023) Cooperative multi-agent target searching: a deep reinforcement learning approach based on parallel hindsight experience replay. Complex Intell Syst.https://doi.org/10.1007/S40747-023-00985-W
Zhu J, Fu X, Qiao Z (2022) UAVs Maneuver decision-making method based on transfer reinforcement learning. Comput Intell Neurosci.https://doi.org/10.1155/2022/2399796
Funding
This work was supported in part by the Aeronautical Science Foundation of China under Grant 2020Z023053001.
Author information
Authors and Affiliations
School of Electronics and Information, Northwestern Polytechnical University, Xi’an, China
Xiaowei Fu, Zhe Qiao & Zhe Xu
- Xiaowei Fu
You can also search for this author inPubMed Google Scholar
- Zhe Qiao
You can also search for this author inPubMed Google Scholar
- Zhe Xu
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toXiaowei Fu.
Ethics declarations
Conflict of interest
All authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Fu, X., Qiao, Z. & Xu, Z. Attack–defense strategy of UAV swarm based on DEP-SIQ in the active target defense scenario.Soft Comput28, 10463–10473 (2024). https://doi.org/10.1007/s00500-024-09826-5
Accepted:
Published:
Issue Date:
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative