{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,9]],"date-time":"2026-04-09T14:12:23Z","timestamp":1775743943355,"version":"3.50.1"},"reference-count":35,"publisher":"Springer Science and Business Media LLC","issue":"6","license":[{"start":{"date-parts":[[2024,7,22]],"date-time":"2024-07-22T00:00:00Z","timestamp":1721606400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,7,22]],"date-time":"2024-07-22T00:00:00Z","timestamp":1721606400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"the Graduate Research and Practice Innovation Program Project of Jiangsu Province","award":["KYCX23_3400"],"award-info":[{"award-number":["KYCX23_3400"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2024,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In air combat missions, unmanned combat aerial vehicles (UCAVs) must take strategic actions to establish combat advantages, enabling effective tracking and attacking of enemy UCAVs. Currently, a lot of reinforcement learning algorithms are applied to the air combat mission of unmanned fighter aircraft. However, most algorithms can only select policies based on the current state of both sides. This leads to the inability to effectively track and attack when the enemy performs large angle maneuvering. Additionally, these algorithms cannot adapt to different situations, resulting in the unmanned fighter aircraft being at a disadvantage in some cases. To solve these problems, this paper proposes predictive air combat decision model with segmented reward allocation for air combat tracking and attacking. On the basis of the air combat environment, we propose the prediction soft actor-critic (Pre-SAC) algorithm, which combines the prediction of enemy states with the states of UCAV for model training. This enables the UCAV to predict the next move of the enemy UCAV in advance and establish a greater air combat advantage for us. Furthermore, by adopting a segmented reward allocation model and combining it with the Pre-SAC algorithm, we propose the segmented reward allocation soft actor-critic (Sra-SAC) algorithm, which solves the problem of UCAVs being unable to adapt to different situations. The results show that the prediction-based segmented reward allocation the Sra-SAC algorithm outperforms the traditional soft actor-critic (SAC) algorithm in terms of overall reward, travel distance, and relative advantage.<\/jats:p>","DOI":"10.1007\/s40747-024-01556-3","type":"journal-article","created":{"date-parts":[[2024,7,22]],"date-time":"2024-07-22T04:01:39Z","timestamp":1721620899000},"page":"7513-7530","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Predictive air combat decision model with segmented reward allocation"],"prefix":"10.1007","volume":"10","author":[{"given":"Yundi","family":"Li","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3982-7860","authenticated-orcid":false,"given":"Yinlong","family":"Yuan","sequence":"additional","affiliation":[]},{"given":"Yun","family":"Cheng","sequence":"additional","affiliation":[]},{"given":"Liang","family":"Hua","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,7,22]]},"reference":[{"key":"1556_CR1","doi-asserted-by":"publisher","DOI":"10.1016\/j.futures.2021.102848","volume":"134","author":"J Jordan","year":"2021","unstructured":"Jordan J (2021) The future of unmanned combat aerial vehicles: an analysis using the three horizons framework. Futures 134:102848. https:\/\/doi.org\/10.1016\/j.futures.2021.102848","journal-title":"Futures"},{"key":"1556_CR2","doi-asserted-by":"publisher","DOI":"10.1016\/j.conengprac.2023.105513","volume":"135","author":"XN Song","year":"2023","unstructured":"Song XN, Wu CL, Stojanovic V, Zhang W, Song S (2023) 1 bit encoding-decoding-based event-triggered fixed-time adaptive control for unmanned surface vehicle with guaranteed tracking performance. Control Eng Pract 135:105513. https:\/\/doi.org\/10.1016\/j.conengprac.2023.105513","journal-title":"Control Eng Pract"},{"key":"1556_CR3","doi-asserted-by":"publisher","DOI":"10.1016\/j.engappai.2023.107832","volume":"131","author":"XN Song","year":"2024","unstructured":"Song XN, Wu CL, Song S, Stojanovic V, Tejado I (2024) Fuzzy wavelet neural adaptive finite-time self-triggered fault-tolerant control for a quadrotor unmanned aerial vehicle with scheduled performance. Eng Appl Artif Intell 131:107832. https:\/\/doi.org\/10.1016\/j.engappai.2023.107832","journal-title":"Eng Appl Artif Intell"},{"key":"1556_CR4","doi-asserted-by":"publisher","first-page":"1697","DOI":"10.1016\/j.dt.2021.09.014","volume":"18","author":"Y Li","year":"2021","unstructured":"Li Y, Shi J, Jiang W, Zhang W, Lyu Y (2021) Autonomous maneuver decision-making for a UCAV in short-range aerial combat based on an MS-DDQN algorithm. Defence Technol 18:1697\u20131714. https:\/\/doi.org\/10.1016\/j.dt.2021.09.014","journal-title":"Defence Technol"},{"key":"1556_CR5","doi-asserted-by":"publisher","first-page":"2397","DOI":"10.3906\/elk-1201-50","volume":"21","author":"ARAR Of","year":"2013","unstructured":"Of ARAR, Ayan K (2013) A flexible rule-based framework for pilot performance analysis in air combat simulation systems. Turk J Electr Eng Comput Sci 21:2397\u20132415. https:\/\/doi.org\/10.3906\/elk-1201-50","journal-title":"Turk J Electr Eng Comput Sci"},{"key":"1556_CR6","doi-asserted-by":"publisher","unstructured":"Chappell AR (1992) Knowledge-based reasoning in the paladin tactical decision generation system. In: Proceedings of 11th IEEE\/AIAA digital avionics systems conference, pp 155\u2013160. https:\/\/doi.org\/10.1109\/DASC.1992.282166","DOI":"10.1109\/DASC.1992.282166"},{"key":"1556_CR7","doi-asserted-by":"publisher","first-page":"2167-0374","DOI":"10.4172\/2167-0374.1000144","volume":"6","author":"N Ernest","year":"2016","unstructured":"Ernest N, Carroll D, Schumacher C, Clark M, Cohen K, Lee G (2016) Genetic fuzzy based artificial intelligence for unmanned combat aerial vehicle control in simulated air combat missions. J Defense Manag 6:2167\u20130374. https:\/\/doi.org\/10.4172\/2167-0374.1000144","journal-title":"J Defense Manag"},{"key":"1556_CR8","doi-asserted-by":"publisher","unstructured":"Rao N, Kashyap S, Gopalaratnam G, Mandal D (2011) Situation and threat assessment in BVR combat. In: AIAA guidance, navigation, and control conference, vol 6241. https:\/\/doi.org\/10.2514\/6.2011-6241","DOI":"10.2514\/6.2011-6241"},{"key":"1556_CR9","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1038\/nature14236","volume":"518","author":"V Mnih","year":"2015","unstructured":"Mnih V, Kavukcuoglu K, Silver D, Rusu AA et al (2015) Human-level control through deep reinforcement learning. Nature 518:529\u2013533. https:\/\/doi.org\/10.1038\/nature14236","journal-title":"Nature"},{"key":"1556_CR10","doi-asserted-by":"publisher","unstructured":"Lillicrap TP, Hunt JJ, Pritzel A, et al (2016) Continuous control with deep reinforcement learning. In: International conference on learning representations. https:\/\/doi.org\/10.48550\/arXiv.1509.02971","DOI":"10.48550\/arXiv.1509.02971"},{"key":"1556_CR11","doi-asserted-by":"publisher","unstructured":"Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International conference on machine learning, pp 1861\u20131870. https:\/\/doi.org\/10.48550\/arXiv.1801.01290","DOI":"10.48550\/arXiv.1801.01290"},{"key":"1556_CR12","doi-asserted-by":"publisher","unstructured":"Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. https:\/\/doi.org\/10.48550\/arXiv.1707.06347","DOI":"10.48550\/arXiv.1707.06347"},{"key":"1556_CR13","doi-asserted-by":"publisher","first-page":"1228","DOI":"10.1109\/TNNLS.2020.3041469","volume":"33","author":"Y Zhu","year":"2020","unstructured":"Zhu Y, Zhao D (2020) Online minimax Q network learning for two-player zero-sum markov games. IEEE Trans Neural Netw Learn Syst 33:1228\u20131241. https:\/\/doi.org\/10.1109\/TNNLS.2020.3041469","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"key":"1556_CR14","doi-asserted-by":"publisher","first-page":"354","DOI":"10.1038\/nature24270","volume":"550","author":"D Silver","year":"2017","unstructured":"Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A et al (2017) Mastering the game of Go without human knowledge. Nature 550:354\u2013359. https:\/\/doi.org\/10.1038\/nature24270","journal-title":"Nature"},{"key":"1556_CR15","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1038\/nature14236","volume":"518","author":"V Mnih","year":"2015","unstructured":"Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518:529\u2013533. https:\/\/doi.org\/10.1038\/nature14236","journal-title":"Nature"},{"key":"1556_CR16","doi-asserted-by":"publisher","first-page":"350","DOI":"10.1038\/s41586-019-1724-z","volume":"575","author":"O Vinyals","year":"2019","unstructured":"Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P et al (2019) Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575:350\u2013354. https:\/\/doi.org\/10.1038\/s41586-019-1724-z","journal-title":"Nature"},{"issue":"4","key":"1556_CR17","doi-asserted-by":"publisher","first-page":"2093","DOI":"10.1109\/TNNLS.2021.3105869","volume":"34","author":"J Chai","year":"2023","unstructured":"Chai J, Li W, Zhu Y, Zhao D, Ma Z, Sun K, Ding J (2023) UNMAS: multiagent reinforcement learning for unshaped cooperative scenarios. IEEE Trans Neural Netw Learn Syst 34(4):2093\u20132104. https:\/\/doi.org\/10.1109\/TNNLS.2021.3105869","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"key":"1556_CR18","doi-asserted-by":"publisher","unstructured":"Rashid T, Samvelyan M, Schroeder C, Farquhar G, Foerster J, Whiteson S (2020) Monotonic value function factorisation for deep multi-agent reinforcement learning. J Mach Learn Res 21(178):1-51. https:\/\/doi.org\/10.48550\/arXiv.2003.08839","DOI":"10.48550\/arXiv.2003.08839"},{"issue":"10","key":"1556_CR19","doi-asserted-by":"publisher","first-page":"6443","DOI":"10.1109\/TCYB.2022.3179775","volume":"53","author":"Y Zhu","year":"2022","unstructured":"Zhu Y, Li W, Zhao M, Hao J, Zhao D (2022) Empirical policy optimization for n-player markov games. IEEE Trans Cybern 53(10):6443\u20136455. https:\/\/doi.org\/10.1109\/TCYB.2022.3179775","journal-title":"IEEE Trans Cybern"},{"issue":"8","key":"1556_CR20","doi-asserted-by":"publisher","first-page":"4823","DOI":"10.1109\/TSMC.2021.3105663","volume":"52","author":"Y Zhang","year":"2021","unstructured":"Zhang Y, Zhao B, Liu D, Zhang S (2021) Event-triggered control of discrete-time zero-sum games via deterministic policy gradient adaptive dynamic programming. IEEE Trans Syst Man Cybern Syst 52(8):4823\u20134835. https:\/\/doi.org\/10.1109\/TSMC.2021.3105663","journal-title":"IEEE Trans Syst Man Cybern Syst"},{"key":"1556_CR21","doi-asserted-by":"publisher","unstructured":"Li W, Zhu Y, Zhao D (2022) Missile guidance with assisted deep reinforcement learning for head-on interception of maneuvering target. Complex Intell Syst 8(2):1205\u20131216. https:\/\/doi.org\/10.1007\/s40747-021-00577-6","DOI":"10.1007\/s40747-021-00577-6"},{"issue":"2","key":"1556_CR22","doi-asserted-by":"publisher","first-page":"29","DOI":"10.3969\/j.issn.1006-141X.2018.02.06","volume":"49","author":"LJ Ding","year":"2018","unstructured":"Ding LJ, Yang QM et al (2018) Research on air combat maneuver decision of UAVs based on reinforcement learning. Avion Technol 49(2):29\u201335. https:\/\/doi.org\/10.3969\/j.issn.1006-141X.2018.02.06","journal-title":"Avion Technol"},{"key":"1556_CR23","doi-asserted-by":"publisher","first-page":"274","DOI":"10.1007\/978-981-10-6463-0_24","volume":"751","author":"P Liu","year":"2017","unstructured":"Liu P, Ma Y (2017) A deep reinforcement learning based intelligent decision method for UCAV air combat. Asian Simul Conf 751:274\u2013286. https:\/\/doi.org\/10.1007\/978-981-10-6463-0_24","journal-title":"Asian Simul Conf"},{"key":"1556_CR24","doi-asserted-by":"publisher","unstructured":"Li L, Zhou Z, Chai J, Liu Z, Zhu Y, Yi J (2022) Learning continuous 3-DOF air-to-air close-in combat strategy using proximal policy optimization. In: IEEE conference on games (COG), pp 616\u2013619. https:\/\/doi.org\/10.1109\/CoG51982.2022.9893690","DOI":"10.1109\/CoG51982.2022.9893690"},{"issue":"3297","key":"1556_CR25","doi-asserted-by":"publisher","first-page":"12","DOI":"10.1016\/S0262-4079(20)31477-9","volume":"245","author":"D Hambling","year":"2020","unstructured":"Hambling D (2020) AI outguns a human fighter pilot. New Sci 245(3297):12. https:\/\/doi.org\/10.1016\/S0262-4079(20)31477-9","journal-title":"New Sci"},{"key":"1556_CR26","doi-asserted-by":"publisher","unstructured":"Kulkarni TD, Narasimhan KR, Saeedi A et al (2016) Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation. In: Advances in neural information processing systems, pp 3675\u20133683. https:\/\/doi.org\/10.48550\/arXiv.1604.06057","DOI":"10.48550\/arXiv.1604.06057"},{"key":"1556_CR27","doi-asserted-by":"publisher","unstructured":"Vezhnevets AS, Silver D, Kavukcuoglu K, et al (2017) FeUdal networks for hierarchical reinforcement learning. https:\/\/doi.org\/10.48550\/arXiv.1703.01161","DOI":"10.48550\/arXiv.1703.01161"},{"key":"1556_CR28","doi-asserted-by":"publisher","unstructured":"Levy A, Konidaris GD, Platt R et al (2019) Learning multi-level hierarchies with hindsight. In: International conference on learning representations. https:\/\/doi.org\/10.48550\/arXiv.1712.00948","DOI":"10.48550\/arXiv.1712.00948"},{"key":"1556_CR29","doi-asserted-by":"publisher","unstructured":"Bacon PL, Harb J, Precup D (2017) The option-critic architecture. In: Proceedings of the national conference on artificial intelligence (AAAI). https:\/\/doi.org\/10.1609\/aaai.v31i1.10916","DOI":"10.1609\/aaai.v31i1.10916"},{"key":"1556_CR30","doi-asserted-by":"publisher","unstructured":"Tessler C, Givony S, Zahavy T et al (2017) A deep hierarchical approach to lifelong learning in minecraft. In: Proceedings of the national conference on artificial intelligence (AAAI). https:\/\/doi.org\/10.1609\/aaai.v31i1.10744","DOI":"10.1609\/aaai.v31i1.10744"},{"key":"1556_CR31","doi-asserted-by":"publisher","unstructured":"Harb J, Bacon PL, Klissarov M et al (2018) When waiting is not an option: learning options with a deliberation cost. National conference on artificial intelligence. In: Proceedings of the national conference on artificial intelligence (AAAI). https:\/\/doi.org\/10.1609\/aaai.v32i1.11831","DOI":"10.1609\/aaai.v32i1.11831"},{"issue":"9","key":"1556_CR32","doi-asserted-by":"publisher","first-page":"5417","DOI":"10.1109\/TSMC.2023.3270444","volume":"53","author":"JJ Chai","year":"2023","unstructured":"Chai JJ, Chen WZ, Zhu YH, Yao ZX, Zhao DB (2023) A hierarchical deep reinforcement learning framework for 6-DOF UCAV air-to-air combat. IEEE Trans Syst Man Cybern Syst 53(9):5417\u20135429. https:\/\/doi.org\/10.1109\/TSMC.2023.3270444","journal-title":"IEEE Trans Syst Man Cybern Syst"},{"key":"1556_CR33","doi-asserted-by":"publisher","first-page":"2","DOI":"10.1007\/s10846-023-01953-9","volume":"109","author":"YL Yuan","year":"2023","unstructured":"Yuan YL, Yang J, Yu ZL et al (2023) Hierarchical goal-guided learning for the evasive Maneuver of fixed-wing UAVs based on deep reinforcement learning. J Intell Robot Syst 109:2. https:\/\/doi.org\/10.1007\/s10846-023-01953-9","journal-title":"J Intell Robot Syst"},{"key":"1556_CR34","doi-asserted-by":"publisher","first-page":"7451","DOI":"10.1007\/s40747-023-01135-y","volume":"9","author":"ZL Peng","year":"2023","unstructured":"Peng ZL, Song XN, Song S, Stojanovic V (2023) Hysteresis quantified control for switched reaction-diffusion systems and its application. Complex Intell Syst 9:7451\u20137460. https:\/\/doi.org\/10.1007\/s40747-023-01135-y","journal-title":"Complex Intell Syst"},{"issue":"5","key":"1556_CR35","doi-asserted-by":"publisher","first-page":"1673","DOI":"10.3969\/j.issn.1673-3819.2022.05.005","volume":"44","author":"B Li","year":"2022","unstructured":"Li B, Bai S, Meng B, Liang S, Li Z (2022) Autonomous air combat decision-making algorithm of UAVs based on SAC algorithm. Command Control Simul 44(5):1673\u20133819. https:\/\/doi.org\/10.3969\/j.issn.1673-3819.2022.05.005","journal-title":"Command Control Simul"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-024-01556-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-024-01556-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-024-01556-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,16]],"date-time":"2024-10-16T22:07:41Z","timestamp":1729116461000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-024-01556-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,22]]},"references-count":35,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,12]]}},"alternative-id":["1556"],"URL":"https:\/\/doi.org\/10.1007\/s40747-024-01556-3","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,7,22]]},"assertion":[{"value":"27 February 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 July 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 July 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors do not have any conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical approval"}}]}}