{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T01:44:46Z","timestamp":1773193486441,"version":"3.50.1"},"reference-count":42,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2024,6,18]],"date-time":"2024-06-18T00:00:00Z","timestamp":1718668800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Intell. Syst. Technol."],"published-print":{"date-parts":[[2024,8,31]]},"abstract":"<jats:p>\n            Artificial Intelligence (AI) has achieved a wide range of successes in autonomous air combat decision-making recently. Previous research demonstrated that AI-enabled air combat approaches could even acquire beyond human-level capabilities. However, there remains a lack of evidence regarding two major difficulties. First, the existing methods with fixed decision intervals are mostly devoted to solving what to act but merely pay attention to when to act, which occasionally misses optimal decision opportunities. Second, the method of an expert-crafted finite maneuver library leads to a lack of tactics diversity, which is vulnerable to an opponent equipped with new tactics. In view of this, we propose a novel Deep Reinforcement Learning (DRL) and prior knowledge hybrid autonomous air combat tactics discovering algorithm, namely deep\n            <jats:bold>E<\/jats:bold>\n            xcitatory-i\n            <jats:bold>N<\/jats:bold>\n            hibitory f\n            <jats:bold>ACT<\/jats:bold>\n            or\n            <jats:bold>I<\/jats:bold>\n            zed maneu\n            <jats:bold>VE<\/jats:bold>\n            r (\n            <jats:bold>ENACTIVE<\/jats:bold>\n            ) learning. The algorithm consists of two key modules, i.e., ENHANCE and FACTIVE. Specifically, ENHANCE learns to adjust the air combat decision-making intervals and appropriately seize key opportunities. FACTIVE factorizes maneuvers and then jointly optimizes them with significant tactics diversity increments. Extensive experimental results reveal that the proposed method outperforms state-of-the-art algorithms with a 62% winning rate and further obtains a margin of a 2.85-fold increase in terms of global tactic space coverage. It also demonstrates that a variety of discovered air combat tactics are comparable to human experts\u2019 knowledge.\n          <\/jats:p>","DOI":"10.1145\/3653979","type":"journal-article","created":{"date-parts":[[2024,3,27]],"date-time":"2024-03-27T11:53:32Z","timestamp":1711540412000},"page":"1-28","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Discovering Expert-Level Air Combat Knowledge via Deep Excitatory-Inhibitory Factorized Reinforcement Learning"],"prefix":"10.1145","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8519-4750","authenticated-orcid":false,"given":"Hai Yin","family":"Piao","sequence":"first","affiliation":[{"name":"School of electronics and information, Northwestern Polytechnical University, Xi'an, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7489-7082","authenticated-orcid":false,"given":"Shengqi","family":"Yang","sequence":"additional","affiliation":[{"name":"SADRI Institute, Shenyang, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7835-9556","authenticated-orcid":false,"given":"Hechang","family":"Chen","sequence":"additional","affiliation":[{"name":"Jilin University, Changchun, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-1154-657X","authenticated-orcid":false,"given":"Junnan","family":"Li","sequence":"additional","affiliation":[{"name":"SADRI Institute, Shenyang, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0004-9934","authenticated-orcid":false,"given":"Jin","family":"Yu","sequence":"additional","affiliation":[{"name":"SADRI Institute, Shenyang, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3553-7062","authenticated-orcid":false,"given":"Xuanqi","family":"Peng","sequence":"additional","affiliation":[{"name":"SADRI Institute, Shenyang, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8046-722X","authenticated-orcid":false,"given":"Xin","family":"Yang","sequence":"additional","affiliation":[{"name":"Dalian University of Technology, Dalian, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7728-916X","authenticated-orcid":false,"given":"Zhen","family":"Yang","sequence":"additional","affiliation":[{"name":"Northwestern Polytechnical University, Xi'an, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0018-2337","authenticated-orcid":false,"given":"Zhixiao","family":"Sun","sequence":"additional","affiliation":[{"name":"Northwestern Polytechnical University, Xi'an, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2697-8093","authenticated-orcid":false,"given":"Yi","family":"Chang","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence, Jilin University, Changchun, China"}]}],"member":"320","published-online":{"date-parts":[[2024,6,18]]},"reference":[{"key":"e_1_3_2_2_2","first-page":"1726","volume-title":"Proceedings of the 31st AAAI Conference on Artificial Intelligence","author":"Bacon Pierre-Luc","year":"2017","unstructured":"Pierre-Luc Bacon, Jean Harb, and Doina Precup. 2017. The option-critic architecture. In Proceedings of the 31st AAAI Conference on Artificial Intelligence. AAAI Press, 1726\u20131734. https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/view\/10916"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/2888402"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1207\/s15327108ijap0803_4"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/1869397.1869402"},{"key":"e_1_3_2_6_2","first-page":"914","volume-title":"Proceedings of the 38th International Conference on Machine Learning","volume":"139","author":"Biedenkapp Andr\u00e9","year":"2021","unstructured":"Andr\u00e9 Biedenkapp, Raghu Rajan, Frank Hutter, and Marius Lindauer. 2021. TempoRL: Learning when to act. In Proceedings of the 38th International Conference on Machine Learning, Vol. 139. PMLR, 914\u2013924. https:\/\/proceedings.mlr.press\/v139\/biedenkapp21a.html"},{"key":"e_1_3_2_7_2","volume-title":"Design of an All-attitude Flight Control System to Execute Commanded Bank Angles and Angles of Attack","author":"Burgin G.","year":"1976","unstructured":"G. Burgin and D. Eggleston. 1976. Design of an All-attitude Flight Control System to Execute Commanded Bank Angles and Angles of Attack. Technical Report. NASA Langley Technical Report Server."},{"key":"e_1_3_2_8_2","volume-title":"An Adaptive Maneuvering Logic Computer Program for the Simulation of One-on-one Air-to-air Combat","author":"Burgin G. H.","year":"1975","unstructured":"G. H. Burgin and A. J. Owens. 1975. An Adaptive Maneuvering Logic Computer Program for the Simulation of One-on-one Air-to-air Combat. Technical Report. NASA Langley Technical Report Server."},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-99247-1_38"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00221-003-1684-1"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.1107470"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2021.116156"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.4172\/2167-0374.1000144"},{"key":"e_1_3_2_14_2","volume-title":"International Conference on Learning Representations","author":"Eysenbach Benjamin","year":"2019","unstructured":"Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, and Sergey Levine. 2019. Diversity is all you need: Learning skills without a reward function. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=SJx63jRqFm"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1146\/annurev-neuro-071714-034002"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-018-05125-4"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2021.3060426"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2021.107071"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1007\/s40435-021-00803-6"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.aau6249"},{"key":"e_1_3_2_21_2","volume-title":"Application of Artificial Intelligence (AI) Programming Techniques to Tactical Guidance For Fighter Aircraft","author":"W. John","year":"1989","unstructured":"John W. McManus and Kenneth H. Goodrich. 1989. Application of Artificial Intelligence (AI) Programming Techniques to Tactical Guidance For Fighter Aircraft. Technical Report. NASA Langley Technical Report Server."},{"key":"e_1_3_2_22_2","first-page":"1008","volume-title":"Proceedings of the 12th International Conference on Neural Information Processing Systems","volume":"12","author":"Konda Vijay R.","year":"2000","unstructured":"Vijay R. Konda and John N. Tsitsiklis. 2000. Actor-critic algorithms. In Proceedings of the 12th International Conference on Neural Information Processing Systems, Vol. 12. MIT Press, 1008\u20131014. https:\/\/papers.nips.cc\/paper\/1999\/hash\/6449f44a102fde848669bdd9eb6b76fa-Abstract.html"},{"key":"e_1_3_2_23_2","volume-title":"International Conference on Learning Representations","author":"Kuba Jakub Grudzien","year":"2022","unstructured":"Jakub Grudzien Kuba, Ruiqing Chen, Muning Wen, Ying Wen, Fanglei Sun, Jun Wang, and Yaodong Yang. 2022. Trust region policy optimisation in multi-agent reinforcement learning. In International Conference on Learning Representations."},{"issue":"86","key":"e_1_3_2_24_2","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Laurens Van Der Maaten","year":"2008","unstructured":"Van Der Maaten Laurens and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 86 (Nov. 2008), 2579\u20132605. http:\/\/jmlr.org\/papers\/v9\/vandermaaten08a.html","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.dt.2021.01.005"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3078845"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.2514\/1.46815"},{"key":"e_1_3_2_28_2","first-page":"1928","volume-title":"Proceedings of the 33rd International Conference on Machine Learning","volume":"48","author":"Mnih Volodymyr","year":"2016","unstructured":"Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In Proceedings of the 33rd International Conference on Machine Learning, Vol. 48. PMLR, 1928\u20131937. https:\/\/proceedings.mlr.press\/v48\/mniha16.html"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/2838738"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neuron.2015.07.033"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.5139\/IJASS.2016.17.2.204"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.2514\/6.2021-0526"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN48605.2020.9207088"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICUAS51884.2021.9476700"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.5555\/3455716.3455894"},{"key":"e_1_3_2_36_2","article-title":"Proximal policy optimization algorithms","author":"Schulman J.","year":"2017","unstructured":"J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (July 2017). https:\/\/arxiv.org\/abs\/1707.06347","journal-title":"arXiv preprint arXiv:1707.06347"},{"key":"e_1_3_2_37_2","volume-title":"International Conference on Learning Representations","author":"Sharma Archit","year":"2020","unstructured":"Archit Sharma, Shixiang Gu, Sergey Levine, Vikash Kumar, and Karol Hausman. 2020. Dynamics-aware unsupervised discovery of skills. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=HJgLZR4KvH"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2021.115380"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/3480969"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.engappai.2020.104112"},{"key":"e_1_3_2_41_2","first-page":"1057","volume-title":"Proceedings of the 12th International Conference on Neural Information Processing Systems","volume":"12","author":"Sutton Richard S.","year":"2000","unstructured":"Richard S. Sutton, David McAllester, Satinder Singh, and Yishay Mansour. 2000. Policy gradient methods for reinforcement learning with function approximation. In Proceedings of the 12th International Conference on Neural Information Processing Systems, Vol. 12. MIT Press, 1057\u20131063. https:\/\/papers.nips.cc\/paper\/1999\/hash\/464d828b85b0bed98e80ade0a5c43b0f-Abstract.html"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.274.5293.1724"},{"key":"e_1_3_2_43_2","first-page":"24611","article-title":"The surprising effectiveness of PPO in cooperative multi-agent games","volume":"35","author":"Yu Chao","year":"2022","unstructured":"Chao Yu, Akash Velu, Eugene Vinitsky, Jiaxuan Gao, Yu Wang, Alexandre Bayen, and Yi Wu. 2022. The surprising effectiveness of PPO in cooperative multi-agent games. Advances in Neural Information Processing Systems 35 (2022), 24611\u201324624.","journal-title":"Advances in Neural Information Processing Systems"}],"container-title":["ACM Transactions on Intelligent Systems and Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3653979","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3653979","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:03:36Z","timestamp":1750291416000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3653979"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,18]]},"references-count":42,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,8,31]]}},"alternative-id":["10.1145\/3653979"],"URL":"https:\/\/doi.org\/10.1145\/3653979","relation":{},"ISSN":["2157-6904","2157-6912"],"issn-type":[{"value":"2157-6904","type":"print"},{"value":"2157-6912","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,6,18]]},"assertion":[{"value":"2022-06-11","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-03-06","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-06-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}