{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,22]],"date-time":"2026-02-22T04:58:29Z","timestamp":1771736309801,"version":"3.50.1"},"reference-count":35,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2021,11,24]],"date-time":"2021-11-24T00:00:00Z","timestamp":1637712000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,11,24]],"date-time":"2021-11-24T00:00:00Z","timestamp":1637712000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"National Key Research and Development Program of China","award":["2018AAA0101005"],"award-info":[{"award-number":["2018AAA0101005"]}]},{"name":"strategic priority research program of chinese academy of sciences","award":["XDA27030400"],"award-info":[{"award-number":["XDA27030400"]}]},{"DOI":"10.13039\/501100004739","name":"youth innovation promotion association of the chinese academy of sciences","doi-asserted-by":"publisher","award":["2021132"],"award-info":[{"award-number":["2021132"]}],"id":[{"id":"10.13039\/501100004739","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2022,4]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In missile guidance, pursuit performance is seriously degraded due to the uncertainty and randomness in target maneuverability, detection delay, and environmental noise. In many methods, accurately estimating the acceleration of the target or the time-to-go is needed to intercept the maneuvering target, which is hard in an environment with uncertainty. In this paper, we propose an assisted deep reinforcement learning (ARL) algorithm to optimize the neural network-based missile guidance controller for head-on interception. Based on the relative velocity, distance, and angle, ARL can control the missile to intercept the maneuvering target and achieve large terminal intercept angle. To reduce the influence of environmental uncertainty, ARL predicts the target\u2019s acceleration as an auxiliary supervised task. The supervised learning task improves the ability of the agent to extract information from observations. To exploit the agent\u2019s good trajectories, ARL presents the Gaussian self-imitation learning to make the mean of action distribution approach the agent\u2019s good actions. Compared with vanilla self-imitation learning, Gaussian self-imitation learning improves the exploration in continuous control. Simulation results validate that ARL outperforms traditional methods and proximal policy optimization algorithm with higher hit rate and larger terminal intercept angle in the simulation environment with noise, delay, and maneuverable target.<\/jats:p>","DOI":"10.1007\/s40747-021-00577-6","type":"journal-article","created":{"date-parts":[[2021,11,24]],"date-time":"2021-11-24T20:47:22Z","timestamp":1637786842000},"page":"1205-1216","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":36,"title":["Missile guidance with assisted deep reinforcement learning for head-on interception of maneuvering target"],"prefix":"10.1007","volume":"8","author":[{"given":"Weifan","family":"Li","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5384-423X","authenticated-orcid":false,"given":"Yuanheng","family":"Zhu","sequence":"additional","affiliation":[]},{"given":"Dongbin","family":"Zhao","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,11,24]]},"reference":[{"issue":"3","key":"577_CR1","doi-asserted-by":"publisher","first-page":"221","DOI":"10.1007\/s40747-016-0024-6","volume":"2","author":"A Anuse","year":"2016","unstructured":"Anuse A, Vyas V (2016) A novel training algorithm for convolutional neural network. Complex Intell Syst 2(3):221\u2013234","journal-title":"Complex Intell Syst"},{"issue":"3","key":"577_CR2","doi-asserted-by":"publisher","first-page":"155","DOI":"10.1007\/s40747-017-0062-8","volume":"4","author":"TR Caskey","year":"2018","unstructured":"Caskey TR, Wasek JS, Franz AY (2018) Deter and protect: crime modeling with multi-agent learning. Complex Intell Syst 4(3):155\u2013169","journal-title":"Complex Intell Syst"},{"key":"577_CR3","doi-asserted-by":"crossref","unstructured":"Chen Y, Zhao D, Li H (2019) Deep Kalman filter with optical flow for multiple object tracking. In: 2019 IEEE international conference on systems, man and cybernetics (SMC), pp 3036\u20133041","DOI":"10.1109\/SMC.2019.8914078"},{"issue":"2","key":"577_CR4","doi-asserted-by":"publisher","first-page":"221","DOI":"10.1007\/s40747-019-0113-4","volume":"6","author":"CAC Coello","year":"2020","unstructured":"Coello CAC, Brambila SG, Gamboa JF, Tapia MGC, G\u00f3mez RH (2020) Evolutionary multiobjective optimization: open research areas and some challenges lying ahead. Complex Intell Syst 6(2):221\u2013236","journal-title":"Complex Intell Syst"},{"key":"577_CR5","unstructured":"Ecoffet A, Huizinga J, Lehman J, Stanley KO, Clune J (2019) Go-explore: a new approach for hard-exploration problems. arXiv preprint. arXiv:1901.10995"},{"key":"577_CR6","doi-asserted-by":"crossref","unstructured":"Gao Y, Liu Y, Zhang Q, Wang Y, Zhao D, Ding D, Pang Z, Zhang Y (2019) Comparison of control methods based on imitation learning for autonomous driving. In: 2019 tenth international conference on intelligent control and information processing (ICICIP), pp 274\u2013281","DOI":"10.1109\/ICICIP47338.2019.9012185"},{"key":"577_CR7","doi-asserted-by":"crossref","unstructured":"Gaudet B, Furfaro R (2012) Missile homing-phase guidance law design using reinforcement learning. In: AIAA guidance, navigation, and control conference, p 4470","DOI":"10.2514\/6.2012-4470"},{"key":"577_CR8","doi-asserted-by":"publisher","first-page":"105746","DOI":"10.1016\/j.ast.2020.105746","volume":"99","author":"B Gaudet","year":"2020","unstructured":"Gaudet B, Furfaro R, Linares R (2020) Reinforcement learning for angle-only intercept guidance of maneuvering targets. Aerosp Sci Technol 99:105746","journal-title":"Aerosp Sci Technol"},{"key":"577_CR9","doi-asserted-by":"publisher","first-page":"54","DOI":"10.1016\/j.ast.2018.03.042","volume":"78","author":"J Guo","year":"2018","unstructured":"Guo J, Xiong Y, Zhou J (2018) A new sliding mode control design for integrated missile guidance and control system. Aerosp Sci Technol 78:54\u201361","journal-title":"Aerosp Sci Technol"},{"key":"577_CR10","doi-asserted-by":"crossref","unstructured":"Hernandez-Leal P, Kartal B, Taylor ME (2019) Agent modeling as auxiliary task for deep reinforcement learning. In: Proceedings of the AAAI conference on artificial intelligence and interactive digital entertainment, vol 15, pp 31\u201337","DOI":"10.1609\/aiide.v15i1.5221"},{"issue":"3","key":"577_CR11","doi-asserted-by":"publisher","first-page":"741","DOI":"10.1016\/j.cja.2013.04.035","volume":"26","author":"M Hou","year":"2013","unstructured":"Hou M, Liang X, Duan G (2013) Adaptive block dynamic surface control for integrated missile guidance and autopilot. Chin J Aeronaut 26(3):741\u2013750","journal-title":"Chin J Aeronaut"},{"issue":"1","key":"577_CR12","doi-asserted-by":"publisher","first-page":"136","DOI":"10.2514\/1.G003620","volume":"42","author":"Q Hu","year":"2019","unstructured":"Hu Q, Han T, Xin M (2019) Sliding-mode impact time guidance law design for various target motions. J Guid Control Dyn 42(1):136\u2013148","journal-title":"J Guid Control Dyn"},{"key":"577_CR13","unstructured":"Jaderberg M, Mnih V, Czarnecki WM, Schaul T, Leibo JZ, Silver D, Kavukcuoglu K(2016) Reinforcement learning with unsupervised auxiliary tasks. arXiv preprint. arXiv:1611.05397"},{"issue":"13","key":"577_CR14","doi-asserted-by":"publisher","first-page":"3521","DOI":"10.1073\/pnas.1611835114","volume":"114","author":"J Kirkpatrick","year":"2017","unstructured":"Kirkpatrick J, Pascanu R, Rabinowitz N, Veness J, Desjardins G, Rusu AA, Milan K, Quan J, Ramalho T, Grabska-Barwinska A et al (2017) Overcoming catastrophic forgetting in neural networks. Proc Natl Acad Sci 114(13):3521\u20133526","journal-title":"Proc Natl Acad Sci"},{"key":"577_CR15","unstructured":"Laskin M, Srinivas A, Abbeel P (2020) CURL: contrastive unsupervised Representations for Reinforcement Learning. In: Proceedings of the 37th international conference on machine learning, vol 119. PMLR, pp 5639\u20135650"},{"issue":"2","key":"577_CR16","doi-asserted-by":"publisher","first-page":"83","DOI":"10.1109\/MCI.2019.2901089","volume":"14","author":"D Li","year":"2019","unstructured":"Li D, Zhao D, Zhang Q, Chen Y (2019) Reinforcement learning and deep learning based lateral control for autonomous driving [application notes]. IEEE Comput Intell Mag 14(2):83\u201398","journal-title":"IEEE Comput Intell Mag"},{"key":"577_CR17","doi-asserted-by":"publisher","first-page":"47353","DOI":"10.1109\/ACCESS.2019.2909579","volume":"7","author":"C Liang","year":"2019","unstructured":"Liang C, Wang W, Liu Z, Lai C, Zhou B (2019) Learning to guide: guidance law based on deep meta-learning and model predictive path integral control. IEEE Access 7:47353\u201347365","journal-title":"IEEE Access"},{"key":"577_CR18","unstructured":"Lin X, Baweja H, Kantor G, Held D (2019) Adaptive auxiliary task weighting for reinforcement learning. In: Advances in neural information processing systems, pp 4773\u20134784"},{"issue":"6","key":"577_CR19","doi-asserted-by":"publisher","first-page":"1448","DOI":"10.1109\/TCST.2009.2039572","volume":"18","author":"W MacKunis","year":"2010","unstructured":"MacKunis W, Patre PM, Kaiser MK, Dixon WE (2010) Asymptotic tracking for aircraft via robust and adaptive dynamic inversion methods. IEEE Trans Control Syst Technol 18(6):1448\u20131456","journal-title":"IEEE Trans Control Syst Technol"},{"key":"577_CR20","unstructured":"Oh J, Guo Y, Singh S, Lee H (2018) Self-imitation learning. In: Proceedings of the 35th international conference on machine learning, vol 80, pp 3878\u20133887"},{"issue":"2","key":"577_CR21","doi-asserted-by":"publisher","first-page":"377","DOI":"10.2514\/1.54892","volume":"35","author":"H Prasanna","year":"2012","unstructured":"Prasanna H, Ghose D (2012) Retro-proportional-navigation: a new guidance law for interception of high speed targets. J Guid Control Dyn 35(2):377\u2013386","journal-title":"J Guid Control Dyn"},{"key":"577_CR22","doi-asserted-by":"crossref","unstructured":"Sang D, Min BM, Tahk M.J (2007) Impact angle control guidance law using Lyapunov function and PSO method. In: SICE annual conference 2007, pp 2253\u20132257","DOI":"10.1109\/SICE.2007.4421363"},{"key":"577_CR23","unstructured":"Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint. arXiv:1707.06347"},{"key":"577_CR24","unstructured":"Shao K, Tang Z, Zhu Y, Li N, Zhao D (2019) A survey of deep reinforcement learning in video games. arXiv preprint. arXiv:1912.10944"},{"key":"577_CR25","volume-title":"Reinforcement learning: an introduction","author":"RS Sutton","year":"2018","unstructured":"Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press, Cambridge"},{"issue":"21","key":"577_CR26","doi-asserted-by":"publisher","first-page":"5671","DOI":"10.1109\/TSP.2019.2941119","volume":"67","author":"M Tang","year":"2019","unstructured":"Tang M, Rong Y, De Maio A, Chen C, Zhou J (2019) Adaptive radar detection in gaussian disturbance with structured covariance matrix via invariance theory. IEEE Trans Signal Process 67(21):5671\u20135685","journal-title":"IEEE Trans Signal Process"},{"issue":"3","key":"577_CR27","doi-asserted-by":"publisher","first-page":"741","DOI":"10.1007\/s40747-020-00175-y","volume":"6","author":"H Zhang","year":"2020","unstructured":"Zhang H, Zhou A, Lin X (2020) Interpretable policy derivation for reinforcement learning based on evolutionary feature synthesis. Complex Intell Syst 6(3):741\u2013753","journal-title":"Complex Intell Syst"},{"issue":"4","key":"577_CR28","doi-asserted-by":"publisher","first-page":"960","DOI":"10.1016\/j.cja.2013.04.037","volume":"26","author":"Y Zhang","year":"2013","unstructured":"Zhang Y, Ma G, Liu A (2013) Guidance law with impact time and impact angle constraints. Chin J Aeronaut 26(4):960\u2013966","journal-title":"Chin J Aeronaut"},{"key":"577_CR29","doi-asserted-by":"crossref","unstructured":"Zhang Y, Zhang M (2020) Machine learning model-based two-dimensional matrix computation model for human motion and dance recovery. Complex Intell Syst 7(4):1805\u20131815","DOI":"10.1007\/s40747-020-00186-9"},{"issue":"6","key":"577_CR30","first-page":"701","volume":"33","author":"D Zhao","year":"2016","unstructured":"Zhao D, Shao K, Zhu Y, Li D, Chen Y, Wang H, Liu DR, Zhou T, Wang CH (2016) Review of deep reinforcement learning and discussions on the development of computer go. Control Theory Appl 33(6):701\u2013717","journal-title":"Control Theory Appl"},{"issue":"11","key":"577_CR31","doi-asserted-by":"publisher","first-page":"2089","DOI":"10.1007\/s00500-013-1110-y","volume":"17","author":"D Zhao","year":"2013","unstructured":"Zhao D, Wang B, Liu D (2013) A supervised actor-critic approach for adaptive cruise control. Soft Comput 17(11):2089\u20132099","journal-title":"Soft Comput"},{"key":"577_CR32","doi-asserted-by":"publisher","first-page":"818","DOI":"10.1016\/j.ast.2019.01.047","volume":"86","author":"J Zhu","year":"2019","unstructured":"Zhu J, Su D, Xie Y, Sun H (2019) Impact time and angle control guidance independent of time-to-go prediction. Aerosp Sci Technol 86:818\u2013825","journal-title":"Aerosp Sci Technol"},{"key":"577_CR33","doi-asserted-by":"crossref","unstructured":"Zhu Y, Xu J (2019) An energy optimal guidance law for non-linear systems considering impact angle constraints. In: Proceedings of the 2019 international conference on artificial intelligence, robotics and control, pp 99\u2013105","DOI":"10.1145\/3388218.3388227"},{"key":"577_CR34","doi-asserted-by":"publisher","unstructured":"Zhu Y, Zhao D (2019) Vision-based control in the open racing car simulator with deep and reinforcement learning. J Ambient Intell Humaniz Comput pp. 1\u201313. https:\/\/doi.org\/10.1007\/s12652-019-01503-y","DOI":"10.1007\/s12652-019-01503-y"},{"key":"577_CR35","doi-asserted-by":"publisher","unstructured":"Zhu Y, Zhao D (2020) Online minimax Q network learning for two-player zero-sum Markov games. IEEE Trans Neural Netw Learn Syst pp. 1\u201314. https:\/\/doi.org\/10.1109\/TNNLS.2020.3041469","DOI":"10.1109\/TNNLS.2020.3041469"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-021-00577-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-021-00577-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-021-00577-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,16]],"date-time":"2023-01-16T05:39:48Z","timestamp":1673847588000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-021-00577-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,11,24]]},"references-count":35,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,4]]}},"alternative-id":["577"],"URL":"https:\/\/doi.org\/10.1007\/s40747-021-00577-6","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,11,24]]},"assertion":[{"value":"26 January 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 October 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 November 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"On behalf of all the authors, the corresponding author states that there is no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}