{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,15]],"date-time":"2026-03-15T05:33:52Z","timestamp":1773552832180,"version":"3.50.1"},"reference-count":36,"publisher":"MDPI AG","issue":"22","license":[{"start":{"date-parts":[[2020,11,18]],"date-time":"2020-11-18T00:00:00Z","timestamp":1605657600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004750","name":"Aeronautical Science Foundation of China","doi-asserted-by":"publisher","award":["2017ZC53021"],"award-info":[{"award-number":["2017ZC53021"]}],"id":[{"id":"10.13039\/501100004750","id-type":"DOI","asserted-by":"publisher"}]},{"name":"the Seed Foundation of Innovation and Creation for Graduate Students in Northwestern Polytechnical University","award":["ZZ2019159"],"award-info":[{"award-number":["ZZ2019159"]}]},{"name":"the Open Project Fund of CETC Key Laboratory of Data Link Technology","award":["CLDL-20182101"],"award-info":[{"award-number":["CLDL-20182101"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>This paper combines deep reinforcement learning (DRL) with meta-learning and proposes a novel approach, named meta twin delayed deep deterministic policy gradient (Meta-TD3), to realize the control of unmanned aerial vehicle (UAV), allowing a UAV to quickly track a target in an environment where the motion of a target is uncertain. This approach can be applied to a variety of scenarios, such as wildlife protection, emergency aid, and remote sensing. We consider a multi-task experience replay buffer to provide data for the multi-task learning of the DRL algorithm, and we combine meta-learning to develop a multi-task reinforcement learning update method to ensure the generalization capability of reinforcement learning. Compared with the state-of-the-art algorithms, namely the deep deterministic policy gradient (DDPG) and twin delayed deep deterministic policy gradient (TD3), experimental results show that the Meta-TD3 algorithm has achieved a great improvement in terms of both convergence value and convergence rate. In a UAV target tracking problem, Meta-TD3 only requires a few steps to train to enable a UAV to adapt quickly to a new target movement mode more and maintain a better tracking effectiveness.<\/jats:p>","DOI":"10.3390\/rs12223789","type":"journal-article","created":{"date-parts":[[2020,11,18]],"date-time":"2020-11-18T09:59:47Z","timestamp":1605693587000},"page":"3789","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":80,"title":["UAV Maneuvering Target Tracking in Uncertain Environments Based on Deep Reinforcement Learning and Meta-Learning"],"prefix":"10.3390","volume":"12","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1415-4444","authenticated-orcid":false,"given":"Bo","family":"Li","sequence":"first","affiliation":[{"name":"School of Electronics and Information, Northwestern Polytechnical University, Xi\u2019an 710072, China"}]},{"given":"Zhigang","family":"Gan","sequence":"additional","affiliation":[{"name":"School of Electronics and Information, Northwestern Polytechnical University, Xi\u2019an 710072, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0030-1199","authenticated-orcid":false,"given":"Daqing","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Engineering, London South Bank University, London SE1 0AA, UK"}]},{"given":"Dyachenko","family":"Sergey Aleksandrovich","sequence":"additional","affiliation":[{"name":"School of Robotic and Intelligent Systems, Moscow Aviation Institute, 125993 Moscow, Russia"}]}],"member":"1968","published-online":{"date-parts":[[2020,11,18]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Fu, C., Carrio, A., Olivares-Mendez, M.A., Suarez-Fernandez, R., and Campoy, P. (June, January 31). Robust real-time vision-based aircraft tracking from unmanned aerial vehicles. Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China.","DOI":"10.1109\/ICRA.2014.6907659"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"31362","DOI":"10.3390\/s151229861","article-title":"Towards an autonomous vision-based unmanned aerial system against wildlife poachers","volume":"15","author":"Fu","year":"2015","journal-title":"Sensors"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1007\/s10846-011-9546-8","article-title":"Safety, security, and rescue missions with an unmanned aerial vehicle (UAV)","volume":"64","author":"Birk","year":"2011","journal-title":"J. Intell. Robot. Syst."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Fu, C., Carrio, A., and Campoy, P. (2015, January 9\u201312). Efficient visual odometry and mapping for unmanned aerial vehicle using ARM-based stereo vision pre-processing system. Proceedings of the 2015 International Conference on Unmanned Aircraft Systems (ICUAS), Denver, CO, USA.","DOI":"10.1109\/ICUAS.2015.7152384"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"29064","DOI":"10.1109\/ACCESS.2020.2971780","article-title":"Path Planning for UAV Ground Target Tracking via Deep Reinforcement Learning","volume":"8","author":"Li","year":"2020","journal-title":"IEEE Access"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1117","DOI":"10.1109\/TVT.2019.2952549","article-title":"Deep reinforcement learning for UAV navigation through massive MIMO technique","volume":"69","author":"Huang","year":"2019","journal-title":"IEEE Trans. Veh. Technol."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"117227","DOI":"10.1109\/ACCESS.2019.2933002","article-title":"UAV autonomous target search based on deep reinforcement learning in complex disaster scene","volume":"7","author":"Wu","year":"2019","journal-title":"IEEE Access"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"2124","DOI":"10.1109\/TVT.2018.2890773","article-title":"Autonomous navigation of UAVs in large-scale complex environments: A deep reinforcement learning approach","volume":"68","author":"Wang","year":"2019","journal-title":"IEEE Trans. Veh. Technol."},{"key":"ref_10","unstructured":"Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2019, January 22\u201324). Continuous control with deep reinforcement learning. Proceedings of the Chinese Automation Congress (CAC), Hangzhou, China."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Wan, K., Gao, X., Hu, Z., and Wu, G. (2020). Robust Motion Control for UAV in Dynamic Uncertain Environments Using Deep Reinforcement Learning. Remote. Sens., 12.","DOI":"10.3390\/rs12040640"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Bhagat, S., and Sujit, P.B. (2020, January 1\u20134). UAV Target Tracking in Urban Environments Using Deep Reinforcement Learning. Proceedings of the 2020 International Conference on Unmanned Aircraft Systems (ICUAS), Athens, Greece.","DOI":"10.1109\/ICUAS48674.2020.9213856"},{"key":"ref_13","unstructured":"Hayat, S., Yanmaz, E., Brown, T.X., and Bettstetter, C. (June, January 29). Multi-objective UAV path planning for search and rescue. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Singapore."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"107038","DOI":"10.1016\/j.comnet.2019.107038","article-title":"Distributed aerial processing for IoT-based edge UAV swarms in smart farming","volume":"167","author":"Mukherjee","year":"2020","journal-title":"Comput. Netw."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Yang, B., Cao, X., Yuen, C., and Qian, L. (2020). Offloading Optimization in Edge Computing for Deep Learning Enabled Target Tracking by Internet-of-UAVs. IEEE Internet Things J., 1.","DOI":"10.1109\/JIOT.2020.3016694"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., and Meger, D. (2017). Deep reinforcement learning that matters. arXiv.","DOI":"10.1609\/aaai.v32i1.11694"},{"key":"ref_17","unstructured":"Zhang, A., Wu, Y., and Pineau, J. (2018). Natural environment benchmarks for reinforcement learning. arXiv."},{"key":"ref_18","unstructured":"Liu, H., Socher, R., and Xiong, C. (2019, January 10\u201315). Taming maml: Efficient unbiased meta-reinforcement learning. Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA."},{"key":"ref_19","unstructured":"Fujimoto, S., van Hoof, H., and Meger, D. (2018). Addressing Function Approximation Error in Actor-Critic Methods. arXiv."},{"key":"ref_20","unstructured":"Finn, C., Abbeel, P., and Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. arXiv."},{"key":"ref_21","unstructured":"Li, Z., Zhou, F., Chen, F., and Li, H. (2017). Meta-sgd: Learning to learn quickly for few-shot learning. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Mellinger, D., and Kumar, V. (2011, January 9\u201313). Minimum snap trajectory generation and control for quadrotors. Proceedings of the 2011 IEEE International Conference on Robotics and Automation, Shanghai, China.","DOI":"10.1109\/ICRA.2011.5980409"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Imanberdiyev, N., Fu, C., Kayacan, E., and Chen, I.-M. (2016, January 13\u201315). Autonomous navigation of UAV by using real-time model-based reinforcement learning. Proceedings of the 2016 14th International Conference on Control, Automation, Robotics and Vision (ICARCV), Phuket, Thailand.","DOI":"10.1109\/ICARCV.2016.7838739"},{"key":"ref_24","unstructured":"Zhou, D., and Schwager, M. (June, January 31). Vector field following for quadrotors using differential flatness. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China."},{"key":"ref_25","unstructured":"Sutton, R.S., and Barto, A.G. (2018). Reinforcement Learning: An Introduction, MIT Press."},{"key":"ref_26","unstructured":"Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., and Riedmiller, M. (2014, January 21\u201326). Deterministic Policy Gradient Algorithms. Proceedings of the 31st International Conference on Machine Learning, Beijing, China."},{"key":"ref_27","unstructured":"Schulman, J., Levine, S., Abbeel, P., Jordan, M., and Moritz, P. (2015, January 6\u201311). Trust region policy optimization. Proceedings of the 32nd International Conference on Machine Learning, Lille, France."},{"key":"ref_28","unstructured":"Sutton, R.S., McAllester, D.A., Singh, S.P., and Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. Advances in Neural Information Processing Systems, MIT Press."},{"key":"ref_29","unstructured":"Schulman, J., Moritz, P., Levine, S., Jordan, M., and Abbeel, P. (2015). High-dimensional continuous control using generalized advantage estimation. arXiv."},{"key":"ref_30","unstructured":"Roderick, M., MacGlashan, J., and Tellex, S. (2017). Implementing the deep q-network. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"765","DOI":"10.1007\/s12046-014-0275-0","article-title":"AI-based adaptive control and design of autopilot system for nonlinear UAV","volume":"39","author":"Yadav","year":"2014","journal-title":"Sadhana"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Peters, J., and Schaal, S. (2006, January 9\u201315). Policy gradient methods for robotics. Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems, Beijing, China.","DOI":"10.1109\/IROS.2006.282564"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"293","DOI":"10.1007\/BF00992699","article-title":"Self-improving reactive agents based on reinforcement learning, planning and teaching","volume":"8","author":"Lin","year":"1992","journal-title":"Mach. Learn."},{"key":"ref_34","unstructured":"Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., and Lillicrap, T. (2016, January 19\u201324). Meta-learning with memory-augmented neural networks. Proceedings of the International Conference on Machine Learning, New York, NY, USA."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"e253","DOI":"10.1017\/S0140525X16001837","article-title":"Building machines that learn and think like people","volume":"40","author":"Lake","year":"2017","journal-title":"Behav. Brain Sci."},{"key":"ref_36","unstructured":"Mnih, V., Badia, A.P., Mirza, M., Graves, A., Harley, T., Lillicrap, T.P., Silver, D., and Kavukcuoglu, K. (2016, January 19\u201324). Asynchronous Methods for Deep Reinforcement Learning. Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/12\/22\/3789\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T10:33:57Z","timestamp":1760178837000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/12\/22\/3789"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,11,18]]},"references-count":36,"journal-issue":{"issue":"22","published-online":{"date-parts":[[2020,11]]}},"alternative-id":["rs12223789"],"URL":"https:\/\/doi.org\/10.3390\/rs12223789","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,11,18]]}}}