{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T21:16:23Z","timestamp":1772918183466,"version":"3.50.1"},"reference-count":43,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2020,2,14]],"date-time":"2020-02-14T00:00:00Z","timestamp":1581638400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61573285"],"award-info":[{"award-number":["61573285"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004750","name":"Aeronautical Science Foundation of China","doi-asserted-by":"publisher","award":["20175553027"],"award-info":[{"award-number":["20175553027"]}],"id":[{"id":"10.13039\/501100004750","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>In this paper, a novel deep reinforcement learning (DRL) method, and robust deep deterministic policy gradient (Robust-DDPG), is proposed for developing a controller that allows robust flying of an unmanned aerial vehicle (UAV) in dynamic uncertain environments. This technique is applicable in many fields, such as penetration and remote surveillance. The learning-based controller is constructed with an actor-critic framework, and can perform a dual-channel continuous control (roll and speed) of the UAV. To overcome the fragility and volatility of original DDPG, three critical learning tricks are introduced in Robust-DDPG: (1) Delayed-learning trick, providing stable learnings, while facing dynamic environments; (2) adversarial attack trick, improving policy\u2019s adaptability to uncertain environments; (3) mixed exploration trick, enabling faster convergence of the model. The training experiments show great improvement in its convergence speed, convergence effect, and stability. The exploiting experiments demonstrate high efficiency in providing the UAV a shorter and smoother path. While, the generalization experiments verify its better adaptability to complicated, dynamic and uncertain environments, comparing to Deep Q Network (DQN) and DDPG algorithms.<\/jats:p>","DOI":"10.3390\/rs12040640","type":"journal-article","created":{"date-parts":[[2020,2,20]],"date-time":"2020-02-20T03:20:03Z","timestamp":1582168803000},"page":"640","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":91,"title":["Robust Motion Control for UAV in Dynamic Uncertain Environments Using Deep Reinforcement Learning"],"prefix":"10.3390","volume":"12","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1359-7112","authenticated-orcid":false,"given":"Kaifang","family":"Wan","sequence":"first","affiliation":[{"name":"School of Electronics and Information, Northwestern Polytechnical University, Xi\u2019an 710072, China"}]},{"given":"Xiaoguang","family":"Gao","sequence":"additional","affiliation":[{"name":"School of Electronics and Information, Northwestern Polytechnical University, Xi\u2019an 710072, China"}]},{"given":"Zijian","family":"Hu","sequence":"additional","affiliation":[{"name":"School of Electronics and Information, Northwestern Polytechnical University, Xi\u2019an 710072, China"}]},{"given":"Gaofeng","family":"Wu","sequence":"additional","affiliation":[{"name":"School of Electronics and Information, Northwestern Polytechnical University, Xi\u2019an 710072, China"}]}],"member":"1968","published-online":{"date-parts":[[2020,2,14]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1088","DOI":"10.1109\/LRA.2018.2795643","article-title":"DroNet: Learning to fly by driving","volume":"3","author":"Loquercio","year":"2018","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Fraga, P., and Ramos, L. (2019). A review on IoT deep Learning UAV systems for autonomous obstacle detection and collision avoidance. Remote Sens., 11.","DOI":"10.3390\/rs11182144"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"46","DOI":"10.1109\/MRA.2012.2206473","article-title":"Toward a fully autonomous UAV: Research platform for indoor and outdoor urban search and rescue","volume":"19","author":"Tomic","year":"2012","journal-title":"IEEE Robot. Autom. Mag."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Zha, H., and Miao, Y. (2020). Improving unmanned aerial vehicle remote sensing-based rice nitrogen nutrition index prediction with machine learning. Remote Sens., 12.","DOI":"10.3390\/rs12020215"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Emery, W., and Schmalzel, J. (2018). Editorial for \u201cremote sensing from unmanned aerial vehicles\u201d. Remote Sens., 10.","DOI":"10.3390\/rs10121877"},{"key":"ref_6","first-page":"1","article-title":"Unmanned aerial vehicles (UAV): A survey on civil applications and key research challenges","volume":"7","author":"Shakhatreh","year":"2018","journal-title":"IEEE Access"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Darrah, M., and Niland, W. (2006, January 21\u201324). UAV cooperative task assignments for a SEAD mission using genetic algorithms. Proceedings of the AIAA Guidance, Navigation & Control Conference & Exhibit, Keystone, CO, USA.","DOI":"10.2514\/6.2006-6456"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1016\/j.proeng.2014.12.098","article-title":"Path planning with modified A star algorithm for a mobile robot","volume":"96","author":"Duchon","year":"2014","journal-title":"Procedia Eng."},{"key":"ref_9","unstructured":"Rahul, K., and Kevin, W. (2011, January 1\u20132). Planning of multiple autonomous vehicles using RRT. Proceedings of the 2011 IEEE 10th International Conference on Cybernetic Intelligent Systems (CIS), London, UK."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Bounini, F., Gingras, D., and Pollart, H. (2017, January 11\u201314). Modified artificial potential field method for online path planning applications. Proceedings of the 2017 IEEE Intelligent Vehicles Symposium (IV), Los Angeles, CA, USA.","DOI":"10.1109\/IVS.2017.7995717"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Panchpor, A.A., Shue, S., and Conrad, J.M. (2018, January 4\u20135). A survey of methods for mobile robot localization and mapping in dynamic indoor environments. Proceedings of the 2018 Conference on Signal Processing and Communication Engineering Systems (SPACES), Vijayawada, India.","DOI":"10.1109\/SPACES.2018.8316333"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Koch, T., K\u00f6rner, M., and Fraundorfer, F. (2019). Automatic and semantically-aware 3D UAV flight planning for image-based 3D reconstruction. Remote Sens., 11.","DOI":"10.3390\/rs11131550"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Chuang, H., He, D., and Namiki, A. (2019). Autonomous target tracking of UAV using high-speed visual feedback. Appl. Sci., 9.","DOI":"10.3390\/app9214552"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"545","DOI":"10.21629\/JSEE.2019.03.12","article-title":"Modeling of UAV path planning based on IMM under POMDP framework","volume":"30","author":"Yang","year":"2019","journal-title":"J. Syst. Eng. Electron."},{"key":"ref_15","unstructured":"Sutton, R., and Barto, A. (2017). Reinforcement Learning: An Introduction, MIT Press. [2nd ed.]."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Junell, J., Kampen, E., and Visser, C. (2015, January 5\u20139). Reinforcement learning applied to a quadrotor guidance law in autonomous flight. Proceedings of the AIAA Guidance, Navigation, and Control Conference, Kissimmee, FL, USA.","DOI":"10.2514\/6.2015-1990"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Luo, W., Tang, Q., and Fu, C. (2018, January 16). Deep-sarsa based multi-UAV path planning and obstacle avoidance in a dynamic environment. Proceedings of the International Conference on Sensing & Imaging, Cham, Switzerland.","DOI":"10.1007\/978-3-319-93818-9_10"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Imanberdiyev, N., Fu, C., and Kayacan, E. (2016, January 13\u201315). Autonomous navigation of UAV by using real-time model-based reinforcement learning. Proceedings of the International Conference on Control, Automation, Robotics and Vision (ICARCV), Phuket, Thailand.","DOI":"10.1109\/ICARCV.2016.7838739"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"353","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Van, H., Guez, A., and Silver, D. (2016, January 12\u201317). Deep reinforcement learning with double Q-learning. Proceedings of the 30th AAAI Conference on Artificial Intelligence, Menlo Park, CA, USA.","DOI":"10.1609\/aaai.v30i1.10295"},{"key":"ref_21","unstructured":"Wang, Z., and Freitas, N. (2016, January 19\u201324). Dueling network architectures for deep reinforcement learning. Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA."},{"key":"ref_22","unstructured":"Tom, S., and John, Q. (2016, January 2\u20134). Prioritized experience replay. Proceedings of the 4th International Conference on Learning Representations (ICLR 2016), San Juan, Puerto Rico."},{"key":"ref_23","first-page":"1","article-title":"A dynamic adjusting reward function method for deep reinforcement learning with adjustable parameters","volume":"2019","author":"Hu","year":"2019","journal-title":"Math. Probl. Eng."},{"key":"ref_24","unstructured":"Kjell, K. (2017). Deep Reinforcement Learning as Control Method for Autonomous UAV, Universitat Politecnica de Catalunya."},{"key":"ref_25","first-page":"1","article-title":"A deep reinforcement learning strategy for UAV autonomous landing on a moving platform","volume":"2","author":"Rodriguez","year":"2018","journal-title":"J. Intell. Robot. Syst."},{"key":"ref_26","unstructured":"Conde, R., and Llata, J. (2017). Time-varying formation controllers for unmanned aerial vehicles using deep reinforcement learning. arXiv."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Peters, J., and Schaal, S. (2006, January 9\u201315). Policy gradient methods for robotics. Proceedings of the 2006 IEEE\/RSJ International Conference on Intelligent Robots and Systems, Beijing, China.","DOI":"10.1109\/IROS.2006.282564"},{"key":"ref_28","unstructured":"Silver, D., and Lever, G. (2014, January 3\u20136). Deterministic policy gradient algorithms. Proceedings of the International Conference on International Conference on Machine Learning, Detroit, MI, USA."},{"key":"ref_29","first-page":"180","article-title":"Continuous control with deep reinforcement learning","volume":"8","author":"Lillicrap","year":"2015","journal-title":"Comput. Sci."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"363","DOI":"10.1109\/ACCESS.2019.2961426","article-title":"Maneuver decision of UAV in short-range air combat based on deep reinforcement learning","volume":"8","author":"Yang","year":"2020","journal-title":"IEEE Access"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"2124","DOI":"10.1109\/TVT.2018.2890773","article-title":"Autonomous navigation of UAVs in large-scale complex environments: A deep reinforcement learning approach","volume":"68","author":"Wang","year":"2019","journal-title":"IEEE Trans. Veh. Technol."},{"key":"ref_32","unstructured":"John, S., and Sergey, L. (2015, January 6\u201311). Trust region policy optimization. Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), Lille, France."},{"key":"ref_33","unstructured":"John, S., Filip, W., and Prafulla, D. (2017). Proximal policy optimization algorithms. arXiv."},{"key":"ref_34","unstructured":"Cory, D. (2010). Controlled Mobility of Unmanned Aircraft Chains to Optimize Network Capacity in Realistic Communication Environments, University of Colorado."},{"key":"ref_35","first-page":"12","article-title":"Mobility control of unmanned aerial vehicle as communication relay in airborne multi-user systems","volume":"6","author":"Wu","year":"2019","journal-title":"Chin. J. Aeronaut."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Beard, R., and McLain, T. (2012). Small Unmanned Aircraft: Theory and Practice, Princeton University Press.","DOI":"10.1515\/9781400840601"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Quintero, S., and Collins, G. (2013, January 17\u201319). Flocking with fixed-wing UAVs for distributed sensing: A stochastic optimal control approach. Proceedings of the American Control Conference (ACC), Washington, DC, USA.","DOI":"10.1109\/ACC.2013.6580133"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"74","DOI":"10.21629\/JSEE.2018.01.08","article-title":"Using approximate dynamic programming for multi-ESM scheduling to track ground moving targets","volume":"29","author":"Wan","year":"2018","journal-title":"J. Syst. Eng. Electron."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Lin, Y.C., and Cheng, Y.T. (2019). Evaluation of UAV LiDAR for mapping coastal environments. Remote Sens., 11.","DOI":"10.3390\/rs11242893"},{"key":"ref_40","unstructured":"Kyriakos, E., and Daniel, K. (2013, January 11\u201313). Using plan-based reward shaping to learn strategies in StarCraft: Brood war. Proceedings of the 2013 IEEE Conference on Computational Intelligence in Games (CIG), Niagara Falls, ON, Canada."},{"key":"ref_41","unstructured":"Scott, F., Herke, V., and David, M. (2018, January 10\u201315). Addressing function approximation error in actor-critic methods. Proceedings of the 35th International Conference on Machine Learning (ICML 2018), Stockholmsm\u00e4ssan, Stockholm, Sweden."},{"key":"ref_42","unstructured":"Ian, J., and Jonathon, S. (2014). Explaining and harnessing adversarial examples. arXiv."},{"key":"ref_43","unstructured":"Jernej, K., and Dawn, S. (2017). Delving into adversarial attacks on deep policies. arXiv."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/12\/4\/640\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T08:57:56Z","timestamp":1760173076000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/12\/4\/640"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,2,14]]},"references-count":43,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2020,2]]}},"alternative-id":["rs12040640"],"URL":"https:\/\/doi.org\/10.3390\/rs12040640","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,2,14]]}}}