{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T14:34:26Z","timestamp":1774449266033,"version":"3.50.1"},"reference-count":37,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2020,2,25]],"date-time":"2020-02-25T00:00:00Z","timestamp":1582588800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100000266","name":"Engineering and Physical Sciences Research Council","doi-asserted-by":"publisher","award":["EP\/R02572X\/1"],"award-info":[{"award-number":["EP\/R02572X\/1"]}],"id":[{"id":"10.13039\/501100000266","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Robotics"],"abstract":"<jats:p>The autonomous landing of an Unmanned Aerial Vehicle (UAV) on a marker is one of the most challenging problems in robotics. Many solutions have been proposed, with the best results achieved via customized geometric features and external sensors. This paper discusses for the first time the use of deep reinforcement learning as an end-to-end learning paradigm to find a policy for UAVs autonomous landing. Our method is based on a divide-and-conquer paradigm that splits a task into sequential sub-tasks, each one assigned to a Deep Q-Network (DQN), hence the name Sequential Deep Q-Network (SDQN). Each DQN in an SDQN is activated by an internal trigger, and it represents a component of a high-level control policy, which can navigate the UAV towards the marker. Different technical solutions have been implemented, for example combining vanilla and double DQNs, and the introduction of a partitioned buffer replay to address the problem of sample efficiency. One of the main contributions of this work consists in showing how an SDQN trained in a simulator via domain randomization, can effectively generalize to real-world scenarios of increasing complexity. The performance of SDQNs is comparable with a state-of-the-art algorithm and human pilots while being quantitatively better in noisy conditions.<\/jats:p>","DOI":"10.3390\/robotics9010008","type":"journal-article","created":{"date-parts":[[2020,2,26]],"date-time":"2020-02-26T04:18:29Z","timestamp":1582690709000},"page":"8","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":38,"title":["Sim-to-Real Quadrotor Landing via Sequential Deep Q-Networks and Domain Randomization"],"prefix":"10.3390","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8318-7269","authenticated-orcid":false,"given":"Riccardo","family":"Polvara","sequence":"first","affiliation":[{"name":"Lincoln Centre for Autonomous Systems, University of Lincoln, Lincoln LN6 7TS, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9500-6899","authenticated-orcid":false,"given":"Massimiliano","family":"Patacchiola","sequence":"additional","affiliation":[{"name":"School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7728-1849","authenticated-orcid":false,"given":"Marc","family":"Hanheide","sequence":"additional","affiliation":[{"name":"Lincoln Centre for Autonomous Systems, University of Lincoln, Lincoln LN6 7TS, UK"}]},{"given":"Gerhard","family":"Neumann","sequence":"additional","affiliation":[{"name":"Karlsruhe Institute of Technology, 76131 Karlsruhe, Germany"},{"name":"Bosch Center for Artificial Intelligence, 72076 Tubingen, Germany"}]}],"member":"1968","published-online":{"date-parts":[[2020,2,25]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Gu, S., Holly, E., Lillicrap, T., and Levine, S. (June, January 29). Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, Singapore.","DOI":"10.1109\/ICRA.2017.7989385"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1177\/0278364919887447","article-title":"Learning dexterous in-hand manipulation","volume":"39","author":"Andrychowicz","year":"2020","journal-title":"Int. J. Rob. Res."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Tai, L., Paolo, G., and Liu, M. (2017, January 24\u201328). Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation. Proceedings of the 2017 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada.","DOI":"10.1109\/IROS.2017.8202134"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Zhu, Y., Mottaghi, R., Kolve, E., Lim, J.J., Gupta, A., Fei-Fei, L., and Farhadi, A. (June, January 29). Target-driven visual navigation in indoor scenes using deep reinforcement learning. Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, Singapore.","DOI":"10.1109\/ICRA.2017.7989381"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Kahn, G., Villaflor, A., Ding, B., Abbeel, P., and Levine, S. (2018, January 21\u201325). Self-supervised deep reinforcement learning with generalized computation graphs for robot navigation. Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia.","DOI":"10.1109\/ICRA.2018.8460655"},{"key":"ref_7","unstructured":"Ha, D., and Schmidhuber, J. (2018, January 2\u20138). Recurrent world models facilitate policy evolution. Proceedings of the Advances in Neural Information Processing Systems, Montr\u00e9al, QC, Canada."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Thabet, M., Patacchiola, M., and Cangelosi, A. (2019). Sample-efficient Deep Reinforcement Learning with Imaginary Rollouts for Human-Robot Interaction. arXiv.","DOI":"10.1109\/IROS40897.2019.8967834"},{"key":"ref_9","unstructured":"Zhang, F., Leitner, J., Milford, M., and Corke, P. (2016). Modular deep q networks for sim-to-real transfer of visuo-motor policies. arXiv."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., and Abbeel, P. (2017, January 24\u201328). Domain randomization for transferring deep neural networks from simulation to the real world. Proceedings of the 2017 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada.","DOI":"10.1109\/IROS.2017.8202133"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Tobin, J., Biewald, L., Duan, R., Andrychowicz, M., Handa, A., Kumar, V., McGrew, B., Ray, A., Schneider, J., and Welinder, P. (2018, January 1\u20135). Domain randomization and generative models for robotic grasping. Proceedings of the 2018 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain.","DOI":"10.1109\/IROS.2018.8593933"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Polvara, R., Patacchiola, M., Sharma, S., Wan, J., Manning, A., Sutton, R., and Cangelosi, A. (2018, January 12\u201315). Toward End-to-End Control for UAV Autonomous Landing via Deep Reinforcement Learning. Proceedings of the 2018 International Conference on Unmanned Aircraft Systems (ICUAS), Dallas, TX, USA.","DOI":"10.1109\/ICUAS.2018.8453449"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Van Hasselt, H., Guez, A., and Silver, D. (2016, January 12\u201317). Deep Reinforcement Learning with Double Q-Learning. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA.","DOI":"10.1609\/aaai.v30i1.10295"},{"key":"ref_14","unstructured":"Thrun, S., and Schwartz, A. (1993). Issues in using function approximation for reinforcement learning. Proceedings of the 1993 Connectionist Models Summer School, Psychology Press. [1st ed.]."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Forster, C., Faessler, M., Fontana, F., Werlberger, M., and Scaramuzza, D. (2015, January 26\u201330). Continuous on-board monocular-vision-based elevation mapping applied to autonomous landing of micro aerial vehicles. Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA.","DOI":"10.1109\/ICRA.2015.7138988"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"572","DOI":"10.1109\/70.768189","article-title":"A high integrity IMU\/GPS navigation loop for autonomous land vehicle applications","volume":"15","author":"Sukkarieh","year":"1999","journal-title":"IEEE Trans. Robo. Autom."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Baca, T., Stepan, P., and Saska, M. (2017, January 6\u20138). Autonomous landing on a moving car with unmanned aerial vehicle. Proceedings of the 2017 European Conference on Mobile Robots (ECMR), Paris, France.","DOI":"10.1109\/ECMR.2017.8098700"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Beul, M., Houben, S., Nieuwenhuisen, M., and Behnke, S. (2017, January 6\u20138). Fast autonomous landing on a moving target at MBZIRC. Proceedings of the 2017 European Conference on Mobile Robots (ECMR), Paris, France.","DOI":"10.1109\/ECMR.2017.8098669"},{"key":"ref_19","first-page":"78","article-title":"The ETH-MAV Team in the MBZ International Robotics Challenge","volume":"36","author":"Pantic","year":"2009","journal-title":"J. Field Rob."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1007\/s10846-013-9819-5","article-title":"Airborne vision-based navigation method for uav accuracy landing using infrared lamps","volume":"72","author":"Gui","year":"2013","journal-title":"J. Intell. Rob. Syst."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"67","DOI":"10.5772\/62027","article-title":"Ground stereo vision-based navigation for autonomous take-off and landing of uavs: A chan-vese model approach","volume":"13","author":"Tang","year":"2016","journal-title":"Int. J. Adv. Rob. Syst."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"881","DOI":"10.1007\/s10514-016-9564-2","article-title":"Monocular vision-based real-time target recognition and tracking for autonomously landing an UAV in a cluttered shipboard environment","volume":"41","author":"Lin","year":"2017","journal-title":"Autono. Robots"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Falanga, D., Zanchettin, A., Simovic, A., Delmerico, J., and Davide, S. (2017, January 11\u201313). Vision-based Autonomous Quadrotor Landing on a Moving Platform. Proceedings of the 2017 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), Shanghai, China.","DOI":"10.1109\/SSRR.2017.8088164"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1524","DOI":"10.1109\/TRO.2016.2604495","article-title":"Landing of a Quadrotor on a Moving Target Using Dynamic Image-Based Visual Servo Control","volume":"32","author":"Serra","year":"2016","journal-title":"IEEE Trans. Rob."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Lee, D., Ryan, T., and Kim, H.J. (2012, January 14\u201318). Autonomous landing of a VTOL UAV on a moving platform using image-based visual servoing. Proceedings of the 2012 IEEE International Conference on Robotics and Automation, Saint Paul, MN, USA.","DOI":"10.1109\/ICRA.2012.6224828"},{"key":"ref_26","unstructured":"Kersandt, K. (2017). Deep Teinforcement Learning as Control Method for Autonomous Uavs. [Master\u2019s Thesis, Universitat Polit\u00e8cnica de Catalunya]."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Xu, Y., Liu, Z., and Wang, X. (2018, January 25\u201327). Monocular Vision based Autonomous Landing of Quadrotor through Deep Reinforcement Learning. Proceedings of the 2018 37th Chinese Control Conference (CCC), Wuhan, China.","DOI":"10.23919\/ChiCC.2018.8482830"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Lee, S., Shim, T., Kim, S., Park, J., Hong, K., and Bang, H. (2018, January 12\u201315). Vision-Based Autonomous Landing of a Multi-Copter Unmanned Aerial Vehicle using Reinforcement Learning. Proceedings of the 2018 International Conference on Unmanned Aircraft Systems (ICUAS), Dallas, TX, USA.","DOI":"10.1109\/ICUAS.2018.8453315"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"351","DOI":"10.1007\/s10846-018-0891-8","article-title":"A deep reinforcement learning strategy for UAV autonomous landing on a moving platform","volume":"93","author":"Sampedro","year":"2019","journal-title":"J. Intell. Rob. Syst."},{"key":"ref_30","unstructured":"Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv."},{"key":"ref_31","unstructured":"Schaul, T., Quan, J., Antonoglou, I., and Silver, D. (2015). Prioritized experience replay. arXiv."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Nonami, K., Kendoul, F., Suzuki, S., Wang, W., and Nakazawa, D. (2010). Autonomous Flying Robots: Unmanned Aerial Vehicles and Micro Aerial Vehicles, Springer. [1st ed.].","DOI":"10.1007\/978-4-431-53856-1"},{"key":"ref_33","unstructured":"Goldstein, H. (1980). Classical Mechanics, Addison-Wesley. [2nd ed.]."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"341","DOI":"10.1023\/A:1025696116075","article-title":"Recent advances in hierarchical reinforcement learning","volume":"13","author":"Barto","year":"2003","journal-title":"Discrete Event Dyn. Syst."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Narasimhan, K., Kulkarni, T., and Barzilay, R. (2015). Language understanding for text-based games using deep reinforcement learning. arXiv.","DOI":"10.18653\/v1\/D15-1001"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"156","DOI":"10.1016\/j.neunet.2012.11.007","article-title":"Autonomous reinforcement learning with experience replay","volume":"41","author":"Tanwani","year":"2013","journal-title":"Neural Netw."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Polvara, R., Sharma, S., Wan, J., Manning, A., and Sutton, R. (2017, January 6\u20138). Towards autonomous landing on a moving vessel through fiducial markers. Proceedings of the 2017 European Conference on Mobile Robots (ECMR), Paris, France.","DOI":"10.1109\/ECMR.2017.8098671"}],"container-title":["Robotics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2218-6581\/9\/1\/8\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T09:01:38Z","timestamp":1760173298000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2218-6581\/9\/1\/8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,2,25]]},"references-count":37,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2020,3]]}},"alternative-id":["robotics9010008"],"URL":"https:\/\/doi.org\/10.3390\/robotics9010008","relation":{},"ISSN":["2218-6581"],"issn-type":[{"value":"2218-6581","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,2,25]]}}}