{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,11]],"date-time":"2026-04-11T04:23:45Z","timestamp":1775881425973,"version":"3.50.1"},"reference-count":28,"publisher":"Springer Science and Business Media LLC","issue":"4-5","license":[{"start":{"date-parts":[[2024,6,6]],"date-time":"2024-06-06T00:00:00Z","timestamp":1717632000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,6,6]],"date-time":"2024-06-06T00:00:00Z","timestamp":1717632000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100009534","name":"Universit\u00e4t Stuttgart","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100009534","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Auton Robot"],"published-print":{"date-parts":[[2024,7]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Multi-rotor UAVs suffer from a restricted range and flight duration due to limited battery capacity. Autonomous landing on a 2D moving platform offers the possibility to replenish batteries and offload data, thus increasing the utility of the vehicle. Classical approaches rely on accurate, complex and difficult-to-derive models of the vehicle and the environment. Reinforcement learning (RL) provides an attractive alternative due to its ability to learn a suitable control policy exclusively from data during a training procedure. However, current methods require several hours to train, have limited success rates and depend on hyperparameters that need to be tuned by trial-and-error. We address all these issues in this work. First, we decompose the landing procedure into a sequence of simpler, but similar learning tasks. This is enabled by applying two instances of the same RL based controller trained for 1D motion for controlling the multi-rotor\u2019s movement in both the longitudinal and the lateral directions. Second, we introduce a powerful state space discretization technique that is based on i) kinematic modeling of the moving platform to derive information about the state space topology and ii) structuring the training as a sequential curriculum using transfer learning. Third, we leverage the kinematics model of the moving platform to also derive interpretable hyperparameters for the training process that ensure sufficient maneuverability of the multi-rotor vehicle. The training is performed using the tabular RL method <jats:italic>Double Q-Learning<\/jats:italic>. Through extensive simulations we show that the presented method significantly increases the rate of successful landings, while requiring less training time compared to other deep RL approaches. Furthermore, for two comparison scenarios it achieves comparable performance than a cascaded PI controller. Finally, we deploy and demonstrate our algorithm on real hardware. For all evaluation scenarios we provide statistics on the agent\u2019s performance. Source code is openly available at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/robot-perception-group\/rl_multi_rotor_landing\">https:\/\/github.com\/robot-perception-group\/rl_multi_rotor_landing<\/jats:ext-link>.<\/jats:p>","DOI":"10.1007\/s10514-024-10162-8","type":"journal-article","created":{"date-parts":[[2024,6,6]],"date-time":"2024-06-06T15:02:08Z","timestamp":1717686128000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Reinforcement learning based autonomous multi-rotor landing on moving platforms"],"prefix":"10.1007","volume":"48","author":[{"given":"Pascal","family":"Goldschmid","sequence":"first","affiliation":[]},{"given":"Aamir","family":"Ahmad","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,6,6]]},"reference":[{"key":"10162_CR1","unstructured":"Anderson, C.W., & Crawford-Hines, S. G. (1994). Multigrid q-learning. technical report cs-94-121"},{"issue":"2","key":"10162_CR2","doi-asserted-by":"publisher","first-page":"369","DOI":"10.1007\/s10846-016-0399-z","volume":"85","author":"O Araar","year":"2017","unstructured":"Araar, O., Aouf, N., & Vitanov, I. (2017). Vision based autonomous landing of multirotor uav on moving platform. Journal of Intelligent & Robotic Systems, 85(2), 369\u2013384. https:\/\/doi.org\/10.1007\/s10846-016-0399-z","journal-title":"Journal of Intelligent & Robotic Systems"},{"issue":"1","key":"10162_CR3","doi-asserted-by":"publisher","first-page":"10488","DOI":"10.1016\/j.ifacol.2017.08.1980","volume":"50","author":"A Borowczyk","year":"2017","unstructured":"Borowczyk, A., Nguyen, D. T., Phu-Van Nguyen, A., et al. (2017). Autonomous landing of a multirotor micro air vehicle on a high velocity ground vehicle. IFAC-PapersOnLine, 50(1), 10488\u201310494. https:\/\/doi.org\/10.1016\/j.ifacol.2017.08.1980","journal-title":"IFAC-PapersOnLine"},{"key":"10162_CR4","first-page":"1","volume":"5","author":"E Even-Dar","year":"2004","unstructured":"Even-Dar, E., & Mansour, Y. (2004). Learning rates for q-learning. Journal of Machine Learning Research, 5, 1\u201325.","journal-title":"Journal of Machine Learning Research"},{"key":"10162_CR5","doi-asserted-by":"publisher","unstructured":"Falanga, D., Zanchettin, A., & Simovic, A., et\u00a0al. (2017). Vision-based autonomous quadrotor landing on a moving platform. In 2017 IEEE International Symposium on Safety, Security and Rescue Robotics (SSRR), pp 200\u2013207, https:\/\/doi.org\/10.1109\/SSRR.2017.8088164","DOI":"10.1109\/SSRR.2017.8088164"},{"key":"10162_CR6","doi-asserted-by":"crossref","unstructured":"Furrer, F., Burri, M., & Achtelik, M., et\u00a0al. (2016). Robot Operating System (ROS): The Complete Reference (Volume 1), Springer International Publishing, Cham, chap RotorS\u2014A Modular Gazebo MAV Simulator Framework, pp. 595\u2013625.","DOI":"10.1007\/978-3-319-26054-9_23"},{"key":"10162_CR7","doi-asserted-by":"publisher","unstructured":"Gautam, A., Sujit, P., & Saripalli, S. (2015). Application of guidance laws to quadrotor landing. In 2015 International Conference on Unmanned Aircraft Systems (ICUAS), pp. 372\u2013379, https:\/\/doi.org\/10.1109\/ICUAS.2015.7152312","DOI":"10.1109\/ICUAS.2015.7152312"},{"key":"10162_CR8","unstructured":"Hasselt, H., et al. (2010). Double q-learning. In J. Lafferty, C. Williams, & J. Shawe-Taylor (Eds.), Advances in Neural Information Processing Systems.  (Vol. 23). Curran Associates Inc."},{"key":"10162_CR9","doi-asserted-by":"publisher","unstructured":"Hu, B., Lu, L., & Mishra, S. (2015). Fast, safe and precise landing of a quadrotor on an oscillating platform. In 2015 American Control Conference (ACC), pp. 3836\u20133841, https:\/\/doi.org\/10.1109\/ACC.2015.7171928","DOI":"10.1109\/ACC.2015.7171928"},{"key":"10162_CR10","doi-asserted-by":"publisher","unstructured":"Kooi, J. E., & Babu\u0161ka, R. (2021). Inclined quadrotor landing using deep reinforcement learning. In 2021 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), pp 2361\u20132368, https:\/\/doi.org\/10.1109\/IROS51168.2021.9636096","DOI":"10.1109\/IROS51168.2021.9636096"},{"key":"10162_CR11","doi-asserted-by":"publisher","unstructured":"Lampton, A., & Valasek, J. (2009). Multiresolution state-space discretization method for q-learning. In 2009 American Control Conference, pp. 1646\u20131651, https:\/\/doi.org\/10.1109\/ACC.2009.5160474","DOI":"10.1109\/ACC.2009.5160474"},{"key":"10162_CR12","doi-asserted-by":"publisher","unstructured":"Lee, S., Shim, T., & Kim, S., et\u00a0al. (2018). Vision-based autonomous landing of a multi-copter unmanned aerial vehicle using reinforcement learning. In 2018 International Conference on Unmanned Aircraft Systems (ICUAS), pp. 108\u2013114, https:\/\/doi.org\/10.1109\/ICUAS.2018.8453315","DOI":"10.1109\/ICUAS.2018.8453315"},{"key":"10162_CR13","doi-asserted-by":"publisher","unstructured":"Ling, K., Chow, D., & Das, A., et\u00a0al. (2014). Autonomous maritime landings for low-cost vtol aerial vehicles. In 2014 Canadian Conference on Computer and Robot Vision, pp. 32\u201339, https:\/\/doi.org\/10.1109\/CRV.2014.13","DOI":"10.1109\/CRV.2014.13"},{"key":"10162_CR14","doi-asserted-by":"publisher","unstructured":"Miyazaki, R., Jiang, R., & Paul, H., et\u00a0al. (2018). Airborne docking for multi-rotor aerial manipulations. In 2018 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4708\u20134714, https:\/\/doi.org\/10.1109\/IROS.2018.8594513","DOI":"10.1109\/IROS.2018.8594513"},{"key":"10162_CR15","unstructured":"Mnih, V., Kavukcuoglu, K., & Silver, D., et\u00a0al. (2013). Playing atari with deep reinforcement learning"},{"issue":"2","key":"10162_CR16","doi-asserted-by":"publisher","first-page":"989","DOI":"10.1002\/asjc.1758","volume":"21","author":"H Mo","year":"2019","unstructured":"Mo, H., & Farid, G. (2019). Nonlinear and adaptive intelligent control techniques for quadrotor uav-a survey. Asian Journal of Control, 21(2), 989\u20131008. https:\/\/doi.org\/10.1002\/asjc.1758","journal-title":"Asian Journal of Control"},{"issue":"181","key":"10162_CR17","first-page":"1","volume":"21","author":"S Narvekar","year":"2020","unstructured":"Narvekar, S., Peng, B., Leonetti, M., et al. (2020). Curriculum learning for reinforcement learning domains: A framework and survey. Journal of Machine Learning Research, 21(181), 1\u201350.","journal-title":"Journal of Machine Learning Research"},{"key":"10162_CR18","doi-asserted-by":"publisher","unstructured":"Polvara, R., Patacchiola, M., & Sharma, S., et\u00a0al. (2018). Toward end-to-end control for uav autonomous landing via deep reinforcement learning. In 2018 International Conference on Unmanned Aircraft Systems (ICUAS), pp. 115\u2013123, https:\/\/doi.org\/10.1109\/ICUAS.2018.8453449","DOI":"10.1109\/ICUAS.2018.8453449"},{"issue":"11","key":"10162_CR19","doi-asserted-by":"publisher","first-page":"1867","DOI":"10.1017\/S0263574719000316","volume":"37","author":"R Polvara","year":"2019","unstructured":"Polvara, R., Sharma, S., Wan, J., et al. (2019). Autonomous vehicular landings on the deck of an unmanned surface vehicle using deep reinforcement learning. Robotica, 37(11), 1867\u20131882. https:\/\/doi.org\/10.1017\/S0263574719000316","journal-title":"Robotica"},{"key":"10162_CR20","doi-asserted-by":"publisher","unstructured":"Rodriguez-Ramos, A., Sampedro, C., & Bavle, H., et\u00a0al. (2018). A deep reinforcement learning technique for vision-based autonomous multirotor landing on a moving platform. In 2018 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), pp 1010\u20131017, https:\/\/doi.org\/10.1109\/IROS.2018.8594472","DOI":"10.1109\/IROS.2018.8594472"},{"issue":"1","key":"10162_CR21","doi-asserted-by":"publisher","first-page":"351","DOI":"10.1007\/s10846-018-0891-8","volume":"93","author":"A Rodriguez-Ramos","year":"2019","unstructured":"Rodriguez-Ramos, A., Sampedro, C., Bavle, H., et al. (2019). A deep reinforcement learning strategy for uav autonomous landing on a moving platform. Journal of Intelligent & Robotic Systems, 93(1), 351\u2013366. https:\/\/doi.org\/10.1007\/s10846-018-0891-8","journal-title":"Journal of Intelligent & Robotic Systems"},{"key":"10162_CR22","doi-asserted-by":"publisher","DOI":"10.1155\/2017\/1823056","author":"P Sanchez-Cuevas","year":"2017","unstructured":"Sanchez-Cuevas, P., Heredia, G., & Ollero, A. (2017). Characterization of the aerodynamic ground effect and its influence in multirotor control. International Journal of Aerospace Engineering. https:\/\/doi.org\/10.1155\/2017\/1823056","journal-title":"International Journal of Aerospace Engineering"},{"key":"10162_CR23","doi-asserted-by":"publisher","unstructured":"Shi, G., Shi, X., & O\u2019Connell, M., et\u00a0al. (2019). Neural lander: Stable drone landing control using learned dynamics. In 2019 International Conference on Robotics and Automation (ICRA), pp. 9784\u20139790, https:\/\/doi.org\/10.1109\/ICRA.2019.8794351","DOI":"10.1109\/ICRA.2019.8794351"},{"key":"10162_CR24","volume-title":"Reinforcement Learning: An Introduction Second edition, in progress","author":"RS Sutton","year":"2015","unstructured":"Sutton, R. S., & Barto, A. G. (2015). Reinforcement Learning: An Introduction Second edition, in progress. London, England: The MIT Press, Cambridge, Massachusetts."},{"key":"10162_CR25","doi-asserted-by":"publisher","unstructured":"Vlantis, P., Marantos, P., & Bechlioulis, C. P., et\u00a0al. (2015). Quadrotor landing on an inclined platform of a moving ground vehicle. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 2202\u20132207, https:\/\/doi.org\/10.1109\/ICRA.2015.7139490","DOI":"10.1109\/ICRA.2015.7139490"},{"key":"10162_CR26","doi-asserted-by":"publisher","unstructured":"Wang, P., Man, Z., & Cao, Z., et\u00a0al. (2016). Dynamics modelling and linear control of quadcopter. In 2016 International Conference on Advanced Mechatronic Systems (ICAMechS), pp. 498\u2013503, https:\/\/doi.org\/10.1109\/ICAMechS.2016.7813499","DOI":"10.1109\/ICAMechS.2016.7813499"},{"issue":"1","key":"10162_CR27","doi-asserted-by":"publisher","first-page":"221","DOI":"10.1007\/s10846-010-9473-0","volume":"61","author":"KE Wenzel","year":"2011","unstructured":"Wenzel, K. E., Masselli, A., & Zell, A. (2011). Automatic take off, tracking and landing of a miniature uav on a moving carrier vehicle. Journal of Intelligent & Robotic Systems, 61(1), 221\u2013238. https:\/\/doi.org\/10.1007\/s10846-010-9473-0","journal-title":"Journal of Intelligent & Robotic Systems"},{"key":"10162_CR28","doi-asserted-by":"publisher","unstructured":"Zhong, D., Zhang, X., & Sun, H., et\u00a0al. (2016). A vision-based auxiliary system of multirotor unmanned aerial vehicles for autonomous rendezvous and docking. In 2016 International Joint Conference on Neural Networks (IJCNN), pp. 4586\u20134592, https:\/\/doi.org\/10.1109\/IJCNN.2016.7727801","DOI":"10.1109\/IJCNN.2016.7727801"}],"container-title":["Autonomous Robots"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10514-024-10162-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10514-024-10162-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10514-024-10162-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,10]],"date-time":"2024-07-10T10:08:22Z","timestamp":1720606102000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10514-024-10162-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,6]]},"references-count":28,"journal-issue":{"issue":"4-5","published-print":{"date-parts":[[2024,7]]}},"alternative-id":["10162"],"URL":"https:\/\/doi.org\/10.1007\/s10514-024-10162-8","relation":{},"ISSN":["0929-5593","1573-7527"],"issn-type":[{"value":"0929-5593","type":"print"},{"value":"1573-7527","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,6,6]]},"assertion":[{"value":"4 July 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 April 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 June 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Both authors\u2019 primary affiliation is the Institute of Flight Mechanics and Controls, University of Stuttgart. Pascal Goldschmid is employed as a research associate and Aamir Ahmad as a tenure-track professor. Both authors are also affiliated with the Perceiving Systems department of the Max Planck Institute for Intelligent Systems.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"13"}}