{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T19:21:13Z","timestamp":1777490473696,"version":"3.51.4"},"reference-count":52,"publisher":"Cambridge University Press (CUP)","issue":"11","license":[{"start":{"date-parts":[[2019,4,8]],"date-time":"2019-04-08T00:00:00Z","timestamp":1554681600000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Robotica"],"published-print":{"date-parts":[[2019,11]]},"abstract":"<jats:title>Summary<\/jats:title><jats:p>Autonomous landing on the deck of a boat or an unmanned surface vehicle (USV) is the minimum requirement for increasing the autonomy of water monitoring missions. This paper introduces an end-to-end control technique based on deep reinforcement learning for landing an unmanned aerial vehicle on a visual marker located on the deck of a USV. The solution proposed consists of a hierarchy of Deep Q-Networks (DQNs) used as high-level navigation policies that address the two phases of the flight: the marker detection and the descending manoeuvre. Few technical improvements have been proposed to stabilize the learning process, such as the combination of vanilla and double DQNs, and a partitioned buffer replay. Simulated studies proved the robustness of the proposed algorithm against different perturbations acting on the marine vessel. The performances obtained are comparable with a state-of-the-art method based on template matching.<\/jats:p>","DOI":"10.1017\/s0263574719000316","type":"journal-article","created":{"date-parts":[[2019,4,8]],"date-time":"2019-04-08T06:52:03Z","timestamp":1554706323000},"page":"1867-1882","source":"Crossref","is-referenced-by-count":42,"title":["Autonomous Vehicular Landings on the Deck of an Unmanned Surface Vehicle using Deep Reinforcement Learning"],"prefix":"10.1017","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8318-7269","authenticated-orcid":false,"given":"Riccardo","family":"Polvara","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5062-3199","authenticated-orcid":false,"given":"Sanjay","family":"Sharma","sequence":"additional","affiliation":[]},{"given":"Jian","family":"Wan","sequence":"additional","affiliation":[]},{"given":"Andrew","family":"Manning","sequence":"additional","affiliation":[]},{"given":"Robert","family":"Sutton","sequence":"additional","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2019,4,8]]},"reference":[{"key":"S0263574719000316_ref40","unstructured":"40. G. Brockman , V. Cheung , L. Pettersson , J. Schneider , J. Schulman , J. Tang and W. Zaremba , \u201cOpenai gym,\u201d arXiv preprint arXiv:1606.01540 (2016)."},{"key":"S0263574719000316_ref37","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D15-1001"},{"key":"S0263574719000316_ref46","unstructured":"46. M. Abadi , A. Agarwal , P. Barham , E. Brevdo , Z. Chen , C. Citro , G. S. Corrado , A. Davis , J. Dean , M. Devin , et al., \u201cTensorflow: Large-scale machine learning on heterogeneous distributed systems,\u201d arXiv preprint arXiv:1603.04467 (2016)."},{"key":"S0263574719000316_ref36","unstructured":"36. T. Schaul , J. Quan , I. Antonoglou and D. Silver , \u201cPrioritized experience replay,\u201d arXiv preprint arXiv:1511.05952 (2015)."},{"key":"S0263574719000316_ref12","doi-asserted-by":"publisher","DOI":"10.1109\/ICUAS.2014.6842381"},{"key":"S0263574719000316_ref35","volume-title":"Proceedings of the 1993 Connectionist Models Summer School","author":"Thrun","year":"1993"},{"key":"S0263574719000316_ref29","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2014.09.003"},{"key":"S0263574719000316_ref25","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2016.7487175"},{"key":"S0263574719000316_ref23","unstructured":"23. J. A. Bagnell and J. G. Schneider , \u201cAutonomous Helicopter Control Using Reinforcement Learning Policy Search Methods,\u201d IEEE International Conference on Robotics and Automation, 2001. Proceedings 2001 ICRA, vol. 2 (IEEE, 2001) pp. 1615\u20131620."},{"key":"S0263574719000316_ref22","doi-asserted-by":"publisher","DOI":"10.1177\/0278364913495721"},{"key":"S0263574719000316_ref14","doi-asserted-by":"publisher","DOI":"10.1007\/s10846-009-9355-5"},{"key":"S0263574719000316_ref21","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2017.06.007"},{"key":"S0263574719000316_ref10","doi-asserted-by":"publisher","DOI":"10.1007\/978-90-481-8764-5_12"},{"key":"S0263574719000316_ref9","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2011.2163435"},{"key":"S0263574719000316_ref8","doi-asserted-by":"publisher","DOI":"10.1007\/10991459_27"},{"key":"S0263574719000316_ref41","unstructured":"41. N. Koenig and A. Howard , \u201cDesign and Use Paradigms for Gazebo, an Open-Source Multi-robot Simulator,\u201d IEEE\/RSJ International Conference on Intelligent Robots and Systems, (IROS 2004) Sendai, Japan. Proceedings, vol. 3 (IEEE, 2004) pp. 2149\u20132154."},{"key":"S0263574719000316_ref7","doi-asserted-by":"publisher","DOI":"10.1017\/S0263574713000611"},{"key":"S0263574719000316_ref43","doi-asserted-by":"publisher","DOI":"10.1109\/MRA.2015.2511678"},{"key":"S0263574719000316_ref5","doi-asserted-by":"publisher","DOI":"10.1109\/MFI.2014.6997750"},{"key":"S0263574719000316_ref16","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2012.09.004"},{"key":"S0263574719000316_ref20","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2008.02.003"},{"key":"S0263574719000316_ref31","unstructured":"31. V. Mnih , A. P. Badia , M. Mirza , A. Graves , T. Lillicrap , T. Harley , D. Silver and K. Kavukcuoglu , \u201cAsynchronous Methods for Deep Reinforcement Learning,\u201d International Conference on Machine Learning, ICML, New York City, NY, USA (2016) pp. 1928\u20131937."},{"key":"S0263574719000316_ref30","doi-asserted-by":"publisher","DOI":"10.1177\/0278364917710318"},{"key":"S0263574719000316_ref2","first-page":"225","volume-title":"Advances in Cooperative Robotics","author":"Polvara","year":"2017"},{"key":"S0263574719000316_ref34","first-page":"2094","volume-title":"Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence","author":"Van Hasselt","year":"2016"},{"key":"S0263574719000316_ref19","doi-asserted-by":"publisher","DOI":"10.1111\/j.1934-6093.1999.tb00014.x"},{"key":"S0263574719000316_ref39","unstructured":"39. M. Bellemare , Y. Naddaf , J. Veness and M. Bowling , \u201cThe arcade learning environment: An evaluation platform for general agents,\u201d Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina (2015)."},{"key":"S0263574719000316_ref3","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"key":"S0263574719000316_ref33","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2012.11.007"},{"key":"S0263574719000316_ref38","doi-asserted-by":"publisher","DOI":"10.1023\/A:1025696116075"},{"key":"S0263574719000316_ref1","doi-asserted-by":"publisher","DOI":"10.1017\/S0263574708004141"},{"key":"S0263574719000316_ref17","unstructured":"17. M. Sereewattana , M. Ruchanurucks and S. Siddhichai , \u201cDepth Estimation of Markers for UAV Automatic Landing Control Using Stereo Vision with a Single Camera,\u201d International Conference of Information and Communication Technology for Embedded System, Ayutthaya, Thailand (2014)."},{"key":"S0263574719000316_ref45","unstructured":"45. X. Glorot and Y. Bengio , \u201cUnderstanding the Difficulty of Training Deep Feedforward Neural Networks,\u201d In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics ( Y. W. Teh and M. Titterington , eds.), vol. 9 (PMLR, 2010) pp. 249\u2013256."},{"key":"S0263574719000316_ref26","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2017.02.013"},{"key":"S0263574719000316_ref6","doi-asserted-by":"publisher","DOI":"10.1109\/ICUAS.2014.6842377"},{"key":"S0263574719000316_ref27","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2017.06.009"},{"key":"S0263574719000316_ref24","doi-asserted-by":"publisher","DOI":"10.1007\/11552246_35"},{"key":"S0263574719000316_ref32","first-page":"315","volume-title":"Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics","volume":"15","author":"Glorot","year":"2011"},{"key":"S0263574719000316_ref13","doi-asserted-by":"publisher","DOI":"10.1109\/TAES.2002.1145742"},{"key":"S0263574719000316_ref11","doi-asserted-by":"publisher","DOI":"10.1007\/s10846-014-0062-5"},{"key":"S0263574719000316_ref4","first-page":"1","article-title":"End-to-end training of deep visuomotor policies","volume":"17","author":"Levine","year":"2016","journal-title":"J. Mach. Learn. Res."},{"key":"S0263574719000316_ref47","doi-asserted-by":"publisher","DOI":"10.3390\/drones2020015"},{"key":"S0263574719000316_ref49","unstructured":"49. J. Schulman , F. Wolski , P. Dhariwal , A. Radford and O. Klimov , \u201cProximal policy optimization algorithms,\u201d arXiv preprint arXiv:1707.06347 (2017)."},{"key":"S0263574719000316_ref48","unstructured":"48. R. Polvara , M. Patacchiola , S. Sharma , J. Wan , A. Manning , R. Sutton and A. Cangelosi , \u201cAutonomous quadrotor landing using deep reinforcement learning,\u201d arXiv preprint arXiv:1709.03339 (2017)."},{"key":"S0263574719000316_ref50","unstructured":"50. I. Osband , D. Russo and B. Van Roy , \u201c(More) Efficient Reinforcement Learning via Posterior Sampling,\u201d In: Proceedings of the 26th International Conference on Neural Information Processing Systems, vol. 2 (Curran Associates Inc., Lake Tahoe, Nevada, 2013) pp. 3003\u20133011."},{"key":"S0263574719000316_ref51","unstructured":"51. I. Osband , C. Blundell , A. Pritzel and B. Van Roy , \u201cDeep Exploration via Bootstrapped DQN,\u201d In: Advances in Neural Information Processing Systems ( D. D. Lee , M. Sugiyama , U. V. Luxburg , I. Guyon and R. Garnett , eds.), vol. 29 (Curran Associates, Inc., 2016) pp. 4026\u20134034."},{"key":"S0263574719000316_ref52","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2017.8202133"},{"key":"S0263574719000316_ref15","unstructured":"15. C. Theodore , D. Rowley , A. Ansar , L. Matthies , S. Goldberg , D. Hubbard and M. Whalley , \u201cFlight Trials of a Rotorcraft Unmanned Aerial Vehicle Landing Autonomously at Unprepared Sites,\u201d Annual Forum Proceedings\u2013American Helicopter Society, vol. 62, no. 2 (American Helicopter Society, Inc., Phoenix, AZ, USA, 2006) p. 1250."},{"key":"S0263574719000316_ref44","first-page":"26","article-title":"Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude","volume":"4","author":"Tieleman","year":"2012","journal-title":"COURSERA: Neural Networks Mach. Learn."},{"key":"S0263574719000316_ref18","unstructured":"18. H. Shi and H. Wang , \u201cA Vision System for Landing an Unmanned Helicopter in a Complex Environment,\u201d Sixth International Symposium on Multispectral Image Processing and Pattern Recognition (International Society for Optics and Photonics, Yichang, China, 2009) pp. 74 962G\u201374 962G."},{"key":"S0263574719000316_ref42","first-page":"5","volume":"3","author":"Quigley","year":"2009","journal-title":"ICRA Workshop on Open Source Software"},{"key":"S0263574719000316_ref28","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2017.01.003"}],"container-title":["Robotica"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S0263574719000316","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,10,8]],"date-time":"2019-10-08T04:49:42Z","timestamp":1570510182000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S0263574719000316\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,4,8]]},"references-count":52,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2019,11]]}},"alternative-id":["S0263574719000316"],"URL":"https:\/\/doi.org\/10.1017\/s0263574719000316","relation":{},"ISSN":["0263-5747","1469-8668"],"issn-type":[{"value":"0263-5747","type":"print"},{"value":"1469-8668","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,4,8]]}}}