{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,5,13]],"date-time":"2025-05-13T21:58:21Z","timestamp":1747173501903,"version":"3.40.5"},"reference-count":58,"publisher":"Cambridge University Press (CUP)","issue":"2","license":[{"start":{"date-parts":[[2020,3,16]],"date-time":"2020-03-16T00:00:00Z","timestamp":1584316800000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["AIEDAM"],"published-print":{"date-parts":[[2020,5]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Collision avoidance for robots and vehicles in unpredictable environments is a challenging task. Various control strategies have been developed for the agent (i.e., robots or vehicles) to sense the environment, assess the situation, and select the optimal actions to avoid collision and accomplish its mission. In our research on autonomous ships, we take a machine learning approach to collision avoidance. The lack of available ship steering data of human ship masters has made it necessary to acquire collision avoidance knowledge through reinforcement learning (RL). Given that the learned neural network tends to be a black box, it is desirable that a method is available which can be used to design an agent's behavior so that the desired knowledge can be captured. Furthermore, RL with complex tasks can be either time consuming or unfeasible. A multi-stage learning method is needed in which agents can learn from simple tasks and then transfer their learned knowledge to closely related but more complex tasks. In this paper, we explore the ways of designing agent behaviors through tuning reward functions and devise a transfer RL method for multi-stage knowledge acquisition. The computer simulation-based agent training results have shown that it is important to understand the roles of each component in a reward function and the various design parameters in transfer RL. The settings of these parameters are all dependent on the complexity of the tasks and the similarities between them.<\/jats:p>","DOI":"10.1017\/s0890060420000141","type":"journal-article","created":{"date-parts":[[2020,3,16]],"date-time":"2020-03-16T08:38:36Z","timestamp":1584347916000},"page":"207-222","source":"Crossref","is-referenced-by-count":11,"title":["Reinforcement learning-based collision avoidance: impact of reward function and knowledge transfer"],"prefix":"10.1017","volume":"34","author":[{"given":"Xiongqing","family":"Liu","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6502-5837","authenticated-orcid":false,"given":"Yan","family":"Jin","sequence":"additional","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2020,3,16]]},"reference":[{"unstructured":"Parisotto, E , Ba, J and Salakhutdinov, R (2016) Actor-mimic: deep multitask and transfer reinforcement learning. arXiv:1511.06342v4 [cs.LG] 22 Feb 2016.","key":"S0890060420000141_ref41"},{"doi-asserted-by":"publisher","key":"S0890060420000141_ref34","DOI":"10.1007\/978-1-4757-6451-2_4"},{"doi-asserted-by":"publisher","key":"S0890060420000141_ref47","DOI":"10.2200\/S00268ED1V01Y201005AIM009"},{"key":"S0890060420000141_ref23","first-page":"183","article-title":"On the design of marine traffic control system (1st report)","volume":"162","author":"Jin","year":"1987","journal-title":"Journal of the Society of Naval Architects of Japan"},{"volume-title":"Neuro-Dynamic Programming","year":"1996","author":"Bertsekas","key":"S0890060420000141_ref4"},{"unstructured":"Bojarski, M , Del Testa, D , Dworakowski, D , Firner, B , Flepp, B , Goyal, P , Jackel, LD , Monfort, M , Muller, U , Zhang, J , Zhang, X , Zhao, J and Zieba, K (2016) End to end learning for self-driving cars. arXiv: 1604.07316 [cs.LG].","key":"S0890060420000141_ref5"},{"key":"S0890060420000141_ref15","doi-asserted-by":"crossref","first-page":"282","DOI":"10.1016\/j.ssci.2016.02.001","article-title":"Statistical analysis of ship accidents and review of safety level","volume":"85","author":"Eleftheria","year":"2016","journal-title":"Safety Science"},{"unstructured":"Coates, A , Huval, B , Wang, T , Wu, D and Ng, A (2013) Deep learning with COTS HPC systems. Proceedings of the 30th International Conference on Machine Learning. PMLR, Vol. 28. pp. 1337\u20131345.","key":"S0890060420000141_ref11"},{"key":"S0890060420000141_ref57","first-page":"1","volume-title":"Deep Reinforcement Learning for Simulated Autonomous Vehicle Control. Course Project Reports: Winter 2016 (CS23 1n: Convolutional Neural Networks for Visual Recognition)","author":"Yu","year":"2016"},{"doi-asserted-by":"crossref","unstructured":"Torrey, L , Shavlik, J , Walker, T and Maclin, R (2006) Skill acquisition via transfer learning and advice taking. European Conference on Machine Learning. Berlin, Heidelberg: Springer, pp. 425\u2013436.10.1007\/11871842_41","key":"S0890060420000141_ref50","DOI":"10.1007\/11871842_41"},{"doi-asserted-by":"publisher","key":"S0890060420000141_ref24","DOI":"10.1109\/ICUAS.2016.7502631"},{"doi-asserted-by":"publisher","key":"S0890060420000141_ref16","DOI":"10.1017\/S0263574708004438"},{"doi-asserted-by":"publisher","key":"S0890060420000141_ref6","DOI":"10.1108\/01439919610108828"},{"doi-asserted-by":"publisher","key":"S0890060420000141_ref14","DOI":"10.1109\/ICASSP.2016.7472110"},{"doi-asserted-by":"publisher","key":"S0890060420000141_ref8","DOI":"10.1109\/MCSE.2016.74"},{"volume-title":"U.S. Patent No. 9,492,235","year":"2016","author":"Hourtash","key":"S0890060420000141_ref22"},{"doi-asserted-by":"publisher","key":"S0890060420000141_ref1","DOI":"10.1007\/978-3-642-32723-0_15"},{"doi-asserted-by":"publisher","key":"S0890060420000141_ref19","DOI":"10.1016\/j.ssci.2013.09.010"},{"doi-asserted-by":"publisher","key":"S0890060420000141_ref30","DOI":"10.1038\/nature14539"},{"doi-asserted-by":"publisher","key":"S0890060420000141_ref37","DOI":"10.1109\/TITS.2015.2409109"},{"key":"S0890060420000141_ref40","doi-asserted-by":"crossref","first-page":"1345","DOI":"10.1109\/TKDE.2009.191","article-title":"A survey on transfer learning","volume":"22","author":"Pan","year":"2010","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"doi-asserted-by":"publisher","key":"S0890060420000141_ref33","DOI":"10.1177\/0278364907084441"},{"doi-asserted-by":"publisher","key":"S0890060420000141_ref49","DOI":"10.1145\/1273496.1273607"},{"unstructured":"Wang, Z , School, T , Hessel, M , van Haselt, H , Lanctot, M and de Freitas, N (2016 b) Dueling network architectures for deep reinforcement learning. arXiv:1511.06581v3 [cs.LG] 5 Apr.","key":"S0890060420000141_ref53"},{"doi-asserted-by":"publisher","key":"S0890060420000141_ref28","DOI":"10.1109\/ICRA.2015.7139555"},{"doi-asserted-by":"publisher","key":"S0890060420000141_ref39","DOI":"10.1109\/TIV.2016.2571067"},{"doi-asserted-by":"publisher","key":"S0890060420000141_ref2","DOI":"10.1109\/ICDMW.2007.109"},{"doi-asserted-by":"publisher","key":"S0890060420000141_ref27","DOI":"10.1145\/3065386"},{"doi-asserted-by":"publisher","key":"S0890060420000141_ref3","DOI":"10.1007\/s10115-013-0647-5"},{"doi-asserted-by":"publisher","key":"S0890060420000141_ref9","DOI":"10.1109\/ICRA.2017.7989037"},{"doi-asserted-by":"publisher","key":"S0890060420000141_ref10","DOI":"10.7551\/mitpress\/11207.001.0001"},{"unstructured":"Dean, J , Corrado, G , Monga, R , Kai, C , Devin, M , Mao, M , Ranzato, M , Senior, A , Tucker, P , Yang, K , Le, QV and Ng, AY (2012) Large scale distributed deep networks. NIPS'12: Proceedings of the 25th International Conference on Neural Information Processing Systems, Vol. 1. Red Hook, NY, USA: Curran Associates Inc.","key":"S0890060420000141_ref12"},{"doi-asserted-by":"publisher","key":"S0890060420000141_ref44","DOI":"10.1038\/nature24270"},{"doi-asserted-by":"publisher","key":"S0890060420000141_ref13","DOI":"10.1109\/ICASSP.2014.6854950"},{"doi-asserted-by":"publisher","key":"S0890060420000141_ref17","DOI":"10.1145\/1160633.1160762"},{"volume-title":"A Seaman's Guide to the Rule of the Road","year":"2009","author":"Ford","key":"S0890060420000141_ref18"},{"doi-asserted-by":"publisher","key":"S0890060420000141_ref20","DOI":"10.1109\/TENCONSpring.2016.7519416"},{"unstructured":"Hinton, G , Vinyals, O and Dean, J (2015) Distilling the Knowledge in a Neural Network. arXiv. 1503.02531v1 [stat.ML] 9 Mar.","key":"S0890060420000141_ref21"},{"doi-asserted-by":"publisher","key":"S0890060420000141_ref25","DOI":"10.1177\/027836498600500106"},{"unstructured":"Kingma, DP and Ba, J (2015) Adam: A method for stochastic optimization, in Proceedings of ICLR, 2015.","key":"S0890060420000141_ref26"},{"unstructured":"Le, Q , Ranzato, M , Monga, R , Devin, M , Chen, K , Corrado, G , Dean, J and Ng, A (2012) Building high-level features using large scale unsupervised learning. International Conference on Machine Learning: arXiv: 1112.6209v5 [cs.LG].","key":"S0890060420000141_ref29"},{"doi-asserted-by":"publisher","key":"S0890060420000141_ref32","DOI":"10.1109\/ICRA.2016.7487477"},{"unstructured":"Mericli, C , Mericli, T and Akin, HL (2010) A reward function generation method using genetic algorithms: a robot soccer case study (extended abstract). Proceeding of 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2010), Vol. 1\u20133, 10\u201314 May 2010, Toronto, Canada.","key":"S0890060420000141_ref35"},{"unstructured":"Mnih, V , Kavukcuoglu, K , Silver, D , Graves, A , Antonoglou, I , Wierstra, D and Riedmiller, M (2013) Playing Atari with deep reinforcement learning. arXiv:1312.5602v1 [cs.LG].","key":"S0890060420000141_ref36"},{"unstructured":"Ng, AY and Russell, S (2000) Algorithms for inversereinforcement learning, in Proceedings of ICML 2000.","key":"S0890060420000141_ref38"},{"unstructured":"Watkins, CJCH (1989) Learning from delayed rewards (Doctoral dissertation). Cambridge University, Cambridge University Press, Cambrdige, UK.","key":"S0890060420000141_ref54"},{"unstructured":"Schaul, T , Quan, J , Antonoglou, I and Silver, D (2016) Prioritized experience replay. arXiv:1511.05952v4 [cs.LG] 25 Feb 2016.","key":"S0890060420000141_ref42"},{"volume-title":"Reinforcement Learning: An Introduction","year":"2018","author":"Sutton","key":"S0890060420000141_ref46"},{"unstructured":"Tang, S and Kumar, V (2015) A complete algorithm for generating safe trajectories for multi-robot teams. In: Bicchi A., Burgard W. (eds) Robotics Research. Springer Proceedings in Advanced Robotics, vol 3. New York, NY, USA: Springer, pp 599\u2013616.","key":"S0890060420000141_ref48"},{"doi-asserted-by":"crossref","unstructured":"van Hasselt, H , Guez, A and Silver, D (2015) Deep reinforcement learning with double Q-learning. arXiv:1509.06461v3 [cs.LG].","key":"S0890060420000141_ref51","DOI":"10.1609\/aaai.v30i1.10295"},{"key":"S0890060420000141_ref52","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1109\/JAS.2016.7471613","article-title":"Where does AlphaGo go: from Church-Turing thesis to AlphaGo thesis and beyond","volume":"3","author":"Wang","year":"2016","journal-title":"IEEE\/CAA Journal of Automatica Sinica"},{"unstructured":"Yosinski, J , Clune, J , Bengio, Y and Lipson, H (2014) How transferrable are features in deep neural networks? NIPS'14: Proceedings of the 27th International Conference on Neural Information Processing Systems, Vol. 2. Cambridge, MA, USA: MIT Press.","key":"S0890060420000141_ref56"},{"doi-asserted-by":"publisher","key":"S0890060420000141_ref55","DOI":"10.1007\/s12239-017-0007-7"},{"doi-asserted-by":"publisher","key":"S0890060420000141_ref58","DOI":"10.1109\/DSN-W.2016.12"},{"unstructured":"Liu, X and Jin, Y (2018) Design of transfer reinforcement learning under low task similarity. ASME 2018 International Design Engineering Technical Conferences & Computers and Information in Engineering Conference, IDETC2018-86013, 26\u201329 August 2018. Quebec City, Quebec, Canada: American Society of Mechanical Engineers Digital Collection.","key":"S0890060420000141_ref31"},{"doi-asserted-by":"publisher","key":"S0890060420000141_ref45","DOI":"10.1038\/nature16961"},{"doi-asserted-by":"publisher","key":"S0890060420000141_ref43","DOI":"10.1007\/s12369-014-0238-y"},{"doi-asserted-by":"publisher","key":"S0890060420000141_ref7","DOI":"10.1093\/mnras\/stu1065"}],"container-title":["Artificial Intelligence for Engineering Design, Analysis and Manufacturing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S0890060420000141","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,10,18]],"date-time":"2022-10-18T21:41:38Z","timestamp":1666129298000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S0890060420000141\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,3,16]]},"references-count":58,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2020,5]]}},"alternative-id":["S0890060420000141"],"URL":"https:\/\/doi.org\/10.1017\/s0890060420000141","relation":{},"ISSN":["0890-0604","1469-1760"],"issn-type":[{"type":"print","value":"0890-0604"},{"type":"electronic","value":"1469-1760"}],"subject":[],"published":{"date-parts":[[2020,3,16]]}}}