{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T02:32:52Z","timestamp":1760149972121,"version":"build-2065373602"},"reference-count":59,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2023,9,28]],"date-time":"2023-09-28T00:00:00Z","timestamp":1695859200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"University of Moratuwa and CodeGen International (Pvt) Ltd under the Q-Bits Scholar grant"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Robotics"],"abstract":"<jats:p>Recent advancements in artificial intelligence have enabled reinforcement learning (RL) agents to exceed human-level performance in various gaming tasks. However, despite the state-of-the-art performance demonstrated by model-free RL algorithms, they suffer from high sample complexity. Hence, it is uncommon to find their applications in robotics, autonomous navigation, and self-driving, as gathering many samples is impractical in real-world hardware systems. Therefore, developing sample-efficient learning algorithms for RL agents is crucial in deploying them in real-world tasks without sacrificing performance. This paper presents an advisor-based learning algorithm, incorporating prior knowledge into the training by modifying the deep deterministic policy gradient algorithm to reduce the sample complexity. Also, we propose an effective method of employing an advisor in data collection to train autonomous navigation agents to maneuver physical platforms, minimizing the risk of collision. We analyze the performance of our methods with the support of simulation and physical experimental setups. Experiments reveal that incorporating an advisor into the training phase significantly reduces the sample complexity without compromising the agent\u2019s performance compared to various benchmark approaches. Also, they show that the advisor\u2019s constant involvement in the data collection process diminishes the agent\u2019s performance, while the limited involvement makes training more effective.<\/jats:p>","DOI":"10.3390\/robotics12050133","type":"journal-article","created":{"date-parts":[[2023,9,28]],"date-time":"2023-09-28T07:50:26Z","timestamp":1695887426000},"page":"133","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["An Advisor-Based Architecture for a Sample-Efficient Training of Autonomous Navigation Agents with Reinforcement Learning"],"prefix":"10.3390","volume":"12","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4003-6337","authenticated-orcid":false,"given":"Rukshan Darshana","family":"Wijesinghe","sequence":"first","affiliation":[{"name":"Department of Electronic and Telecommunication Engineering, Faculty of Engineering, University of Moratuwa, Moratuwa 10400, Sri Lanka"},{"name":"CODEGEN QBITS LAB, University of Moratuwa, Moratuwa 10400, Sri Lanka"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7461-0165","authenticated-orcid":false,"given":"Dumindu","family":"Tissera","sequence":"additional","affiliation":[{"name":"Department of Electronic and Telecommunication Engineering, Faculty of Engineering, University of Moratuwa, Moratuwa 10400, Sri Lanka"},{"name":"CODEGEN QBITS LAB, University of Moratuwa, Moratuwa 10400, Sri Lanka"}]},{"given":"Mihira Kasun","family":"Vithanage","sequence":"additional","affiliation":[{"name":"CODEGEN QBITS LAB, University of Moratuwa, Moratuwa 10400, Sri Lanka"},{"name":"Department of Computational Mathematics, Faculty of Information Technology, University of Moratuwa, Moratuwa 10400, Sri Lanka"}]},{"given":"Alex","family":"Xavier","sequence":"additional","affiliation":[{"name":"CODEGEN QBITS LAB, University of Moratuwa, Moratuwa 10400, Sri Lanka"},{"name":"Department of Computer Science and Engineering, Faculty of Engineering, University of Moratuwa, Moratuwa 10400, Sri Lanka"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2621-5291","authenticated-orcid":false,"given":"Subha","family":"Fernando","sequence":"additional","affiliation":[{"name":"CODEGEN QBITS LAB, University of Moratuwa, Moratuwa 10400, Sri Lanka"},{"name":"Department of Computational Mathematics, Faculty of Information Technology, University of Moratuwa, Moratuwa 10400, Sri Lanka"}]},{"given":"Jayathu","family":"Samarawickrama","sequence":"additional","affiliation":[{"name":"Department of Electronic and Telecommunication Engineering, Faculty of Engineering, University of Moratuwa, Moratuwa 10400, Sri Lanka"},{"name":"CODEGEN QBITS LAB, University of Moratuwa, Moratuwa 10400, Sri Lanka"}]}],"member":"1968","published-online":{"date-parts":[[2023,9,28]]},"reference":[{"key":"ref_1","unstructured":"Yang, Y., and Wang, J. (2020). An overview of multi-agent reinforcement learning from game theoretical perspective. arXiv."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Lample, G., and Chaplot, D.S. (2017, January 4\u20139). Playing FPS games with deep reinforcement learning. Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA.","DOI":"10.1609\/aaai.v31i1.10827"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3543846","article-title":"Reinforcement learning based recommender systems: A survey","volume":"55","author":"Afsar","year":"2022","journal-title":"ACM Comput. Surv."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1238","DOI":"10.1177\/0278364913495721","article-title":"Reinforcement learning in robotics: A survey","volume":"32","author":"Kober","year":"2013","journal-title":"Int. J. Robot. Res."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"873","DOI":"10.1080\/01691864.2020.1757504","article-title":"Learning efficient push and grasp policy in a tote box from simulation","volume":"34","author":"Ni","year":"2020","journal-title":"Adv. Robot."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"9688","DOI":"10.1016\/j.ifacol.2020.12.2619","article-title":"Deep reinforcement learning for snake robot locomotion","volume":"53","author":"Shi","year":"2020","journal-title":"IFAC-PapersOnLine"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"13159","DOI":"10.1109\/ACCESS.2021.3052024","article-title":"Robotic information gathering with reinforcement learning assisted by domain knowledge: An application to gas source localization","volume":"9","author":"Wiedemann","year":"2021","journal-title":"IEEE Access"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1109\/MM.2009.77","article-title":"Dynamic multicore resource management: A machine learning approach","volume":"29","author":"Martinez","year":"2009","journal-title":"IEEE Micro"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1145\/1394608.1382172","article-title":"Self-optimizing memory controllers: A reinforcement learning approach","volume":"36","author":"Ipek","year":"2008","journal-title":"ACM SIGARCH Comput. Archit. News"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"2124","DOI":"10.1109\/TVT.2018.2890773","article-title":"Autonomous navigation of UAVs in large-scale complex environments: A deep reinforcement learning approach","volume":"68","author":"Wang","year":"2019","journal-title":"IEEE Trans. Veh. Technol."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1359","DOI":"10.1080\/01691864.2021.1977696","article-title":"SegVisRL: Development of a robot\u2019s neural visuomotor and planning system for lunar exploration","volume":"35","author":"Blum","year":"2021","journal-title":"Adv. Robot."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"939","DOI":"10.1080\/01691864.2021.1938671","article-title":"Navigation system with SLAM-based trajectory topological map and reinforcement learning-based local planner","volume":"35","author":"Xue","year":"2021","journal-title":"Adv. Robot."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"484","DOI":"10.1038\/nature16961","article-title":"Mastering the game of Go with deep neural networks and tree search","volume":"529","author":"Silver","year":"2016","journal-title":"Nature"},{"key":"ref_15","unstructured":"Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2016, January 2\u20134). Continuous control with deep reinforcement learning. Proceedings of the International Conference on Learning Representations (ICLR), San Juan, PR, USA."},{"key":"ref_16","unstructured":"Fujimoto, S., Hoof, H., and Meger, D. (2018, January 10\u201315). Addressing function approximation error in actor\u2013critic methods. Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden."},{"key":"ref_17","unstructured":"Hasselt, H. (2010, January 6\u20138). Double Q-learning. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Liu, R., Nageotte, F., Zanne, P., de Mathelin, M., and Dresp-Langley, B. (2021). Deep reinforcement learning for the control of robotic manipulation: A focussed mini-review. Robotics, 10.","DOI":"10.3390\/robotics10010022"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"122","DOI":"10.3390\/robotics2030122","article-title":"Reinforcement learning in robotics: Applications and real-world challenges","volume":"2","author":"Kormushev","year":"2013","journal-title":"Robotics"},{"key":"ref_20","unstructured":"Wijesinghe, R., Vithanage, K., Tissera, D., Xavier, A., Fernando, S., and Samarawickrama, J. (2021). Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Sutton, R.S., and Barto, A.G. (1998). Introduction to Reinforcement Learning, MIT Press.","DOI":"10.1109\/TNN.1998.712192"},{"key":"ref_22","unstructured":"Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., and Riedmiller, M. (2014, January 21\u201326). Deterministic policy gradient algorithms. Proceedings of the International Conference Machine Learning (ICML), Beijing, China."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1180","DOI":"10.1016\/j.neucom.2007.11.026","article-title":"Natural actor\u2013critic","volume":"71","author":"Peters","year":"2008","journal-title":"Neurocomputing"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Van Hasselt, H., Guez, A., and Silver, D. (2016, January 12\u201317). Deep reinforcement learning with double q-learning. Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA.","DOI":"10.1609\/aaai.v30i1.10295"},{"key":"ref_25","unstructured":"Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. (2018). Soft actor\u2013critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv."},{"key":"ref_26","unstructured":"Gu, S., Lillicrap, T., Sutskever, I., and Levine, S. (2016, January 19\u201321). Continuous deep q-learning with model-based acceleration. Proceedings of the International Conference on Machine Learning, New York, NY, USA."},{"key":"ref_27","unstructured":"Dadvar, M., Nayyar, R.K., and Srivastava, S. (2022). Learning Dynamic Abstract Representations for Sample-Efficient Reinforcement Learning. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"263","DOI":"10.1007\/s11370-019-00310-w","article-title":"Path planning for active SLAM based on deep reinforcement learning under unknown environments","volume":"13","author":"Wen","year":"2020","journal-title":"Intell. Serv. Robot."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Tai, L., Paolo, G., and Liu, M. (2017, January 24\u201328). Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation. Proceedings of the International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada.","DOI":"10.1109\/IROS.2017.8202134"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Kahn, G., Villaflor, A., Ding, B., Abbeel, P., and Levine, S. (2018, January 21\u201325). Self-supervised deep reinforcement learning with generalized computation graphs for robot navigation. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia.","DOI":"10.1109\/ICRA.2018.8460655"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Nagabandi, A., Kahn, G., Fearing, R.S., and Levine, S. (2018, January 21\u201325). Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia.","DOI":"10.1109\/ICRA.2018.8463189"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Arzate Cruz, C., and Igarashi, T. (2020, January 6\u201310). A survey on interactive reinforcement learning: Design principles and open challenges. Proceedings of the 2020 ACM Designing Interactive Systems Conference, Virtual.","DOI":"10.1145\/3357236.3395525"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"120757","DOI":"10.1109\/ACCESS.2020.3006254","article-title":"A review on interactive reinforcement learning from human social feedback","volume":"8","author":"Lin","year":"2020","journal-title":"IEEE Access"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"18215","DOI":"10.1007\/s00521-021-06850-6","article-title":"Human engagement providing evaluative and informative advice for interactive reinforcement learning","volume":"35","author":"Bignold","year":"2022","journal-title":"Neural Comput. Appl."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"104242","DOI":"10.1109\/ACCESS.2021.3099071","article-title":"A robust approach for continuous interactive actor\u2013critic algorithms","volume":"9","author":"Fernandes","year":"2021","journal-title":"IEEE Access"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Bignold, A., Cruz, F., Dazeley, R., Vamplew, P., and Foale, C. (2021). Persistent rule-based interactive reinforcement learning. Neural Comput. Appl., 1\u201318.","DOI":"10.1007\/s00521-021-06466-w"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Zhang, J., Springenberg, J.T., Boedecker, J., and Burgard, W. (2017, January 24\u201328). Deep reinforcement learning with successor features for navigation across similar environments. Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS 2017), Vancouver, BC, Canada.","DOI":"10.1109\/IROS.2017.8206049"},{"key":"ref_38","unstructured":"Parisotto, E., Ba, J.L., and Salakhutdinov, R. (2015). Actor-mimic: Deep multitask and transfer reinforcement learning. arXiv."},{"key":"ref_39","unstructured":"Ross, S., Gordon, G., and Bagnell, D. (2011, January 11\u201313). A reduction of imitation learning and structured prediction to no-regret online learning. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Lauderdale, FL, USA."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"1143","DOI":"10.1109\/LRA.2020.2966414","article-title":"Learning robust control policies for end-to-end autonomous driving from data-driven simulation","volume":"5","author":"Amini","year":"2020","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"3363","DOI":"10.1109\/LRA.2019.2926677","article-title":"Learning-based model predictive control for autonomous racing","volume":"4","author":"Kabzan","year":"2019","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Taylor, M.E., Kuhlmann, G., and Stone, P. (2008, January 12\u201316). Autonomous transfer for reinforcement learning. Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2008)-Volume 1, Estoril, Portugal.","DOI":"10.1145\/1329125.1329248"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"1055","DOI":"10.1080\/01691864.2020.1778521","article-title":"Integration of imitation learning using GAIL and reinforcement learning using task-achievement rewards via probabilistic graphical model","volume":"34","author":"Kinose","year":"2020","journal-title":"Adv. Robot."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"858","DOI":"10.1080\/01691864.2020.1778523","article-title":"Hierarchical and parameterized learning of pick-and-place manipulation from under-specified human demonstrations","volume":"34","author":"Qian","year":"2020","journal-title":"Adv. Robot."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Sosa-Ceron, A.D., Gonzalez-Hernandez, H.G., and Reyes-Avenda\u00f1o, J.A. (2022). Learning from Demonstrations in Human\u2013Robot Collaborative Scenarios: A Survey. Robotics, 11.","DOI":"10.3390\/robotics11060126"},{"key":"ref_46","unstructured":"Oh, J., Guo, Y., Singh, S., and Lee, H. (2018). Self-imitation learning. arXiv."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., and Abbeel, P. (2018, January 21\u201325). Overcoming exploration in reinforcement learning with demonstrations. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2018), Brisbane, Australia.","DOI":"10.1109\/ICRA.2018.8463162"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Hester, T., Vecerik, M., Pietquin, O., Lanctot, M., Schaul, T., Piot, B., Horgan, D., Quan, J., Sendonaris, A., and Osband, I. (2018, January 2\u20137). Deep q-learning from demonstrations. Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.11757"},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"1312","DOI":"10.1109\/LRA.2021.3057023","article-title":"Badgr: An autonomous self-supervised learning-based navigation system","volume":"6","author":"Kahn","year":"2021","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"101","DOI":"10.1007\/s10846-006-9055-3","article-title":"A fuzzy\u2013braitenberg navigation strategy for differential drive mobile robots","volume":"47","author":"Yang","year":"2006","journal-title":"J. Intell. Robot. Syst."},{"key":"ref_51","unstructured":"Konda, V.R., and Tsitsiklis, J.N. (2000, January 4\u20139). Actor\u2013critic algorithms. Proceedings of the Advances in Neural Information Processing Systems, Denver, CO, USA."},{"key":"ref_52","unstructured":"Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., and Kavukcuoglu, K. (2016, January 19\u201324). Asynchronous methods for deep reinforcement learning. Proceedings of the International Conference on Machine Learning (ICML), New York, NY, USA."},{"key":"ref_53","unstructured":"Gu, S., Holly, E., Lillicrap, T., and Levine, S. (June, January 29). Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. Proceedings of the International Conference on Robotics and Automation (ICRA), Singapore."},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"823","DOI":"10.1103\/PhysRev.36.823","article-title":"On the theory of the Brownian motion","volume":"36","author":"Uhlenbeck","year":"1930","journal-title":"Phys. Rev."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Rohmer, E., Singh, S.P., and Freese, M. (2013, January 3\u20137). V-REP: A versatile and scalable robot simulation framework. Proceedings of the International Conference Intelligent Robots and Systems (IROS), Tokyo, Japan.","DOI":"10.1109\/IROS.2013.6696520"},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Jia, T., Sun, N.L., and Cao, M.Y. (2008, January 1\u20133). Moving object detection based on blob analysis. Proceedings of the 2008 IEEE International Conference on Automation and Logistics, Qingdao, China.","DOI":"10.1109\/ICAL.2008.4636168"},{"key":"ref_57","unstructured":"Li, Y., and Yuan, Y. (2017, January 4\u20139). Convergence analysis of two-layer neural networks with relu activation. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA."},{"key":"ref_58","unstructured":"Kumar, S.K. (2017). On weight initialization in deep neural networks. arXiv."},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"4423","DOI":"10.1109\/LRA.2018.2869644","article-title":"Reinforced imitation: Sample efficient deep reinforcement learning for mapless navigation by leveraging prior demonstrations","volume":"3","author":"Pfeiffer","year":"2018","journal-title":"IEEE Robot. Autom. Lett."}],"container-title":["Robotics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2218-6581\/12\/5\/133\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T21:01:01Z","timestamp":1760130061000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2218-6581\/12\/5\/133"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,28]]},"references-count":59,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2023,10]]}},"alternative-id":["robotics12050133"],"URL":"https:\/\/doi.org\/10.3390\/robotics12050133","relation":{},"ISSN":["2218-6581"],"issn-type":[{"type":"electronic","value":"2218-6581"}],"subject":[],"published":{"date-parts":[[2023,9,28]]}}}