{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,28]],"date-time":"2026-02-28T14:47:41Z","timestamp":1772290061764,"version":"3.50.1"},"reference-count":40,"publisher":"Cambridge University Press (CUP)","issue":"6","license":[{"start":{"date-parts":[[2024,5,2]],"date-time":"2024-05-02T00:00:00Z","timestamp":1714608000000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Robotica"],"published-print":{"date-parts":[[2024,6]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>With the rise of deep reinforcement learning (RL) methods, many complex robotic manipulation tasks are being solved. However, harnessing the full power of deep learning requires large datasets. Online RL does not suit itself readily into this paradigm due to costly and time-consuming agent-environment interaction. Therefore, many offline RL algorithms have recently been proposed to learn robotic tasks. But mainly, all such methods focus on a single-task or multitask learning, which requires retraining whenever we need to learn a new task. Continuously learning tasks without forgetting previous knowledge combined with the power of offline deep RL would allow us to scale the number of tasks by adding them one after another. This paper investigates the effectiveness of regularisation-based methods like synaptic intelligence for sequentially learning image-based robotic manipulation tasks in an offline-RL setup. We evaluate the performance of this combined framework against common challenges of sequential learning: catastrophic forgetting and forward knowledge transfer. We performed experiments with different task combinations to analyse the effect of task ordering. We also investigated the effect of the number of object configurations and the density of robot trajectories. We found that learning tasks sequentially helps in the retention of knowledge from previous tasks, thereby reducing the time required to learn a new task. Regularisation-based approaches for continuous learning, like the synaptic intelligence method, help mitigate catastrophic forgetting but have shown only limited transfer of knowledge from previous tasks.<\/jats:p>","DOI":"10.1017\/s0263574724000389","type":"journal-article","created":{"date-parts":[[2024,5,2]],"date-time":"2024-05-02T07:43:16Z","timestamp":1714635796000},"page":"1715-1730","source":"Crossref","is-referenced-by-count":5,"title":["Learning vision-based robotic manipulation tasks sequentially in offline reinforcement learning settings"],"prefix":"10.1017","volume":"42","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-4879-3811","authenticated-orcid":false,"given":"Sudhir Pratap","family":"Yadav","sequence":"first","affiliation":[]},{"given":"Rajendra","family":"Nagar","sequence":"additional","affiliation":[]},{"given":"Suril V.","family":"Shah","sequence":"additional","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2024,5,2]]},"reference":[{"key":"S0263574724000389_ref22","unstructured":"[22] Kalashnikov, D. , Varley, J. , Chebotar, Y. , Swanson, B. , Jonschkowski, R. , Finn, C. , Levine, S. and Hausman, K. , \u201cScaling Up Multi-Task Robotic Reinforcement Learning,\u201d In:\u00a0Conference on Robot Learning (CoRL), (2021)."},{"key":"S0263574724000389_ref3","doi-asserted-by":"crossref","unstructured":"[3] Devin, C. , Gupta, A. , Darrell, T. , Abbeel, P. and Levine, S. , \u201cLearning Modular Neural Network Policies for Multi-Task and Multi-Robot Transfer,\u201d In:\u00a0International Conference on Robotics and Automation (ICRA), (2017) pp. 2169\u20132176.","DOI":"10.1109\/ICRA.2017.7989250"},{"key":"S0263574724000389_ref15","doi-asserted-by":"publisher","DOI":"10.1017\/S0263574721000618"},{"key":"S0263574724000389_ref21","doi-asserted-by":"crossref","unstructured":"[21] Gupta, A. , Yu, J. , Zhao, T. , Kumar, V. , Rovinsky, A. , Xu, K. , Devlin, T. and Levine, S. , \u201cReset-Free Reinforcement Learning via Multi-Task Learning: Learning Dexterous Manipulation Behaviors Without Human Intervention,\u201d In:\u00a0International Conference on Robotics and Automation (ICRA), (2021) pp. 6664\u20136671.","DOI":"10.1109\/ICRA48506.2021.9561384"},{"key":"S0263574724000389_ref6","article-title":"Category level pick and place using deep reinforcement learning","author":"Gualtieri","year":"2017","journal-title":"Computing Research Repository"},{"key":"S0263574724000389_ref8","unstructured":"[8] Lee, A. , Devin, C. , Zhou, Y. , Lampe, T. , Bousmalis, K. , Springenberg, J. , Byravan, A. , Abdolmaleki, A. , Gileadi, N. and Khosid, D. , \u201cBeyond Pick-and-Place: Tackling Robotic Stacking of Diverse Shapes.\u201d In:\u00a0Conference on Robot Learning, (2021)."},{"key":"S0263574724000389_ref9","doi-asserted-by":"publisher","DOI":"10.1017\/S0263574722000650"},{"key":"S0263574724000389_ref1","unstructured":"[1] Mnih, V. , Kavukcuoglu, K. , Silver, D. , Graves, A. , Antonoglou, I. , Wierstra, D. and Riedmiller, M. , \u201cPlaying atari with deep reinforcement learning,\u201d (2013). arXiv preprint arXiv: 1312.5602, 2013."},{"key":"S0263574724000389_ref32","unstructured":"[32] Caccia, M. , Mueller, J. , Kim, T. , Charlin, L. and Fakoor, R. , \u201cTask-agnostic continual reinforcement learning: In praise of a simple baseline,\u201d (2022). arXiv preprint arXiv: 2205.14495, 2022."},{"key":"S0263574724000389_ref34","doi-asserted-by":"publisher","DOI":"10.1016\/S1364-6613(99)01294-2"},{"key":"S0263574724000389_ref37","first-page":"1179","article-title":"Conservative q-learning for offline reinforcement learning","volume":"33","author":"Kumar","year":"2020","journal-title":"Adv Neur Info Pro Syst"},{"key":"S0263574724000389_ref7","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2020.3003865"},{"key":"S0263574724000389_ref16","doi-asserted-by":"publisher","DOI":"10.1016\/j.autcon.2021.103569"},{"key":"S0263574724000389_ref5","doi-asserted-by":"crossref","unstructured":"[5] Haarnoja, T. , Pong, V. , Zhou, A. , Dalal, M. , Abbeel, P. and Levine, S. , \u201cComposable Deep Reinforcement Learning for Robotic Manipulation,\u201d In:\u00a0International Conference on Robotics and Automation (ICRA), (2018) pp. 6244\u20136251.","DOI":"10.1109\/ICRA.2018.8460756"},{"key":"S0263574724000389_ref27","first-page":"7765","volume-title":"Computer Vision and Pattern Recognition","author":"Mallya","year":"2018"},{"key":"S0263574724000389_ref23","unstructured":"[23] Sodhani, S. , Zhang, A. and Pineau, J. , \u201cMulti-Task Reinforcement Learning with Context-based Representations,\u201d In:\u00a0International Conference on Machine Learning, (2021) pp. 9767\u20139779."},{"key":"S0263574724000389_ref25","first-page":"5824","article-title":"Gradient surgery for multi-task learning","volume":"33","author":"Yu","year":"2020","journal-title":"Adv Neur Infor Pro Syst"},{"key":"S0263574724000389_ref18","doi-asserted-by":"publisher","DOI":"10.1002\/aisy.202100095"},{"key":"S0263574724000389_ref33","unstructured":"[33] Haarnoja, T. , Zhou, A. , Abbeel, P. and Levine, S. , \u201cSoft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor,\u201d In:\u00a0International Conference on Machine Learning, (2018) pp. 1861\u20131870."},{"key":"S0263574724000389_ref36","unstructured":"[36] Singh, A. , Yu, A. , Yang, J. , Zhang, J. , Kumar, A. and Levine, S. , \u201cCog: Connecting new skills to past experience with offline reinforcement learning,\u201d (2020). arXiv preprint arXiv: 2010.14500."},{"key":"S0263574724000389_ref19","doi-asserted-by":"crossref","unstructured":"[19] Nair, A. , Chen, D. , Agrawal, P. , Isola, P. , Abbeel, P. , Malik, J. and Levine, S. , \u201cCombining Self-Supervised Learning and Imitation for Vision-based Rope Manipulation,\u201d In:\u00a0International Conference on Robotics and Automation (ICRA), (2017) pp. 2146\u20132153.","DOI":"10.1109\/ICRA.2017.7989247"},{"key":"S0263574724000389_ref28","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2773081"},{"key":"S0263574724000389_ref35","unstructured":"[35] Zenke, F. , Poole, B. and Ganguli, S. , \u201cContinual Learning through Synaptic Intelligence,\u201d In:\u00a0International Conference on Machine Learning, (2017) pp. 3987\u20133995."},{"key":"S0263574724000389_ref38","article-title":"Deep reinforcement learning with double Q-learning","volume":"30","author":"Van Hasselt","year":"2016","journal-title":"Proceed AAAI Conf Arti Intell"},{"key":"S0263574724000389_ref29","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1611835114"},{"key":"S0263574724000389_ref40","volume-title":"PyBullet, a Python Module for Physics Simulation for Games, Robotics and Machine Learning","author":"Erwin","year":"2016"},{"key":"S0263574724000389_ref30","article-title":"OpenAI gym","author":"Brockman","year":"2016","journal-title":"Comp Res Reposit"},{"key":"S0263574724000389_ref4","unstructured":"[4] Gu, S. , Holly, E. , Lillicrap, T. and Levine, S. , \u201cDeep reinforcement learning for robotic manipulation,\u201d (2016). arXiv preprint arXiv: 1610.00633 1, 2016."},{"key":"S0263574724000389_ref24","volume-title":"Advances in Neural Information Processing Systems","author":"Teh","year":"2017"},{"key":"S0263574724000389_ref39","unstructured":"[39] Kingma, D. and Ba, J. , \u201cAdam: A method for stochastic optimization,\u201d (2014). arXiv preprint arXiv: 1412.6980."},{"key":"S0263574724000389_ref31","first-page":"28496","article-title":"Continual world: A robotic benchmark for continual reinforcement learning","volume":"34","author":"Wo\u0142czyk","year":"2021","journal-title":"Adv Neur Infor Pro Syst"},{"key":"S0263574724000389_ref10","doi-asserted-by":"crossref","unstructured":"[10] Wu, X. , Zhang, D. , Qin, F. and Xu, D. , \u201cDeep reinforcement learning of robotic precision insertion skill accelerated by demonstrations,\u201d In:\u00a0International Conference on Automation Science and Engineering (CASE), 1651-1656, (2019).","DOI":"10.1109\/COASE.2019.8842940"},{"key":"S0263574724000389_ref12","doi-asserted-by":"crossref","unstructured":"[12] Schoettler, G. , Nair, A. , Luo, J. , Bahl, S. , Ojea, J. , Solowjow, E. and Levine, S. , \u201cDeep Reinforcement Learning for Industrial Insertion Tasks with Visual Inputs and Natural Rewards,\u201d In:\u00a0International Conference on Intelligent Robots and Systems (IROS), (2020) pp. 5548\u20135555.","DOI":"10.1109\/IROS45743.2020.9341714"},{"key":"S0263574724000389_ref11","doi-asserted-by":"crossref","unstructured":"[11] Yasutomi, A. , Mori, H. and Ogata, T. , \u201cA Peg-in-Hole Task Strategy for Holes in Concrete,\u201d In:\u00a0International Conference on Robotics and Automation (ICRA), (2021) pp. 2205\u20132211.","DOI":"10.1109\/ICRA48506.2021.9561370"},{"key":"S0263574724000389_ref17","doi-asserted-by":"publisher","DOI":"10.1007\/s00170-022-09877-8"},{"key":"S0263574724000389_ref2","unstructured":"[2] Kalashnikov, D. , Irpan, A. , Pastor, P. , Ibarz, J. , Herzog, A. , Jang, E. , Quillen, D. , Holly, E. , Kalakrishnan, M. and Vanhoucke, V. , \u201cQt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation, (2018). arXiv preprint arXiv: 1806.10293, 2018."},{"key":"S0263574724000389_ref14","doi-asserted-by":"publisher","DOI":"10.1017\/S0263574722001230"},{"key":"S0263574724000389_ref26","unstructured":"[26] Goodfellow, I. , Mirza, M. , Xiao, D. , Courville, A. and Bengio, Y. , \u201cAn empirical investigation of catastrophic forgetting in gradient-based neural networks,\u201d (2013). arXiv preprint arXiv: 1312.6211, 2013."},{"key":"S0263574724000389_ref13","doi-asserted-by":"crossref","unstructured":"[13] Nemec, B. , \u017dlajpah, L. and Ude, A. , \u201cDoor Opening by Joining Reinforcement Learning and Intelligent Control,\u201d In:\u00a0International Conference on Advanced Robotics (ICAR, (2017) pp. 222\u2013228.","DOI":"10.1109\/ICAR.2017.8023522"},{"key":"S0263574724000389_ref20","unstructured":"[20] Lee, R. , Ward, D. , Cosgun, A. , Dasagi, V. , Corke, P. and Leitner, J. , \u201cLearning arbitrary-goal fabric folding with one hour of real robot experience (2020). arXiv preprint arXiv: 2010.03209."}],"container-title":["Robotica"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S0263574724000389","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,6]],"date-time":"2025-02-06T13:05:13Z","timestamp":1738847113000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S0263574724000389\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,2]]},"references-count":40,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,6]]}},"alternative-id":["S0263574724000389"],"URL":"https:\/\/doi.org\/10.1017\/s0263574724000389","relation":{},"ISSN":["0263-5747","1469-8668"],"issn-type":[{"value":"0263-5747","type":"print"},{"value":"1469-8668","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,2]]}}}