{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,8]],"date-time":"2026-05-08T15:44:08Z","timestamp":1778255048898,"version":"3.51.4"},"reference-count":21,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2024,1,22]],"date-time":"2024-01-22T00:00:00Z","timestamp":1705881600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Science and Technology Innovation 2030","doi-asserted-by":"publisher","award":["2022ZD0208800"],"award-info":[{"award-number":["2022ZD0208800"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Science and Technology Innovation 2030","doi-asserted-by":"publisher","award":["62176215"],"award-info":[{"award-number":["62176215"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"NSFC General Program","award":["2022ZD0208800"],"award-info":[{"award-number":["2022ZD0208800"]}]},{"name":"NSFC General Program","award":["62176215"],"award-info":[{"award-number":["62176215"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>The ability to learn continuously is crucial for a robot to achieve a high level of intelligence and autonomy. In this paper, we consider continual reinforcement learning (RL) for quadruped robots, which includes the ability to continuously learn sub-sequential tasks (plasticity) and maintain performance on previous tasks (stability). The policy obtained by the proposed method enables robots to learn multiple tasks sequentially, while overcoming both catastrophic forgetting and loss of plasticity. At the same time, it achieves the above goals with as little modification to the original RL learning process as possible. The proposed method uses the Piggyback algorithm to select protected parameters for each task, and reinitializes the unused parameters to increase plasticity. Meanwhile, we encourage the policy network exploring by encouraging the entropy of the soft network of the policy network. Our experiments show that traditional continual learning algorithms cannot perform well on robot locomotion problems, and our algorithm is more stable and less disruptive to the RL training progress. Several robot locomotion experiments validate the effectiveness of our method.<\/jats:p>","DOI":"10.3390\/e26010093","type":"journal-article","created":{"date-parts":[[2024,1,23]],"date-time":"2024-01-23T08:28:28Z","timestamp":1705998508000},"page":"93","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Continual Reinforcement Learning for Quadruped Robot Locomotion"],"prefix":"10.3390","volume":"26","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9671-7924","authenticated-orcid":false,"given":"Sibo","family":"Gai","sequence":"first","affiliation":[{"name":"School of Computer Science, Fudan University, Shanghai 200433, China"},{"name":"School of Engineer, Westlake Univercity, Hangzhou 310030, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shangke","family":"Lyu","sequence":"additional","affiliation":[{"name":"School of Engineer, Westlake Univercity, Hangzhou 310030, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hongyin","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Engineer, Westlake Univercity, Hangzhou 310030, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Donglin","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Engineer, Westlake Univercity, Hangzhou 310030, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2024,1,22]]},"reference":[{"key":"ref_1","unstructured":"Zitkovich, B., Yu, T., Xu, S., Xu, P., Xiao, T., Xia, F., Wu, J., Wohlhart, P., Welker, S., and Wahid, A. (2023, January 6\u20139). Rt-2: Vision-language-action models transfer web knowledge to robotic control. Proceedings of the 7th Annual Conference on Robot Learning, Atlanta, GA, USA."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Brohan, A., Brown, N., Carbajal, J., Chebotar, Y., Dabis, J., Finn, C., Gopalakrishnan, K., Hausman, K., Herzog, A., and Hsu, J. (2022). Rt-1: Robotics transformer for real-world control at scale. arXiv.","DOI":"10.15607\/RSS.2023.XIX.025"},{"key":"ref_3","unstructured":"Kang, Y., Shi, D., Liu, J., He, L., and Wang, D. (2023). Beyond reward: Offline preference-guided policy optimization. arXiv."},{"key":"ref_4","unstructured":"Liu, J., Zhang, H., Zhuang, Z., Kang, Y., Wang, D., and Wang, B. (2023). Design from Policies: Conservative Test-Time Adaptation for Offline Policy Optimization. arXiv."},{"key":"ref_5","unstructured":"Reed, S., Zolna, K., Parisotto, E., Colmenarejo, S.G., Novikov, A., Barth-Maron, G., Gimenez, M., Sulsky, Y., Kay, J., and Springenberg, J.T. (2022). A generalist agent. arXiv."},{"key":"ref_6","first-page":"4767","article-title":"Multi-task reinforcement learning with soft modularization","volume":"33","author":"Yang","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_7","first-page":"72","article-title":"Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights","volume":"Volume 11208","author":"Ferrari","year":"2018","journal-title":"Computer Vision\u2014ECCV 2018, Proceedings of the 15th European Conference, Munich, Germany, 8\u201314 September 2018"},{"key":"ref_8","unstructured":"Kang, H., Yoon, J., Madjid, S.R., Hwang, S.J., and Yoo, C.D. (2023). Forget-free Continual Learning with Soft-Winning SubNetworks. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Ramanujan, V., Wortsman, M., Kembhavi, A., Farhadi, A., and Rastegari, M. (2020, January 13\u201319). What\u2019s hidden in a randomly weighted neural network?. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01191"},{"key":"ref_10","unstructured":"Zhang, H., Yang, S., and Wang, D. (2023). A Real-World Quadrupedal Locomotion Benchmark for Offline Reinforcement Learning. arXiv."},{"key":"ref_11","unstructured":"Van de Ven, G.M., and Tolias, A.S. (2019). Three scenarios for continual learning. arXiv."},{"key":"ref_12","unstructured":"Rolnick, D., Ahuja, A., Schwarz, J., Lillicrap, T.P., and Wayne, G. (2019, January 8\u201314). Experience Replay for Continual Learning. Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, BC, Canada."},{"key":"ref_13","unstructured":"Chaudhry, A., Ranzato, M., Rohrbach, M., and Elhoseiny, M. (2019, January 6\u20139). Efficient Lifelong Learning with A-GEM. Proceedings of the ICLR, New Orleans, LA, USA."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"3521","DOI":"10.1073\/pnas.1611835114","article-title":"Overcoming catastrophic forgetting in neural networks","volume":"114","author":"Kirkpatrick","year":"2017","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Chaudhry, A., Dokania, P.K., Ajanthan, T., and Torr, P.H. (2018, January 8\u201314). Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence. Proceedings of the European Conference on Computer Vision, Munich, Germany.","DOI":"10.1007\/978-3-030-01252-6_33"},{"key":"ref_16","unstructured":"Zenke, F., Poole, B., and Ganguli, S. (2017, January 6\u201311). Continual Learning Through Synaptic Intelligence. Proceedings of the International Conference on Machine Learning. PMLR, Sydney, Australia."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Mallya, A., and Lazebnik, S. (2018, January 18\u201323). Packnet: Adding multiple tasks to a single network by iterative pruning. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00810"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Gai, S., Wang, D., and He, L. (2023, January 18). Offline Experience Replay for Continual Offline Reinforcement Learning. Proceedings of the 26th European Conference on Artificial Intelligence, Krak\u00f3w, Poland.","DOI":"10.3233\/FAIA230343"},{"key":"ref_19","unstructured":"Nahrendra, I.M.A., Yu, B., and Myung, H. (June, January 29). DreamWaQ: Learning Robust Quadrupedal Locomotion with Implicit Terrain Imagination via Deep Reinforcement Learning. Proceedings of the IEEE International Conference on Robotics and Automation, ICRA 2023, London, UK."},{"key":"ref_20","unstructured":"Kingma, D.P., and Ba, J. (2015, January 7\u20139). Adam: A Method for Stochastic Optimization. Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA."},{"key":"ref_21","unstructured":"Lyle, C., Zheng, Z., Nikishin, E., Pires, B.\u00c1., Pascanu, R., and Dabney, W. (2023, January 23\u201329). Understanding Plasticity in Neural Networks. Proceedings of the International Conference on Machine Learning, ICML 2023, Honolulu, HI, USA."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/26\/1\/93\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T13:47:16Z","timestamp":1760104036000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/26\/1\/93"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,22]]},"references-count":21,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,1]]}},"alternative-id":["e26010093"],"URL":"https:\/\/doi.org\/10.3390\/e26010093","relation":{},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,1,22]]}}}