{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,10]],"date-time":"2026-07-10T13:37:30Z","timestamp":1783690650945,"version":"3.55.0"},"reference-count":35,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2023,6,27]],"date-time":"2023-06-27T00:00:00Z","timestamp":1687824000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Neurorobot."],"abstract":"<jats:p>Deep reinforcement learning (RL) agents often suffer from catastrophic forgetting, forgetting previously found solutions in parts of the input space when training new data. Replay memories are a common solution to the problem by decorrelating and shuffling old and new training samples. They naively store state transitions as they arrive, without regard for redundancy. We introduce a novel cognitive-inspired replay memory approach based on the Grow-When-Required (GWR) self-organizing network, which resembles a map-based mental model of the world. Our approach organizes stored transitions into a concise environment-model-like network of state nodes and transition edges, merging similar samples to reduce the memory size and increase pair-wise distance among samples, which increases the relevancy of each sample. Overall, our study shows that map-based experience replay allows for significant memory reduction with only small decreases in performance.<\/jats:p>","DOI":"10.3389\/fnbot.2023.1127642","type":"journal-article","created":{"date-parts":[[2023,6,27]],"date-time":"2023-06-27T09:00:05Z","timestamp":1687856405000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":19,"title":["Map-based experience replay: a memory-efficient solution to catastrophic forgetting in reinforcement learning"],"prefix":"10.3389","volume":"17","author":[{"given":"Muhammad Burhan","family":"Hafez","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Tilman","family":"Immisch","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Tom","family":"Weber","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Stefan","family":"Wermter","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1965","published-online":{"date-parts":[[2023,6,27]]},"reference":[{"key":"B1","first-page":"1474","article-title":"\u201cStratified experience replay: correcting multiplicity bias in off-policy reinforcement learning,\u201d","author":"Daley","year":"2021","journal-title":"Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS"},{"key":"B2","first-page":"1120","article-title":"\u201cModel-free generative replay for lifelong reinforcement learning: application to starcraft-2,\u201d","author":"Daniels","year":"2022","journal-title":"Conference on Lifelong Learning Agents"},{"key":"B3","first-page":"2587","article-title":"\u201cAddressing function approximation error in actor-critic methods,\u201d","author":"Fujimoto","year":"2018","journal-title":"35th International Conference on Machine Learning, ICML 2018"},{"key":"B4","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2019.XV.011","article-title":"Learning to alk via deep reinforcement learning","author":"Haarnoja","year":"2019","journal-title":"Robotics: Science and Systems (RSS)"},{"key":"B5","doi-asserted-by":"publisher","first-page":"14","DOI":"10.1515\/pjbr-2019-0005","article-title":"Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learning","volume":"10","author":"Hafez","year":"","journal-title":"Paladyn J. Behav. Robot"},{"key":"B6","doi-asserted-by":"publisher","DOI":"10.1109\/DEVLRN.2019.8850723","article-title":"\u201cEfficient intrinsically motivated robotic grasping with learning-adaptive imagination in latent space,\u201d","author":"Hafez","year":"","journal-title":"2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)"},{"key":"B7","doi-asserted-by":"crossref","first-page":"6739","DOI":"10.1109\/IROS51168.2021.9636297","article-title":"\u201cBehavior self-organization supports task inference for continual robot learning,\u201d","volume-title":"2021 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS)","author":"Hafez","year":"2021"},{"key":"B8","first-page":"8387","article-title":"\u201cTemporal difference learning for model predictive control,\u201d","author":"Hansen","year":"2022","journal-title":"International Conference on Machine Learning"},{"key":"B9","doi-asserted-by":"publisher","first-page":"3302","DOI":"10.1609\/aaai.v32i1.11595","article-title":"Selective experience replay for lifelong learning","volume":"32","author":"Isele","year":"2018","journal-title":"AAAI"},{"key":"B10","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1109\/IJCNN.1999.831553","article-title":"\u201cAn instantaneous topological mapping model for correlated stimuli,\u201d","author":"Jockusch","year":"1999","journal-title":"IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No. 99CH36339)"},{"key":"B11","doi-asserted-by":"crossref","DOI":"10.1109\/ROMAN.2017.8172289","article-title":"\u201cNico\u2014neuro-inspired companion: A developmental humanoid robot platform for multimodal interaction,\u201d","volume-title":"2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN)","author":"Kerzel","year":"2017"},{"key":"B12","first-page":"1334","article-title":"End-to-end training of deep visuomotor policies","volume":"17","author":"Levine","year":"2016","journal-title":"J. Mach. Lear. Res"},{"key":"B13","doi-asserted-by":"publisher","first-page":"185","DOI":"10.1007\/s10489-020-01786-1","article-title":"SLER: Self-generated long-term experience replay for continual reinforcement learning","volume":"51","author":"Li","year":"2021","journal-title":"Appl. Intellig"},{"key":"B14","article-title":"\u201cTune: A research platform for distributed model selection and training,\u201d","author":"Liaw","year":"2018","journal-title":"Proceedings of the ICML Workshop on Automatic Machine Learning (AutoML"},{"key":"B15","article-title":"\u201cContinuous control with deep reinforcement learning,\u201d","author":"Lillicrap","year":"2016","journal-title":"4th International Conference on Learning Representations, ICLR 2016"},{"key":"B16","doi-asserted-by":"publisher","first-page":"109","DOI":"10.1016\/S0079-7421(08)60536-8","article-title":"Catastrophic interference in connectionist networks: The sequential learning problem","volume":"24","author":"McCloskey","year":"1989","journal-title":"Psychol. Learn. Motivat"},{"key":"B17","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"B18","first-page":"4851","article-title":"\u201cRemember and forget for experience replay,\u201d","author":"Novati","year":"2019","journal-title":"Proceedings of the 36th International Conference on Machine Learning"},{"key":"B19","doi-asserted-by":"publisher","first-page":"71","DOI":"10.1109\/ROMAN.2016.7745093","article-title":"\u201cHuman motion assessment in real time using recurrent self-organization,\u201d","author":"Parisi","year":"2016","journal-title":"25th IEEE International Symposium on Robot and Human Interactive Communication, RO-MAN 2016"},{"key":"B20","doi-asserted-by":"publisher","first-page":"137","DOI":"10.1016\/j.neunet.2017.09.001","article-title":"Lifelong learning of human actions with deep neural network self-organization","volume":"96","author":"Parisi","year":"2017","journal-title":"Neural Networks"},{"key":"B21","doi-asserted-by":"crossref","DOI":"10.1109\/Humanoids53995.2022.10000092","article-title":"\u201cLearning to autonomously reach objects with nico and grow-when-required networks,\u201d","volume-title":"2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids)","author":"Rahrakhshan","year":"2022"},{"key":"B22","article-title":"\u201cLearning to learn without forgetting by maximizing transfer and minimizing interference,\u201d","author":"Riemer","year":"2019","journal-title":"International Conference on Learning Representations. International Conference on Learning Representations, ICLR"},{"key":"B23","article-title":"\u201cExperience replay for continual learning,\u201d","author":"Rolnick","year":"2019","journal-title":"Advances in Neural Information Processing Systems"},{"key":"B24","article-title":"\u201cPrioritized Experience Replay,\u201d","author":"Schaul","year":"2015","journal-title":"4th International Conference on Learning Representations, ICLR 2016"},{"key":"B25","doi-asserted-by":"publisher","first-page":"604","DOI":"10.1038\/s41586-020-03051-4","article-title":"Mastering atari, go, chess and shogi by planning with a learned model","volume":"588","author":"Schrittwieser","year":"2020","journal-title":"Nature"},{"key":"B26","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.2214840120","article-title":"Superhuman artificial intelligence can improve human decision-making by increasing novelty","author":"Shin","year":"2023","journal-title":"Proc. Natl. Acad. Sci. U.S.A"},{"key":"B27","doi-asserted-by":"publisher","first-page":"484","DOI":"10.1038\/nature16961","article-title":"Mastering the game of go with deep neural networks and tree search","volume":"529","author":"Silver","year":"2016","journal-title":"Nature"},{"key":"B28","doi-asserted-by":"publisher","first-page":"1140","DOI":"10.1126\/science.aar6404","article-title":"A general reinforcement learning algorithm that masters chess, shogi, and go through self-play","volume":"362","author":"Silver","year":"2018","journal-title":"Science"},{"key":"B29","doi-asserted-by":"crossref","DOI":"10.1109\/IROS.2012.6386109","article-title":"\u201cMujoco: A physics engine for model-based control,\u201d","volume-title":"2012 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS)","author":"Todorov","year":"2012"},{"key":"B30","doi-asserted-by":"publisher","first-page":"350","DOI":"10.1038\/s41586-019-1724-z","article-title":"Grandmaster level in starcraft II using multi-agent reinforcement learning","volume":"575","author":"Vinyals","year":"2019","journal-title":"Nature"},{"key":"B31","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1145\/3147.3165","article-title":"Random sampling with a reservoir","volume":"11","author":"Vitter","year":"1985","journal-title":"ACM Trans. Math. Softw"},{"key":"B32","doi-asserted-by":"publisher","DOI":"10.5220\/0010107904040411","article-title":"\u201cBootstrapping a DQN replay memory with synthetic experiences,\u201d","author":"von Pilchau","year":"2020","journal-title":"IJCCI 2020 - Proceedings of the 12th International Joint Conference on Computational Intelligence"},{"key":"B33","article-title":"\u201cA framework of dual replay buffer: balancing forgetting and generalization in reinforcement learning,\u201d","author":"Zhang","year":"2019","journal-title":"Workshop on Scaling Up Reinforcement Learning (SURL), International Joint Conference on Artificial Intelligence (IJCAI"},{"key":"B34","doi-asserted-by":"crossref","DOI":"10.1109\/IROS47612.2022.9981510","article-title":"\u201cImpact makes a sound and sound makes an impact: Sound guides representations and explorations,\u201d","volume-title":"2022 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS)","author":"Zhao","year":"2022"},{"key":"B35","doi-asserted-by":"crossref","DOI":"10.1109\/ICRA.2017.7989381","article-title":"\u201cTarget-driven visual navigation in indoor scenes using deep reinforcement learning,\u201d","volume-title":"IEEE International Conference on Robotics and Automation (ICRA)","author":"Zhu","year":"2017"}],"container-title":["Frontiers in Neurorobotics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fnbot.2023.1127642\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,27]],"date-time":"2023-06-27T09:00:55Z","timestamp":1687856455000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fnbot.2023.1127642\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,27]]},"references-count":35,"alternative-id":["10.3389\/fnbot.2023.1127642"],"URL":"https:\/\/doi.org\/10.3389\/fnbot.2023.1127642","relation":{},"ISSN":["1662-5218"],"issn-type":[{"value":"1662-5218","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,6,27]]},"article-number":"1127642"}}