{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T16:05:13Z","timestamp":1753891513397,"version":"3.41.2"},"reference-count":66,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,3,27]],"date-time":"2025-03-27T00:00:00Z","timestamp":1743033600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Comput. Neurosci."],"abstract":"<jats:sec><jats:title>Introduction<\/jats:title><jats:p>Recent advances in computational neuroscience highlight the significance of prefrontal cortical meta-control mechanisms in facilitating flexible and adaptive human behavior. In addition, hippocampal function, particularly mental simulation capacity, proves essential in this adaptive process. Rooted from these neuroscientific insights, we present <jats:italic>Meta-Dyna<\/jats:italic>, a novel neuroscience-inspired reinforcement learning architecture that demonstrates rapid adaptation to environmental dynamics whilst managing variable goal states and state-transition uncertainties.<\/jats:p><\/jats:sec><jats:sec><jats:title>Methods<\/jats:title><jats:p>This architectural framework implements prefrontal meta-control mechanisms integrated with hippocampal replay function, which in turn optimized task performance with limited experiences. We evaluated this approach through comprehensive experimental simulations across three distinct paradigms: the two-stage Markov decision task, which frequently serves in human learning and decision-making research; <jats:italic>stochastic GridWorldLoCA<\/jats:italic>, an established benchmark suite for model-based reinforcement learning; and a <jats:italic>stochastic Atari Pong<\/jats:italic> variant incorporating multiple goals under uncertainty.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>Experimental results demonstrate <jats:italic>Meta-Dyna<\/jats:italic>'s superior performance compared with baseline reinforcement learning algorithms across multiple metrics: average reward, choice optimality, and a number of trials for success.<\/jats:p><\/jats:sec><jats:sec><jats:title>Discussions<\/jats:title><jats:p>These findings advance our understanding of computational reinforcement learning whilst contributing to the development of brain-inspired learning agents capable of flexible, goal-directed behavior within dynamic environments.<\/jats:p><\/jats:sec>","DOI":"10.3389\/fncom.2025.1559915","type":"journal-article","created":{"date-parts":[[2025,3,27]],"date-time":"2025-03-27T23:41:39Z","timestamp":1743118899000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Prefrontal meta-control incorporating mental simulation enhances the adaptivity of reinforcement learning agents in dynamic environments"],"prefix":"10.3389","volume":"19","author":[{"given":"JiHun","family":"Kim","sequence":"first","affiliation":[]},{"given":"Jee Hang","family":"Lee","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2025,3,27]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"489","DOI":"10.1016\/j.neuron.2008.10.019","article-title":"Theoretical neuroscience","volume":"60","author":"Abbott","year":"2001","journal-title":"Comput. Math. Model Neural"},{"key":"B2","doi-asserted-by":"publisher","first-page":"29302","DOI":"10.1073\/pnas.1912341117","article-title":"Rapid trial-and-error learning with simulation supports flexible tool use and physical reasoning","volume":"117","author":"Allen","year":"2020","journal-title":"Proc. Nat. Acad. Sci"},{"key":"B3","doi-asserted-by":"publisher","first-page":"133","DOI":"10.1016\/j.cobeha.2021.02.019","article-title":"Reinforcement-guided learning in frontal neocortex: emerging computational concepts","volume":"38","author":"Banerjee","year":"2021","journal-title":"Curr. Opin. Behav. Sci"},{"key":"B4","article-title":"Mbmf: Model-based priors for model-free reinforcement learning","author":"Bansal","year":"2017","journal-title":"arXiv preprint arXiv:1709.03153"},{"key":"B5","article-title":"Stealing that free lunch: exposing the limits of dyna-style reinforcement learning","author":"Barkley","year":"2024","journal-title":"arXiv preprint arXiv:2412.14312"},{"key":"B6","doi-asserted-by":"publisher","first-page":"210","DOI":"10.3390\/automation4030013","article-title":"Deep dyna-q for rapid learning and improved formation achievement in cooperative transportation","volume":"4","author":"Budiyanto","year":"2023","journal-title":"Automation"},{"key":"B7","article-title":"Transdreamer: reinforcement learning with transformer world models","author":"Chen","year":"2022","journal-title":"arXiv preprint arXiv:2202.09481"},{"key":"B8","first-page":"15084","article-title":"\u201cDecision transformer: reinforcement learning via sequence modeling,\u201d","author":"Chen","year":"2021","journal-title":"Advances in Neural Information Processing Systems"},{"key":"B9","doi-asserted-by":"publisher","first-page":"20130478","DOI":"10.1098\/rstb.2013.0478","article-title":"The algorithmic anatomy of model-based evaluation","volume":"369","author":"Daw","year":"2014","journal-title":"Philos. Trans. R. Soc"},{"key":"B10","doi-asserted-by":"publisher","first-page":"1204","DOI":"10.1016\/j.neuron.2011.02.027","article-title":"Model-based influences on humans' choices and striatal prediction errors","volume":"69","author":"Daw","year":"2011","journal-title":"Neuron"},{"key":"B11","doi-asserted-by":"publisher","first-page":"1704","DOI":"10.1038\/nn1560","article-title":"Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control","volume":"8","author":"Daw","year":"2005","journal-title":"Nat. Neurosci"},{"key":"B12","doi-asserted-by":"publisher","first-page":"473","DOI":"10.3758\/s13415-014-0277-8","article-title":"Model-based and model-free pavlovian reward learning: revaluation, revision, and revelation","volume":"14","author":"Dayan","year":"2014","journal-title":"Cogn. Affect. Behav. Neurosci"},{"key":"B13","doi-asserted-by":"publisher","first-page":"67","DOI":"10.1098\/rstb.1985.0010","article-title":"Actions and habits: the development of behavioural autonomy","volume":"308","author":"Dickinson","year":"1985","journal-title":"Philos. Trans. R. Soc. London B"},{"key":"B14","doi-asserted-by":"publisher","first-page":"312","DOI":"10.1016\/j.neuron.2013.09.007","article-title":"Goals and habits in the brain","volume":"80","author":"Dolan","year":"2013","journal-title":"Neuron"},{"key":"B15","doi-asserted-by":"publisher","first-page":"2758","DOI":"10.1109\/TNNLS.2020.3008249","article-title":"Intelligent trainer for dyna-style model-based deep reinforcement learning","volume":"32","author":"Dong","year":"2020","journal-title":"IEEE Trans. Neural Netw. Learn. Syst"},{"key":"B16","doi-asserted-by":"publisher","first-page":"114106","DOI":"10.1016\/j.est.2024.114106","article-title":"Deep dyna reinforcement learning based energy management system for solar operated hybrid electric vehicle using load scheduling technique","volume":"102","author":"Ghode","year":"2024","journal-title":"J. Energy Storage"},{"key":"B17","doi-asserted-by":"publisher","first-page":"695","DOI":"10.1016\/j.neuron.2010.01.034","article-title":"Hippocampal replay is not a simple function of experience","volume":"65","author":"Gupta","year":"2010","journal-title":"Neuron"},{"key":"B18","article-title":"World models","author":"Ha","year":"2018","journal-title":"arXiv preprint arXiv:1803.10122"},{"key":"B19","first-page":"1861","article-title":"\u201cSoft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor,\u201d","volume-title":"International Conference on Machine Learning","author":"Haarnoja","year":"2018"},{"key":"B20","first-page":"2555","article-title":"\u201cLearning latent dynamics for planning from pixels,\u201d","author":"Hafner","year":"2019","journal-title":"International Conference on Machine Learning"},{"key":"B21","doi-asserted-by":"publisher","first-page":"e1009003","DOI":"10.1371\/journal.pcbi.1009003","article-title":"Effects of subclinical depression on prefrontal-striatal model-based and model-free learning","volume":"17","author":"Heo","year":"2021","journal-title":"PLoS Comput. Biol"},{"key":"B22","doi-asserted-by":"publisher","first-page":"1454","DOI":"10.1126\/science.1217230","article-title":"Awake hippocampal sharp-wave ripples support spatial memory","volume":"336","author":"Jadhav","year":"2012","journal-title":"Science"},{"key":"B23","doi-asserted-by":"publisher","first-page":"913","DOI":"10.1038\/nn.2344","article-title":"Awake replay of remote experiences in the hippocampus","volume":"12","author":"Karlsson","year":"2009","journal-title":"Nat. Neurosci"},{"key":"B24","article-title":"Fast exploration with simplified models and approximately optimistic planning in model based reinforcement learning","author":"Keramati","year":"2018","journal-title":"arXiv preprint"},{"key":"B25","doi-asserted-by":"publisher","first-page":"110185","DOI":"10.1016\/j.celrep.2021.110185","article-title":"Prefrontal solution to the bias-variance tradeoff during reinforcement learning","volume":"37","author":"Kim","year":"2021","journal-title":"Cell Rep"},{"key":"B26","doi-asserted-by":"publisher","DOI":"10.21203\/rs.3.rs-3080402\/v1","article-title":"Long short-term prediction guides human metacognitive reinforcement learning","author":"Kim","year":"2023","journal-title":"Res Sq."},{"key":"B27","doi-asserted-by":"publisher","first-page":"5738","DOI":"10.1038\/s41467-019-13632-1","article-title":"Task complexity interacts with state-space uncertainty in the arbitration between model-based and model-free learning","volume":"10","author":"Kim","year":"2019","journal-title":"Nat. Commun"},{"key":"B28","doi-asserted-by":"publisher","first-page":"1321","DOI":"10.1177\/0956797617708288","article-title":"Cost-benefit arbitration between multiple reinforcement-learning systems","volume":"28","author":"Kool","year":"2017","journal-title":"Psychol. Sci"},{"key":"B29","doi-asserted-by":"publisher","first-page":"17951","DOI":"10.1073\/pnas.0905191106","article-title":"Genetic variation in dopaminergic neuromodulation influences the ability to rapidly and flexibly adapt decisions","volume":"106","author":"Krugel","year":"2009","journal-title":"Proc. Nat. Acad. Sci"},{"key":"B30","doi-asserted-by":"publisher","first-page":"193","DOI":"10.1080\/02724990344000141","article-title":"The role of associative history in models of associative learning: a selective review and a hybrid model","volume":"57","author":"Le Pelley","year":"2004","journal-title":"Quart. J. Exper. Psychol. Section B"},{"key":"B31","doi-asserted-by":"publisher","first-page":"113702","DOI":"10.1016\/j.celrep.2024.113702","article-title":"Controlling human causal inference through in silico task design","volume":"43","author":"Lee","year":"2024","journal-title":"Cell Rep"},{"key":"B32","doi-asserted-by":"publisher","first-page":"1060101","DOI":"10.3389\/fncom.2022.1060101","article-title":"Importance of prefrontal meta control in human-like reinforcement learning","volume":"16","author":"Lee","year":"2022","journal-title":"Front. Comput. Neurosci"},{"key":"B33","doi-asserted-by":"publisher","first-page":"eaav2975","DOI":"10.1126\/scirobotics.aav2975","article-title":"Toward high-performance, memory-efficient, and fast reinforcement learning\u2014lessons from decision neuroscience","volume":"4","author":"Lee","year":"2019","journal-title":"Sci. Robot"},{"key":"B34","doi-asserted-by":"publisher","first-page":"687","DOI":"10.1016\/j.neuron.2013.11.028","article-title":"Neural computations underlying arbitration between model-based and model-free learning","volume":"81","author":"Lee","year":"2014","journal-title":"Neuron"},{"key":"B35","doi-asserted-by":"publisher","first-page":"1250","DOI":"10.1038\/nn.2904","article-title":"Differential roles of human striatum and amygdala in associative learning","volume":"14","author":"Li","year":"2011","journal-title":"Nat. Neurosci"},{"key":"B36","article-title":"When to trust your data: enhancing dyna-style model-based reinforcement learning with data filter","author":"Li","year":"2024","journal-title":"arXiv preprint arXiv:2410.12160"},{"key":"B37","doi-asserted-by":"publisher","first-page":"112526","DOI":"10.1016\/j.est.2024.112526","article-title":"Dyna algorithm-based reinforcement learning energy management for fuel cell hybrid engineering vehicles","volume":"94","author":"Liu","year":"2024","journal-title":"J. Energy Storage"},{"key":"B38","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s40860-025-00243-5","article-title":"A smart grid computational offloading policy generation method for end-edge-cloud environments","volume":"11","author":"Liu","year":"2025","journal-title":"J. Reliable Intell. Environ"},{"key":"B39","doi-asserted-by":"publisher","first-page":"20210618","DOI":"10.1098\/rspa.2021.0618","article-title":"Physics-informed dyna-style model-based deep reinforcement learning for dynamic control","volume":"477","author":"Liu","year":"2021","journal-title":"Proc. R. Soc. A"},{"key":"B40","doi-asserted-by":"publisher","first-page":"640","DOI":"10.1016\/j.cell.2019.06.012","article-title":"Human replay spontaneously reorganizes experience","volume":"178","author":"Liu","year":"2019","journal-title":"Cell"},{"key":"B41","doi-asserted-by":"publisher","first-page":"1609","DOI":"10.1038\/s41593-018-0232-z","article-title":"Prioritized memory access explains planning and hippocampal replay","volume":"21","author":"Mattar","year":"2018","journal-title":"Nat. Neurosci"},{"key":"B42","article-title":"Improving the adaptive and continuous learning capabilities of artificial neural networks: Lessons from multi-neuromodulatory dynamics","author":"Mei","year":"2025","journal-title":"arXiv preprint arXiv:2501.06762"},{"key":"B43","doi-asserted-by":"publisher","first-page":"1269","DOI":"10.1038\/nn.4613","article-title":"Dorsal hippocampus contributes to model-based planning","volume":"20","author":"Miller","year":"2017","journal-title":"Nat. Neurosci"},{"key":"B44","first-page":"1928","article-title":"\u201cAsynchronous methods for deep reinforcement learning,\u201d","volume-title":"Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research","author":"Mnih","year":"2016"},{"key":"B45","doi-asserted-by":"publisher","first-page":"680","DOI":"10.1038\/s41562-017-0180-8","article-title":"The successor representation in human reinforcement learning","volume":"1","author":"Momennejad","year":"2017","journal-title":"Nat. Hum. Behav"},{"key":"B46","article-title":"Learning model-based strategies in simple environments with hierarchical q-networks","author":"Muyesser","year":"2018","journal-title":"arXiv preprint arXiv:1801.06689"},{"key":"B47","doi-asserted-by":"publisher","first-page":"532","DOI":"10.1037\/\/0033-295X.87.6.532","article-title":"A model for pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli","volume":"87","author":"Pearce","year":"1980","journal-title":"Psychol. Rev"},{"key":"B48","doi-asserted-by":"publisher","first-page":"74","DOI":"10.1038\/nature12112","article-title":"Hippocampal place-cell sequences depict future paths to remembered goals","volume":"497","author":"Pfeiffer","year":"2013","journal-title":"Nature"},{"year":"2025","author":"Qu","key":"B49"},{"key":"B50","article-title":"\u201cImagination-augmented agents for deep reinforcement learning,\u201d","author":"Racani\u00e9re","year":"2017","journal-title":"Advances in Neural Information Processing Systems"},{"key":"B51","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford","year":"2019","journal-title":"OpenAI blog"},{"key":"B52","doi-asserted-by":"publisher","first-page":"e1005768","DOI":"10.1371\/journal.pcbi.1005768","article-title":"Predictive representations can link model-based reinforcement learning to model-free mechanisms","volume":"13","author":"Russek","year":"2017","journal-title":"PLoS Comput. Biol"},{"key":"B53","doi-asserted-by":"publisher","first-page":"114879","DOI":"10.1016\/j.enbuild.2024.114879","article-title":"Dyna-pinn: physics-informed deep dyna-q reinforcement learning for intelligent control of building heating system in low-diversity training data regimes","volume":"324","author":"Saeed","year":"2024","journal-title":"Energy Build"},{"volume-title":"Real-time digital twin with reinforcement learning for industrial manipulator applications","year":"2024","author":"Samaylal","key":"B54"},{"key":"B55","article-title":"Proximal policy optimization algorithms","author":"Schulman","year":"2017","journal-title":"arXiv preprint arXiv:1707.06347"},{"key":"B56","first-page":"1889","article-title":"\u201cTrust region policy optimization,\u201d","author":"Schulman","year":"2015","journal-title":"Proceedings of the 32nd International Conference on Machine Learning, Vol. 37"},{"key":"B57","doi-asserted-by":"publisher","DOI":"10.1101\/2025.02.14.638024","article-title":"Dual process impairments in reinforcement learning and working memory systems underlie learning deficits in physiological anxiety","author":"Senta","year":"2025","journal-title":"bioRxiv"},{"key":"B58","doi-asserted-by":"publisher","first-page":"1643","DOI":"10.1038\/nn.4650","article-title":"The hippocampus as a predictive map","volume":"20","author":"Stachenfeld","year":"2017","journal-title":"Nat. Neurosci"},{"key":"B59","first-page":"216","article-title":"\u201cIntegrated architectures for learning, planning, and reacting based on approximating dynamic programming,\u201d","volume-title":"Proceedings of the Seventh International Conference","author":"Sutton","year":"1990"},{"key":"B60","first-page":"171","article-title":"\u201cAdapting bias by gradient descent: an incremental version of delta-bar-delta,\u201d","volume-title":"AAAI","author":"Sutton","year":"1992"},{"volume-title":"Reinforcement Learning: An Introduction","year":"2018","author":"Sutton","key":"B61"},{"key":"B62","doi-asserted-by":"publisher","first-page":"189","DOI":"10.1037\/h0061626","article-title":"Cognitive maps in rats and men","volume":"55","author":"Tolman","year":"1948","journal-title":"Psychol. Rev"},{"key":"B63","article-title":"\u201cAttention is all you need,\u201d","author":"Vaswani","year":"2017","journal-title":"Advances in Neural Information Processing Systems"},{"key":"B64","first-page":"22536","article-title":"\u201cTowards evaluating adaptivity of model-based reinforcement learning methods,\u201d","volume-title":"International Conference on Machine Learning","author":"Wan","year":"2022"},{"key":"B65","article-title":"Learning to reinforcement learn","author":"Wang","year":"2016","journal-title":"arXiv preprint arXiv:1611.05763"},{"key":"B66","doi-asserted-by":"publisher","first-page":"6459","DOI":"10.1523\/JNEUROSCI.3414-13.2014","article-title":"Hippocampal replay captures the unique topological structure of a novel environment","volume":"34","author":"Wu","year":"2014","journal-title":"J. Neurosci"}],"container-title":["Frontiers in Computational Neuroscience"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fncom.2025.1559915\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,27]],"date-time":"2025-03-27T23:42:49Z","timestamp":1743118969000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fncom.2025.1559915\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,27]]},"references-count":66,"alternative-id":["10.3389\/fncom.2025.1559915"],"URL":"https:\/\/doi.org\/10.3389\/fncom.2025.1559915","relation":{},"ISSN":["1662-5188"],"issn-type":[{"type":"electronic","value":"1662-5188"}],"subject":[],"published":{"date-parts":[[2025,3,27]]},"article-number":"1559915"}}