{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,28]],"date-time":"2026-02-28T07:31:53Z","timestamp":1772263913928,"version":"3.50.1"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1008552","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2021,1,20]],"date-time":"2021-01-20T00:00:00Z","timestamp":1611100800000}}],"reference-count":54,"publisher":"Public Library of Science (PLoS)","issue":"1","license":[{"start":{"date-parts":[[2021,1,7]],"date-time":"2021-01-07T00:00:00Z","timestamp":1609977600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>Dual-reinforcement learning theory proposes behaviour is under the tutelage of a retrospective, value-caching, model-free (MF) system and a prospective-planning, model-based (MB), system. This architecture raises a question as to the degree to which, when devising a plan, a MB controller takes account of influences from its MF counterpart. We present evidence that such a sophisticated self-reflective MB planner incorporates an anticipation of the influences its own MF-proclivities exerts on the execution of its planned future actions. Using a novel bandit task, wherein subjects were periodically allowed to design their environment, we show that reward-assignments were constructed in a manner consistent with a MB system taking account of its MF propensities. Thus, in the task participants assigned higher rewards to bandits that were momentarily associated with stronger MF tendencies. Our findings have implications for a range of decision making domains that includes drug abuse, pre-commitment, and the tension between short and long-term decision horizons in economics.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1008552","type":"journal-article","created":{"date-parts":[[2021,1,7]],"date-time":"2021-01-07T23:47:46Z","timestamp":1610063266000},"page":"e1008552","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":10,"title":["Model based planners reflect on their model-free propensities"],"prefix":"10.1371","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7641-2402","authenticated-orcid":true,"given":"Rani","family":"Moran","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mehdi","family":"Keramati","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Raymond J.","family":"Dolan","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"340","published-online":{"date-parts":[[2021,1,7]]},"reference":[{"key":"pcbi.1008552.ref001","volume-title":"Stevens\u2019 Handbook of Experimental Psychology","author":"A Dickinson","year":"2002"},{"key":"pcbi.1008552.ref002","doi-asserted-by":"crossref","first-page":"1704","DOI":"10.1038\/nn1560","article-title":"Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control","volume":"8","author":"ND Daw","year":"2005","journal-title":"Nat Neurosci"},{"key":"pcbi.1008552.ref003","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1038\/npp.2009.131","article-title":"Human and Rodent Homologies in Action Control: Corticostriatal Determinants of Goal-Directed and Habitual Action","volume":"35","author":"BW Balleine","year":"2010","journal-title":"Neuropsychopharmacology"},{"key":"pcbi.1008552.ref004","doi-asserted-by":"crossref","first-page":"312","DOI":"10.1016\/j.neuron.2013.09.007","article-title":"Goals and habits in the brain","volume":"80","author":"RJ Dolan","year":"2013","journal-title":"Neuron"},{"key":"pcbi.1008552.ref005","doi-asserted-by":"crossref","first-page":"961","DOI":"10.1016\/S0893-6080(99)00046-5","article-title":"What are the computations of the cerebellum, the basal ganglia and the cerebral cortex?","volume":"12","author":"K Doya","year":"1999","journal-title":"Neural Netw"},{"key":"pcbi.1008552.ref006","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1111\/j.1460-9568.2005.04218.x","article-title":"The role of the dorsomedial striatum in instrumental conditioning","volume":"22","author":"HH Yin","year":"2005","journal-title":"Eur J Neurosci"},{"key":"pcbi.1008552.ref007","doi-asserted-by":"crossref","first-page":"1204","DOI":"10.1016\/j.neuron.2011.02.027","article-title":"Model-based influences on humans\u2019 choices and striatal prediction errors","volume":"69","author":"ND Daw","year":"2011","journal-title":"Neuron"},{"key":"pcbi.1008552.ref008","doi-asserted-by":"crossref","first-page":"182","DOI":"10.1037\/a0030844","article-title":"Retrospective revaluation in sequential decision making: A tale of two systems","volume":"143","author":"SJ Gershman","year":"2014","journal-title":"J Exp Psychol Gen"},{"key":"pcbi.1008552.ref009","doi-asserted-by":"crossref","first-page":"585","DOI":"10.1016\/j.neuron.2010.04.016","article-title":"States versus rewards: Dissociable neural prediction error signals underlying model-based and model-free reinforcement learning","volume":"66","author":"J Gl\u00e4scher","year":"2010","journal-title":"Neuron"},{"key":"pcbi.1008552.ref010","doi-asserted-by":"crossref","first-page":"4019","DOI":"10.1523\/JNEUROSCI.0564-07.2007","article-title":"Determining the neural substrates of goal-directed learning in the human brain","volume":"27","author":"V Valentin V","year":"2007","journal-title":"J Neurosci"},{"key":"pcbi.1008552.ref011","doi-asserted-by":"crossref","first-page":"914","DOI":"10.1016\/j.neuron.2013.08.009","article-title":"Disruption of Dorsolateral Prefrontal Cortex Decreases Model-Based in Favor of Model-free Control in Humans","volume":"80","author":"P Smittenaar","year":"2013","journal-title":"Neuron"},{"key":"pcbi.1008552.ref012","doi-asserted-by":"crossref","first-page":"400","DOI":"10.1093\/cercor\/13.4.400","article-title":"Coordination of Actions and Habits in the Medial Prefrontal Cortex of Rats","volume":"13","author":"S Killcross","year":"2003","journal-title":"Cereb Cortex"},{"key":"pcbi.1008552.ref013","doi-asserted-by":"crossref","first-page":"13817","DOI":"10.1073\/pnas.1506367112","article-title":"Habitual control of goal selection in humans","volume":"112","author":"F Cushman","year":"2015","journal-title":"Proc Natl Acad Sci"},{"key":"pcbi.1008552.ref014","doi-asserted-by":"crossref","first-page":"750","DOI":"10.1038\/s41467-019-08662-8","article-title":"Retrospective model-based inference guides model-free credit assignment","volume":"10","author":"R Moran","year":"2019","journal-title":"Nat Commun"},{"key":"pcbi.1008552.ref015","doi-asserted-by":"crossref","first-page":"15871","DOI":"10.1073\/pnas.1821647116","article-title":"Credit assignment to state-independent task representations and its relationship with model-based decision making","volume":"116","author":"N Shahar","year":"2019","journal-title":"Proc Natl Acad Sci U S A"},{"key":"pcbi.1008552.ref016","doi-asserted-by":"crossref","first-page":"e1006803","DOI":"10.1371\/journal.pcbi.1006803","article-title":"Improving the reliability of model-based decision-making estimates in the two-stage decision task with reaction-times and drift-diffusion modeling","volume":"15","author":"N Shahar","year":"2019","journal-title":"PLOS Comput Biol"},{"key":"pcbi.1008552.ref017","volume-title":"Proc Natl Acad Sci U S A","author":"R Moran"},{"key":"pcbi.1008552.ref018","first-page":"322","author":"RS Sutton","year":"1998","journal-title":"Reinforcement learning : an introduction"},{"key":"pcbi.1008552.ref019","doi-asserted-by":"crossref","first-page":"1593","DOI":"10.1126\/science.275.5306.1593","article-title":"A neural substrate of prediction and reward","volume":"275","author":"W Schultz","year":"1997","journal-title":"Science"},{"key":"pcbi.1008552.ref020","doi-asserted-by":"crossref","first-page":"767","DOI":"10.1038\/nn.3981","article-title":"Model-based choices involve prospective neural activity","volume":"18","author":"BB Doll","year":"2015","journal-title":"Nat Neurosci"},{"key":"pcbi.1008552.ref021","volume-title":"Dynamic programming","author":"R Bellman","year":"2003"},{"key":"pcbi.1008552.ref022","first-page":"739","volume-title":"Stevens\u2019 Handbook of Experimental Psychology","author":"RM Shiffrin","year":"1998","edition":"2"},{"key":"pcbi.1008552.ref023","doi-asserted-by":"crossref","first-page":"525","DOI":"10.1207\/s15516709cog2703_8","article-title":"Controlled & automatic processing: Behavior, theory, and biological mechanisms","volume":"27","author":"W Schneider","year":"2003","journal-title":"Cogn Sci"},{"key":"pcbi.1008552.ref024","volume-title":"Learning from delayed rewards","author":"CJC Watkins","year":"1989"},{"key":"pcbi.1008552.ref025","first-page":"64","article-title":"A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement","volume":"21","author":"RA Rescorla","year":"1972","journal-title":"Class Cond II Curr Res Theory"},{"key":"pcbi.1008552.ref026","doi-asserted-by":"crossref","first-page":"425","DOI":"10.1016\/j.tics.2017.03.011","article-title":"The Importance of Falsification in Computational Cognitive Modeling","volume":"21","author":"S Palminteri","year":"2017","journal-title":"Trends Cogn Sci"},{"key":"pcbi.1008552.ref027","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1016\/j.cogpsych.2015.01.005","article-title":"Old processes, new perspectives: Familiarity is correlated with (not independent of) recollection and is more (not equally) variable for targets than for lures","volume":"79","author":"R Moran","year":"2015","journal-title":"Cogn Psychol"},{"key":"pcbi.1008552.ref028","doi-asserted-by":"crossref","first-page":"20941","DOI":"10.1073\/pnas.1312011110","article-title":"Working-memory capacity protects model-based learning from stress","volume":"110","author":"AR Otto","year":"2013","journal-title":"Proc Natl Acad Sci U S A"},{"key":"pcbi.1008552.ref029","first-page":"7","article-title":"Speed\/accuracy trade-off between the habitual and the goal-directed processes","author":"M Keramati","year":"2011","journal-title":"PLoS Comput Biol"},{"key":"pcbi.1008552.ref030","first-page":"216","article-title":"Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming","volume":"1990","author":"RS Sutton","year":"1990","journal-title":"Machine Learning Proceedings"},{"key":"pcbi.1008552.ref031","doi-asserted-by":"crossref","first-page":"687","DOI":"10.1016\/j.neuron.2013.11.028","article-title":"Neural Computations Underlying Arbitration between Model-Based and Model-free Learning","volume":"81","author":"S Wan Lee","year":"2014","journal-title":"Neuron"},{"key":"pcbi.1008552.ref032","first-page":"751","article-title":"The Curse of Planning: Dissecting Multiple Reinforcement-Learning Systems by Taxing the Central Executive","volume":"24","author":"AR Otto","year":"2013"},{"key":"pcbi.1008552.ref033","doi-asserted-by":"crossref","first-page":"268","DOI":"10.1016\/j.psyneuen.2014.12.017","article-title":"The interaction of acute and chronic stress impairs model-based behavioral control","volume":"53","author":"C Radenbach","year":"2015","journal-title":"Psychoneuroendocrinology"},{"key":"pcbi.1008552.ref034","doi-asserted-by":"crossref","DOI":"10.1093\/acprof:oso\/9780199646739.001.0001","volume-title":"Foundations of metacognition","author":"M. J. Beran","year":"2012"},{"key":"pcbi.1008552.ref035","volume-title":"Metacognition: Knowing about knowing","author":"J Metcalfe","year":"1996"},{"key":"pcbi.1008552.ref036","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511808098","volume-title":"Heuristics and Biases. Heuristics and Biases","author":"T Gilovich","year":"2002"},{"key":"pcbi.1008552.ref037","first-page":"1","article-title":"Resource-rational analysis: understanding human cognition as the optimal use of limited computational resources","author":"F Lieder","year":"2019","journal-title":"Behav Brain Sci"},{"key":"pcbi.1008552.ref038","doi-asserted-by":"crossref","first-page":"12868","DOI":"10.1073\/pnas.1609094113","article-title":"Adaptive integration of habits into depth-limited planning defines a habitual-goal\u2013directed spectrum","volume":"113","author":"M Keramati","year":"2016","journal-title":"Proc Natl Acad Sci"},{"key":"pcbi.1008552.ref039","doi-asserted-by":"crossref","first-page":"1153","DOI":"10.1016\/j.neunet.2006.03.002","article-title":"The misbehavior of value and the discipline of the will","volume":"19","author":"P Dayan","year":"2006","journal-title":"Neural Networks"},{"key":"pcbi.1008552.ref040","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1016\/j.tics.2004.10.003","article-title":"Conflict monitoring and anterior cingulate cortex: an update","volume":"8","author":"MM Botvinick","year":"2004","journal-title":"Trends Cogn Sci"},{"key":"pcbi.1008552.ref041","article-title":"Animal Spirits: Affective and Deliberative Processes in Economic Behavior","author":"GF Loewenstein","year":"2004","journal-title":"SSRN Electron J"},{"key":"pcbi.1008552.ref042","doi-asserted-by":"crossref","first-page":"1449","DOI":"10.1257\/aer.96.5.1449","article-title":"A Dual-Self Model of Impulse Control","volume":"96","author":"D Fudenberg","year":"2006","journal-title":"Am Econ Rev"},{"key":"pcbi.1008552.ref043","doi-asserted-by":"crossref","first-page":"1558","DOI":"10.1257\/0002828043052222","article-title":"Addiction and Cue-Triggered Decision Processes","volume":"94","author":"BD Bernheim","year":"2004","journal-title":"Am Econ Rev"},{"key":"pcbi.1008552.ref044","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1371\/journal.pone.0162246","article-title":"Plans, habits, and theory of mind","volume":"11","author":"SJ Gershman","year":"2016","journal-title":"PLoS One"},{"key":"pcbi.1008552.ref045","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1037\/h0024835","article-title":"Self-perception: An alternative interpretation of cognitive dissonance phenomena","volume":"74","author":"DJ Bem","year":"1967","journal-title":"Psychol Rev"},{"key":"pcbi.1008552.ref046","doi-asserted-by":"crossref","first-page":"400","DOI":"10.1111\/j.1749-6632.1992.tb25984.x","article-title":"Classical conditioning in drug-dependent humans","volume":"654","author":"CP O\u2019Brien","year":"1992","journal-title":"Ann N Y Acad Sci"},{"key":"pcbi.1008552.ref047","doi-asserted-by":"crossref","first-page":"453","DOI":"10.1007\/s00213-013-3120-y","article-title":"The reinstatement model of drug relapse: recent neurobiological findings, emerging research topics, and translational research","volume":"229","author":"JM Bossert","year":"2013","journal-title":"Psychopharmacology (Berl)"},{"key":"pcbi.1008552.ref048","doi-asserted-by":"crossref","first-page":"635","DOI":"10.1016\/j.tics.2014.09.003","article-title":"Impaired self-awareness in human addiction: deficient attribution of personal relevance","volume":"18","author":"SJ Moeller","year":"2014","journal-title":"Trends Cogn Sci"},{"key":"pcbi.1008552.ref049","doi-asserted-by":"crossref","first-page":"391","DOI":"10.1016\/j.neuron.2013.05.028","article-title":"Restricting temptations: Neural mechanisms of precommitment","volume":"79","author":"MJ Crockett","year":"2013","journal-title":"Neuron"},{"key":"pcbi.1008552.ref050","doi-asserted-by":"crossref","first-page":"219","DOI":"10.1111\/1467-9280.00441","article-title":"Procrastination, Deadlines, and Performance: Self-Control by Precommitment","volume":"13","author":"D Ariely","year":"2002","journal-title":"Psychol Sci"},{"key":"pcbi.1008552.ref051","doi-asserted-by":"crossref","first-page":"4296","DOI":"10.1152\/jn.00024.2007","article-title":"Object Category Structure in Response Patterns of Neuronal Population in Monkey Inferior Temporal Cortex","volume":"97","author":"R Kiani","year":"2007","journal-title":"J Neurophysiol"},{"key":"pcbi.1008552.ref052","doi-asserted-by":"crossref","first-page":"1126","DOI":"10.1016\/j.neuron.2008.10.043","article-title":"Matching categorical object representations in inferior temporal cortex of man and monkey","volume":"60","author":"N Kriegeskorte","year":"2008","journal-title":"Neuron"},{"key":"pcbi.1008552.ref053","doi-asserted-by":"crossref","first-page":"88","DOI":"10.1016\/j.jmp.2019.03.007","article-title":"Biases in estimating the balance between model-free and model-based learning systems due to model misspecification","volume":"91","author":"A Toyama","year":"2019","journal-title":"J Math Psychol"},{"key":"pcbi.1008552.ref054","first-page":"1","article-title":"Habits without values","author":"J. M Kevin","year":"2018","journal-title":"Psychol Rev"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1008552","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2021,1,20]],"date-time":"2021-01-20T00:00:00Z","timestamp":1611100800000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1008552","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,1,20]],"date-time":"2021-01-20T19:42:53Z","timestamp":1611171773000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1008552"}},"subtitle":[],"editor":[{"given":"Alireza","family":"Soltani","sequence":"first","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2021,1,7]]},"references-count":54,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2021,1,7]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1008552","relation":{"new_version":[{"id-type":"doi","id":"10.1371\/journal.pcbi.1008552","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,1,7]]}}}