{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,21]],"date-time":"2026-03-21T02:09:41Z","timestamp":1774058981422,"version":"3.50.1"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1009070","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2021,6,15]],"date-time":"2021-06-15T00:00:00Z","timestamp":1623715200000}}],"reference-count":98,"publisher":"Public Library of Science (PLoS)","issue":"6","license":[{"start":{"date-parts":[[2021,6,3]],"date-time":"2021-06-03T00:00:00Z","timestamp":1622678400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001711","name":"Schweizerischer Nationalfonds zur F\u00f6rderung der Wissenschaftlichen Forschung","doi-asserted-by":"publisher","award":["CRSII2 147636 (Sinergia)"],"award-info":[{"award-number":["CRSII2 147636 (Sinergia)"]}],"id":[{"id":"10.13039\/501100001711","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001711","name":"Schweizerischer Nationalfonds zur F\u00f6rderung der Wissenschaftlichen Forschung","doi-asserted-by":"publisher","award":["CRSII2 147636 (Sinergia)"],"award-info":[{"award-number":["CRSII2 147636 (Sinergia)"]}],"id":[{"id":"10.13039\/501100001711","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001711","name":"Schweizerischer Nationalfonds zur F\u00f6rderung der Wissenschaftlichen Forschung","doi-asserted-by":"publisher","award":["200020 184615"],"award-info":[{"award-number":["200020 184615"]}],"id":[{"id":"10.13039\/501100001711","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100010661","name":"Horizon 2020 Framework Programme","doi-asserted-by":"publisher","award":["785907 (Human Brain Project, SGA2)"],"award-info":[{"award-number":["785907 (Human Brain Project, SGA2)"]}],"id":[{"id":"10.13039\/100010661","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100010661","name":"Horizon 2020 Framework Programme","doi-asserted-by":"publisher","award":["785907 (Human Brain Project, SGA2)"],"award-info":[{"award-number":["785907 (Human Brain Project, SGA2)"]}],"id":[{"id":"10.13039\/100010661","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>Classic reinforcement learning (RL) theories cannot explain human behavior in the absence of external reward or when the environment changes. Here, we employ a deep sequential decision-making paradigm with sparse reward and abrupt environmental changes. To explain the behavior of human participants in these environments, we show that RL theories need to include surprise and novelty, each with a distinct role. While novelty drives exploration before the first encounter of a reward, surprise increases the rate of learning of a world-model as well as of model-free action-values. Even though the world-model is available for model-based RL, we find that human decisions are dominated by model-free action choices. The world-model is only marginally used for planning, but it is important to detect surprising events. Our theory predicts human action choices with high probability and allows us to dissociate surprise, novelty, and reward in EEG signals.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1009070","type":"journal-article","created":{"date-parts":[[2021,6,3]],"date-time":"2021-06-03T14:18:50Z","timestamp":1622729930000},"page":"e1009070","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":43,"title":["Novelty is not surprise: Human exploratory and adaptive behavior in sequential decision-making"],"prefix":"10.1371","volume":"17","author":[{"given":"He A.","family":"Xu","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7870-8602","authenticated-orcid":true,"given":"Alireza","family":"Modirshanechi","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5274-144X","authenticated-orcid":true,"given":"Marco P.","family":"Lehmann","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4344-2189","authenticated-orcid":true,"given":"Wulfram","family":"Gerstner","sequence":"additional","affiliation":[]},{"given":"Michael H.","family":"Herzog","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2021,6,3]]},"reference":[{"issue":"5306","key":"pcbi.1009070.ref001","doi-asserted-by":"crossref","first-page":"1593","DOI":"10.1126\/science.275.5306.1593","article-title":"A neural substrate of prediction and reward","volume":"275","author":"W Schultz","year":"1997","journal-title":"Science"},{"issue":"2","key":"pcbi.1009070.ref002","doi-asserted-by":"crossref","first-page":"329","DOI":"10.1016\/S0896-6273(03)00169-7","article-title":"Temporal difference models and reward-related learning in the human brain","volume":"38","author":"JP O\u2019Doherty","year":"2003","journal-title":"Neuron"},{"issue":"12","key":"pcbi.1009070.ref003","doi-asserted-by":"crossref","first-page":"1704","DOI":"10.1038\/nn1560","article-title":"Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control","volume":"8","author":"ND Daw","year":"2005","journal-title":"Nature neuroscience"},{"issue":"7106","key":"pcbi.1009070.ref004","doi-asserted-by":"crossref","first-page":"1042","DOI":"10.1038\/nature05051","article-title":"Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans","volume":"442","author":"M Pessiglione","year":"2006","journal-title":"Nature"},{"issue":"4","key":"pcbi.1009070.ref005","doi-asserted-by":"crossref","first-page":"585","DOI":"10.1016\/j.neuron.2010.04.016","article-title":"States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning","volume":"66","author":"J Gl\u00e4scher","year":"2010","journal-title":"Neuron"},{"issue":"6","key":"pcbi.1009070.ref006","doi-asserted-by":"crossref","first-page":"1204","DOI":"10.1016\/j.neuron.2011.02.027","article-title":"Model-based influences on humans\u2019 choices and striatal prediction errors","volume":"69","author":"ND Daw","year":"2011","journal-title":"Neuron"},{"issue":"5","key":"pcbi.1009070.ref007","doi-asserted-by":"crossref","first-page":"786","DOI":"10.1038\/nn.3068","article-title":"Mapping value based planning and extensively trained choice in the human brain","volume":"15","author":"K Wunderlich","year":"2012","journal-title":"Nature neuroscience"},{"key":"pcbi.1009070.ref008","doi-asserted-by":"crossref","first-page":"e47463","DOI":"10.7554\/eLife.47463","article-title":"One-shot learning and behavioral eligibility traces in sequential decision making","volume":"8","author":"MP Lehmann","year":"2019","journal-title":"Elife"},{"issue":"3","key":"pcbi.1009070.ref009","doi-asserted-by":"crossref","first-page":"391","DOI":"10.1111\/tops.12138","article-title":"Novelty and inductive generalization in human reinforcement learning","volume":"7","author":"SJ Gershman","year":"2015","journal-title":"Topics in cognitive science"},{"key":"pcbi.1009070.ref010","doi-asserted-by":"crossref","first-page":"167","DOI":"10.1016\/j.conb.2019.08.004","article-title":"Visual novelty, curiosity, and intrinsic reward in machine learning and the brain","volume":"58","author":"A Jaegle","year":"2019","journal-title":"Current Opinion in Neurobiology"},{"key":"pcbi.1009070.ref011","unstructured":"Singh S, Lewis RL, Barto AG. Where do rewards come from. In: Proceedings of the annual conference of the cognitive science society. Cognitive Science Society; 2009. p. 2601\u20132606."},{"issue":"3","key":"pcbi.1009070.ref012","doi-asserted-by":"crossref","first-page":"230","DOI":"10.1109\/TAMD.2010.2056368","article-title":"Formal theory of creativity, fun, and intrinsic motivation (1990\u20132010)","volume":"2","author":"J Schmidhuber","year":"2010","journal-title":"IEEE Transactions on Autonomous Mental Development"},{"key":"pcbi.1009070.ref013","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1016\/j.cobeha.2016.04.005","article-title":"Reinforcement learning with Marr","volume":"11","author":"Y Niv","year":"2016","journal-title":"Current opinion in behavioral sciences"},{"issue":"11","key":"pcbi.1009070.ref014","doi-asserted-by":"crossref","first-page":"585","DOI":"10.1016\/j.tics.2013.09.001","article-title":"Information-seeking, curiosity, and attention: computational and neural mechanisms","volume":"17","author":"J Gottlieb","year":"2013","journal-title":"Trends in cognitive sciences"},{"issue":"3","key":"pcbi.1009070.ref015","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1037\/rev0000175","article-title":"Reconciling novelty and complexity through a rational analysis of curiosity","volume":"127","author":"R Dubey","year":"2019","journal-title":"Psychological Review"},{"key":"pcbi.1009070.ref016","first-page":"1281","volume-title":"Advances in neural information processing systems","author":"N Chentanez","year":"2005"},{"key":"pcbi.1009070.ref017","first-page":"1471","volume-title":"Advances in Neural Information Processing Systems","author":"M Bellemare","year":"2016"},{"key":"pcbi.1009070.ref018","doi-asserted-by":"crossref","unstructured":"Martin J, Narayanan SS, Everitt T, Hutter M. Count-based exploration in feature space for reinforcement learning. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. AAAI Press; 2017. p. 2471\u20132478.","DOI":"10.24963\/ijcai.2017\/344"},{"key":"pcbi.1009070.ref019","volume-title":"Reinforcement learning: An introduction","author":"RS Sutton","year":"2018"},{"issue":"7","key":"pcbi.1009070.ref020","doi-asserted-by":"crossref","first-page":"1040","DOI":"10.1038\/nn.3130","article-title":"Rational regulation of learning dynamics by pupil-linked arousal systems","volume":"15","author":"MR Nassar","year":"2012","journal-title":"Nature neuroscience"},{"issue":"1","key":"pcbi.1009070.ref021","doi-asserted-by":"crossref","first-page":"177","DOI":"10.1016\/j.neuroimage.2012.04.050","article-title":"Evidence for neural encoding of Bayesian surprise in human somatosensation","volume":"62","author":"D Ostwald","year":"2012","journal-title":"NeuroImage"},{"key":"pcbi.1009070.ref022","doi-asserted-by":"crossref","first-page":"e41541","DOI":"10.7554\/eLife.41541","article-title":"Brain signatures of a multiscale process of sequence learning in humans","volume":"8","author":"M Maheu","year":"2019","journal-title":"Elife"},{"key":"pcbi.1009070.ref023","doi-asserted-by":"crossref","first-page":"302","DOI":"10.1016\/j.neuroimage.2019.04.028","article-title":"Trial-by-trial surprise-decoding model for visual and auditory binary oddball tasks","volume":"196","author":"A Modirshanechi","year":"2019","journal-title":"NeuroImage"},{"issue":"37","key":"pcbi.1009070.ref024","doi-asserted-by":"crossref","first-page":"12366","DOI":"10.1523\/JNEUROSCI.0822-10.2010","article-title":"An approximately Bayesian delta-rule model explains the dynamics of belief updating in a changing environment","volume":"30","author":"MR Nassar","year":"2010","journal-title":"Journal of Neuroscience"},{"issue":"9","key":"pcbi.1009070.ref025","doi-asserted-by":"crossref","first-page":"1214","DOI":"10.1038\/nn1954","article-title":"Learning the value of information in an uncertain world","volume":"10","author":"TE Behrens","year":"2007","journal-title":"Nature neuroscience"},{"issue":"4","key":"pcbi.1009070.ref026","doi-asserted-by":"crossref","first-page":"e1006972","DOI":"10.1371\/journal.pcbi.1006972","article-title":"Confidence resets reveal hierarchical adaptive learning in humans","volume":"15","author":"M Heilbron","year":"2019","journal-title":"PLoS computational biology"},{"issue":"10","key":"pcbi.1009070.ref027","doi-asserted-by":"crossref","first-page":"635","DOI":"10.1038\/s41583-019-0180-y","article-title":"Adaptive learning under expected and unexpected uncertainty","volume":"20","author":"A Soltani","year":"2019","journal-title":"Nature Reviews Neuroscience"},{"issue":"1","key":"pcbi.1009070.ref028","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1162\/neco_a_01025","article-title":"Balancing new against old information: the role of puzzlement surprise in learning","volume":"30","author":"M Faraji","year":"2018","journal-title":"Neural computation"},{"issue":"2","key":"pcbi.1009070.ref029","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1162\/neco_a_01352","article-title":"Learning in Volatile Environments with the Bayes Factor Surprise","volume":"33","author":"V Liakoni","year":"2021","journal-title":"Neural Computation"},{"key":"pcbi.1009070.ref030","first-page":"1","volume-title":"Nature Human Behaviour","author":"C Findling","year":"2020"},{"issue":"4","key":"pcbi.1009070.ref031","doi-asserted-by":"crossref","first-page":"681","DOI":"10.1016\/j.neuron.2005.04.026","article-title":"Uncertainty, neuromodulation, and attention","volume":"46","author":"AJ Yu","year":"2005","journal-title":"Neuron"},{"issue":"6204","key":"pcbi.1009070.ref032","doi-asserted-by":"crossref","first-page":"1616","DOI":"10.1126\/science.1255514","article-title":"A critical time window for dopamine actions on the structural plasticity of dendritic spines","volume":"345","author":"S Yagishita","year":"2014","journal-title":"Science"},{"key":"pcbi.1009070.ref033","doi-asserted-by":"crossref","DOI":"10.3389\/fncir.2018.00053","article-title":"Eligibility traces and plasticity on behavioral time scales: experimental support of neohebbian three-factor learning rules","volume":"12","author":"W Gerstner","year":"2018","journal-title":"Frontiers in neural circuits"},{"key":"pcbi.1009070.ref034","doi-asserted-by":"crossref","DOI":"10.7551\/mitpress\/9027.001.0001","volume-title":"Inside jokes: Using humor to reverse-engineer the mind","author":"MM Hurley","year":"2011"},{"key":"pcbi.1009070.ref035","doi-asserted-by":"crossref","first-page":"907","DOI":"10.3389\/fpsyg.2013.00907","article-title":"Novelty or surprise?","volume":"4","author":"A Barto","year":"2013","journal-title":"Frontiers in psychology"},{"key":"pcbi.1009070.ref036","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-642-29075-6","volume-title":"Novelty, information and surprise","author":"G Palm","year":"2012"},{"key":"pcbi.1009070.ref037","doi-asserted-by":"crossref","first-page":"118","DOI":"10.1016\/j.cobeha.2020.07.008","article-title":"Understanding exploration in humans and machines by formalizing the function of curiosity","volume":"35","author":"R Dubey","year":"2020","journal-title":"Current Opinion in Behavioral Sciences"},{"issue":"47","key":"pcbi.1009070.ref038","doi-asserted-by":"crossref","first-page":"12539","DOI":"10.1523\/JNEUROSCI.2925-08.2008","article-title":"Trial-by-trial fluctuations in the event-related electroencephalogram reflect dynamic changes in the degree of surprise","volume":"28","author":"RB Mars","year":"2008","journal-title":"Journal of Neuroscience"},{"key":"pcbi.1009070.ref039","doi-asserted-by":"crossref","unstructured":"Gijsen S, Grundei M, Lange RT, Ostwald D, Blankenburg F. Neural surprise in somatosensory Bayesian learning. BioRxiv. 2020.","DOI":"10.1101\/2020.06.18.158915"},{"issue":"10","key":"pcbi.1009070.ref040","doi-asserted-by":"crossref","first-page":"836","DOI":"10.1016\/j.tics.2019.07.012","article-title":"Where does value come from?","volume":"23","author":"K Juechems","year":"2019","journal-title":"Trends in cognitive sciences"},{"issue":"4","key":"pcbi.1009070.ref041","doi-asserted-by":"crossref","first-page":"e1006713","DOI":"10.1371\/journal.pcbi.1006713","article-title":"Learning and forgetting using reinforced Bayesian change detection","volume":"15","author":"V Moens","year":"2019","journal-title":"PLoS computational biology"},{"key":"pcbi.1009070.ref042","unstructured":"Achiam J, Sastry S. Surprise-based intrinsic motivation for deep reinforcement learning. arXiv preprint arXiv:170301732. 2017."},{"key":"pcbi.1009070.ref043","unstructured":"Burda Y, Edwards H, Pathak D, Storkey A, Darrell T, Efros AA. Large-Scale Study of Curiosity-Driven Learning. In: International Conference on Learning Representations; 2018."},{"key":"pcbi.1009070.ref044","doi-asserted-by":"crossref","first-page":"312","DOI":"10.3389\/fpsyg.2017.00312","article-title":"What to choose next? a paradigm for testing human sequential decision making","volume":"8","author":"EM Tartaglia","year":"2017","journal-title":"Frontiers in psychology"},{"key":"pcbi.1009070.ref045","unstructured":"Oxford English Dictionary. \u201cnovelty, n. and adj.\u201d.;. Available from: https:\/\/www.oed.com\/view\/Entry\/128781."},{"key":"pcbi.1009070.ref046","article-title":"A mathematical theory of communication","volume":"20","author":"C Shannon","year":"1948","journal-title":"Bell System Technical Journal 27: 379-423 and 623\u2013656"},{"key":"pcbi.1009070.ref047","unstructured":"Tribus M. Thermostatics and thermodynamics: an introduction to energy, information and states of matter, with engineering applications. van Nostrand; 1961."},{"key":"pcbi.1009070.ref048","unstructured":"Oxford English Dictionary. \u201csurprise, n.\u201d.;. Available from: https:\/\/www.oed.com\/view\/Entry\/194999."},{"key":"pcbi.1009070.ref049","first-page":"1873","volume-title":"Advances in neural information processing systems","author":"AJ Yu","year":"2009"},{"issue":"12","key":"pcbi.1009070.ref050","doi-asserted-by":"crossref","first-page":"e1005260","DOI":"10.1371\/journal.pcbi.1005260","article-title":"Human inferences about sequences: A minimal transition probability model","volume":"12","author":"F Meyniel","year":"2016","journal-title":"PLoS computational biology"},{"key":"pcbi.1009070.ref051","doi-asserted-by":"crossref","unstructured":"Markovic D, Stojic H, Schwoebel S, Kiebel SJ. An empirical evaluation of active inference in multi-armed bandits. arXiv preprint arXiv:210108699. 2021.","DOI":"10.1016\/j.neunet.2021.08.018"},{"issue":"4","key":"pcbi.1009070.ref052","doi-asserted-by":"crossref","first-page":"1004","DOI":"10.1016\/j.neuroimage.2009.03.025","article-title":"Bayesian model selection for group studies","volume":"46","author":"KE Stephan","year":"2009","journal-title":"Neuroimage"},{"key":"pcbi.1009070.ref053","doi-asserted-by":"crossref","first-page":"971","DOI":"10.1016\/j.neuroimage.2013.08.065","article-title":"Bayesian model selection for group studies\u2014revisited","volume":"84","author":"L Rigoux","year":"2014","journal-title":"Neuroimage"},{"key":"pcbi.1009070.ref054","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1016\/j.cobeha.2016.04.003","article-title":"Taming the beast: extracting generalizable knowledge from computational models of cognition","volume":"11","author":"MR Nassar","year":"2016","journal-title":"Current opinion in behavioral sciences"},{"key":"pcbi.1009070.ref055","doi-asserted-by":"crossref","first-page":"e49547","DOI":"10.7554\/eLife.49547","article-title":"Ten simple rules for the computational modeling of behavioral data","volume":"8","author":"RC Wilson","year":"2019","journal-title":"Elife"},{"key":"pcbi.1009070.ref056","doi-asserted-by":"crossref","first-page":"222","DOI":"10.1016\/j.neuroimage.2014.11.007","article-title":"A computational analysis of the neural bases of Bayesian inference","volume":"106","author":"A Kolossa","year":"2015","journal-title":"Neuroimage"},{"issue":"8","key":"pcbi.1009070.ref057","doi-asserted-by":"crossref","first-page":"1870","DOI":"10.1016\/j.neubiorev.2012.05.008","article-title":"Learning from experience: event-related potential correlates of reward processing, neural adaptation, and behavioral choice","volume":"36","author":"MM Walsh","year":"2012","journal-title":"Neuroscience & Biobehavioral Reviews"},{"issue":"4","key":"pcbi.1009070.ref058","doi-asserted-by":"crossref","first-page":"679","DOI":"10.1037\/0033-295X.109.4.679","article-title":"The neural basis of human error processing: reinforcement learning, dopamine, and the error-related negativity","volume":"109","author":"CB Holroyd","year":"2002","journal-title":"Psychological review"},{"issue":"2","key":"pcbi.1009070.ref059","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1038\/nrn2787","article-title":"The free-energy principle: a unified brain theory?","volume":"11","author":"K Friston","year":"2010","journal-title":"Nature reviews neuroscience"},{"issue":"1","key":"pcbi.1009070.ref060","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1162\/NECO_a_00912","article-title":"Active inference: a process theory","volume":"29","author":"K Friston","year":"2017","journal-title":"Neural computation"},{"key":"pcbi.1009070.ref061","unstructured":"Storck J, Hochreiter S, Schmidhuber J. Reinforcement driven information acquisition in non-deterministic environments. In: Proceedings of the international conference on artificial neural networks, Paris. vol. 2. Citeseer; 1995. p. 159\u2013164."},{"key":"pcbi.1009070.ref062","first-page":"547","volume-title":"Advances in neural information processing systems","author":"L Itti","year":"2006"},{"key":"pcbi.1009070.ref063","doi-asserted-by":"crossref","unstructured":"Schmidhuber J. Driven by compression progress: A simple principle explains essential aspects of subjective beauty, novelty, surprise, interestingness, attention, curiosity, creativity, art, science, music, jokes. In: Workshop on anticipatory behavior in adaptive learning systems. Springer; 2008. p. 48\u201376.","DOI":"10.1007\/978-3-642-02565-5_4"},{"key":"pcbi.1009070.ref064","first-page":"1","volume-title":"Information, Coding and Mathematics","author":"P Baldi","year":"2002"},{"issue":"11","key":"pcbi.1009070.ref065","doi-asserted-by":"crossref","first-page":"e1003939","DOI":"10.1371\/journal.pcbi.1003939","article-title":"Statistical computations underlying the dynamics of memory updating","volume":"10","author":"SJ Gershman","year":"2014","journal-title":"PLoS computational biology"},{"key":"pcbi.1009070.ref066","doi-asserted-by":"crossref","first-page":"e23763","DOI":"10.7554\/eLife.23763","article-title":"The computational nature of memory modification","volume":"6","author":"SJ Gershman","year":"2017","journal-title":"Elife"},{"key":"pcbi.1009070.ref067","doi-asserted-by":"crossref","first-page":"85","DOI":"10.3389\/fncir.2015.00085","article-title":"Neuromodulated spike-timing-dependent plasticity, and theory of three-factor learning rules","volume":"9","author":"N Fr\u00e9maux","year":"2016","journal-title":"Frontiers in neural circuits"},{"issue":"12","key":"pcbi.1009070.ref068","doi-asserted-by":"crossref","first-page":"e1004648","DOI":"10.1371\/journal.pcbi.1004648","article-title":"Simple plans or sophisticated habits? State, transition and learning interactions in the two-step task","volume":"11","author":"T Akam","year":"2015","journal-title":"PLoS computational biology"},{"key":"pcbi.1009070.ref069","volume-title":"Thinking, fast and slow","author":"D Kahneman","year":"2011"},{"issue":"10","key":"pcbi.1009070.ref070","doi-asserted-by":"crossref","first-page":"3098","DOI":"10.1073\/pnas.1414219112","article-title":"Interplay of approximate planning strategies","volume":"112","author":"QJ Huys","year":"2015","journal-title":"Proceedings of the National Academy of Sciences"},{"issue":"8","key":"pcbi.1009070.ref071","doi-asserted-by":"crossref","first-page":"e1005090","DOI":"10.1371\/journal.pcbi.1005090","article-title":"When does model-based control pay off?","volume":"12","author":"W Kool","year":"2016","journal-title":"PLoS computational biology"},{"key":"pcbi.1009070.ref072","first-page":"1","article-title":"Humans primarily use model-based inference in the two-stage task","author":"CF da Silva","year":"2020","journal-title":"Nature Human Behaviour"},{"issue":"5","key":"pcbi.1009070.ref073","doi-asserted-by":"crossref","first-page":"1249","DOI":"10.1016\/j.cell.2020.10.024","article-title":"The Tolman-Eichenbaum machine: Unifying space and relational memory through generalization in the hippocampal formation","volume":"183","author":"JC Whittington","year":"2020","journal-title":"Cell"},{"key":"pcbi.1009070.ref074","first-page":"1","volume-title":"Computational Brain & Behavior","author":"CM Wu","year":"2020"},{"key":"pcbi.1009070.ref075","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1016\/j.conb.2018.11.003","article-title":"The algorithmic architecture of exploration in the human brain","volume":"55","author":"E Schulz","year":"2019","journal-title":"Current opinion in neurobiology"},{"issue":"1481","key":"pcbi.1009070.ref076","doi-asserted-by":"crossref","first-page":"933","DOI":"10.1098\/rstb.2007.2098","article-title":"Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration","volume":"362","author":"JD Cohen","year":"2007","journal-title":"Philosophical Transactions of the Royal Society B: Biological Sciences"},{"issue":"6","key":"pcbi.1009070.ref077","doi-asserted-by":"crossref","first-page":"587","DOI":"10.1038\/s41562-019-0589-3","article-title":"Diverse motives for human curiosity","volume":"3","author":"K Kobayashi","year":"2019","journal-title":"Nature human behaviour"},{"issue":"12","key":"pcbi.1009070.ref078","doi-asserted-by":"crossref","first-page":"758","DOI":"10.1038\/s41583-018-0078-0","article-title":"Towards a neuroscience of active sampling and curiosity","volume":"19","author":"J Gottlieb","year":"2018","journal-title":"Nature Reviews Neuroscience"},{"issue":"21","key":"pcbi.1009070.ref079","doi-asserted-by":"crossref","first-page":"8145","DOI":"10.1523\/JNEUROSCI.2978-14.2015","article-title":"Reinforcement learning in multidimensional environments relies on attention mechanisms","volume":"35","author":"Y Niv","year":"2015","journal-title":"Journal of Neuroscience"},{"issue":"6","key":"pcbi.1009070.ref080","doi-asserted-by":"crossref","first-page":"1600","DOI":"10.1016\/j.cell.2020.11.013","article-title":"A unified framework for dopamine signals across timescales","volume":"183","author":"HR Kim","year":"2020","journal-title":"Cell"},{"key":"pcbi.1009070.ref081","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1016\/j.conb.2020.08.014","article-title":"Dopamine signals as temporal difference errors: recent advances","volume":"67","author":"CK Starkweather","year":"2021","journal-title":"Current Opinion in Neurobiology"},{"issue":"1","key":"pcbi.1009070.ref082","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1152\/jn.1998.80.1.1","article-title":"Predictive reward signal of dopamine neurons","volume":"80","author":"W Schultz","year":"1998","journal-title":"Journal of neurophysiology"},{"issue":"2","key":"pcbi.1009070.ref083","doi-asserted-by":"crossref","first-page":"251","DOI":"10.1016\/S0006-8993(97)00265-5","article-title":"Burst activity of ventral tegmental dopamine neurons is elicited by sensory stimuli in the awake cat","volume":"759","author":"JC Horvitz","year":"1997","journal-title":"Brain research"},{"issue":"4-6","key":"pcbi.1009070.ref084","doi-asserted-by":"crossref","first-page":"549","DOI":"10.1016\/S0893-6080(02)00048-5","article-title":"Dopamine: generalization and bonuses","volume":"15","author":"S Kakade","year":"2002","journal-title":"Neural Networks"},{"issue":"1","key":"pcbi.1009070.ref085","doi-asserted-by":"crossref","first-page":"142","DOI":"10.1016\/j.neuron.2020.01.012","article-title":"Cue-Evoked Dopamine Promotes Conditioned Responding during Learning","volume":"106","author":"J Morrens","year":"2020","journal-title":"Neuron"},{"issue":"4","key":"pcbi.1009070.ref086","doi-asserted-by":"crossref","first-page":"433","DOI":"10.1163\/156856897X00357","article-title":"The psychophysics toolbox","volume":"10","author":"DH Brainard","year":"1997","journal-title":"Spatial vision"},{"issue":"1","key":"pcbi.1009070.ref087","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1016\/j.jneumeth.2003.10.009","article-title":"EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis","volume":"134","author":"A Delorme","year":"2004","journal-title":"Journal of neuroscience methods"},{"issue":"1","key":"pcbi.1009070.ref088","doi-asserted-by":"crossref","first-page":"tgaa034","DOI":"10.1093\/texcom\/tgaa034","article-title":"Brain networks sensitive to object novelty, value, and their combination","volume":"1","author":"A Ghazizadeh","year":"2020","journal-title":"Cerebral Cortex Communications"},{"key":"pcbi.1009070.ref089","unstructured":"Van Seijen H, Sutton RS. Efficient planning in MDPs by small backups. In: Proc. 30th Int. Conf. Mach. Learn.; 2013. p. 1\u20133."},{"key":"pcbi.1009070.ref090","unstructured":"Brea J. Is prioritized sweeping the better episodic control? arXiv preprint arXiv:171106677. 2017."},{"issue":"1","key":"pcbi.1009070.ref091","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1287\/mksc.4.1.20","article-title":"A Bayesian cross-validated likelihood method for comparing alternative specifications of quantitative models","volume":"4","author":"RT Rust","year":"1985","journal-title":"Marketing Science"},{"issue":"2","key":"pcbi.1009070.ref092","doi-asserted-by":"crossref","first-page":"489","DOI":"10.1093\/biomet\/asz077","article-title":"On the marginal likelihood and cross-validation","volume":"107","author":"E Fong","year":"2020","journal-title":"Biometrika"},{"key":"pcbi.1009070.ref093","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9781316576533","volume-title":"Computer age statistical inference","author":"B Efron","year":"2016"},{"issue":"1","key":"pcbi.1009070.ref094","article-title":"Trial-by-trial data analysis using computational models","volume":"23","author":"ND Daw","year":"2011","journal-title":"Decision making, affect, and learning: Attention and performance XXIII"},{"issue":"3","key":"pcbi.1009070.ref095","doi-asserted-by":"crossref","first-page":"368","DOI":"10.1016\/j.conb.2011.04.001","article-title":"Multiple representations and algorithms for reinforcement learning in the cortico-basal ganglia circuit","volume":"21","author":"M Ito","year":"2011","journal-title":"Current opinion in neurobiology"},{"issue":"1","key":"pcbi.1009070.ref096","doi-asserted-by":"crossref","first-page":"e1003441","DOI":"10.1371\/journal.pcbi.1003441","article-title":"VBA: a probabilistic treatment of nonlinear models for neurobiological and behavioural data","volume":"10","author":"J Daunizeau","year":"2014","journal-title":"PLoS Comput Biol"},{"key":"pcbi.1009070.ref097","doi-asserted-by":"crossref","DOI":"10.1002\/0470013192.bsa526","volume-title":"R-Squared, Adjusted R-Squared","author":"J Miles","year":"2005"},{"issue":"6","key":"pcbi.1009070.ref098","doi-asserted-by":"crossref","first-page":"e176","DOI":"10.1371\/journal.pbio.0020176","article-title":"Electroencephalographic brain dynamics following manually responded visual targets","volume":"2","author":"S Makeig","year":"2004","journal-title":"PLoS Biol"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1009070","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2021,6,15]],"date-time":"2021-06-15T00:00:00Z","timestamp":1623715200000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1009070","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,4]],"date-time":"2023-11-04T09:30:24Z","timestamp":1699090224000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1009070"}},"subtitle":[],"editor":[{"given":"Samuel J.","family":"Gershman","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,6,3]]},"references-count":98,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2021,6,3]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1009070","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2020.09.24.311084","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,6,3]]}}}