{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"institution":[{"name":"bioRxiv"}],"indexed":{"date-parts":[[2026,1,15]],"date-time":"2026-01-15T11:37:21Z","timestamp":1768477041987,"version":"3.49.0"},"posted":{"date-parts":[[2017,4,11]]},"group-title":"Neuroscience","reference-count":64,"publisher":"openRxiv","license":[{"start":{"date-parts":[[2017,4,11]],"date-time":"2017-04-11T00:00:00Z","timestamp":1491868800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.biorxiv.org\/about\/FAQ#license"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"accepted":{"date-parts":[[2020,5,22]]},"abstract":"<jats:title>Summary<\/jats:title>\n                <jats:p>The anterior cingulate cortex (ACC) is implicated in learning the value of actions, but it remains poorly understood whether and how it contributes to model-based mechanisms that use action-state predictions and afford behavioural flexibility. To isolate these mechanisms, we developed a multi-step decision task for mice in which both action-state transition probabilities and reward probabilities changed over time. Calcium imaging revealed ramps of choice-selective neuronal activity, followed by an evolving representation of the state reached and trial outcome, with different neuronal populations representing reward in different states. ACC neurons represented the current action-state transition structure, whether state transitions were expected or surprising, and the predicted state given chosen action. Optogenetic inhibition of ACC blocked the influence of action-state transitions on subsequent choice, without affecting the influence of rewards. These data support a role for ACC in model-based reinforcement learning, specifically in using action-state transitions to guide subsequent choice.<\/jats:p>\n                <jats:sec>\n                  <jats:title>Highlights<\/jats:title>\n                  <jats:list list-type=\"bullet\">\n                    <jats:list-item>\n                      <jats:p>A novel two-step task disambiguates model-based and model-free RL in mice.<\/jats:p>\n                    <\/jats:list-item>\n                    <jats:list-item>\n                      <jats:p>ACC represents all trial events, reward representation is contextualised by state.<\/jats:p>\n                    <\/jats:list-item>\n                    <jats:list-item>\n                      <jats:p>ACC represents action-state transition structure, predicted states, and surprise.<\/jats:p>\n                    <\/jats:list-item>\n                    <jats:list-item>\n                      <jats:p>Inhibiting ACC impedes action-state transitions from influencing subsequent choice.<\/jats:p>\n                    <\/jats:list-item>\n                  <\/jats:list>\n                <\/jats:sec>","DOI":"10.1101\/126292","type":"posted-content","created":{"date-parts":[[2017,4,12]],"date-time":"2017-04-12T01:10:19Z","timestamp":1491959419000},"source":"Crossref","is-referenced-by-count":10,"title":["Anterior cingulate cortex represents action-state predictions and causally mediates model-based reinforcement learning in a two-step decision task"],"prefix":"10.64898","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1810-0494","authenticated-orcid":false,"given":"Thomas","family":"Akam","sequence":"first","affiliation":[]},{"given":"Ines","family":"Rodrigues-Vaz","sequence":"additional","affiliation":[]},{"given":"Ivo","family":"Marcelo","sequence":"additional","affiliation":[]},{"given":"Xiangyu","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Michael","family":"Pereira","sequence":"additional","affiliation":[]},{"given":"Rodrigo Freire","family":"Oliveira","sequence":"additional","affiliation":[]},{"given":"Peter","family":"Dayan","sequence":"additional","affiliation":[]},{"given":"Rui M.","family":"Costa","sequence":"additional","affiliation":[]}],"member":"54368","reference":[{"key":"2024080316404226000_126292v2.1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neuron.2013.10.018"},{"key":"2024080316404226000_126292v2.2","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pcbi.1004648"},{"key":"2024080316404226000_126292v2.3","doi-asserted-by":"publisher","DOI":"10.1016\/S0028-3908(98)00033-1"},{"key":"2024080316404226000_126292v2.4","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","article-title":"Controlling the false discovery rate: a practical and powerful approach to multiple testing","volume":"57","year":"1995","journal-title":"J. R. Stat. Soc. Ser. B Methodol"},{"key":"2024080316404226000_126292v2.5","doi-asserted-by":"publisher","DOI":"10.1523\/JNEUROSCI.3864-11.2012"},{"key":"2024080316404226000_126292v2.6","doi-asserted-by":"publisher","DOI":"10.1038\/nn.3752"},{"key":"2024080316404226000_126292v2.7","doi-asserted-by":"publisher","DOI":"10.1038\/nn1560"},{"key":"2024080316404226000_126292v2.8","doi-asserted-by":"publisher","DOI":"10.1016\/j.neuron.2011.02.027"},{"key":"2024080316404226000_126292v2.9","doi-asserted-by":"crossref","unstructured":"Dezfouli, A. , and Balleine, B.W. (2017). Learning the structure of the world: The adaptive nature of state-space and action representations in multi-stage decision-making. BioRxiv 211664.","DOI":"10.1101\/211664"},{"key":"2024080316404226000_126292v2.10","doi-asserted-by":"publisher","DOI":"10.1016\/j.neuron.2013.09.007"},{"key":"2024080316404226000_126292v2.11","doi-asserted-by":"publisher","DOI":"10.1038\/nn.3981"},{"key":"2024080316404226000_126292v2.12","doi-asserted-by":"publisher","DOI":"10.1523\/JNEUROSCI.1901-15.2016"},{"key":"2024080316404226000_126292v2.13","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pcbi.1004463"},{"key":"2024080316404226000_126292v2.14","doi-asserted-by":"publisher","DOI":"10.1016\/j.conb.2010.02.008"},{"key":"2024080316404226000_126292v2.15","doi-asserted-by":"publisher","DOI":"10.1038\/nmeth.1694"},{"key":"2024080316404226000_126292v2.16","doi-asserted-by":"publisher","DOI":"10.7554\/eLife.11305"},{"key":"2024080316404226000_126292v2.17","doi-asserted-by":"publisher","DOI":"10.1152\/jn.90629.2008"},{"key":"2024080316404226000_126292v2.18","doi-asserted-by":"publisher","DOI":"10.1523\/JNEUROSCI.2219-18.2018"},{"key":"2024080316404226000_126292v2.19","doi-asserted-by":"publisher","DOI":"10.1152\/jn.00634.2002"},{"key":"2024080316404226000_126292v2.20","doi-asserted-by":"crossref","unstructured":"Hasz, B.M. , and Redish, A.D. (2018). Deliberation and Procedural Automation on a Two-Step Task for Rats. Front. Integr. Neurosci. 12.","DOI":"10.3389\/fnint.2018.00030"},{"key":"2024080316404226000_126292v2.21","first-page":"149","article-title":"Dorsal Anterior Cingulate Cortex: A Bottom-Up View. Annu. Rev","volume":"39","year":"2016","journal-title":"Neurosci"},{"key":"2024080316404226000_126292v2.22","doi-asserted-by":"publisher","DOI":"10.1111\/j.1460-9568.2012.08073.x"},{"key":"2024080316404226000_126292v2.23","doi-asserted-by":"crossref","unstructured":"Hintiryan, H. , Foster, N.N. , Bowman, I. , Bay, M. , Song, M.Y. , Gou, L. , Yamashita, S. , Bienkowski, M.S. , Zingg, B. , Zhu, M. , et al. (2016). The mouse cortico-striatal projectome. Nat. Neurosci.","DOI":"10.1038\/nn.4332"},{"key":"2024080316404226000_126292v2.24","doi-asserted-by":"crossref","first-page":"116834","DOI":"10.1016\/j.neuroimage.2020.116834","article-title":"Goal-oriented and habitual decisions: Neural signatures of model-based and model-free learning","volume":"215","year":"2020","journal-title":"NeuroImage"},{"key":"2024080316404226000_126292v2.25","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pcbi.1002028"},{"key":"2024080316404226000_126292v2.26","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pcbi.1002410"},{"key":"2024080316404226000_126292v2.27","doi-asserted-by":"publisher","DOI":"10.1523\/JNEUROSCI.6157-08.2009"},{"key":"2024080316404226000_126292v2.28","doi-asserted-by":"publisher","DOI":"10.1523\/JNEUROSCI.1962-14.2015"},{"key":"2024080316404226000_126292v2.29","doi-asserted-by":"publisher","DOI":"10.1126\/science.1087847"},{"key":"2024080316404226000_126292v2.30","doi-asserted-by":"publisher","DOI":"10.1126\/science.1226518"},{"key":"2024080316404226000_126292v2.31","doi-asserted-by":"publisher","DOI":"10.1038\/nn1724"},{"key":"2024080316404226000_126292v2.32","doi-asserted-by":"publisher","DOI":"10.1038\/nn.2961"},{"key":"2024080316404226000_126292v2.33","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pcbi.1002055"},{"key":"2024080316404226000_126292v2.34","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-019-13889-6"},{"key":"2024080316404226000_126292v2.35","doi-asserted-by":"crossref","first-page":"e1005090","DOI":"10.1371\/journal.pcbi.1005090","article-title":"When Does Model-Based Control Pay Off?","volume":"12","year":"2016","journal-title":"PLOS Comput Biol"},{"key":"2024080316404226000_126292v2.36","doi-asserted-by":"publisher","DOI":"10.1016\/j.neuron.2013.11.028"},{"key":"2024080316404226000_126292v2.37","doi-asserted-by":"crossref","unstructured":"Lockwood, P. , Klein-Flugge, M. , Abdurahman, A. , and Crockett, M. (2019). Neural signatures of model-free learning when avoiding harm to self and other. BioRxiv 718106.","DOI":"10.1101\/718106"},{"key":"2024080316404226000_126292v2.38","doi-asserted-by":"publisher","DOI":"10.1126\/science.1084204"},{"key":"2024080316404226000_126292v2.39","doi-asserted-by":"publisher","DOI":"10.1038\/nn.4613"},{"key":"2024080316404226000_126292v2.40","doi-asserted-by":"crossref","unstructured":"Miller, K.J. , Shenhav, A. , and Ludvig, E.A. (2019). Habits without values. Psychol. Rev. 292\u2013311.","DOI":"10.1037\/rev0000120"},{"key":"2024080316404226000_126292v2.41","doi-asserted-by":"crossref","unstructured":"Miranda, B. , Malalasekera, W.M.N. , Behrens, T.E. , Dayan, P. , and Kennerley, S.W. (2019). Combined model-free and model-sensitive reinforcement learning in non-human primates. BioRxiv 836007.","DOI":"10.1101\/836007"},{"key":"2024080316404226000_126292v2.42","doi-asserted-by":"publisher","DOI":"10.1038\/nature13186"},{"key":"2024080316404226000_126292v2.43","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1305373110"},{"key":"2024080316404226000_126292v2.44","doi-asserted-by":"publisher","DOI":"10.1177\/0956797612463080"},{"key":"2024080316404226000_126292v2.45","doi-asserted-by":"crossref","unstructured":"Pachitariu, M. , Steinmetz, N. , Kadir, S. , Carandini, M. , and Harris, K.D. (2016). Kilosort: realtime spike-sorting for extracellular electrophysiology with hundreds of channels. BioRxiv 061481.","DOI":"10.1101\/061481"},{"key":"2024080316404226000_126292v2.46","unstructured":"Paxinos, G. , and Franklin, K.B. (2007). The mouse brain in stereotaxic coordinates -3rd Edition (Academic Press)."},{"key":"2024080316404226000_126292v2.47","doi-asserted-by":"publisher","DOI":"10.1038\/nn1756"},{"key":"2024080316404226000_126292v2.48","doi-asserted-by":"publisher","DOI":"10.1523\/JNEUROSCI.3541-08.2008"},{"key":"2024080316404226000_126292v2.49","doi-asserted-by":"publisher","DOI":"10.1038\/nn2066"},{"key":"2024080316404226000_126292v2.50","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pcbi.1005768"},{"key":"2024080316404226000_126292v2.51","doi-asserted-by":"publisher","DOI":"10.1159\/000362840"},{"key":"2024080316404226000_126292v2.52","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1821647116"},{"key":"2024080316404226000_126292v2.53","doi-asserted-by":"publisher","DOI":"10.1523\/JNEUROSCI.4647-10.2011"},{"key":"2024080316404226000_126292v2.54","doi-asserted-by":"crossref","unstructured":"Smittenaar, P. , FitzGerald, T.H.B. , Romei, V. , Wright, N.D. , and Dolan, R.J. (2013). Disruption of Dorsolateral Prefrontal Cortex Decreases Model-Based in Favor of Model-free Control in Humans. Neuron.","DOI":"10.1016\/j.neuron.2013.08.009"},{"key":"2024080316404226000_126292v2.55","doi-asserted-by":"publisher","DOI":"10.1016\/j.neuron.2010.03.033"},{"key":"2024080316404226000_126292v2.56","doi-asserted-by":"crossref","unstructured":"Sutton, R.S. , and Barto, A.G. (1998). Reinforcement learning: An introduction (The MIT press).","DOI":"10.1109\/TNN.1998.712192"},{"key":"2024080316404226000_126292v2.57","doi-asserted-by":"crossref","unstructured":"Thorndike, E.L. (1911). Animal intelligence: Experimental studies.","DOI":"10.5962\/bhl.title.55072"},{"key":"2024080316404226000_126292v2.58","doi-asserted-by":"publisher","DOI":"10.1007\/s00429-012-0493-3"},{"key":"2024080316404226000_126292v2.59","doi-asserted-by":"publisher","DOI":"10.1038\/mp.2014.44"},{"key":"2024080316404226000_126292v2.60","doi-asserted-by":"publisher","DOI":"10.1523\/JNEUROSCI.23-16-06475.2003"},{"key":"2024080316404226000_126292v2.61","doi-asserted-by":"publisher","DOI":"10.1016\/j.neuron.2012.03.042"},{"key":"2024080316404226000_126292v2.62","doi-asserted-by":"publisher","DOI":"10.1111\/j.1460-9568.2005.04219.x"},{"key":"2024080316404226000_126292v2.63","doi-asserted-by":"publisher","DOI":"10.1111\/j.1460-9568.2005.04218.x"},{"key":"2024080316404226000_126292v2.64","doi-asserted-by":"publisher","DOI":"10.7554\/eLife.28728"}],"container-title":[],"original-title":[],"link":[{"URL":"https:\/\/syndication.highwire.org\/content\/doi\/10.1101\/126292","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,14]],"date-time":"2026-01-14T22:23:36Z","timestamp":1768429416000},"score":1,"resource":{"primary":{"URL":"http:\/\/biorxiv.org\/lookup\/doi\/10.1101\/126292"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,4,11]]},"references-count":64,"URL":"https:\/\/doi.org\/10.1101\/126292","relation":{},"subject":[],"published":{"date-parts":[[2017,4,11]]},"subtype":"preprint"}}