{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,31]],"date-time":"2025-12-31T10:11:22Z","timestamp":1767175882018,"version":"build-2238731810"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1011385","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2023,9,5]],"date-time":"2023-09-05T00:00:00Z","timestamp":1693872000000}}],"reference-count":107,"publisher":"Public Library of Science (PLoS)","issue":"8","license":[{"start":{"date-parts":[[2023,8,18]],"date-time":"2023-08-18T00:00:00Z","timestamp":1692316800000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000027","name":"National Institute on Alcohol Abuse and Alcoholism","doi-asserted-by":"publisher","award":["R01AA016022"],"award-info":[{"award-number":["R01AA016022"]}],"id":[{"id":"10.13039\/100000027","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001691","name":"Japan Society for the Promotion of Science","doi-asserted-by":"publisher","award":["23120007"],"award-info":[{"award-number":["23120007"]}],"id":[{"id":"10.13039\/501100001691","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001691","name":"Japan Society for the Promotion of Science","doi-asserted-by":"publisher","award":["16K21738"],"award-info":[{"award-number":["16K21738"]}],"id":[{"id":"10.13039\/501100001691","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001691","name":"Japan Society for the Promotion of Science","doi-asserted-by":"publisher","award":["16H06561"],"award-info":[{"award-number":["16H06561"]}],"id":[{"id":"10.13039\/501100001691","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001691","name":"Japan Society for the Promotion of Science","doi-asserted-by":"publisher","award":["16H06563"],"award-info":[{"award-number":["16H06563"]}],"id":[{"id":"10.13039\/501100001691","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>A major advance in understanding learning behavior stems from experiments showing that reward learning requires dopamine inputs to striatal neurons and arises from synaptic plasticity of cortico-striatal synapses. Numerous reinforcement learning models mimic this dopamine-dependent synaptic plasticity by using the reward prediction error, which resembles dopamine neuron firing, to learn the best action in response to a set of cues. Though these models can explain many facets of behavior, reproducing some types of goal-directed behavior, such as renewal and reversal, require additional model components. Here we present a reinforcement learning model, TD2Q, which better corresponds to the basal ganglia with two Q matrices, one representing direct pathway neurons (G) and another representing indirect pathway neurons (N). Unlike previous two-Q architectures, a novel and critical aspect of TD2Q is to update the G and N matrices utilizing the temporal difference reward prediction error. A best action is selected for N and G using a softmax with a reward-dependent adaptive exploration parameter, and then differences are resolved using a second selection step applied to the two action probabilities. The model is tested on a range of multi-step tasks including extinction, renewal, discrimination; switching reward probability learning; and sequence learning. Simulations show that TD2Q produces behaviors similar to rodents in choice and sequence learning tasks, and that use of the temporal difference reward prediction error is required to learn multi-step tasks. Blocking the update rule on the N matrix blocks discrimination learning, as observed experimentally. Performance in the sequence learning task is dramatically improved with two matrices. These results suggest that including additional aspects of basal ganglia physiology can improve the performance of reinforcement learning models, better reproduce animal behaviors, and provide insight as to the role of direct- and indirect-pathway striatal neurons.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1011385","type":"journal-article","created":{"date-parts":[[2023,8,18]],"date-time":"2023-08-18T13:38:04Z","timestamp":1692365884000},"page":"e1011385","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":4,"title":["Enhancing reinforcement learning models by including direct and indirect pathways improves performance on striatal dependent tasks"],"prefix":"10.1371","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4711-2344","authenticated-orcid":true,"given":"Kim T.","family":"Blackwell","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2446-6820","authenticated-orcid":true,"given":"Kenji","family":"Doya","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"340","published-online":{"date-parts":[[2023,8,18]]},"reference":[{"key":"pcbi.1011385.ref001","doi-asserted-by":"crossref","first-page":"1007","DOI":"10.1152\/jn.00519.2001","article-title":"Corticostriatal combinatorics: the implications of corticostriatal axonal arborizations","author":"T Zheng","year":"2002","journal-title":"J.Neurophysiol"},{"key":"pcbi.1011385.ref002","doi-asserted-by":"crossref","first-page":"4722","DOI":"10.1523\/JNEUROSCI.18-12-04722.1998","article-title":"Connectivity and convergence of single corticostriatal axons","author":"AE Kincaid","year":"1998","journal-title":"J.Neurosci"},{"key":"pcbi.1011385.ref003","doi-asserted-by":"crossref","first-page":"2027","DOI":"10.1152\/jn.00115.2013","article-title":"Sensitivity to theta-burst timing permits LTP in dorsal striatal adult brain slice","volume":"110","author":"SL Hawes","year":"2013","journal-title":"JNeurophysiol"},{"key":"pcbi.1011385.ref004","doi-asserted-by":"crossref","first-page":"2435","DOI":"10.1523\/JNEUROSCI.4402-07.2008","article-title":"Dopamine receptor activation is required for corticostriatal spike-timing-dependent plasticity","author":"V Pawlak","year":"2008","journal-title":"J.Neurosci"},{"key":"pcbi.1011385.ref005","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1152\/jn.2001.85.1.117","article-title":"Dopamine D-1\/D-5 receptor activation is required for long-term potentiation in the rat neostriatum in vitro","volume":"85","author":"Wickens JR Kerr JNDN","year":"2001","journal-title":"JNeurophysiol"},{"key":"pcbi.1011385.ref006","doi-asserted-by":"crossref","first-page":"304","DOI":"10.1038\/1124","article-title":"Dopamine neurons report an error in the temporal prediction of reward during learning","author":"JR Hollerman","year":"1998","journal-title":"Nat.Neurosci"},{"key":"pcbi.1011385.ref007","doi-asserted-by":"crossref","first-page":"244","DOI":"10.3389\/fpsyg.2017.00244","article-title":"The dopamine prediction error: Contributions to associative models of reward learning.","volume":"8","author":"HM Nasser","year":"2017","journal-title":"Frontiers in Psychology"},{"key":"pcbi.1011385.ref008","doi-asserted-by":"crossref","first-page":"1281","DOI":"10.1038\/nn.3188","article-title":"Transient stimulation of distinct subpopulations of striatal neurons mimics changes in action value","volume":"15","author":"LH Tai","year":"2012","journal-title":"Nature Neuroscience"},{"key":"pcbi.1011385.ref009","doi-asserted-by":"crossref","first-page":"1302","DOI":"10.1016\/j.neuron.2018.08.002","article-title":"Monitoring and Updating of Action Selection for Goal-Directed Behavior through the Striatal Direct and Indirect Pathways","volume":"99","author":"S Nonomura","year":"2018","journal-title":"Neuron"},{"key":"pcbi.1011385.ref010","doi-asserted-by":"crossref","first-page":"1487","DOI":"10.1152\/jn.00925.2015","article-title":"Habit formation coincides with shifts in reinforcement representations in the sensorimotor striatum.","volume":"115","author":"KS Smith","year":"2016","journal-title":"JNeurophysiol"},{"key":"pcbi.1011385.ref011","doi-asserted-by":"crossref","first-page":"3499","DOI":"10.1523\/JNEUROSCI.1962-14.2015","article-title":"Distinct Neural Representation in the Dorsolateral, Dorsomedial, and Ventral Parts of the Striatum during Fixed- and Free-Choice Tasks","volume":"35","author":"M Ito","year":"2015","journal-title":"Journal of Neuroscience"},{"key":"pcbi.1011385.ref012","volume-title":"Reinforcement Learning: An Introduction","author":"The MIT Press","year":"1998","edition":"2"},{"key":"pcbi.1011385.ref013","doi-asserted-by":"crossref","first-page":"1180","DOI":"10.1111\/j.1460-9568.2012.08025.x","article-title":"Uncertainty in action-value estimation affects both action choice and learning rate of the choice behaviors of rats","volume":"35","author":"A Funamizu","year":"2012","journal-title":"European Journal of Neuroscience"},{"key":"pcbi.1011385.ref014","doi-asserted-by":"crossref","first-page":"784","DOI":"10.1037\/0033-295X.114.3.784","article-title":"Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling.","volume":"114","author":"AD Redish","year":"2007","journal-title":"Psychological review."},{"key":"pcbi.1011385.ref015","doi-asserted-by":"crossref","first-page":"e46050","DOI":"10.7554\/eLife.46050","article-title":"Distinct roles of striatal direct and indirect pathways in value-based decision making.","volume":"8","author":"S Kwak","year":"2019","journal-title":"eLife"},{"key":"pcbi.1011385.ref016","doi-asserted-by":"crossref","first-page":"1337","DOI":"10.1126\/science.1115270","article-title":"Representation of action-specific reward values in the striatum","author":"K Samejima","year":"2005","journal-title":"Science"},{"key":"pcbi.1011385.ref017","doi-asserted-by":"crossref","first-page":"848","DOI":"10.1126\/science.1160575","article-title":"Dichotomous dopaminergic control of striatal synaptic plasticity","volume":"321","author":"W Shen","year":"2008","journal-title":"Science"},{"key":"pcbi.1011385.ref018","doi-asserted-by":"crossref","first-page":"1616","DOI":"10.1126\/science.1255514","article-title":"A critical time window for dopamine actions on the structural plasticity of dendritic spines","volume":"345","author":"S Yagishita","year":"2014","journal-title":"Science"},{"key":"pcbi.1011385.ref019","doi-asserted-by":"crossref","first-page":"555","DOI":"10.1038\/s41586-020-2115-1","article-title":"Dopamine D2 receptors in discrimination learning and spine enlargement","volume":"579","author":"Y Iino","year":"2020","journal-title":"Nature"},{"key":"pcbi.1011385.ref020","doi-asserted-by":"crossref","first-page":"441","DOI":"10.1146\/annurev-neuro-061010-113641","article-title":"Modulation of striatal projection systems by dopamine","volume":"34","author":"CR Gerfen","year":"2011","journal-title":"Annual review of neuroscience"},{"key":"pcbi.1011385.ref021","doi-asserted-by":"crossref","first-page":"703","DOI":"10.1016\/j.cell.2016.06.032","article-title":"Complementary Contributions of Striatal Projection Pathways to Action Initiation and Execution","volume":"166","author":"F Tecuapetla","year":"2016","journal-title":"Cell"},{"key":"pcbi.1011385.ref022","doi-asserted-by":"crossref","first-page":"622","DOI":"10.1038\/nature09159","article-title":"Regulation of parkinsonian motor behaviours by optogenetic control of basal ganglia circuitry","volume":"466","author":"V. Kravitz A","year":"2010","journal-title":"Nature"},{"key":"pcbi.1011385.ref023","doi-asserted-by":"crossref","first-page":"10535","DOI":"10.1523\/JNEUROSCI.4415-14.2015","article-title":"Multimodal Plasticity in Dorsal Striatum While Learning a Lateralized Navigation Task.","volume":"35","author":"SL Hawes","year":"2015","journal-title":"JNeurosci"},{"key":"pcbi.1011385.ref024","first-page":"333","article-title":"Dynamic reorganization of striatal circuits during the acquisition and consolidation of a skill.","volume":"12","author":"HH Yin","year":"2009","journal-title":"NatNeurosci"},{"key":"pcbi.1011385.ref025","doi-asserted-by":"crossref","first-page":"9196","DOI":"10.1523\/JNEUROSCI.0313-14.2014","article-title":"The acquisition of goal-directed actions generates opposing plasticity in direct and indirect pathways in dorsomedial striatum.","volume":"34","author":"Q Shan","year":"2014","journal-title":"JNeurosci"},{"key":"pcbi.1011385.ref026","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1037\/a0037015","article-title":"Opponent actor learning (OpAL): modeling interactive effects of striatal dopamine on reinforcement learning and choice incentive.","volume":"121","author":"A Collins","year":"2014","journal-title":"Psychological review."},{"key":"pcbi.1011385.ref027","doi-asserted-by":"crossref","first-page":"347","DOI":"10.1016\/j.neuron.2011.11.015","article-title":"RGS4 Is Required for Dopaminergic Control of Striatal LTD and Susceptibility to Parkinsonian Motor Deficits","volume":"73","author":"TN Lerner","year":"2012","journal-title":"Neuron"},{"key":"pcbi.1011385.ref028","doi-asserted-by":"crossref","first-page":"e1002034","DOI":"10.1371\/journal.pbio.1002034","article-title":"A new framework for cortico-striatal plasticity: behavioural theory meets in vitro data at the reinforcement-action interface","author":"KN Gurney","year":"2015","journal-title":"PLoS.Biol"},{"key":"pcbi.1011385.ref029","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1016\/j.tins.2006.12.003","article-title":"Space, time and dopamine","volume":"30","author":"GW Arbuthnott","year":"2007","journal-title":"Trends in Neurosciences"},{"key":"pcbi.1011385.ref030","doi-asserted-by":"crossref","first-page":"14273","DOI":"10.1523\/JNEUROSCI.1894-10.2010","article-title":"Influence of phasic and tonic dopamine release on receptor activation","volume":"30","author":"JK Dreyer","year":"2010","journal-title":"JNeurosci"},{"key":"pcbi.1011385.ref031","doi-asserted-by":"crossref","first-page":"858","DOI":"10.1016\/j.neuron.2012.03.017","article-title":"Article Whole-Brain Mapping of Direct Inputs to Midbrain Dopamine Neurons","volume":"74","author":"M Watabe-Uchida","year":"2012","journal-title":"Neuron"},{"key":"pcbi.1011385.ref032","doi-asserted-by":"crossref","first-page":"668","DOI":"10.1111\/j.1460-9568.2010.07564.x","article-title":"Exclusive and common targets of neostriatofugal projections of rat striosome neurons: A single neuron-tracing study using a viral vector","volume":"33","author":"F Fujiyama","year":"2011","journal-title":"European Journal of Neuroscience"},{"key":"pcbi.1011385.ref033","doi-asserted-by":"crossref","first-page":"11318","DOI":"10.1073\/pnas.1613337113","article-title":"Striosome-dendron bouquets highlight a unique striatonigral circuit targeting dopamine-containing neurons","volume":"113","author":"JR Crittenden","year":"2016","journal-title":"Proceedings of the National Academy of Sciences of the United States of America"},{"key":"pcbi.1011385.ref034","doi-asserted-by":"crossref","first-page":"6770","DOI":"10.1038\/s41598-019-43245-z","article-title":"Dopamine blockade impairs the exploration-exploitation trade-off in rats.","volume":"9","author":"F Cinotti","year":"2019","journal-title":"Scientific reports."},{"key":"pcbi.1011385.ref035","doi-asserted-by":"crossref","first-page":"9","DOI":"10.3389\/fnins.2012.00009","article-title":"Dopaminergic Control of the Exploration-Exploitation Trade-Off via the Basal Ganglia.","volume":"6","author":"M Humphries","year":"2012","journal-title":"Frontiers in neuroscience"},{"key":"pcbi.1011385.ref036","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1016\/j.neuropsychologia.2018.10.009","article-title":"Dopaminergic genes are associated with both directed and random exploration.","volume":"120","author":"SJ Gershman","year":"2018","journal-title":"Neuropsychologia"},{"key":"pcbi.1011385.ref037","doi-asserted-by":"crossref","first-page":"e51260","DOI":"10.7554\/eLife.51260","article-title":"Dopaminergic modulation of the exploration\/exploitation trade-off in human decision-making.","volume":"9","author":"K Chakroun","year":"2020","journal-title":"eLife"},{"key":"pcbi.1011385.ref038","doi-asserted-by":"crossref","first-page":"66","DOI":"10.3389\/fnana.2017.00066","article-title":"Distinct Functions of the Primate Putamen Direct and Indirect Pathways in Adaptive Outcome-Based Action Selection","volume":"11","author":"Y Ueda","year":"2017","journal-title":"Frontiers in neuroanatomy."},{"key":"pcbi.1011385.ref039","doi-asserted-by":"crossref","first-page":"17","DOI":"10.3389\/fnbeh.2018.00017","article-title":"The Winding Road to Relapse: Forging a New Understanding of Cue-Induced Reinstatement Models and Their Associated Neural Mechanisms.","volume":"12","author":"MD Namba","year":"2018","journal-title":"Frontiers in Behavioral Neuroscience"},{"key":"pcbi.1011385.ref040","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1016\/bs.pbr.2015.08.004","article-title":"Animal models of drug relapse and craving: From drug priming-induced reinstatement to incubation of craving after voluntary abstinence","volume":"224","author":"M Venniro","year":"2016","journal-title":"Progress in Brain Research"},{"key":"pcbi.1011385.ref041","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1016\/j.nlm.2004.04.004","article-title":"The influence of NMDA receptors in the dorsomedial striatum on response reversal learning","volume":"82","author":"CA Palencia","year":"2004","journal-title":"Neurobiology of Learning and Memory"},{"key":"pcbi.1011385.ref042","doi-asserted-by":"crossref","first-page":"74","DOI":"10.1016\/j.bbr.2010.02.017","article-title":"Selective lesions of the dorsomedial striatum impair serial spatial reversal learning in rats","volume":"210","author":"A Casta\u00f1\u00e9","year":"2010","journal-title":"Behavioural Brain Research"},{"key":"pcbi.1011385.ref043","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1038\/nn.4173","article-title":"Mesolimbic dopamine signals the value of work","volume":"19","author":"AA Hamid","year":"2015","journal-title":"Nature Neuroscience"},{"key":"pcbi.1011385.ref044","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1016\/j.cell.2018.06.012","article-title":"Optogenetic Editing Reveals the Hierarchical Organization of Learned Action Sequences","volume":"174","author":"CE Geddes","year":"2018","journal-title":"Cell"},{"key":"pcbi.1011385.ref045","doi-asserted-by":"crossref","first-page":"e1004567","DOI":"10.1371\/journal.pcbi.1004567","article-title":"A Unifying Probabilistic View of Associative Learning.","volume":"11","author":"SJ Gershman","year":"2015","journal-title":"PLoS Comput Biol."},{"key":"pcbi.1011385.ref046","doi-asserted-by":"crossref","first-page":"e23763","DOI":"10.7554\/eLife.23763","article-title":"The computational nature of memory modification","volume":"6","author":"S Gershman","year":"2017","journal-title":"eLife"},{"key":"pcbi.1011385.ref047","doi-asserted-by":"crossref","first-page":"355","DOI":"10.1196\/annals.1401.013","article-title":"The contribution of the medial prefrontal cortex, orbitofrontal cortex, and dorsomedial striatum to behavioral flexibility","volume":"1121","author":"ME Ragozzino","year":"2007","journal-title":"Annals of the New York Academy of Sciences"},{"key":"pcbi.1011385.ref048","doi-asserted-by":"crossref","first-page":"482","DOI":"10.1038\/nature12077","article-title":"Corticostriatal neurons in auditory cortex drive decisions during auditory discrimination","volume":"497","author":"P Znamenskiy","year":"2013","journal-title":"Nature"},{"key":"pcbi.1011385.ref049","doi-asserted-by":"crossref","first-page":"736","DOI":"10.1038\/s41386-020-0612-4","article-title":"Dorsal and ventral striatal dopamine D1 and D2 receptors differentially modulate distinct phases of serial visual reversal learning","volume":"45","author":"J Sala-Bayo","year":"2020","journal-title":"Neuropsychopharmacology"},{"key":"pcbi.1011385.ref050","doi-asserted-by":"crossref","first-page":"551","DOI":"10.1037\/a0024403","article-title":"Effects of D-cycloserine on the extinction of appetitive operant learning","volume":"125","author":"D Vurbic","year":"2011","journal-title":"Behavioral Neuroscience"},{"key":"pcbi.1011385.ref051","doi-asserted-by":"crossref","first-page":"107483","DOI":"10.1016\/j.nlm.2021.107483","article-title":"General Pavlovian-instrumental transfer tests reveal selective inhibition of the response type\u2013whether Pavlovian or instrumental\u2013performed during extinction","volume":"183","author":"V Laurent","year":"2021","journal-title":"Neurobiology of Learning and Memory"},{"key":"pcbi.1011385.ref052","doi-asserted-by":"crossref","first-page":"13421","DOI":"10.1523\/JNEUROSCI.1969-12.2012","article-title":"Striatal indirect pathway contributes to selection accuracy of learned motor actions","volume":"32","author":"K Nishizawa","year":"2012","journal-title":"Journal of Neuroscience"},{"key":"pcbi.1011385.ref053","doi-asserted-by":"crossref","first-page":"e1007465","DOI":"10.1371\/journal.pcbi.1007465","article-title":"Modeling the effects of motivation on choice and learning in the basal ganglia.","volume":"16","author":"MMH van Swieten","year":"2020","journal-title":"PLoS Comput Biol"},{"key":"pcbi.1011385.ref054","doi-asserted-by":"crossref","first-page":"457","DOI":"10.1016\/j.tins.2012.04.009","article-title":"Reinforcement learning: computing the temporal difference of values via distinct corticostriatal pathways","volume":"35","author":"K Morita","year":"2012","journal-title":"Trends in Neurosciences"},{"key":"pcbi.1011385.ref055","first-page":"6","article-title":"Learning to use working memory: a reinforcement learning gating model of rule acquisition in rats.","author":"K Lloyd","year":"2012","journal-title":"Front Comput Neurosci."},{"key":"pcbi.1011385.ref056","doi-asserted-by":"crossref","first-page":"e1005062","DOI":"10.1371\/journal.pcbi.1005062","article-title":"Learning Reward Uncertainty in the Basal Ganglia.","volume":"12","author":"JG Mikhael","year":"2016","journal-title":"PLoS Comput Biol"},{"key":"pcbi.1011385.ref057","doi-asserted-by":"crossref","first-page":"896","DOI":"10.1016\/j.neuron.2010.05.011","article-title":"Distinct Roles of Synaptic Transmission in Direct and Indirect Striatal Pathways to Reward and Aversive Behavior","volume":"66","author":"T Hikida","year":"2010","journal-title":"Neuron"},{"key":"pcbi.1011385.ref058","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.neures.2016.01.004","article-title":"Neural mechanisms of the nucleus accumbens circuit in reward and aversive learning","volume":"108","author":"T Hikida","year":"2016","journal-title":"Neuroscience Research"},{"key":"pcbi.1011385.ref059","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1016\/j.neunet.2020.12.001","article-title":"Modular deep reinforcement learning from reward and punishment for robot navigation.","volume":"135","author":"J Wang","year":"2021","journal-title":"Neural Netw."},{"key":"pcbi.1011385.ref060","doi-asserted-by":"crossref","first-page":"111","DOI":"10.3389\/fncir.2018.00111","article-title":"A dual role hypothesis of the cortico-basal-Ganglia pathways: Opponency and temporal difference through dopamine and adenosine.","volume":"12","author":"K Morita","year":"2019","journal-title":"Frontiers in Neural Circuits"},{"key":"pcbi.1011385.ref061","doi-asserted-by":"crossref","first-page":"2139","DOI":"10.1523\/JNEUROSCI.1313-19.2019","article-title":"Complementary Control over Habits and Behavioral Vigor by Phasic Activity in the Dorsolateral Striatum","volume":"40","author":"ACG Crego","year":"2020","journal-title":"J Neurosci"},{"key":"pcbi.1011385.ref062","first-page":"464","article-title":"The role of the basal ganglia in habit formation.","volume":"7","author":"HH Yin","year":"2006","journal-title":"NatRevNeurosci."},{"key":"pcbi.1011385.ref063","first-page":"43","article-title":"The integrative function of the basal ganglia in instrumental conditioning.","volume":"199","author":"BW Balleine","year":"2009","journal-title":"BehavBrain Res"},{"key":"pcbi.1011385.ref064","doi-asserted-by":"crossref","first-page":"1009","DOI":"10.1016\/j.neuron.2014.10.045","article-title":"Essential role of presynaptic NMDA receptors in activity-dependent BDNF secretion and corticostriatal LTP","volume":"84","author":"H Park","year":"2014","journal-title":"Neuron"},{"key":"pcbi.1011385.ref065","doi-asserted-by":"crossref","first-page":"e69748","DOI":"10.7554\/eLife.69748","article-title":"Sex differences in learning from exploration.","volume":"10","author":"CS Chen","year":"2021","journal-title":"eLife"},{"key":"pcbi.1011385.ref066","doi-asserted-by":"crossref","first-page":"107169","DOI":"10.1016\/j.nlm.2020.107169","article-title":"Chemogenetic inhibition in the dorsal striatum reveals regional specificity of direct and indirect pathway control of action sequencing","volume":"169","author":"E Garr","year":"2020","journal-title":"Neurobiology of Learning and Memory"},{"key":"pcbi.1011385.ref067","doi-asserted-by":"crossref","first-page":"104245","DOI":"10.1016\/j.isci.2022.104245","article-title":"Striatal direct pathway neurons play leading roles in accelerating rotarod motor skill learning.","volume":"25","author":"B Liang","year":"2022","journal-title":"iScience"},{"key":"pcbi.1011385.ref068","doi-asserted-by":"crossref","first-page":"5723","DOI":"10.1073\/pnas.75.11.5723","article-title":"Histochemically distinct compartments in the striatum of human, monkeys, and cat demonstrated by acetylthiocholinesterase staining","volume":"75","author":"AM Graybiel","year":"1978","journal-title":"Proc Natl Acad Sci USA"},{"key":"pcbi.1011385.ref069","doi-asserted-by":"crossref","first-page":"1544","DOI":"10.1038\/s41593-019-0470-8","article-title":"Learning task-state representations","volume":"22","author":"Y. Niv","year":"2019","journal-title":"Nat Neurosci"},{"key":"pcbi.1011385.ref070","doi-asserted-by":"crossref","first-page":"623","DOI":"10.1111\/ejn.13829","article-title":"Integrated anatomical and physiological mapping of striatal afferent projections","volume":"49","author":"K Choi","year":"2019","journal-title":"European Journal of Neuroscience"},{"key":"pcbi.1011385.ref071","doi-asserted-by":"crossref","first-page":"7143","DOI":"10.1523\/JNEUROSCI.3336-17.2018","article-title":"Neural Computations Underlying Causal Structure Learning","volume":"38","author":"MS Tomov","year":"2018","journal-title":"J Neurosci"},{"key":"pcbi.1011385.ref072","doi-asserted-by":"crossref","first-page":"160","DOI":"10.1016\/j.cognition.2016.04.002","article-title":"Neural signature of hierarchically structured expectations predicts clustering and transfer of rule sets in reinforcement learning","volume":"152","author":"AGE Collins","year":"2016","journal-title":"Cognition"},{"key":"pcbi.1011385.ref073","doi-asserted-by":"crossref","first-page":"18049","DOI":"10.1073\/pnas.2001348117","article-title":"Fast spiking interneuron activity in primate striatum tracks learning of attention cues","volume":"117","author":"KB Boroujeni","year":"2020","journal-title":"Proceedings of the National Academy of Sciences of the United States of America"},{"key":"pcbi.1011385.ref074","doi-asserted-by":"crossref","first-page":"1062","DOI":"10.1038\/nn.2342","article-title":"Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation","volume":"12","author":"MJ Frank","year":"2009","journal-title":"Nat Neurosci"},{"key":"pcbi.1011385.ref075","doi-asserted-by":"crossref","first-page":"933","DOI":"10.1098\/rstb.2007.2098","article-title":"Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration","volume":"362","author":"JD Cohen","year":"2007","journal-title":"Phil Trans R Soc B"},{"key":"pcbi.1011385.ref076","doi-asserted-by":"crossref","first-page":"2575","DOI":"10.1093\/cercor\/bhr332","article-title":"Frontal Theta Reflects Uncertainty and Unexpectedness during Exploration and Exploitation","volume":"22","author":"JF Cavanagh","year":"2012","journal-title":"Cerebral Cortex"},{"key":"pcbi.1011385.ref077","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1016\/j.cognition.2017.12.014","article-title":"Deconstructing the human algorithms for exploration.","volume":"173","author":"SJ Gershman","year":"2018","journal-title":"Cognition"},{"key":"pcbi.1011385.ref078","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1016\/j.conb.2018.11.003","article-title":"The algorithmic architecture of exploration in the human brain","volume":"55","author":"E Schulz","year":"2019","journal-title":"Current Opinion in Neurobiology"},{"key":"pcbi.1011385.ref079","doi-asserted-by":"crossref","first-page":"665","DOI":"10.1016\/S0893-6080(02)00056-4","article-title":"Control of exploitation\u2013exploration meta-parameter in reinforcement learning.","volume":"15","author":"S Ishii","year":"2002","journal-title":"Neural Networks."},{"key":"pcbi.1011385.ref080","doi-asserted-by":"crossref","first-page":"871","DOI":"10.1016\/S0306-4522(01)00231-7","article-title":"Neostriatal and globus pallidus stimulation induced inhibitory postsynaptic potentials in entopeduncular neurons in rat brain slice preparations","volume":"105","author":"H. Kita","year":"2001","journal-title":"Neuroscience"},{"key":"pcbi.1011385.ref081","doi-asserted-by":"crossref","first-page":"639082","DOI":"10.3389\/fncel.2021.639082","article-title":"Endocannabinoids and Dopamine Balance Basal Ganglia Output.","volume":"15","author":"L Gorodetski","year":"2021","journal-title":"Frontiers in Cellular Neuroscience"},{"key":"pcbi.1011385.ref082","doi-asserted-by":"crossref","first-page":"7177","DOI":"10.1523\/JNEUROSCI.0639-17.2017","article-title":"Dopaminergic Modulation of Synaptic Integration and Firing Patterns in the Rat Entopeduncular Nucleus","author":"H Lavian","year":"2017","journal-title":"J.Neurosci"},{"key":"pcbi.1011385.ref083","doi-asserted-by":"crossref","first-page":"9353","DOI":"10.1523\/JNEUROSCI.5796-12.2013","article-title":"GABAergic circuits control spike-timing-dependent plasticity","author":"V Paille","year":"2013","journal-title":"J.Neurosci"},{"key":"pcbi.1011385.ref084","doi-asserted-by":"crossref","first-page":"789502","DOI":"10.1155\/2015\/789502","article-title":"Dopaminergic Modulation of Striatal Inhibitory Transmission and Long-Term Plasticity.","volume":"2015","author":"E Nieto Mendoza","year":"2015","journal-title":"Neural Plast."},{"key":"pcbi.1011385.ref085","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1113\/jphysiol.2007.144501","article-title":"Cell-specific spike-timing-dependent plasticity in GABAergic and cholinergic interneurons in corticostriatal rat brain slices","author":"E Fino","year":"2008","journal-title":"J.Physiol"},{"key":"pcbi.1011385.ref086","doi-asserted-by":"crossref","first-page":"744","DOI":"10.1016\/j.neuroscience.2009.03.015","article-title":"Asymmetric spike-timing dependent plasticity of striatal nitric oxide-synthase interneurons","volume":"160","author":"E Fino","year":"2009","journal-title":"Neuroscience"},{"key":"pcbi.1011385.ref087","doi-asserted-by":"crossref","first-page":"116","DOI":"10.3389\/fncel.2015.00116","article-title":"Potentiation of NMDA receptor-mediated transmission in striatal cholinergic interneurons","volume":"9","author":"MJ Oswald","year":"2015","journal-title":"Frontiers in Cellular Neuroscience"},{"key":"pcbi.1011385.ref088","doi-asserted-by":"crossref","first-page":"474","DOI":"10.1101\/lm.1439909","article-title":"Diversity in long-term synaptic plasticity at inhibitory synapses of striatal spiny neurons.","volume":"16","author":"PE Rueda-Orozco","year":"2009","journal-title":"Learning & Memory."},{"key":"pcbi.1011385.ref089","doi-asserted-by":"crossref","first-page":"549","DOI":"10.1126\/science.283.5401.549","article-title":"The Role of Locus Coeruleus in the Regulation of Cognitive Performance","volume":"283","author":"M Usher","year":"1999","journal-title":"Science"},{"key":"pcbi.1011385.ref090","doi-asserted-by":"crossref","first-page":"495","DOI":"10.1016\/S0893-6080(02)00044-8","article-title":"Metalearning and neuromodulation","volume":"15","author":"K. Doya","year":"2002","journal-title":"Neural Networks"},{"key":"pcbi.1011385.ref091","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1146\/annurev.neuro.28.061604.135709","article-title":"An Integrative Theory Of Locus Coeruleus-Norepinephrine Function: Adaptive Gain and Optimal Performance","volume":"28","author":"G Aston-Jones","year":"2005","journal-title":"Annu Rev Neurosci"},{"key":"pcbi.1011385.ref092","doi-asserted-by":"crossref","first-page":"1312","DOI":"10.1016\/j.neuron.2016.04.043","article-title":"Endocannabinoid Modulation of Orbitostriatal Circuits Gates Habit Formation","volume":"90","author":"CM Gremel","year":"2016","journal-title":"Neuron"},{"key":"pcbi.1011385.ref093","doi-asserted-by":"crossref","first-page":"e65764","DOI":"10.7554\/eLife.65764","article-title":"Specialized coding patterns among dorsomedial prefrontal neuronal ensembles predict conditioned reward seeking.","volume":"10","author":"RI Grant","year":"2021","journal-title":"eLife"},{"key":"pcbi.1011385.ref094","doi-asserted-by":"crossref","first-page":"8771","DOI":"10.1523\/JNEUROSCI.23-25-08771.2003","article-title":"Dissociable contributions of the orbitofrontal and infralimbic cortex to pavlovian autoshaping and discrimination reversal learning: Further evidence for the functional heterogeneity of the rodent frontal cortex","volume":"23","author":"Y Chudasama","year":"2003","journal-title":"Journal of Neuroscience"},{"key":"pcbi.1011385.ref095","doi-asserted-by":"crossref","first-page":"1996","DOI":"10.1523\/JNEUROSCI.3366-15.2016","article-title":"Multifaceted contributions by different regions of the orbitofrontal and medial prefrontal cortex to probabilistic reversal learning","volume":"36","author":"GL Dalton","year":"2016","journal-title":"Journal of Neuroscience"},{"key":"pcbi.1011385.ref096","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1016\/j.neuroscience.2016.03.034","article-title":"Orbitofrontal cortex reflects changes in response\u2013outcome contingencies during probabilistic reversal learning","volume":"345","author":"LR Amodeo","year":"2017","journal-title":"Neuroscience"},{"key":"pcbi.1011385.ref097","doi-asserted-by":"crossref","first-page":"248","DOI":"10.1016\/j.nlm.2011.05.001","article-title":"Differential role of the hippocampus in response-outcome and context-outcome learning: Evidence from selective satiation procedures","volume":"96","author":"AC Reichelt","year":"2011","journal-title":"Neurobiology of Learning and Memory"},{"key":"pcbi.1011385.ref098","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1016\/S0166-4328(02)00104-3","article-title":"Attenuation of context-specific inhibition on reversal learning of a stimulus-response task in rats with neurotoxic hippocampal damage","volume":"136","author":"RJ McDonald","year":"2002","journal-title":"Behavioural Brain Research"},{"key":"pcbi.1011385.ref099","doi-asserted-by":"crossref","first-page":"12176","DOI":"10.1523\/JNEUROSCI.3761-07.2007","article-title":"Neural ensembles in CA3 transiently encode paths forward of the animal at a decision point","volume":"27","author":"A Johnson","year":"2007","journal-title":"Journal of Neuroscience"},{"key":"pcbi.1011385.ref100","doi-asserted-by":"crossref","first-page":"a021808","DOI":"10.1101\/cshperspect.a021808","article-title":"Place cells, grid cells, and memory","volume":"7","author":"MB Moser","year":"2015","journal-title":"Cold Spring Harbor Perspectives in Biology"},{"key":"pcbi.1011385.ref101","doi-asserted-by":"crossref","first-page":"128","DOI":"10.3389\/fncom.2016.00128","article-title":"Computational properties of the hippocampus increase the efficiency of goal-directed foraging through hierarchical reinforcement learning","volume":"10","author":"E Chalmers","year":"2016","journal-title":"Frontiers in Computational Neuroscience"},{"key":"pcbi.1011385.ref102","doi-asserted-by":"crossref","first-page":"e1005768","DOI":"10.1371\/journal.pcbi.1005768","article-title":"Predictive representations can link model-based reinforcement learning to model-free mechanisms","volume":"13","author":"EM Russek","year":"2017","journal-title":"PLoS Computational Biology"},{"key":"pcbi.1011385.ref103","doi-asserted-by":"crossref","first-page":"961","DOI":"10.1016\/S0893-6080(99)00046-5","article-title":"What are the computations of the cerebellum, the basal ganglia and the cerebral cortex?","volume":"12","author":"K. Doya","year":"1999","journal-title":"Neural Networks."},{"key":"pcbi.1011385.ref104","doi-asserted-by":"crossref","first-page":"31378","DOI":"10.1038\/srep31378","article-title":"Model-based action planning involves cortico-cerebellar and basal ganglia networks.","volume":"6","author":"ASR Fermin","year":"2016","journal-title":"Sci Rep"},{"key":"pcbi.1011385.ref105","doi-asserted-by":"crossref","first-page":"e1007331","DOI":"10.1371\/journal.pcbi.1007331","article-title":"A flexible and generalizable model of online latent-state learning.","volume":"15","author":"AL Cochran","year":"2019","journal-title":"PLoS Comput Biol"},{"key":"pcbi.1011385.ref106","doi-asserted-by":"crossref","first-page":"8161","DOI":"10.1523\/JNEUROSCI.1554-07.2007","article-title":"The Role of the Dorsal Striatum in Reward and Decision-Making","volume":"27","author":"BW Balleine","year":"2007","journal-title":"Journal of Neuroscience"},{"key":"pcbi.1011385.ref107","article-title":"Hierarchical control of goal-directed action in the cortical-basal ganglia network.","author":"BW Balleine","year":"2015","journal-title":"Current Opinion in Behavioral Sciences."}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1011385","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2023,9,5]],"date-time":"2023-09-05T00:00:00Z","timestamp":1693872000000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1011385","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,5]],"date-time":"2023-09-05T13:40:05Z","timestamp":1693921205000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1011385"}},"subtitle":[],"editor":[{"given":"Ming Bo","family":"Cai","sequence":"first","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2023,8,18]]},"references-count":107,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2023,8,18]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1011385","relation":{},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,8,18]]}}}