{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,5]],"date-time":"2026-03-05T08:14:49Z","timestamp":1772698489747,"version":"3.50.1"},"reference-count":61,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2026,3,5]],"date-time":"2026-03-05T00:00:00Z","timestamp":1772668800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Neurorobot."],"abstract":"<jats:p>Balancing exploration and exploitation remains a fundamental challenge in reliable mobile robot control, as conventional policies often converge on suboptimal behaviors. Inspired by the brain's division of labor for adaptive control, we propose SpikeAEC, a fully spiking, neuromodulated Actor-Explorer-Critic architecture designed to address this dilemma online within a closed-loop system. SpikeAEC comprises three specialized subnetworks operating in parallel: the Actor, inspired by the basal ganglia, proposes exploitative actions; the Explorer, modeled after the ACC-GPe-STN pathway, generates adaptive exploratory actions gated by a vigilance signal modulated by the accumulated global temporal-difference (TD) error; and the Critic, based on the ventral striatum, computes the TD error. The final action is selected by a separate, TAN-based Arbitrator, which probabilistically chooses between the Actor's and Explorer's action proposals according to recent performance and the TD error. These subnetworks are coupled through a unified three-factor learning framework that uses the TD signal and phasic neuromodulators (acetylcholine and dopamine) from the Arbitrator to drive pathway-specific synaptic plasticity. This online plasticity enhances the quality of action proposals and accelerates policy refinement. In simulation, SpikeAEC outperforms leading brain-inspired methods by converging 24% faster, reducing trajectory length by 18%, and increasing cumulative reward by over 5% against the top-performing baseline, all while maintaining consistency with established neurophysiological principles.<\/jats:p>","DOI":"10.3389\/fnbot.2026.1757795","type":"journal-article","created":{"date-parts":[[2026,3,5]],"date-time":"2026-03-05T07:29:53Z","timestamp":1772695793000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["SpikeAEC: a neuromodulation-based spiking controller for explore-exploit balancing in mobile robots"],"prefix":"10.3389","volume":"20","author":[{"given":"Canyang","family":"Liu","sequence":"first","affiliation":[{"name":"School of Artificial Intelligence, Nanjing University of Information Science and Technology","place":["Nanjing, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yichen","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence, Nanjing University of Information Science and Technology","place":["Nanjing, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yongqi","family":"Zhou","sequence":"additional","affiliation":[{"name":"School of Electronics and Information Engineering, Nanjing University of Information Science and Technology","place":["Nanjing, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Buqin","family":"Su","sequence":"additional","affiliation":[{"name":"School of Computing, Nanjing University of Information Science and Technology","place":["Nanjing, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1965","published-online":{"date-parts":[[2026,3,5]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"5215","DOI":"10.1109\/TNNLS.2021.3069683","article-title":"Spike-timing-dependent plasticity with activation-dependent scaling for receptive fields development","volume":"33","author":"Bia\u0142as","year":"2021","journal-title":"IEEE Trans. Neural Netw. Learn. Syst"},{"key":"B2","doi-asserted-by":"publisher","first-page":"18","DOI":"10.3389\/fnbot.2019.00018","article-title":"Supervised learning in SNN via reward-modulated spike-timing-dependent plasticity for a target reaching vehicle","volume":"13","author":"Bing","year":"","journal-title":"Front. Neurorobot"},{"key":"B3","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2019.8793774","article-title":"\u201cEnd to end learning of a multi-layered SNN based on r-STDP for a target tracking snake-like robot,\u201d","author":"Bing","year":"","journal-title":"IEEE International Conference on Robotics and Automation"},{"key":"B4","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2018.8460482","article-title":"\u201cEnd to end learning of spiking neural network based on r-STDP for a lane keeping vehicle,\u201d","author":"Bing","year":"2018","journal-title":"IEEE International Conference on Robotics and Automation"},{"key":"B5","doi-asserted-by":"publisher","first-page":"5169","DOI":"10.1038\/s41467-025-60462-5","article-title":"Distinct spatially organized striatum-wide acetylcholine dynamics for the learning and extinction of Pavlovian associations","volume":"16","author":"Bouabid","year":"2025","journal-title":"Nat. Commun"},{"key":"B6","doi-asserted-by":"publisher","first-page":"2300132","DOI":"10.1002\/aisy.202300132","article-title":"Bioinspired spike-based hippocampus and posterior parietal cortex models for robot navigation and environment pseudomapping","volume":"5","author":"Casanueva-Morato","year":"2023","journal-title":"Adv. Intell. Syst"},{"key":"B7","doi-asserted-by":"publisher","first-page":"577","DOI":"10.1038\/s41586-023-06492-9","article-title":"Dopamine and glutamate regulate striatal acetylcholine in decision-making","volume":"621","author":"Chantranupong","year":"2023","journal-title":"Nature"},{"key":"B8","doi-asserted-by":"publisher","first-page":"2881","DOI":"10.1109\/TNNLS.2024.3352653","article-title":"Fully spiking actor network with intralayer connections for reinforcement learning","volume":"36","author":"Chen","year":"2024","journal-title":"IEEE Trans. Neural Netw. Learn. Syst"},{"key":"B9","first-page":"7645","article-title":"\u201cSurrogate module learning: Reduce the gradient error accumulation in training spiking neural networks,\u201d","volume-title":"International Conference on Machine Learning","author":"Deng","year":"2023"},{"key":"B10","doi-asserted-by":"publisher","first-page":"13446","DOI":"10.1109\/TNNLS.2024.3481887","article-title":"Historical decision-making regularized maximum entropy reinforcement learning","volume":"36","author":"Dong","year":"2024","journal-title":"IEEE Trans. Neural Netw. Learn. Syst"},{"key":"B11","doi-asserted-by":"publisher","first-page":"1468","DOI":"10.1162\/neco.2007.19.6.1468","article-title":"Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity","volume":"19","author":"Florian","year":"2007","journal-title":"Neural Comput"},{"key":"B12","doi-asserted-by":"publisher","first-page":"16320","DOI":"10.1109\/TITS.2025.3579147","article-title":"An adaptive deep reinforcement learning framework for auv attack-defense games","volume":"26","author":"Gan","year":"2025","journal-title":"IEEE Trans. Intell. Transport. Syst"},{"key":"B13","doi-asserted-by":"publisher","first-page":"1657","DOI":"10.1109\/TCSII.2021.3117699","article-title":"A low-cost fpga implementation of spiking extreme learning machine with on-chip reward-modulated STDP learning","volume":"69","author":"He","year":"2021","journal-title":"IEEE Trans. Circ. Syst. II"},{"key":"B14","doi-asserted-by":"publisher","first-page":"7530","DOI":"10.1038\/s41467-022-35121-8","article-title":"Dynamic control of decision and movement speed in the human basal ganglia","volume":"13","author":"Herz","year":"2022","journal-title":"Nat. Commun"},{"key":"B15","doi-asserted-by":"publisher","first-page":"3876","DOI":"10.1109\/LRA.2019.2928765","article-title":"Vision-based estimation of driving energy for planetary rovers using deep learning and terramechanics","volume":"4","author":"Higa","year":"2019","journal-title":"IEEE Robot. Autom. Lett"},{"key":"B16","doi-asserted-by":"publisher","first-page":"3202","DOI":"10.1038\/s41467-022-30827-1","article-title":"Norepinephrine potentiates and serotonin depresses visual cortical responses by transforming eligibility traces","volume":"13","author":"Hong","year":"2022","journal-title":"Nat. Commun"},{"key":"B17","doi-asserted-by":"publisher","first-page":"1569","DOI":"10.1109\/TNN.2003.820440","article-title":"Simple model of spiking neurons","volume":"14","author":"Izhikevich","year":"2003","journal-title":"IEEE Trans. Neural Netw"},{"key":"B18","doi-asserted-by":"publisher","DOI":"10.7551\/mitpress\/2526.001.0001","author":"Izhikevich","year":"2007","journal-title":"Dynamical Systems in Neuroscience"},{"key":"B19","doi-asserted-by":"publisher","first-page":"205","DOI":"10.1016\/j.neucom.2010.03.001","article-title":"Ace (actor-critic-explorer) paradigm for reinforcement learning in basal ganglia: Highlighting the role of subthalamic and pallidal nuclei","volume":"74","author":"Joseph","year":"2010","journal-title":"Neurocomputing"},{"key":"B20","doi-asserted-by":"publisher","first-page":"1124","DOI":"10.1162\/neco_a_01754","article-title":"Neural code translation with lif neuron microcircuits","volume":"37","author":"Karlsson","year":"2025","journal-title":"Neural Comput"},{"key":"B21","doi-asserted-by":"publisher","first-page":"6662","DOI":"10.1038\/s41467-022-34465-5","article-title":"Reward expectation extinction restructures and degrades ca1 spatial maps through loss of a dopaminergic reward proximity signal","volume":"13","author":"Krishnan","year":"2022","journal-title":"Nat. Commun"},{"key":"B22","doi-asserted-by":"publisher","first-page":"129916","DOI":"10.1016\/j.neucom.2025.129916","article-title":"DSQN: robust path planning of mobile robot based on deep spiking q-network","volume":"634","author":"Kumar","year":"2025","journal-title":"Neurocomputing"},{"key":"B23","doi-asserted-by":"publisher","first-page":"7827","DOI":"10.1038\/s41467-024-52290-w","article-title":"Optimal level of human intracranial theta activity for behavioral switching in the subthalamo-medio-prefrontal circuit","volume":"15","author":"Laquitaine","year":"2024","journal-title":"Nat. Commun"},{"key":"B24","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1007\/s13534-024-00436-6","article-title":"Brain-inspired learning rules for spiking neural network-based control: a tutorial","volume":"15","author":"Lee","year":"2024","journal-title":"Biomed. Eng. Lett"},{"key":"B25","doi-asserted-by":"publisher","first-page":"1543","DOI":"10.1109\/TCSI.2021.3052885","article-title":"A fast and energy-efficient SNN processor with adaptive clock\/event-driven computation scheme and online learning","volume":"68","author":"Li","year":"2021","journal-title":"IEEE Trans. Circ. Syst. I"},{"key":"B26","doi-asserted-by":"publisher","first-page":"3264","DOI":"10.1162\/neco_a_01409","article-title":"Learning the synaptic and intrinsic membrane dynamics underlying working memory in spiking neural network models","volume":"33","author":"Li","year":"2021","journal-title":"Neural Comput"},{"key":"B27","doi-asserted-by":"publisher","first-page":"308","DOI":"10.1016\/j.neucom.2021.06.027","article-title":"An autonomous learning mobile robot using biological reward modulate stdp","volume":"458","author":"Lu","year":"2021","journal-title":"Neurocomputing"},{"key":"B28","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i13.29335","article-title":"\u201cDiscerning temporal difference learning,\u201d","author":"Ma","year":"2023","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"key":"B29","doi-asserted-by":"publisher","first-page":"191","DOI":"10.3389\/fnins.2015.00191","article-title":"A spiking basal ganglia model of synchrony, exploration and decision making","volume":"9","author":"Mandali","year":"2015","journal-title":"Front. Neurosci"},{"key":"B30","doi-asserted-by":"publisher","first-page":"87","DOI":"10.1016\/j.patcog.2019.05.015","article-title":"Bio-inspired digit recognition using reward-modulated spike-timing-dependent plasticity in deep convolutional networks","volume":"94","author":"Mozafari","year":"2019","journal-title":"Pattern Recognit"},{"key":"B31","doi-asserted-by":"publisher","first-page":"5185","DOI":"10.1038\/s41467-021-25366-0","article-title":"Neural signatures of hyperdirect pathway activity in Parkinson's disease","volume":"12","author":"Oswal","year":"2021","journal-title":"Nat. Commun"},{"key":"B32","doi-asserted-by":"publisher","first-page":"eadi0591","DOI":"10.1126\/scirobotics.adi0591","article-title":"Fully neuromorphic vision and control for autonomous drone flight","volume":"9","author":"Paredes-Vall\u00e9s","year":"2024","journal-title":"Sci. Robot"},{"key":"B33","doi-asserted-by":"publisher","first-page":"129170","DOI":"10.1016\/j.neucom.2024.129170","article-title":"Modulated spike-time dependent plasticity (STDP)-based learning for spiking neural network (SNN): a review","volume":"618","author":"Rahman","year":"2025","journal-title":"Neurocomputing"},{"key":"B34","doi-asserted-by":"publisher","first-page":"1296","DOI":"10.1038\/s41467-022-28950-0","article-title":"Coincidence of cholinergic pauses, dopaminergic activation and depolarisation of spiny projection neurons drives synaptic plasticity in the striatum","volume":"13","author":"Reynolds","year":"2022","journal-title":"Nat. Commun"},{"key":"B35","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i05.6214","article-title":"\u201cMulti-agent actor-critic with hierarchical graph attention network,\u201d","author":"Ryu","year":"2020","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"key":"B36","doi-asserted-by":"publisher","first-page":"e65459","DOI":"10.7554\/eLife.65459","article-title":"Spike frequency adaptation supports network computations on temporally dispersed information","volume":"10","author":"Salaj","year":"2021","journal-title":"Elife"},{"key":"B37","doi-asserted-by":"publisher","first-page":"2247","DOI":"10.1038\/s41467-022-29870-9","article-title":"Chalcogenide optomemristors for multi-factor neuromorphic computation","volume":"13","author":"Sarwat","year":"2022","journal-title":"Nat. Commun"},{"key":"B38","doi-asserted-by":"publisher","first-page":"1183321","DOI":"10.3389\/fnins.2023.1183321","article-title":"Meta-spikepropamine: learning to learn with synaptic plasticity in spiking neural networks","volume":"17","author":"Schmidgall","year":"","journal-title":"Front. Neurosci"},{"key":"B39","doi-asserted-by":"publisher","DOI":"10.1145\/3589737.3605971","article-title":"\u201cSynaptic motor adaptation: a three-factor learning rule for adaptive robotic control in spiking neural networks,\u201d","author":"Schmidgall","year":"","journal-title":"Proceedings of the 2023 International Conference on Neuromorphic Systems"},{"key":"B40","doi-asserted-by":"publisher","first-page":"7924","DOI":"10.1038\/s41467-022-35601-x","article-title":"Continuous cholinergic-dopaminergic updating in the nucleus accumbens underlies approaches to reward-predicting cues","volume":"13","author":"Skirzewski","year":"2022","journal-title":"Nat. Commun"},{"key":"B41","doi-asserted-by":"publisher","first-page":"1876","DOI":"10.1016\/j.neuron.2021.03.028","article-title":"The anterior cingulate cortex directs exploration of alternative strategies","volume":"109","author":"Tervo","year":"2021","journal-title":"Neuron"},{"key":"B42","doi-asserted-by":"publisher","first-page":"939","DOI":"10.1162\/neco_a_01487","article-title":"Parameter identification problem in the hodgkin-huxley model","volume":"34","author":"Valle","year":"2022","journal-title":"Neural Comput"},{"key":"B43","doi-asserted-by":"publisher","DOI":"10.1109\/RADAR54928.2023.10371008","article-title":"\u201cCollision avoidance navigation with radar and spiking reinforcement learning,\u201d","author":"Van Damme","year":"2023","journal-title":"Radar"},{"key":"B44","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i11.17200","article-title":"\u201cExpected eligibility traces,\u201d","author":"Van Hasselt","year":"2021","journal-title":"AAAI Conference on Artificial Intelligence"},{"key":"B45","doi-asserted-by":"publisher","first-page":"7747","DOI":"10.1109\/LRA.2024.3421188","article-title":"Deep reinforcement learning with dynamic graphs for adaptive informative path planning","volume":"9","author":"Vashisth","year":"2024","journal-title":"IEEE Robot. Autom. Lett"},{"key":"B46","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.cogsys.2019.09.006","article-title":"Mobile robot navigation with the combination of supervised learning in cerebellum and reward-based learning in basal ganglia","volume":"59","author":"Wang","year":"2020","journal-title":"Cogn. Syst. Res"},{"key":"B47","doi-asserted-by":"publisher","first-page":"125","DOI":"10.1007\/978-3-319-49685-6_12","author":"Wang","year":"2016","journal-title":"A Spiking Neural Network Based Autonomous Reinforcement Learning Model and Its Application in Decision Making"},{"key":"B48","doi-asserted-by":"publisher","first-page":"807","DOI":"10.1109\/TBCAS.2022.3191004","article-title":"A high-accuracy and energy-efficient cordic based izhikevich neuron with error suppression and compensation","volume":"16","author":"Wang","year":"2022","journal-title":"IEEE Trans. Biomed. Circuits Syst"},{"key":"B49","doi-asserted-by":"publisher","first-page":"3925","DOI":"10.1038\/ncomms4925","article-title":"Modulation of dopamine release in the striatum by physiologically relevant levels of nicotine","volume":"5","author":"Wang","year":"2014","journal-title":"Nat. Commun"},{"key":"B50","doi-asserted-by":"publisher","first-page":"82","DOI":"10.1016\/j.fmre.2024.01.014","article-title":"Modeling the modulation of beta oscillations in the basal ganglia by dual-target optogenetic stimulation","volume":"5","author":"Wang","year":"2025","journal-title":"Fundam. Res"},{"key":"B51","doi-asserted-by":"publisher","first-page":"405","DOI":"10.1007\/s12264-020-00615-2","article-title":"Role of the anterior cingulate cortex in translational pain research","volume":"37","author":"Xiao","year":"2021","journal-title":"Neurosci. Bull"},{"key":"B52","doi-asserted-by":"publisher","first-page":"4398","DOI":"10.1109\/TNNLS.2021.3057070","article-title":"Cerebellumorphic: large-scale neuromorphic model and architecture for supervised motor learning","volume":"33","author":"Yang","year":"2021","journal-title":"IEEE Trans. Neural Netw. Learn. Syst"},{"key":"B53","doi-asserted-by":"publisher","first-page":"30648","DOI":"10.1038\/s41598-024-77779-8","article-title":"Exploring spiking neural networks for deep reinforcement learning in robotic tasks","volume":"14","author":"Zanatta","year":"2024","journal-title":"Sci. Rep"},{"key":"B54","doi-asserted-by":"publisher","first-page":"290","DOI":"10.1109\/TCDS.2017.2649564","article-title":"A basal ganglia network centric reinforcement learning model and its application in unmanned aerial vehicle","volume":"10","author":"Zeng","year":"2018","journal-title":"IEEE Trans. Cogn. Dev. Syst"},{"key":"B55","doi-asserted-by":"publisher","first-page":"82","DOI":"10.1007\/s11424-024-3414-7","article-title":"A bio-inspired integration model of basal ganglia and cerebellum for motion learning of a musculoskeletal robot","volume":"37","author":"Zhang","year":"","journal-title":"J. Syst. Sci. Compl"},{"key":"B56","doi-asserted-by":"publisher","first-page":"120255","DOI":"10.1016\/j.ins.2024.120255","article-title":"Explorer-actor-critic: Better actors for deep reinforcement learning","volume":"662","author":"Zhang","year":"","journal-title":"Inf. Sci"},{"key":"B57","doi-asserted-by":"publisher","first-page":"667","DOI":"10.1016\/j.apsb.2023.12.005","article-title":"Targeting camp in d1-msns in the nucleus accumbens, a new rapid antidepressant strategy","volume":"14","author":"Zhang","year":"2023","journal-title":"Acta Pharm. Sinica B"},{"key":"B58","doi-asserted-by":"publisher","first-page":"100611","DOI":"10.1016\/j.patter.2022.100611","article-title":"Nature-inspired self-organizing collision avoidance for drone swarm based on reward-modulated spiking neural network","volume":"3","author":"Zhao","year":"2022","journal-title":"Patterns"},{"key":"B59","doi-asserted-by":"publisher","first-page":"2127","DOI":"10.1007\/s11571-023-10046-0","article-title":"A tan-dopamine interaction mechanism based computational model of basal ganglia in action selection","volume":"18","author":"Zhu","year":"2024","journal-title":"Cogn. Neurodyn"},{"key":"B60","doi-asserted-by":"publisher","first-page":"e163266","DOI":"10.1172\/JCI163266","article-title":"Nucleus accumbens d1\/d2 circuits control opioid withdrawal symptoms in mice","volume":"133","author":"Zhu","year":"2023","journal-title":"J. Clin. Invest"},{"key":"B61","doi-asserted-by":"publisher","DOI":"10.1109\/IROS55552.2023.10342044","article-title":"\u201cAn energy-efficient lane-keeping system using 3d lidar based on spiking neural network,\u201d","author":"Zhuang","year":"2023","journal-title":"IEEE\/RJS International Conference on Intelligent RObots and Systems"}],"container-title":["Frontiers in Neurorobotics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fnbot.2026.1757795\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,5]],"date-time":"2026-03-05T07:29:55Z","timestamp":1772695795000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fnbot.2026.1757795\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,5]]},"references-count":61,"alternative-id":["10.3389\/fnbot.2026.1757795"],"URL":"https:\/\/doi.org\/10.3389\/fnbot.2026.1757795","relation":{},"ISSN":["1662-5218"],"issn-type":[{"value":"1662-5218","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,3,5]]},"article-number":"1757795"}}