{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,10]],"date-time":"2026-05-10T10:15:19Z","timestamp":1778408119793,"version":"3.51.4"},"reference-count":57,"publisher":"MIT Press","issue":"4","license":[{"start":{"date-parts":[[2021,3,7]],"date-time":"2021-03-07T00:00:00Z","timestamp":1615075200000},"content-version":"vor","delay-in-days":65,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,3,26]]},"abstract":"<jats:p>Brains process information in spiking neural networks. Their intricate connections shape the diverse functions these networks perform. Yet how network connectivity relates to function is poorly understood, and the functional capabilities of models of spiking networks are still rudimentary. The lack of both theoretical insight and practical algorithms to find the necessary connectivity poses a major impediment to both studying information processing in the brain and building efficient neuromorphic hardware systems. The training algorithms that solve this problem for artificial neural networks typically rely on gradient descent. But doing so in spiking networks has remained challenging due to the nondifferentiable nonlinearity of spikes. To avoid this issue, one can employ surrogate gradients to discover the required connectivity. However, the choice of a surrogate is not unique, raising the question of how its implementation influences the effectiveness of the method. Here, we use numerical simulations to systematically study how essential design parameters of surrogate gradients affect learning performance on a range of classification problems. We show that surrogate gradient learning is robust to different shapes of underlying surrogate derivatives, but the choice of the derivative's scale can substantially affect learning performance. When we combine surrogate gradients with suitable activity regularization techniques, spiking networks perform robust information processing at the sparse activity limit. Our study provides a systematic account of the remarkable robustness of surrogate gradient learning and serves as a practical guide to model functional spiking neural networks.<\/jats:p>","DOI":"10.1162\/neco_a_01367","type":"journal-article","created":{"date-parts":[[2021,1,29]],"date-time":"2021-01-29T18:13:43Z","timestamp":1611944023000},"page":"899-925","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":202,"title":["The Remarkable Robustness of Surrogate Gradient Learning for Instilling Complex Function in Spiking Neural Networks"],"prefix":"10.1162","volume":"33","author":[{"given":"Friedemann","family":"Zenke","sequence":"first","affiliation":[{"name":"Centre for Neural Circuits and Behaviour, University of Oxford, Oxford OX1 3SR, U.K., and Friedrich Miescher Institute for Biomedical Research, 4058 Basel, Switzerland, friedemann.zenke@fmi.ch"}]},{"given":"Tim P.","family":"Vogels","sequence":"additional","affiliation":[{"name":"Centre for Neural Circuits and Behaviour, University of Oxford, Oxford OX1 3SR, U.K., and Institute for Science and Technology, 3400 Klosterneuburg, Austria, tim.vogels@ist.ac.at"}]}],"member":"281","published-online":{"date-parts":[[2021,3,26]]},"reference":[{"key":"2021041320044225400_B1","first-page":"7243","article-title":"A low power, fully event-based gesture recognition system","author":"Amir","year":"2017","journal-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition"},{"key":"2021041320044225400_B2","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1016\/j.conb.2019.01.007","article-title":"Analyzing biological and artificial neural networks: Challenges with opportunities for synergy?","volume":"55","author":"Barrett","year":"2019","journal-title":"Current Opinion in Neurobiology"},{"key":"2021041320044225400_B3","first-page":"795","volume-title":"Advances in neural information processing systems","author":"Bellec","year":"2018"},{"key":"2021041320044225400_B4","author":"Bellec","year":"2019"},{"issue":"2","key":"2021041320044225400_B5","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1109\/MCSE.2017.33","article-title":"A neuromorph's prospectus","volume":"19","author":"Boahen","year":"2017","journal-title":"Comput. Sci. Eng."},{"key":"2021041320044225400_B6","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1007\/978-3-642-21735-7_8","article-title":"Error-backpropagation in networks of fractionally predictive spiking neurons","author":"Bohte","year":"2011","journal-title":"Artificial Neural Networks and Machine Learning\u2014ICANN 2011"},{"key":"2021041320044225400_B7","author":"Cramer","year":"2020","journal-title":"Training spiking multi-layer networks with surrogate gradients on an analog neuromorphic substrate."},{"key":"2021041320044225400_B8","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TNNLS.2020.3044364","article-title":"The Heidelberg spiking data sets for the systematic evaluation of spiking neural networks","author":"Cramer","year":"2020","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"issue":"6203","key":"2021041320044225400_B9","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1038\/337129a0","article-title":"The recent excitement about neural networks","volume":"337","author":"Crick","year":"1989","journal-title":"Nature"},{"key":"2021041320044225400_B10","author":"Cueva","year":"2019","journal-title":"Low dimensional dynamics for working memory and time encoding."},{"issue":"41","key":"2021041320044225400_B11","doi-asserted-by":"crossref","first-page":"11441","DOI":"10.1073\/pnas.1604850113","volume":"113","author":"Esser","year":"2016","journal-title":"Proc. Natl. Acad. Sci. U.S.A."},{"key":"2021041320044225400_B12","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9781107447615","author":"Gerstner","year":"2014","journal-title":"Neuronal dynamics: From single neurons to networks and models of cognition"},{"issue":"6277","key":"2021041320044225400_B13","doi-asserted-by":"crossref","DOI":"10.1126\/science.aab4113","article-title":"Spiking neurons can discover predictive features by aggregate-label learning","volume":"351","author":"G\u00fctig","year":"2016","journal-title":"Science"},{"issue":"3","key":"2021041320044225400_B14","doi-asserted-by":"crossref","first-page":"420","DOI":"10.1038\/nn1643","article-title":"The tempotron: A neuron that learns spike timing-based decisions","volume":"9","author":"G\u00fctig","year":"2006","journal-title":"Nat. Neurosci."},{"key":"2021041320044225400_B15","first-page":"1026","article-title":"Delving deep into rectifiers: Surpassing human-level performance on imagenet classification","author":"He","journal-title":"Proceedings of the IEEE International Conference on Computer Vision"},{"issue":"2","key":"2021041320044225400_B16","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1142\/S0218488598000094","article-title":"The vanishing gradient problem during learning recurrent neural nets and problem solutions","volume":"6","author":"Hochreiter","year":"1998","journal-title":"Int. J. Unc. Fuzz. Knowl. Based Syst."},{"key":"2021041320044225400_B17","author":"Huang","year":"2001","journal-title":"Spoken language processing: A guide to theory, algorithm and system development"},{"key":"2021041320044225400_B18","first-page":"1440","volume-title":"Advances in neural information processing systems","author":"Huh","year":"2018"},{"key":"2021041320044225400_B19","author":"Hunsberger","year":"2015","journal-title":"Spiking deep networks with LIF neurons"},{"key":"2021041320044225400_B20","author":"Kingma","year":"2014","journal-title":"Adam: A method for stochastic optimization"},{"issue":"7553","key":"2021041320044225400_B21","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"key":"2021041320044225400_B22","author":"LeCun","year":"1998","journal-title":"The MNIST database of handwritten digits"},{"key":"2021041320044225400_B23","doi-asserted-by":"crossref","DOI":"10.3389\/fnins.2016.00508","article-title":"Training deep spiking neural networks using backpropagation","volume":"10","author":"Lee","year":"2016","journal-title":"Front. Neurosci."},{"key":"2021041320044225400_B24","author":"Maheswaranathan","year":"2018","journal-title":"Deep learning models reveal internal structure and diverse computations in the retina under natural scenes"},{"issue":"7474","key":"2021041320044225400_B25","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1038\/nature12742","article-title":"Context-dependent computation by recurrent dynamics in prefrontal cortex","volume":"503","author":"Mante","year":"2013","journal-title":"Nature"},{"key":"2021041320044225400_B26","doi-asserted-by":"crossref","DOI":"10.3389\/fncom.2016.00131","article-title":"Representational distance learning for deep neural networks","volume":"10","author":"McClure","year":"2016","journal-title":"Front. Comput. Neurosci."},{"key":"2021041320044225400_B27","first-page":"1369","volume-title":"Advances in neural information processing systems","author":"McIntosh","year":"2016"},{"key":"2021041320044225400_B28","author":"Michaels","year":"2019","journal-title":"A neural network model of flexible grasp movement generation."},{"key":"2021041320044225400_B29","author":"Mishkin","year":"2016","journal-title":"All you need is a good init."},{"issue":"7","key":"2021041320044225400_B30","first-page":"3227","article-title":"Supervised learning based on temporal coding in spiking neural networks","volume":"29","author":"Mostafa","year":"2018","journal-title":"Trans. Neural Netw. Learn. Syst."},{"key":"2021041320044225400_B31","article-title":"Local online learning in recurrent networks with random feedback","volume":"8","author":"Murray","journal-title":"eLife"},{"key":"2021041320044225400_B32","doi-asserted-by":"crossref","first-page":"52","DOI":"10.1016\/j.isci.2018.06.010","article-title":"Data and power efficient intelligence with neuromorphic learning machines","volume":"5","author":"Neftci","year":"2018","journal-title":"iScience"},{"issue":"6","key":"2021041320044225400_B33","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1109\/MSP.2019.2931595","article-title":"Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural networks","volume":"36","author":"Neftci","year":"2019","journal-title":"IEEE Signal Process. Mag."},{"key":"2021041320044225400_B34","article-title":"Converting static image datasets to spiking neuromorphic datasets using saccades","volume":"9","author":"Orchard","year":"2015","journal-title":"FrontNeurosci."},{"key":"2021041320044225400_B35","first-page":"8026","article-title":"PyTorch: An imperative style, high-performance deep learning library","volume":"32","author":"Paszke","year":"2019","journal-title":"Advances in neural information processing systems"},{"key":"2021041320044225400_B36","doi-asserted-by":"crossref","DOI":"10.3389\/fnins.2018.00774","article-title":"Deep learning with spiking neurons: Opportunities and challenges","volume":"12","author":"Pfeiffer","year":"2018","journal-title":"Front. Neurosci."},{"issue":"e38242","key":"2021041320044225400_B37","article-title":"\u201cArtiphysiology\u201d reveals V4-like shape tuning in a deep network trained for image classification","volume":"7","author":"Pospisil","year":"2018","journal-title":"eLife"},{"issue":"11","key":"2021041320044225400_B38","doi-asserted-by":"crossref","first-page":"1761","DOI":"10.1038\/s41593-019-0520-2","volume":"22","author":"Richards","year":"2019","journal-title":"Nat. Neurosci."},{"issue":"7784","key":"2021041320044225400_B39","doi-asserted-by":"crossref","first-page":"607","DOI":"10.1038\/s41586-019-1677-2","article-title":"Towards spike-based machine intelligence with neuromorphic computing","volume":"575","author":"Roy","year":"2019","journal-title":"Nature"},{"key":"2021041320044225400_B40","doi-asserted-by":"crossref","first-page":"1947","DOI":"10.1109\/ISCAS.2010.5536970","article-title":"A wafer-scale neuromorphic hardware system for large-scale neural modeling","author":"Schemmel","year":"2010","journal-title":"Proceedings of 2010 IEEE International Symposium on Circuits and Systems"},{"key":"2021041320044225400_B41","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1016\/j.neunet.2014.09.003","article-title":"Deep learning in neural networks: An overview","volume":"61","author":"Schmidhuber","year":"2015","journal-title":"Neural Netw."},{"key":"2021041320044225400_B42","first-page":"1419","volume-title":"Advances in neural information processing systems","author":"Shrestha","year":"2018"},{"key":"2021041320044225400_B43","author":"Sterling","year":"2017","journal-title":"Principles of neural design"},{"issue":"12","key":"2021041320044225400_B44","doi-asserted-by":"crossref","first-page":"1774","DOI":"10.1038\/s41593-018-0276-0","article-title":"Motor primitives in space and time via targeted gain modulation in cortical networks","volume":"21","author":"Stroud","year":"2018","journal-title":"Nature Neuroscience"},{"issue":"3","key":"2021041320044225400_B45","doi-asserted-by":"crossref","first-page":"626","DOI":"10.1162\/NECO_a_00409","article-title":"Opening the black box: Low-dimensional dynamics in high-dimensional recurrent neural networks","volume":"25","author":"Sussillo","year":"2012","journal-title":"Neural Comput."},{"key":"2021041320044225400_B46","first-page":"8535","volume-title":"Advances in neural information processing systems","author":"Tanaka","year":"2019"},{"issue":"46","key":"2021041320044225400_B47","doi-asserted-by":"crossref","first-page":"10786","DOI":"10.1523\/JNEUROSCI.3508-05.2005","article-title":"Signal propagation and logic gating in networks of integrate-and-fire neurons","volume":"25","author":"Vogels","year":"2005","journal-title":"J. Neurosci."},{"issue":"1","key":"2021041320044225400_B48","doi-asserted-by":"crossref","first-page":"102","DOI":"10.1038\/s41593-017-0028-6","volume":"21","author":"Wang","year":"2018","journal-title":"Nat. Neurosci."},{"key":"2021041320044225400_B49","author":"Warden","year":"2018","journal-title":"Speech commands: A dataset for limited-vocabulary speech recognition"},{"issue":"2","key":"2021041320044225400_B50","doi-asserted-by":"crossref","first-page":"270","DOI":"10.1162\/neco.1989.1.2.270","article-title":"A learning algorithm for continually running fully recurrent neural networks","volume":"1","author":"Williams","year":"1989","journal-title":"Neural Computation"},{"key":"2021041320044225400_B51","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1016\/j.conb.2018.12.009","article-title":"Bridging large-scale neuronal recordings and large-scale network models using dimensionality reduction","author":"Williamson","year":"2019","journal-title":"Current Opinion in Neurobiology"},{"issue":"6","key":"2021041320044225400_B52","doi-asserted-by":"crossref","first-page":"325","DOI":"10.1038\/s42256-020-0187-0","article-title":"Deep learning incorporating biologically inspired neural dynamics and in-memory computing","volume":"2","author":"niak","year":"2020","journal-title":"Nature Machine Intelligence"},{"issue":"3","key":"2021041320044225400_B53","doi-asserted-by":"crossref","first-page":"356","DOI":"10.1038\/nn.4244","article-title":"Using goal-driven deep learning models to understand sensory cortex","volume":"19","author":"Yamins","year":"2016","journal-title":"Nat. Neurosci."},{"issue":"23","key":"2021041320044225400_B54","doi-asserted-by":"crossref","first-page":"8619","DOI":"10.1073\/pnas.1403112111","article-title":"Performance-optimized hierarchical models predict neural responses in higher visual cortex","volume":"111","author":"Yamins","year":"2014","journal-title":"Proc. Natl. Acad. Sci. U.S.A."},{"key":"2021041320044225400_B55","author":"Zenke","year":"2019","journal-title":"SpyTorch"},{"issue":"6","key":"2021041320044225400_B56","doi-asserted-by":"crossref","first-page":"1514","DOI":"10.1162\/neco_a_01086","article-title":"SuperSpike: Supervised learning in multilayer spiking neural networks","volume":"30","author":"Zenke","year":"2018","journal-title":"Neural Comput."},{"key":"2021041320044225400_B57","author":"Zimmer","year":"2019","journal-title":"Technical report: Supervised training of convolutional spiking neural networks with PyTorch"}],"container-title":["Neural Computation"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/direct.mit.edu\/neco\/article-pdf\/33\/4\/899\/1902294\/neco_a_01367.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/direct.mit.edu\/neco\/article-pdf\/33\/4\/899\/1902294\/neco_a_01367.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,4,14]],"date-time":"2021-04-14T20:18:40Z","timestamp":1618431520000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/neco\/article\/33\/4\/899\/97482\/The-Remarkable-Robustness-of-Surrogate-Gradient"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021]]},"references-count":57,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2021,3,26]]},"published-print":{"date-parts":[[2021,3,26]]}},"URL":"https:\/\/doi.org\/10.1162\/neco_a_01367","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2020.06.29.176925","asserted-by":"object"}]},"ISSN":["0899-7667","1530-888X"],"issn-type":[{"value":"0899-7667","type":"print"},{"value":"1530-888X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021]]},"published":{"date-parts":[[2021]]}}}