{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:49:04Z","timestamp":1760161744049,"version":"3.37.3"},"reference-count":30,"publisher":"IOP Publishing","issue":"1","license":[{"start":{"date-parts":[[2020,4,28]],"date-time":"2020-04-28T00:00:00Z","timestamp":1588032000000},"content-version":"vor","delay-in-days":58,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,4,28]],"date-time":"2020-04-28T00:00:00Z","timestamp":1588032000000},"content-version":"tdm","delay-in-days":58,"URL":"https:\/\/iopscience.iop.org\/info\/page\/text-and-data-mining"}],"funder":[{"DOI":"10.13039\/501100000266","name":"Engineering and Physical Sciences Research Council","doi-asserted-by":"crossref","award":["EP\/N035437\/1"],"award-info":[{"award-number":["EP\/N035437\/1"]}],"id":[{"id":"10.13039\/501100000266","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["iopscience.iop.org"],"crossmark-restriction":false},"short-container-title":["Mach. Learn.: Sci. Technol."],"published-print":{"date-parts":[[2020,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Artificial neural network training with gradient descent can be destabilized by \u2018bad batches\u2019 with high losses. This is often problematic for training with small batch sizes, high order loss functions or unstably high learning rates. To stabilize learning, we have developed adaptive learning rate clipping (ALRC) to limit backpropagated losses to a number of standard deviations above their running means. ALRC is designed to complement existing learning algorithms: Our algorithm is computationally inexpensive, can be applied to any loss function or batch size, is robust to hyperparameter choices and does not affect backpropagated gradient distributions. Experiments with CIFAR-10 supersampling show that ALCR decreases errors for unstable mean quartic error training while stable mean squared error training is unaffected. We also show that ALRC decreases unstable mean squared errors for scanning transmission electron microscopy supersampling and partial scan completion. Our source code is available at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/Jeffrey-Ede\/ALRC\" xlink:type=\"simple\">https:\/\/github.com\/Jeffrey-Ede\/ALRC<\/jats:ext-link>.<\/jats:p>","DOI":"10.1088\/2632-2153\/ab81e2","type":"journal-article","created":{"date-parts":[[2020,3,20]],"date-time":"2020-03-20T22:30:39Z","timestamp":1584743439000},"page":"015011","update-policy":"https:\/\/doi.org\/10.1088\/crossmark-policy","source":"Crossref","is-referenced-by-count":20,"title":["Adaptive learning rate clipping stabilizes learning"],"prefix":"10.1088","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9358-5364","authenticated-orcid":false,"given":"Jeffrey M","family":"Ede","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Richard","family":"Beanland","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"266","published-online":{"date-parts":[[2020,4,28]]},"reference":[{"article-title":"An overview of gradient descent optimization algorithms","year":"2016","author":"Ruder","key":"mlstab81e2bib1"},{"article-title":"Stochastic gradient descent optimizes over-parameterized deep ReLU networks","year":"2018","author":"Zou","key":"mlstab81e2bib2"},{"key":"mlstab81e2bib3","first-page":"pp 487","article-title":"Catastrophic forgetting: still a problem for DNNs","author":"Pf\u00fclb","year":"2018"},{"article-title":"Deep learning for pedestrians: backpropagation in CNNs","year":"2018","author":"Bou\u00e9","key":"mlstab81e2bib4"},{"key":"mlstab81e2bib5","doi-asserted-by":"crossref","DOI":"10.3934\/mfc.2018008","article-title":"How convolutional neural network see the world-A survey of convolutional neural network visualization methods","author":"Qin","year":"2018"},{"key":"mlstab81e2bib6","first-page":"pp 3856","article-title":"Dynamic routing between capsules","author":"Sabour","year":"2017"},{"article-title":"On the difficulty of training recurrent neural networks","year":"2012","author":"Bengio","key":"mlstab81e2bib7"},{"article-title":"Statistical language models based on neural networks","year":"2012","author":"Mikolov","key":"mlstab81e2bib8"},{"key":"mlstab81e2bib9","doi-asserted-by":"crossref","first-page":"pp 73","DOI":"10.1214\/aoms\/1177703732","article-title":"Robust estimation of a location parameter","author":"Huber","year":"1964","journal-title":"The Annals of Mathematical Statistics"},{"article-title":"An alternative probabilistic interpretation of the Huber loss","year":"2019","author":"Meyer","key":"mlstab81e2bib10"},{"article-title":"Batch normalization accelerating deep network training by reducing internal covariate shift","year":"2015","author":"Ioffe","key":"mlstab81e2bib11"},{"volume":"vol 55","year":"2014","author":"Krizhevsky","key":"mlstab81e2bib12"},{"year":"2009","author":"Krizhevsky","key":"mlstab81e2bib13"},{"key":"mlstab81e2bib14","doi-asserted-by":"publisher","first-page":"85","DOI":"10.1109\/MSP.2017.2739299","article-title":"Convolutional neural networks for inverse problems in imaging: A review","volume":"34","author":"McCann","year":"2017","journal-title":"IEEE Signal Process. Mag."},{"key":"mlstab81e2bib15","first-page":"pp 1097","article-title":"ImageNet classification with deep convolutional neural networks","author":"Krizhevsky","year":"2012"},{"key":"mlstab81e2bib16","first-page":"pp 807","article-title":"Rectified linear units improve restricted Boltzmann machines","author":"Nair","year":"2010"},{"key":"mlstab81e2bib17","first-page":"pp 249","article-title":"Understanding the difficulty of training deep feedforward neural networks","author":"Glorot","year":"2010"},{"article-title":"ADAM: A method for stochastic optimization","year":"2014","author":"Kingma","key":"mlstab81e2bib18"},{"year":"2019","author":"Ede","key":"mlstab81e2bib19"},{"key":"mlstab81e2bib20","doi-asserted-by":"crossref","DOI":"10.1038\/s41598-020-65261-0","article-title":"Partial scanning transmission electron microscopy with deep learning","author":"Ede","year":"2020"},{"key":"mlstab81e2bib21","first-page":"pp 1","article-title":"Going deeper with convolutions","author":"Szegedy","year":"2015"},{"key":"mlstab81e2bib22","first-page":"pp 2818","article-title":"Rethinking the inception architecture for computer vision","author":"Szegedy","year":"2016"},{"key":"mlstab81e2bib23","first-page":"pp 901","article-title":"Weight normalization: A simple reparameterization to accelerate training of deep neural networks","author":"Salimans","year":"2016"},{"key":"mlstab81e2bib24","first-page":"pp 2160","article-title":"Norm matters: efficient and accurate normalization schemes in deep networks","author":"Hoffer","year":"2018"},{"article-title":"Rethinking atrous convolution for semantic image segmentation","year":"2017","author":"Chen","key":"mlstab81e2bib25"},{"key":"mlstab81e2bib26","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-Level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"mlstab81e2bib27","first-page":"265","article-title":"Tensor flow: A system for large-scale machine learning.","volume":"16","author":"Abadi","year":"2016","journal-title":"OSDI"},{"year":"2020","author":"Ede","key":"mlstab81e2bib28"},{"year":"2020","author":"Ede","key":"mlstab81e2bib29"},{"key":"mlstab81e2bib30","first-page":"pp 770","article-title":"Deep residual learning for image recognition","author":"He","year":"2016"}],"container-title":["Machine Learning: Science and Technology"],"original-title":[],"link":[{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ab81e2","content-type":"text\/html","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ab81e2\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ab81e2\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ab81e2","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ab81e2","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ab81e2\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ab81e2\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ab81e2\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ab81e2\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"similarity-checking"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ab81e2\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ab81e2","content-type":"text\/html","content-version":"vor","intended-application":"similarity-checking"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ab81e2\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,2,16]],"date-time":"2022-02-16T15:33:15Z","timestamp":1645025595000},"score":1,"resource":{"primary":{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ab81e2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,3,1]]},"references-count":30,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2020,4,28]]},"published-print":{"date-parts":[[2020,3,1]]}},"URL":"https:\/\/doi.org\/10.1088\/2632-2153\/ab81e2","relation":{},"ISSN":["2632-2153"],"issn-type":[{"type":"electronic","value":"2632-2153"}],"subject":[],"published":{"date-parts":[[2020,3,1]]},"assertion":[{"value":"Adaptive learning rate clipping stabilizes learning","name":"article_title","label":"Article Title"},{"value":"Machine Learning: Science and Technology","name":"journal_title","label":"Journal Title"},{"value":"paper","name":"article_type","label":"Article Type"},{"value":"\u00a9 2020 The Author(s). Published by IOP Publishing Ltd","name":"copyright_information","label":"Copyright Information"},{"value":"2019-12-20","name":"date_received","label":"Date Received","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2020-03-20","name":"date_accepted","label":"Date Accepted","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2020-04-28","name":"date_epub","label":"Online publication date","group":{"name":"publication_dates","label":"Publication dates"}}]}}