{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,12]],"date-time":"2025-11-12T12:20:53Z","timestamp":1762950053297,"version":"3.45.0"},"reference-count":34,"publisher":"IOP Publishing","issue":"4","license":[{"start":{"date-parts":[[2025,11,12]],"date-time":"2025-11-12T00:00:00Z","timestamp":1762905600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2025,11,12]],"date-time":"2025-11-12T00:00:00Z","timestamp":1762905600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/iopscience.iop.org\/info\/page\/text-and-data-mining"}],"funder":[{"DOI":"10.13039\/100010663","name":"H2020 European Research Council","doi-asserted-by":"crossref","award":["101021526"],"award-info":[{"award-number":["101021526"]}],"id":[{"id":"10.13039\/100010663","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["iopscience.iop.org"],"crossmark-restriction":false},"short-container-title":["Mach. Learn.: Sci. Technol."],"published-print":{"date-parts":[[2025,12,30]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    Like many optimization algorithms, stochastic variational inference is sensitive to the choice of the learning rate. If the learning rate is too small, the optimization process may be slow, and the algorithm might get stuck in local optima. On the other hand, if the learning rate is too large, the algorithm may oscillate or diverge, failing to converge to a solution. Adaptive learning rate methods such as Adam, AdaMax, Adagrad, or root mean square propagation automatically adjust the learning rate based on the history of gradients. Nevertheless, if the base learning rate is too large, the variational parameters might still oscillate around the optimal solution. With learning rate schedules, the learning rate can be reduced gradually to mitigate this problem. However, the amount at which the learning rate should be decreased in each iteration is not known\n                    <jats:italic>a priori<\/jats:italic>\n                    , which can significantly impact the performance of the optimization. In this work, we propose a method to decay the learning rate based on the history of the variational parameters. We use an empirical measure to quantify the amount of oscillations against the progress of the variational parameters to adapt the learning rate. The approach requires little memory and is computationally efficient. We demonstrate in various numerical examples that our method reduces the sensitivity of the optimization performance to the learning rate and that it can also be used in combination with other adaptive learning rate methods.\n                  <\/jats:p>","DOI":"10.1088\/2632-2153\/ae19cc","type":"journal-article","created":{"date-parts":[[2025,10,30]],"date-time":"2025-10-30T22:53:12Z","timestamp":1761864792000},"page":"045041","update-policy":"https:\/\/doi.org\/10.1088\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Dynamic learning rate decay for stochastic variational inference"],"prefix":"10.1088","volume":"6","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-2908-2951","authenticated-orcid":true,"given":"Maximilian","family":"Dinkel","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6336-6718","authenticated-orcid":true,"given":"Gil","family":"Robalo Rei","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7419-3384","authenticated-orcid":false,"given":"Wolfgang A","family":"Wall","sequence":"additional","affiliation":[]}],"member":"266","published-online":{"date-parts":[[2025,11,12]]},"reference":[{"key":"mlstae19ccbib1","first-page":"1303","type":"journal-article","article-title":"Stochastic variational inference","volume":"14","author":"Hoffman","year":"2013","journal-title":"J. Mach. Learn. Res."},{"key":"mlstae19ccbib2","doi-asserted-by":"publisher","first-page":"859","DOI":"10.1080\/01621459.2017.1285773","type":"journal-article","article-title":"Variational inference: a review for statisticians","volume":"112","author":"Blei","year":"2017","journal-title":"J. Am. Stat. Assoc."},{"key":"mlstae19ccbib3","first-page":"pp 298","type":"conference-proceedings","article-title":"An adaptive learning rate for stochastic variational inference","volume":"vol 28","author":"Ranganath","year":"2013"},{"key":"mlstae19ccbib4","doi-asserted-by":"publisher","first-page":"pp 9","DOI":"10.1007\/3-540-49430-8_2)","type":"book","article-title":"Efficient BackProp","author":"LeCun","year":"1998"},{"article-title":"Adam: a method for stochastic optimization","year":"2014","author":"Kingma","key":"mlstae19ccbib5","type":"preprint"},{"key":"mlstae19ccbib6","first-page":"2121","type":"journal-article","article-title":"Adaptive subgradient methods for online learning and stochastic optimization","volume":"12","author":"Duchi","year":"2011","journal-title":"J. Mach. Learn. Res."},{"year":"2012","author":"Tieleman","key":"mlstae19ccbib7","type":"book"},{"key":"mlstae19ccbib8","doi-asserted-by":"publisher","first-page":"251","DOI":"10.1162\/089976698300017746","type":"journal-article","article-title":"Natural gradient works efficiently in learning","volume":"10","author":"Amari","year":"1998","journal-title":"Neural Comput."},{"article-title":"Queens: an open-source python framework for solver-independent analyses of large-scale computational models","year":"2025","author":"Biehler","key":"mlstae19ccbib9","type":"preprint"},{"year":"2006","author":"Bishop","key":"mlstae19ccbib10","type":"book"},{"key":"mlstae19ccbib11","doi-asserted-by":"publisher","first-page":"400","DOI":"10.1214\/aoms\/1177729586","type":"journal-article","article-title":"A stochastic approximation method","volume":"22","author":"Robbins","year":"1951","journal-title":"Ann. Math. Stat."},{"key":"mlstae19ccbib12","first-page":"pp 814","type":"conference-proceedings","article-title":"Black box variational inference","volume":"vol 33","author":"Ranganath","year":"2014"},{"key":"mlstae19ccbib13","first-page":"1","type":"journal-article","article-title":"Monte Carlo gradient estimation in machine learning","volume":"21","author":"Mohamed","year":"2020","journal-title":"J. Mach. Learn. Res."},{"key":"mlstae19ccbib14","first-page":"pp 81","type":"book","article-title":"Rao-Blackwellisation of Sampling Schemes","volume":"vol 83","author":"Casella","year":"1996"},{"edition":"4th edn","year":"2006","author":"Ross","key":"mlstae19ccbib15","type":"book"},{"key":"mlstae19ccbib16","first-page":"pp 1363","type":"conference-proceedings","article-title":"Variational Bayesian inference with stochastic search","author":"Paisley","year":"2012"},{"article-title":"Auto-encoding variational bayes","year":"2022","author":"Kingma","key":"mlstae19ccbib17","type":"preprint"},{"key":"mlstae19ccbib18","type":"conference-proceedings","article-title":"Variational dropout and the local reparameterization trick","volume":"vol 28","author":"Kingma","year":"2015"},{"key":"mlstae19ccbib19","type":"conference-proceedings","article-title":"Sticking the landing: simple, lower-variance gradient estimators for variational inference","volume":"vol 30","author":"Roeder","year":"2017"},{"article-title":"ADADELTA: an adaptive learning rate method","year":"2012","author":"Zeiler","key":"mlstae19ccbib20","type":"preprint"},{"article-title":"Breast cancer Wisconsin (Diagnostic)","year":"1995","author":"Wolberg","key":"mlstae19ccbib21","type":"other"},{"key":"mlstae19ccbib22","doi-asserted-by":"publisher","first-page":"2087","DOI":"10.5555\/2627435.2670318","type":"journal-article","article-title":"Parallel MCMC with generalized elliptical slice sampling","volume":"15","author":"Nishihara","year":"2014","journal-title":"J. Mach. Learn. Res."},{"year":"1992","author":"Aeberhard","key":"mlstae19ccbib23","type":"other"},{"key":"mlstae19ccbib24","doi-asserted-by":"publisher","first-page":"pp 223","DOI":"10.1137\/16M1080173)","type":"book","article-title":"Optimization Methods for Large-Scale Machine Learning","volume":"vol 60","author":"Bottou","year":"2018"},{"key":"mlstae19ccbib25","type":"conference-proceedings","article-title":"Using statistics to automate stochastic optimization","volume":"vol 32","author":"Lang","year":"2019"},{"article-title":"Statistical adaptive stochastic gradient methods","year":"2020","author":"Zhang","key":"mlstae19ccbib26","type":"other"},{"key":"mlstae19ccbib27","first-page":"pp 234","type":"conference-proceedings","article-title":"Efficient gradient-free variational inference using policy search","volume":"vol 80","author":"Arenz","year":"2018"},{"key":"mlstae19ccbib28","doi-asserted-by":"publisher","first-page":"163","DOI":"10.1016\/0304-3835(94)90099-X","type":"journal-article","article-title":"Machine learning techniques to diagnose breast cancer from image-processed nuclear features of fine needle aspirates","volume":"77","author":"Wolberg","year":"1994","journal-title":"Comput. Appl. Early Detect. Staging Cancer"},{"key":"mlstae19ccbib29","doi-asserted-by":"publisher","first-page":"761","DOI":"10.1093\/biomet\/asp053","type":"journal-article","article-title":"Sinh-arcsinh distributions","volume":"96","author":"Jones","year":"2009","journal-title":"Biometrika"},{"key":"mlstae19ccbib30","doi-asserted-by":"publisher","DOI":"10.1088\/1361-6420\/ad5eb4","type":"journal-article","article-title":"Solving bayesian inverse problems with expensive likelihoods using constrained Gaussian processes and active learning","volume":"40","author":"Dinkel","year":"2024","journal-title":"Inverse Problems"},{"year":"2003","author":"Ghanem","key":"mlstae19ccbib31","type":"book"},{"key":"mlstae19ccbib32","doi-asserted-by":"publisher","first-page":"757","DOI":"10.1109\/TASL.2008.919072","type":"journal-article","article-title":"On the importance of the Pearson correlation coefficient in noise reduction","volume":"16","author":"Benesty","year":"2008","journal-title":"IEEE Trans. Audio Speech Lang. Process."},{"key":"mlstae19ccbib33","first-page":"pp 59","type":"book","article-title":"Thirteen Ways to Look at the Correlation Coefficient","volume":"vol 42","author":"Rodgers","year":"1988"},{"article-title":"The matrix reference manual","year":"2020","author":"Brookes","key":"mlstae19ccbib34","type":"other"}],"container-title":["Machine Learning: Science and Technology"],"original-title":[],"link":[{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ae19cc","content-type":"text\/html","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ae19cc\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ae19cc","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ae19cc\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ae19cc\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ae19cc\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ae19cc\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"similarity-checking"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ae19cc\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,12]],"date-time":"2025-11-12T12:18:29Z","timestamp":1762949909000},"score":1,"resource":{"primary":{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ae19cc"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,12]]},"references-count":34,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2025,11,12]]},"published-print":{"date-parts":[[2025,12,30]]}},"URL":"https:\/\/doi.org\/10.1088\/2632-2153\/ae19cc","relation":{},"ISSN":["2632-2153"],"issn-type":[{"type":"electronic","value":"2632-2153"}],"subject":[],"published":{"date-parts":[[2025,11,12]]},"assertion":[{"value":"Dynamic learning rate decay for stochastic variational inference","name":"article_title","label":"Article Title"},{"value":"Machine Learning: Science and Technology","name":"journal_title","label":"Journal Title"},{"value":"paper","name":"article_type","label":"Article Type"},{"value":"\u00a9 2025 The Author(s). Published by IOP Publishing Ltd","name":"copyright_information","label":"Copyright Information"},{"value":"2024-12-20","name":"date_received","label":"Date Received","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2025-10-30","name":"date_accepted","label":"Date Accepted","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2025-11-12","name":"date_epub","label":"Online publication date","group":{"name":"publication_dates","label":"Publication dates"}}]}}