{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,21]],"date-time":"2025-08-21T16:54:41Z","timestamp":1755795281751,"version":"3.44.0"},"reference-count":30,"publisher":"Oxford University Press (OUP)","issue":"8","license":[{"start":{"date-parts":[[2025,2,17]],"date-time":"2025-02-17T00:00:00Z","timestamp":1739750400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/pages\/standard-publication-reuse-rights"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,8,14]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Learning to optimize (L2O) is a technique that uses neural networks to learn optimization algorithms automatically. While it holds promise for diverse optimization problems, achieving consistently ideal results remains a challenge. Typically, L2O through a parameterized optimization method (i.e. \u201c optimizer\u201d) learns from training samples and generalizes to test tasks with the same distribution. However, the new test tasks usually have some deviation from the training set distribution. In this case, the generic L2O methods may not produce good optimization results. Thus, we introduce a step-size control mechanism based on the generic L2O to solve the common problem of insufficient control of the iteration amplitude in L2O and adopt different update strategies for various optimization problems to adapt to complex optimization scenarios. Additionally, we also innovatively use the gated recurrent unit network as the core model of the optimizer to achieve better optimization results. Finally, the experimental outcomes from numerical simulations and real-world datasets show that our proposed methods are significantly better than other optimization algorithms.<\/jats:p>","DOI":"10.1093\/comjnl\/bxaf012","type":"journal-article","created":{"date-parts":[[2025,2,17]],"date-time":"2025-02-17T11:44:31Z","timestamp":1739792671000},"page":"908-925","source":"Crossref","is-referenced-by-count":0,"title":["Learning to optimize based on rate decay"],"prefix":"10.1093","volume":"68","author":[{"given":"Wenmin","family":"Ma","sequence":"first","affiliation":[{"name":"School of Mathematics and Statistics , Fujian Normal University, Fuzhou 350117, Fujian,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2025,2,17]]},"reference":[{"key":"2025081702465953400_ref1","doi-asserted-by":"publisher","first-page":"3668","DOI":"10.1109\/TCYB.2019.2950779","article-title":"A survey of optimization methods from a machine learning perspective","volume":"50","author":"Sun","year":"2019","journal-title":"IEEE Trans Cybern"},{"key":"2025081702465953400_ref2","article-title":"Reformer: The efficient transformer","volume-title":"Proceedings of the 8th International Conference on Learning Representations (ICLR)","author":"Kitaev","year":"2020"},{"key":"2025081702465953400_ref3","first-page":"1877","article-title":"Language models are few-shot learners","volume":"33","author":"Brown","year":"2020","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2025081702465953400_ref4","first-page":"10665","article-title":"Adahessian: An adaptive second order optimizer for machine learning","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Yao","year":"2021"},{"key":"2025081702465953400_ref5","first-page":"464","article-title":"Cyclical learning rates for training neural networks","volume-title":"Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV)","author":"Smith","year":"2017"},{"key":"2025081702465953400_ref6","doi-asserted-by":"publisher","first-page":"400","DOI":"10.1214\/aoms\/1177729586","article-title":"A stochastic approximation method","volume":"22","author":"Robbins","year":"1951","journal-title":"Ann Math Stat"},{"key":"2025081702465953400_ref7","article-title":"Adaptive subgradient methods for online learning and stochastic optimization","volume":"12","author":"Duchi","year":"2011","journal-title":"J Mach Learn Res"},{"key":"2025081702465953400_ref8","first-page":"26","article-title":"Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude","volume":"4","author":"Tieleman","year":"2012","journal-title":"COURSERA: Neural Networks for Machine Learning"},{"key":"2025081702465953400_ref9","article-title":"Adam: A method for stochastic optimization","volume-title":"Conference Track Proceedings of the 3rd International Conference on Learning Representations (ICLR)","author":"Kingma","year":"2015"},{"key":"2025081702465953400_ref10","article-title":"Learning to learn by gradient descent by gradient descent","volume":"29","author":"Andrychowicz","year":"2016","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2025081702465953400_ref11","doi-asserted-by":"publisher","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput"},{"key":"2025081702465953400_ref12","first-page":"2247","article-title":"Learning gradient descent: Better generalization and longer horizons","volume-title":"Proceedings of the 34th International Conference on Machine Learning (ICML)","author":"Lv","year":"2017"},{"key":"2025081702465953400_ref13","first-page":"3751","article-title":"Learned optimizers that scale and generalize","volume-title":"Proceedings of the 34th International Conference on Machine Learning (ICML)","author":"Wichrowska","year":"2017"},{"key":"2025081702465953400_ref14","doi-asserted-by":"publisher","first-page":"179","DOI":"10.1207\/s15516709cog1402_1","article-title":"Finding structure in time","volume":"14","author":"Elman","year":"1990","journal-title":"Cognit Sci"},{"key":"2025081702465953400_ref15","first-page":"1","article-title":"Learning to optimize: A primer and a benchmark","volume":"23","author":"Chen","year":"2022","journal-title":"J Mach Learn Res"},{"key":"2025081702465953400_ref16","first-page":"2146","article-title":"Towards constituting mathematical structures for learning to optimize","volume-title":"Proceedings of the 40th International Conference on Machine Learning (ICML)","author":"Liu","year":"2023"},{"key":"2025081702465953400_ref17","doi-asserted-by":"crossref","first-page":"1724","DOI":"10.3115\/v1\/D14-1179","article-title":"Learning phrase representations using RNN encoder\u2013decoder for statistical machine translation","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Cho","year":"2014"},{"article-title":"An overview of gradient descent optimization algorithm","year":"2018","author":"Ruder","key":"2025081702465953400_ref18"},{"key":"2025081702465953400_ref19","first-page":"4556","article-title":"Understanding and correcting pathologies in the training of learned optimizers","volume-title":"IProceedings of the 36th International Conference on Machine Learning (ICML)","author":"Metz","year":"2019"},{"key":"2025081702465953400_ref20","doi-asserted-by":"crossref","DOI":"10.1017\/9781009160865","volume-title":"Large-Scale Convex Optimization: Algorithms & Analyses Via Monotone Operators","author":"Ryu","year":"2022"},{"volume-title":"Minimization Methods for Non-differentiable Functions","year":"2012","author":"Shor","key":"2025081702465953400_ref21"},{"key":"2025081702465953400_ref22","doi-asserted-by":"publisher","first-page":"877","DOI":"10.1137\/0314056","article-title":"Monotone operators and the proximal point algorithm","volume":"14","author":"Tyrrell","year":"1976","journal-title":"SIAM J Control Optim"},{"key":"2025081702465953400_ref23","doi-asserted-by":"crossref","first-page":"437","DOI":"10.1007\/978-3-642-35289-8_26","article-title":"Practical recommendations for gradient-based training of deep architectures","volume-title":"Neural networks: Tricks of the trade: Second edition","author":"Bengio","year":"2012"},{"key":"2025081702465953400_ref24","doi-asserted-by":"publisher","first-page":"175","DOI":"10.1093\/comjnl\/3.3.175","article-title":"An automatic method for finding the greatest or least value of a function","volume":"3","author":"Rosenbrock","year":"1960","journal-title":"Comput J"},{"key":"2025081702465953400_ref25","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1140\/epjc\/s10052-021-08950-y","article-title":"Evolutionary algorithms for hyperparameter optimization in machine learning for application in high energy physics","volume":"81","author":"Tani","year":"2021","journal-title":"Eur Phys J C"},{"key":"2025081702465953400_ref26","article-title":"M-l2o: Towards generalizable learning-to-optimize by test-time fast self-adaptation","volume-title":"Proceedings of the 11th Conference on Learning Representations (ICLR)","author":"Yang","year":"2023"},{"key":"2025081702465953400_ref27","doi-asserted-by":"publisher","first-page":"157","DOI":"10.1109\/72.279181","article-title":"Learning long-term dependencies with gradient descent is difficult","volume":"5","author":"Bengio","year":"1994","journal-title":"IEEE Trans Neural Netw"},{"volume-title":"Breast Cancer Wisconsin (Diagnostic)","year":"1993","author":"Wolberg","key":"2025081702465953400_ref28"},{"key":"2025081702465953400_ref29","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1016\/0095-0696(78)90006-2","article-title":"Hedonic housing prices and the demand for clean air","volume":"5","author":"Jr","year":"1978","journal-title":"J Environ Econ Manage"},{"key":"2025081702465953400_ref30","doi-asserted-by":"crossref","first-page":"399","DOI":"10.1007\/978-3-319-19425-7_17","article-title":"Introduction to survival analysis","volume-title":"Regression modeling strategies: With applications to linear models, logistic and ordinal regression, and survival analysis","author":"Harrell","year":"2015"}],"container-title":["The Computer Journal"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/comjnl\/article-pdf\/68\/8\/908\/61932351\/bxaf012.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/comjnl\/article-pdf\/68\/8\/908\/61932351\/bxaf012.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,17]],"date-time":"2025-08-17T06:47:27Z","timestamp":1755413247000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/comjnl\/article\/68\/8\/908\/8019597"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,17]]},"references-count":30,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2025,2,17]]},"published-print":{"date-parts":[[2025,8,14]]}},"URL":"https:\/\/doi.org\/10.1093\/comjnl\/bxaf012","relation":{},"ISSN":["0010-4620","1460-2067"],"issn-type":[{"type":"print","value":"0010-4620"},{"type":"electronic","value":"1460-2067"}],"subject":[],"published-other":{"date-parts":[[2025,8]]},"published":{"date-parts":[[2025,2,17]]}}}