{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,15]],"date-time":"2026-06-15T15:29:24Z","timestamp":1781537364989,"version":"3.54.5"},"reference-count":79,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2025,3,27]],"date-time":"2025-03-27T00:00:00Z","timestamp":1743033600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"NSF","award":["6933939"],"award-info":[{"award-number":["6933939"]}]},{"name":"Berkeley","award":["6934982"],"award-info":[{"award-number":["6934982"]}]},{"DOI":"10.13039\/100006831","name":"Air Force","doi-asserted-by":"crossref","award":["6943130"],"award-info":[{"award-number":["6943130"]}],"id":[{"id":"10.13039\/100006831","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. ACM"],"published-print":{"date-parts":[[2025,4,30]]},"abstract":"<jats:p>\n            Can we accelerate the convergence of gradient descent without changing the algorithm\u2014just by judiciously choosing stepsizes? Surprisingly, we show that the answer is yes. Our proposed\n            <jats:italic>Silver Stepsize Schedule<\/jats:italic>\n            optimizes strongly convex functions in\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(\\kappa ^{\\log _{\\rho } 2} \\approx \\kappa ^{0.7864}\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            iterations, where\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(\\rho =1+\\sqrt {2}\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            is the silver ratio and \u03ba is the condition number. This is intermediate between the textbook unaccelerated rate \u03ba and the accelerated rate\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(\\kappa ^{1\/2}\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            due to Nesterov in 1983. The non-strongly convex setting is conceptually identical, and standard black-box reductions imply an analogous partially accelerated rate\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(\\varepsilon ^{-\\log _{\\rho } 2} \\approx \\varepsilon ^{-0.7864}\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            . We conjecture and provide partial evidence that these rates are optimal among all stepsize schedules.\n          <\/jats:p>\n          <jats:p>\n            The Silver Stepsize Schedule is constructed recursively in a fully explicit way. It is non-monotonic, fractal-like, and approximately periodic of period\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(\\kappa ^{\\log _{\\rho } 2}\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            . This leads to a phase transition in the convergence rate: initially super-exponential (acceleration regime), then exponential (saturation regime).\n          <\/jats:p>\n          <jats:p>\n            The core algorithmic intuition is\n            <jats:italic>hedging<\/jats:italic>\n            between individually suboptimal strategies\u2014short steps and long steps\u2014since bad cases for the former are good cases for the latter, and vice versa. Properly combining these stepsizes yields faster convergence due to the misalignment of worst-case functions. The key challenge in proving this speedup is enforcing long-range consistency conditions along the algorithm\u2019s trajectory. We do this by developing a technique that recursively glues constraints from different portions of the trajectory, thus removing a key stumbling block in previous analyses of optimization algorithms. More broadly, we believe that the concepts of hedging and multi-step descent have the potential to be powerful algorithmic paradigms in a variety of contexts in optimization and beyond.\n          <\/jats:p>\n          <jats:p>This article publishes and extends the first author\u2019s 2018 master\u2019s thesis (advised by the second author)\u2014which established for the first time that judiciously choosing stepsizes can enable acceleration in convex optimization. Prior to this thesis, the only such result was for the special case of quadratic optimization, due to Young in 1953.<\/jats:p>","DOI":"10.1145\/3708502","type":"journal-article","created":{"date-parts":[[2024,12,13]],"date-time":"2024-12-13T10:52:25Z","timestamp":1734087145000},"page":"1-38","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["Acceleration by Stepsize Hedging: Multi-Step Descent and the Silver Stepsize Schedule"],"prefix":"10.1145","volume":"72","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7367-0097","authenticated-orcid":false,"given":"Jason M.","family":"Altschuler","sequence":"first","affiliation":[{"name":"Statistics and Data Science, University of Pennsylvania, Philadelphia, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1132-8477","authenticated-orcid":false,"given":"Pablo A.","family":"Parrilo","sequence":"additional","affiliation":[{"name":"EECS, Massachusetts Institute of Technology, Cambridge, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,3,27]]},"reference":[{"key":"e_1_3_4_2_1","first-page":"87","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Agarwal Naman","year":"2021","unstructured":"Naman Agarwal, Surbhi Goel, and Cyril Zhang. 2021. Acceleration via fractal learning rate schedules. In Proceedings of the International Conference on Machine Learning. 87\u201399."},{"key":"e_1_3_4_3_1","article-title":"Optimal black-box reductions between optimization objectives","volume":"29","author":"Allen-Zhu Zeyuan","year":"2016","unstructured":"Zeyuan Allen-Zhu and Elad Hazan. 2016. Optimal black-box reductions between optimization objectives. Advances in Neural Information Processing Systems 29 (2016), 1\u20139.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_4_4_1","article-title":"Linear coupling: An ultimate unification of gradient and mirror descent","author":"Allen-Zhu Zeyuan","year":"2014","unstructured":"Zeyuan Allen-Zhu and Lorenzo Orecchia. 2014. Linear coupling: An ultimate unification of gradient and mirror descent. arXiv preprint arXiv:1407.1537 (2014).","journal-title":"arXiv preprint arXiv:1407.1537"},{"key":"e_1_3_4_5_1","volume-title":"Greed, Hedging, and Acceleration in Convex Optimization","author":"Altschuler Jason M.","year":"2018","unstructured":"Jason M. Altschuler. 2018. Greed, Hedging, and Acceleration in Convex Optimization. Master\u2019s Thesis. Massachusetts Institute of Technology."},{"key":"e_1_3_4_6_1","article-title":"Acceleration by Stepsize Hedging II: Silver Stepsize Schedule for smooth convex optimization","author":"Altschuler Jason M.","year":"2023","unstructured":"Jason M. Altschuler and Pablo A. Parrilo. 2023. Acceleration by Stepsize Hedging II: Silver Stepsize Schedule for smooth convex optimization. arXiv preprint arXiv:2309.16530 (2023).","journal-title":"arXiv preprint arXiv:2309.16530"},{"issue":"4","key":"e_1_3_4_7_1","doi-asserted-by":"crossref","first-page":"547","DOI":"10.1145\/321296.321305","article-title":"Iterative procedures for nonlinear integral equations","volume":"12","author":"Anderson Donald G.","year":"1965","unstructured":"Donald G. Anderson. 1965. Iterative procedures for nonlinear integral equations. Journal of the ACM 12, 4 (1965), 547\u2013560.","journal-title":"Journal of the ACM"},{"issue":"1","key":"e_1_3_4_8_1","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1007\/s10107-022-01903-7","article-title":"Principled analyses and design of first-order methods with inexact proximal operators","volume":"201","author":"Barr\u00e9 Mathieu","year":"2023","unstructured":"Mathieu Barr\u00e9, Adrien B. Taylor, and Francis Bach. 2023. Principled analyses and design of first-order methods with inexact proximal operators. Mathematical Programming 201, 1-2 (2023), 185\u2013230.","journal-title":"Mathematical Programming"},{"issue":"1","key":"e_1_3_4_9_1","doi-asserted-by":"crossref","first-page":"141","DOI":"10.1093\/imanum\/8.1.141","article-title":"Two-point step size gradient methods","volume":"8","author":"Barzilai Jonathan","year":"1988","unstructured":"Jonathan Barzilai and Jonathan M. Borwein. 1988. Two-point step size gradient methods. IMA Journal of Numerical Analysis 8, 1 (1988), 141\u2013148.","journal-title":"IMA Journal of Numerical Analysis"},{"key":"e_1_3_4_10_1","series-title":"Algorithms and Computation in Mathematics","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-662-05355-3","volume-title":"Algorithms in Real Algebraic Geometry","author":"Basu Saugata","year":"2003","unstructured":"Saugata Basu, Richard M. Pollack, and Marie-Fran\u00e7oise Roy. 2003. Algorithms in Real Algebraic Geometry. Algorithms and Computation in Mathematics, Vol. 10. Springer-Verlag, Berlin, Germany."},{"issue":"1","key":"e_1_3_4_11_1","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1137\/080716542","article-title":"A fast iterative shrinkage-thresholding algorithm for linear inverse problems","volume":"2","author":"Beck Amir","year":"2009","unstructured":"Amir Beck and Marc Teboulle. 2009. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences 2, 1 (2009), 183\u2013202.","journal-title":"SIAM Journal on Imaging Sciences"},{"key":"e_1_3_4_12_1","volume-title":"Nonlinear Programming","author":"Bertsekas Dimitri","year":"1999","unstructured":"Dimitri Bertsekas. 1999. Nonlinear Programming. Athena Scientific."},{"key":"e_1_3_4_13_1","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511804441","volume-title":"Convex Optimization","author":"Boyd Stephen","year":"2004","unstructured":"Stephen Boyd and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press."},{"issue":"3","key":"e_1_3_4_14_1","doi-asserted-by":"crossref","first-page":"231","DOI":"10.1561\/2200000050","article-title":"Convex optimization: Algorithms and complexity","volume":"8","author":"Bubeck S\u00e9bastien","year":"2015","unstructured":"S\u00e9bastien Bubeck. 2015. Convex optimization: Algorithms and complexity. Foundations and Trends in Machine Learning 8, 3-4 (2015), 231\u2013357.","journal-title":"Foundations and Trends in Machine Learning"},{"key":"e_1_3_4_15_1","article-title":"A geometric alternative to Nesterov\u2019s accelerated gradient descent","author":"Bubeck S\u00e9bastien","year":"2015","unstructured":"S\u00e9bastien Bubeck, Yin Tat Lee, and Mohit Singh. 2015. A geometric alternative to Nesterov\u2019s accelerated gradient descent. arXiv preprint arXiv:1506.08187 (2015).","journal-title":"arXiv preprint arXiv:1506.08187"},{"issue":"1847","key":"e_1_3_4_16_1","first-page":"536","article-title":"M\u00e9thode g\u00e9n\u00e9rale pour la r\u00e9solution des systemes d\u2019\u00e9quations simultan\u00e9es","volume":"25","author":"Cauchy Augustin","year":"1847","unstructured":"Augustin Cauchy. 1847. M\u00e9thode g\u00e9n\u00e9rale pour la r\u00e9solution des systemes d\u2019\u00e9quations simultan\u00e9es. Comptes Rendus de l\u2019Academie des Paris 25, 1847 (1847), 536\u2013538.","journal-title":"Comptes Rendus de l\u2019Academie des Paris"},{"key":"e_1_3_4_17_1","volume-title":"Quantifier Elimination and Cylindrical Algebraic Decomposition","author":"Caviness Bob F.","year":"2012","unstructured":"Bob F. Caviness and Jeremy R. Johnson. 2012. Quantifier Elimination and Cylindrical Algebraic Decomposition. Springer Science & Business Media."},{"key":"e_1_3_4_18_1","article-title":"Relative Lipschitzness in extragradient methods and a direct recipe for acceleration","author":"Cohen Michael B.","year":"2020","unstructured":"Michael B. Cohen, Aaron Sidford, and Kevin Tian. 2020. Relative Lipschitzness in extragradient methods and a direct recipe for acceleration. arXiv preprint arXiv:2011.06572 (2020).","journal-title":"arXiv preprint arXiv:2011.06572"},{"key":"e_1_3_4_19_1","volume-title":"Ideals, Varieties, and Algorithms: An Introduction to Computational Algebraic Geometry and Commutative Algebra","author":"Cox David","year":"2013","unstructured":"David Cox, John Little, and Donal O\u2019Shea. 2013. Ideals, Varieties, and Algorithms: An Introduction to Computational Algebraic Geometry and Commutative Algebra. Springer Science & Business Media."},{"key":"e_1_3_4_20_1","article-title":"A robust accelerated optimization algorithm for strongly convex functions","author":"Cyrus Saman","year":"2017","unstructured":"Saman Cyrus, Bin Hu, Bryan Van Scoy, and Laurent Lessard. 2017. A robust accelerated optimization algorithm for strongly convex functions. arXiv preprint arXiv:1710.04753 (2017).","journal-title":"arXiv preprint arXiv:1710.04753"},{"key":"e_1_3_4_21_1","volume-title":"Performance Estimation of the Gradient Method with Fixed Arbitrary Step Sizes","author":"Daccache Antoine","year":"2019","unstructured":"Antoine Daccache. 2019. Performance Estimation of the Gradient Method with Fixed Arbitrary Step Sizes. Ph.D. Dissertation. Universit\u00e9 Catholique de Louvain."},{"key":"e_1_3_4_22_1","first-page":"567","article-title":"Branch-and-bound performance estimation programming: A unified methodology for constructing optimal optimization methods","author":"Gupta Shuvomoy Das","year":"2023","unstructured":"Shuvomoy Das Gupta, Bart Van Parys, and Ernest K. Ryu. 2023. Branch-and-bound performance estimation programming: A unified methodology for constructing optimal optimization methods. Mathematical Programming 204 (2023), 567\u2013639.","journal-title":"Mathematical Programming"},{"key":"e_1_3_4_23_1","article-title":"Nonlinear conjugate gradient methods: Worst-case convergence rates via computer-assisted analyses","author":"Gupta Shuvomoy Das","year":"2023","unstructured":"Shuvomoy Das Gupta, Robert M. Freund, Xu Andy Sun, and Adrien Taylor. 2023. Nonlinear conjugate gradient methods: Worst-case convergence rates via computer-assisted analyses. arXiv preprint arXiv:2301.01530 (2023).","journal-title":"arXiv preprint arXiv:2301.01530"},{"issue":"1","key":"e_1_3_4_24_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1561\/2400000036","article-title":"Acceleration methods","volume":"5","author":"d\u2019Aspremont Alexandre","year":"2021","unstructured":"Alexandre d\u2019Aspremont, Damien Scieur, and Adrien Taylor. 2021. Acceleration methods. Foundations and Trends in Optimization 5, 1-2 (2021), 1\u2013245.","journal-title":"Foundations and Trends in Optimization"},{"key":"e_1_3_4_25_1","doi-asserted-by":"crossref","first-page":"1185","DOI":"10.1007\/s11590-016-1087-4","article-title":"On the worst-case complexity of the gradient method with exact line search for smooth strongly convex functions","volume":"11","author":"Klerk Etienne De","year":"2017","unstructured":"Etienne De Klerk, Fran\u00e7ois Glineur, and Adrien B. Taylor. 2017. On the worst-case complexity of the gradient method with exact line search for smooth strongly convex functions. Optimization Letters 11 (2017), 1185\u20131199.","journal-title":"Optimization Letters"},{"issue":"3","key":"e_1_3_4_26_1","doi-asserted-by":"crossref","first-page":"2053","DOI":"10.1137\/19M1281368","article-title":"Worst-case convergence analysis of inexact gradient and Newton methods through semidefinite programming performance estimation","volume":"30","author":"Klerk Etienne De","year":"2020","unstructured":"Etienne De Klerk, Francois Glineur, and Adrien B. Taylor. 2020. Worst-case convergence analysis of inexact gradient and Newton methods through semidefinite programming performance estimation. SIAM Journal on Optimization 30, 3 (2020), 2053\u20132082.","journal-title":"SIAM Journal on Optimization"},{"key":"e_1_3_4_27_1","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1007\/s10107-013-0677-5","article-title":"First-order methods of smooth convex optimization with inexact oracle","volume":"146","author":"Devolder Olivier","year":"2014","unstructured":"Olivier Devolder, Fran\u00e7ois Glineur, and Yurii Nesterov. 2014. First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146 (2014), 37\u201375.","journal-title":"Mathematical Programming"},{"issue":"1","key":"e_1_3_4_28_1","doi-asserted-by":"crossref","first-page":"915","DOI":"10.1137\/20M1322716","article-title":"Generalized momentum-based methods: A Hamiltonian perspective","volume":"31","author":"Diakonikolas Jelena","year":"2021","unstructured":"Jelena Diakonikolas and Michael I. Jordan. 2021. Generalized momentum-based methods: A Hamiltonian perspective. SIAM Journal on Optimization 31, 1 (2021), 915\u2013944.","journal-title":"SIAM Journal on Optimization"},{"key":"e_1_3_4_29_1","article-title":"Accelerated extra-gradient descent: A novel accelerated first-order method","author":"Diakonikolas Jelena","year":"2017","unstructured":"Jelena Diakonikolas and Lorenzo Orecchia. 2017. Accelerated extra-gradient descent: A novel accelerated first-order method. arXiv preprint arXiv:1706.04680 (2017).","journal-title":"arXiv preprint arXiv:1706.04680"},{"issue":"3","key":"e_1_3_4_30_1","doi-asserted-by":"crossref","first-page":"547","DOI":"10.1137\/S0036144596305582","article-title":"From potential theory to matrix iterations in six steps","volume":"40","author":"Driscoll Tobin A.","year":"1998","unstructured":"Tobin A. Driscoll, Kim-Chuan Toh, and Lloyd N. Trefethen. 1998. From potential theory to matrix iterations in six steps. SIAM Review 40, 3 (1998), 547\u2013578.","journal-title":"SIAM Review"},{"key":"e_1_3_4_31_1","article-title":"Efficient first-order methods for convex minimization: A constructive approach","author":"Drori Yoel","year":"2018","unstructured":"Yoel Drori and Adrien B. Taylor. 2018. Efficient first-order methods for convex minimization: A constructive approach. arXiv preprint arXiv:1803.05676 (2018).","journal-title":"arXiv preprint arXiv:1803.05676"},{"key":"e_1_3_4_32_1","doi-asserted-by":"crossref","first-page":"101590","DOI":"10.1016\/j.jco.2021.101590","article-title":"On the oracle complexity of smooth strongly convex minimization","volume":"68","author":"Drori Yoel","year":"2022","unstructured":"Yoel Drori and Adrien B. Taylor. 2022. On the oracle complexity of smooth strongly convex minimization. Journal of Complexity 68 (2022), 101590.","journal-title":"Journal of Complexity"},{"issue":"1","key":"e_1_3_4_33_1","doi-asserted-by":"crossref","first-page":"451","DOI":"10.1007\/s10107-013-0653-0","article-title":"Performance of first-order methods for smooth convex minimization: A novel approach","volume":"145","author":"Drori Yoel","year":"2014","unstructured":"Yoel Drori and Marc Teboulle. 2014. Performance of first-order methods for smooth convex minimization: A novel approach. Mathematical Programming 145, 1-2 (2014), 451\u2013482.","journal-title":"Mathematical Programming"},{"issue":"1","key":"e_1_3_4_34_1","doi-asserted-by":"crossref","first-page":"251","DOI":"10.1137\/16M1072528","article-title":"An optimal first order method based on optimal quadratic averaging","volume":"28","author":"Drusvyatskiy Dmitriy","year":"2018","unstructured":"Dmitriy Drusvyatskiy, Maryam Fazel, and Scott Roy. 2018. An optimal first order method based on optimal quadratic averaging. SIAM Journal on Optimization 28, 1 (2018), 251\u2013271.","journal-title":"SIAM Journal on Optimization"},{"key":"e_1_3_4_35_1","volume-title":"Worst-Case Functions for the Gradient Method with Fixed Variable Step Sizes","author":"Eloi Diego","year":"2022","unstructured":"Diego Eloi. 2022. Worst-Case Functions for the Gradient Method with Fixed Variable Step Sizes. Ph.D. Dissertation. Universit\u00e9 Catholique de Louvain."},{"key":"e_1_3_4_36_1","unstructured":"GitHub. n.d.Acceleration by Stepsize Hedging\u2014Verification of Identities Computer Algebra Script. Retrieved December 15 2024 from https:\/\/jasonaltschuler.github.io\/AccelerationByStepsizeHedging"},{"key":"e_1_3_4_37_1","first-page":"3028","volume-title":"Proceedings of the International Conference on Artificial Intelligence and Statistics","author":"Goujaud Baptiste","year":"2022","unstructured":"Baptiste Goujaud, Damien Scieur, Aymeric Dieuleveut, Adrien B. Taylor, and Fabian Pedregosa. 2022. Super-acceleration with cyclical step-sizes. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 3028\u20133065."},{"key":"e_1_3_4_38_1","article-title":"Provably faster gradient descent via long steps","author":"Grimmer Benjamin","year":"2023","unstructured":"Benjamin Grimmer. 2023. Provably faster gradient descent via long steps. arXiv preprint arXiv:2307.06324 (2023).","journal-title":"arXiv preprint arXiv:2307.06324"},{"key":"e_1_3_4_39_1","article-title":"Accelerated gradient descent via long steps","author":"Grimmer Benjamin","year":"2023","unstructured":"Benjamin Grimmer, Kevin Shu, and Alex L. Wang. 2023. Accelerated gradient descent via long steps. arXiv preprint arXiv:2309.09961 (2023).","journal-title":"arXiv preprint arXiv:2309.09961"},{"issue":"3","key":"e_1_3_4_40_1","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1561\/2400000013","article-title":"Introduction to online convex optimization","volume":"2","author":"Hazan Elad","year":"2016","unstructured":"Elad Hazan. 2016. Introduction to online convex optimization. Foundations and Trends in Optimization 2, 3-4 (2016), 157\u2013325.","journal-title":"Foundations and Trends in Optimization"},{"issue":"6","key":"e_1_3_4_41_1","doi-asserted-by":"crossref","first-page":"409","DOI":"10.6028\/jres.049.044","article-title":"Methods of conjugate gradients for solving linear systems","volume":"49","year":"1952","unstructured":"Magnus R. Hestenes and Eduard Steifel. 1952. Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards 49, 6 (1952), 409\u2013436.","journal-title":"Journal of Research of the National Bureau of Standards"},{"key":"e_1_3_4_42_1","article-title":"Analysis of approximate stochastic gradient using quadratic constraints and sequential semidefinite programs","author":"Hu Bin","year":"2017","unstructured":"Bin Hu, Peter Seiler, and Laurent Lessard. 2017. Analysis of approximate stochastic gradient using quadratic constraints and sequential semidefinite programs. arXiv preprint arXiv:1711.00987 (2017).","journal-title":"arXiv preprint arXiv:1711.00987"},{"issue":"2","key":"e_1_3_4_43_1","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1007\/s10208-015-9290-8","article-title":"Steepest descent method with random step lengths","volume":"17","author":"Kalousek Zden\u011bk","year":"2017","unstructured":"Zden\u011bk Kalousek. 2017. Steepest descent method with random step lengths. Foundations of Computational Mathematics 17, 2 (2017), 359\u2013422.","journal-title":"Foundations of Computational Mathematics"},{"key":"e_1_3_4_44_1","first-page":"2431","volume-title":"Proceedings of the Conference on Learning Theory","author":"Kelner Jonathan","year":"2022","unstructured":"Jonathan Kelner, Annie Marsden, Vatsal Sharan, Aaron Sidford, Gregory Valiant, and Honglin Yuan. 2022. Big-Step-Little-Step: Efficient gradient methods for objectives with multiple scales. In Proceedings of the Conference on Learning Theory. 2431\u20132540."},{"issue":"1","key":"e_1_3_4_45_1","first-page":"81","article-title":"Optimized first-order methods for smooth convex minimization","volume":"159","author":"Kim Donghwan","year":"2016","unstructured":"Donghwan Kim and Jeffrey A. Fessler. 2016. Optimized first-order methods for smooth convex minimization. Mathematical Programming 159, 1-2 (2016), 81\u2013107.","journal-title":"Mathematical Programming"},{"issue":"1","key":"e_1_3_4_46_1","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1007\/s10957-016-1018-7","article-title":"On the convergence analysis of the optimized gradient method","volume":"172","author":"Kim Donghwan","year":"2017","unstructured":"Donghwan Kim and Jeffrey A. Fessler. 2017. On the convergence analysis of the optimized gradient method. Journal of Optimization Theory and Applications 172, 1 (2017), 187\u2013205.","journal-title":"Journal of Optimization Theory and Applications"},{"issue":"1","key":"e_1_3_4_47_1","doi-asserted-by":"crossref","first-page":"192","DOI":"10.1007\/s10957-020-01770-2","article-title":"Optimizing the efficiency of first-order methods for decreasing the gradient of smooth convex functions","volume":"188","author":"Kim Donghwan","year":"2021","unstructured":"Donghwan Kim and Jeffrey A. Fessler. 2021. Optimizing the efficiency of first-order methods for decreasing the gradient of smooth convex functions. Journal of Optimization Theory and Applications 188, 1 (2021), 192\u2013219.","journal-title":"Journal of Optimization Theory and Applications"},{"key":"e_1_3_4_48_1","article-title":"Time-reversed dissipation induces duality between minimizing gradient norm and function value","volume":"36","author":"Kim Jaeyeon","year":"2024","unstructured":"Jaeyeon Kim, Asuman Ozdaglar, Chanwoo Park, and Ernest Ryu. 2024. Time-reversed dissipation induces duality between minimizing gradient norm and function value. Advances in Neural Information Processing Systems 36 (2024), 1\u201352.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_4_49_1","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-540-70628-1","volume-title":"Computational Commutative Algebra. 1","author":"Kreuzer M.","year":"2000","unstructured":"M. Kreuzer and L. Robbiano. 2000. Computational Commutative Algebra. 1. Springer-Verlag, Berlin."},{"issue":"2","key":"e_1_3_4_50_1","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1016\/0041-5553(71)90169-8","article-title":"Ordering of the iterative parameters in the cyclical Chebyshev iterative method","volume":"11","author":"Lebedev V. I.","year":"1971","unstructured":"V. I. Lebedev and S. A. Finogenov. 1971. Ordering of the iterative parameters in the cyclical Chebyshev iterative method. U.S.S.R. Computational Mathematics and Mathematical Physics 11, 2 (1971), 155\u2013170.","journal-title":"U.S.S.R. Computational Mathematics and Mathematical Physics"},{"issue":"1","key":"e_1_3_4_51_1","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1137\/15M1009597","article-title":"Analysis and design of optimization algorithms via integral quadratic constraints","volume":"26","author":"Lessard Laurent","year":"2016","unstructured":"Laurent Lessard, Benjamin Recht, and Andrew Packard. 2016. Analysis and design of optimization algorithms via integral quadratic constraints. SIAM Journal on Optimization 26, 1 (2016), 57\u201395.","journal-title":"SIAM Journal on Optimization"},{"key":"e_1_3_4_52_1","volume-title":"Linear and Nonlinear Programming","author":"Luenberger David G.","year":"1984","unstructured":"David G. Luenberger and Yinyu Ye. 1984. Linear and Nonlinear Programming. Vol. 2. Springer."},{"key":"e_1_3_4_53_1","article-title":"A systematic approach to Lyapunov analyses of continuous-time models in convex optimization","author":"Moucer C\u00e9line","year":"2023","unstructured":"C\u00e9line Moucer, Adrien B. Taylor, and Francis Bach. 2023. A systematic approach to Lyapunov analyses of continuous-time models in convex optimization. SIAM Journal on Optimization 33, 3 (2023). 1558\u20131586.","journal-title":"SIAM Journal on Optimization"},{"key":"e_1_3_4_54_1","volume-title":"Problem Complexity and Method Efficiency in Optimization","author":"Nemirovskii Arkadii","year":"1983","unstructured":"Arkadii Nemirovskii and David Borisovich Yudin. 1983. Problem Complexity and Method Efficiency in Optimization. Wiley."},{"issue":"2","key":"e_1_3_4_55_1","first-page":"372","article-title":"A method of solving a convex programming problem with convergence rate  \\({O}(1\/k^2)\\)","volume":"27","author":"Nesterov Yurii E.","year":"1983","unstructured":"Yurii E. Nesterov. 1983. A method of solving a convex programming problem with convergence rate \\({O}(1\/k^2)\\) . Soviet Mathematics Doklady 27, 2 (1983), 372\u2013376.","journal-title":"Soviet Mathematics Doklady"},{"key":"e_1_3_4_56_1","volume-title":"Introductory Lectures on Convex Optimization: A Basic Course","author":"Nesterov Yurii E.","year":"1998","unstructured":"Yurii E. Nesterov. 1998. Introductory Lectures on Convex Optimization: A Basic Course. Vol. 87. Springer Science & Business Media."},{"key":"e_1_3_4_57_1","doi-asserted-by":"crossref","DOI":"10.1007\/b98874","volume-title":"Numerical Optimization","author":"Nocedal Jorge","year":"1999","unstructured":"Jorge Nocedal and Stephen J. Wright. 1999. Numerical Optimization. Springer."},{"key":"e_1_3_4_58_1","doi-asserted-by":"crossref","first-page":"1645","DOI":"10.1109\/LSP.2021.3101131","article-title":"Provable super-convergence with a large cyclical learning rate","volume":"28","author":"Oymak Samet","year":"2021","unstructured":"Samet Oymak. 2021. Provable super-convergence with a large cyclical learning rate. IEEE Signal Processing Letters 28 (2021), 1645\u20131649.","journal-title":"IEEE Signal Processing Letters"},{"key":"e_1_3_4_59_1","article-title":"The Nesterov-Spokoiny Acceleration:  \\(o (1\/k^2)\\)  convergence without proximal operations","author":"Peng Weibin","year":"2023","unstructured":"Weibin Peng and Tianyu Wang. 2023. The Nesterov-Spokoiny Acceleration: \\(o (1\/k^2)\\) convergence without proximal operations. arXiv preprint arXiv:2308.14314 (2023).","journal-title":"arXiv preprint arXiv:2308.14314"},{"issue":"5","key":"e_1_3_4_60_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/0041-5553(64)90137-5","article-title":"Some methods of speeding up the convergence of iteration methods","volume":"4","author":"Polyak Boris T.","year":"1964","unstructured":"Boris T. Polyak. 1964. Some methods of speeding up the convergence of iteration methods. U.S.S.R. Computational Mathematics and Mathematical Physics 4, 5 (1964), 1\u201317.","journal-title":"U.S.S.R. Computational Mathematics and Mathematical Physics"},{"key":"e_1_3_4_61_1","volume-title":"Introduction to Optimization","author":"Polyak Boris T.","year":"1987","unstructured":"Boris T. Polyak. 1987. Introduction to Optimization. Optimization Software Inc."},{"issue":"3","key":"e_1_3_4_62_1","doi-asserted-by":"crossref","first-page":"597","DOI":"10.1007\/s10589-010-9319-5","article-title":"Gradient algorithms for quadratic optimization with fast convergence rates","volume":"50","author":"Pronzato Luc","year":"2011","unstructured":"Luc Pronzato and Anatoly Zhigljavsky. 2011. Gradient algorithms for quadratic optimization with fast convergence rates. Computational Optimization and Applications 50, 3 (2011), 597\u2013617.","journal-title":"Computational Optimization and Applications"},{"key":"e_1_3_4_63_1","volume-title":"An Introduction to the Approximation of Functions","author":"Rivlin Theodore J.","year":"1981","unstructured":"Theodore J. Rivlin. 1981. An Introduction to the Approximation of Functions. Courier Corporation."},{"key":"e_1_3_4_64_1","doi-asserted-by":"crossref","DOI":"10.1515\/9781400873173","volume-title":"Convex Analysis","author":"Rockafellar Ralph Tyrell","year":"1970","unstructured":"Ralph Tyrell Rockafellar. 1970. Convex Analysis. Princeton University Press."},{"issue":"3","key":"e_1_3_4_65_1","doi-asserted-by":"crossref","first-page":"2251","DOI":"10.1137\/19M1304854","article-title":"Operator splitting performance estimation: Tight contraction factors and optimal parameter selection","volume":"30","author":"Ryu Ernest K.","year":"2020","unstructured":"Ernest K. Ryu, Adrien B. Taylor, Carolina Bergeling, and Pontus Giselsson. 2020. Operator splitting performance estimation: Tight contraction factors and optimal parameter selection. SIAM Journal on Optimization 30, 3 (2020), 2251\u20132271.","journal-title":"SIAM Journal on Optimization"},{"key":"e_1_3_4_66_1","first-page":"79","article-title":"Understanding the acceleration phenomenon via high-resolution differential equations","author":"Shi Bin","year":"2021","unstructured":"Bin Shi, Simon S. Du, Michael I. Jordan, and Weijie J. Su. 2021. Understanding the acceleration phenomenon via high-resolution differential equations. Mathematical Programming 195 (2021), 79\u2013148.","journal-title":"Mathematical Programming"},{"key":"e_1_3_4_67_1","article-title":"Acceleration via symplectic discretization of high-resolution differential equations","volume":"32","author":"Shi Bin","year":"2019","unstructured":"Bin Shi, Simon S. Du, Weijie Su, and Michael I. Jordan. 2019. Acceleration via symplectic discretization of high-resolution differential equations. Advances in Neural Information Processing Systems 32 (2019), 1\u20139.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_4_68_1","first-page":"464","volume-title":"Proceedings of the IEEE Winter Conference on Applications of Computer Vision","author":"Smith Leslie N.","year":"2017","unstructured":"Leslie N. Smith. 2017. Cyclical learning rates for training neural networks. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. IEEE, 464\u2013472."},{"key":"e_1_3_4_69_1","first-page":"369","volume-title":"Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications","author":"Smith Leslie N.","year":"2019","unstructured":"Leslie N. Smith and Nicholay Topin. 2019. Super-convergence: Very fast training of neural networks using large learning rates. In Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, Vol. 11006. SPIE, 369\u2013386."},{"issue":"1","key":"e_1_3_4_70_1","first-page":"5312","article-title":"A differential equation for modeling Nesterov\u2019s accelerated gradient method: Theory and insights","volume":"17","author":"Su Weijie","year":"2016","unstructured":"Weijie Su, Stephen Boyd, and Emmanuel J. Cand\u00e8s. 2016. A differential equation for modeling Nesterov\u2019s accelerated gradient method: Theory and insights. Journal of Machine Learning Research 17, 1 (2016), 5312\u20135354.","journal-title":"Journal of Machine Learning Research"},{"issue":"1","key":"e_1_3_4_71_1","doi-asserted-by":"crossref","first-page":"557","DOI":"10.1007\/s10107-022-01839-y","article-title":"An optimal gradient method for smooth strongly convex minimization","volume":"199","author":"Taylor Adrien","year":"2023","unstructured":"Adrien Taylor and Yoel Drori. 2023. An optimal gradient method for smooth strongly convex minimization. Mathematical Programming 199, 1-2 (2023), 557\u2013594.","journal-title":"Mathematical Programming"},{"key":"e_1_3_4_72_1","volume-title":"Convex Interpolation and Performance Estimation of First-Order Methods for Convex Optimization.","author":"Taylor Adrien B.","year":"2017","unstructured":"Adrien B. Taylor. 2017. Convex Interpolation and Performance Estimation of First-Order Methods for Convex Optimization.Ph.D. Dissertation. Catholic University of Louvain, Louvain-la-Neuve, Belgium."},{"issue":"1","key":"e_1_3_4_73_1","doi-asserted-by":"crossref","first-page":"307","DOI":"10.1007\/s10107-016-1009-3","article-title":"Smooth strongly convex interpolation and exact worst-case performance of first-order methods","volume":"161","author":"Taylor Adrien B.","year":"2017","unstructured":"Adrien B. Taylor, Julien M. Hendrickx, and Fran\u00e7ois Glineur. 2017. Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Mathematical Programming 161, 1-2 (2017), 307\u2013345.","journal-title":"Mathematical Programming"},{"key":"e_1_3_4_74_1","unstructured":"Paul Tseng. 2008. On Accelerated Proximal Gradient Methods for Convex-Concave Optimization. Retrieved December 15 2024 from http:\/\/www.math.washington.edu\/tseng\/papers\/apgm.pdf"},{"issue":"1","key":"e_1_3_4_75_1","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1109\/LCSYS.2017.2722406","article-title":"The fastest known globally convergent first-order method for minimizing strongly convex functions","volume":"2","author":"Scoy Bryan Van","year":"2018","unstructured":"Bryan Van Scoy, Randy A. Freeman, and Kevin M. Lynch. 2018. The fastest known globally convergent first-order method for minimizing strongly convex functions. IEEE Control Systems Letters 2, 1 (2018), 49\u201354.","journal-title":"IEEE Control Systems Letters"},{"key":"e_1_3_4_76_1","series-title":"Springer Series in Computational Mathematics","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-642-05156-2","volume-title":"Matrix Iterative Analysis","author":"Varga Richard S.","year":"2000","unstructured":"Richard S. Varga. 2000. Matrix Iterative Analysis. Springer Series in Computational Mathematics, Vol. 27. Springer."},{"issue":"47","key":"e_1_3_4_77_1","first-page":"E7351\u2013E7358","article-title":"A variational perspective on accelerated methods in optimization","volume":"113","author":"Wibisono Andre","year":"2016","unstructured":"Andre Wibisono, Ashia C. Wilson, and Michael I. Jordan. 2016. A variational perspective on accelerated methods in optimization. Proceedings of the National Academy of Sciences 113, 47 (2016), E7351\u2013E7358.","journal-title":"Proceedings of the National Academy of Sciences"},{"key":"e_1_3_4_78_1","article-title":"A Lyapunov analysis of momentum methods in optimization","author":"Wilson Ashia C.","year":"2016","unstructured":"Ashia C. Wilson, Benjamin Recht, and Michael I. Jordan. 2016. A Lyapunov analysis of momentum methods in optimization. arXiv preprint arXiv:1611.02635 (2016).","journal-title":"arXiv preprint arXiv:1611.02635"},{"issue":"1","key":"e_1_3_4_79_1","doi-asserted-by":"crossref","first-page":"243","DOI":"10.1002\/sapm1953321243","article-title":"On Richardson\u2019s method for solving linear systems with positive definite matrices","volume":"32","author":"Young David","year":"1953","unstructured":"David Young. 1953. On Richardson\u2019s method for solving linear systems with positive definite matrices. Journal of Mathematics and Physics 32, 1-4 (1953), 243\u2013255.","journal-title":"Journal of Mathematics and Physics"},{"issue":"6","key":"e_1_3_4_80_1","doi-asserted-by":"crossref","first-page":"1047","DOI":"10.1007\/s11590-012-0491-7","article-title":"An asymptotically optimal gradient algorithm for quadratic optimization with low computational cost","volume":"7","author":"Zhigljavsky Anatoly","year":"2013","unstructured":"Anatoly Zhigljavsky, Luc Pronzato, and Elena Bukina. 2013. An asymptotically optimal gradient algorithm for quadratic optimization with low computational cost. Optimization Letters 7, 6 (2013), 1047\u20131059.","journal-title":"Optimization Letters"}],"container-title":["Journal of the ACM"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3708502","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3708502","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:09:54Z","timestamp":1750295394000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3708502"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,27]]},"references-count":79,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,4,30]]}},"alternative-id":["10.1145\/3708502"],"URL":"https:\/\/doi.org\/10.1145\/3708502","relation":{},"ISSN":["0004-5411","1557-735X"],"issn-type":[{"value":"0004-5411","type":"print"},{"value":"1557-735X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,3,27]]},"assertion":[{"value":"2024-02-24","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-11-06","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-03-27","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}