{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,9]],"date-time":"2026-05-09T14:37:27Z","timestamp":1778337447766,"version":"3.51.4"},"publisher-location":"New York, NY, USA","reference-count":66,"publisher":"ACM","funder":[{"name":"NSF (National Science Foundation)","award":["2019844, 2007668"],"award-info":[{"award-number":["2019844, 2007668"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,6,15]]},"DOI":"10.1145\/3717823.3718308","type":"proceedings-article","created":{"date-parts":[[2025,6,15]],"date-time":"2025-06-15T23:34:42Z","timestamp":1750030482000},"page":"2225-2236","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Improved Complexity for Smooth Nonconvex Optimization: A Two-Level Online Learning Approach with Quasi-Newton Methods"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2352-1549","authenticated-orcid":false,"given":"Ruichen","family":"Jiang","sequence":"first","affiliation":[{"name":"University of Texas at Austin, Austin, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6603-0091","authenticated-orcid":false,"given":"Aryan","family":"Mokhtari","sequence":"additional","affiliation":[{"name":"University of Texas at Austin, Austin, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-8515-5661","authenticated-orcid":false,"given":"Francisco","family":"Patitucci","sequence":"additional","affiliation":[{"name":"University of Texas at Austin, Austin, USA"}]}],"member":"320","published-online":{"date-parts":[[2025,6,15]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Naman Agarwal Zeyuan Allen-Zhu Brian Bullins Elad Hazan and Tengyu Ma."},{"key":"e_1_3_2_1_2_1","volume-title":"Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, 1195-1199","unstructured":"2017. Finding approximate local minima faster than gradient descent. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, 1195-1199."},{"key":"e_1_3_2_1_3_1","unstructured":"Zeyuan Allen-Zhu and Yuanzhi Li. 2018. Neon2: finding local minima via ifrst-order oracles. Advances in Neural Information Processing Systems 31."},{"key":"e_1_3_2_1_4_1","series-title":"SIAM journal on imaging sciences, 2, 1, 183-202","volume-title":"A fast iterative shrinkage-thresholding algorithm for linear inverse problems","author":"Beck Amir","unstructured":"Amir Beck and Marc Teboulle. 2009. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences, 2, 1, 183-202."},{"key":"e_1_3_2_1_5_1","volume-title":"The Thirty Sixth Annual Conference on Learning Theory, 4696-4736","author":"Blanchard Moise","year":"2023","unstructured":"Moise Blanchard, Junhui Zhang, and Patrick Jaillet. 2023. Quadratic memory is necessary for optimal query complexity in convex optimization: center-of-mass is Pareto-optimal. In The Thirty Sixth Annual Conference on Learning Theory, 4696-4736."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1090\/S0025-5718-1970-0279993-0"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1093\/imamat\/12.3.223"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1137\/21M1397854"},{"key":"e_1_3_2_1_9_1","volume-title":"Conference on Learning Theory. PMLR, 940-960","author":"Bubeck S\u00e9bastien","year":"2020","unstructured":"S\u00e9bastien Bubeck and Dan Mikulincer. 2020. How to trap a gradient flow. In Conference on Learning Theory. PMLR, 940-960."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1137\/0724077"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1137\/20M1321759"},{"key":"e_1_3_2_1_12_1","volume-title":"International conference on machine learning, 654-663","author":"Carmon Yair","year":"2017","unstructured":"Yair Carmon, John C Duchi, Oliver Hinder, and Aaron Sidford. 2017. \u201c Convex until proven guilty\u201d: dimension-free acceleration of gradient descent on nonconvex functions. In International conference on machine learning, 654-663."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1137\/17M1114296"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10107-019-01406-y"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10107-019-01431-x"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/FOCS57990.2023.00086"},{"key":"e_1_3_2_1_17_1","volume-title":"Nicholas IM Gould, and Ph L Toint","author":"Conn Andrew R","year":"1991","unstructured":"Andrew R Conn, Nicholas IM Gould, and Ph L Toint. 1991. Convergence of quasi-Newton matrices generated by the symmetric rank one update. Mathematical programming, 50, 1, 177-195."},{"key":"e_1_3_2_1_18_1","volume-title":"Nicholas IM Gould, and Philippe L Toint","author":"Conn Andrew R","year":"2000","unstructured":"Andrew R Conn, Nicholas IM Gould, and Philippe L Toint. 2000. Trust region methods. SIAM."},{"key":"e_1_3_2_1_19_1","volume-title":"International Conference on Machine Learning. PMLR, 6643-6670","author":"Cutkosky Ashok","year":"2023","unstructured":"Ashok Cutkosky, Harsh Mehta, and Francesco Orabona. 2023. Optimal stochastic non-smooth non-convex optimization through online-to-non-convex conversion. In International Conference on Machine Learning. PMLR, 6643-6670."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1137\/S1052623401383455"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"crossref","unstructured":"W. C. Davidon. 1959. Variable metric method for minimization. Techinical Report ANL-5990. Argonne National Laboratory Argonne IL.","DOI":"10.2172\/4252678"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"crossref","unstructured":"Roger Fletcher. 1970. A new approach to variable metric algorithms. The computer journal 13 3 317-322.","DOI":"10.1093\/comjnl\/13.3.317"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"crossref","unstructured":"Roger Fletcher. 1994. An overview of unconstrained optimization. Algorithms for continuous optimization: The state of the art 109-143.","DOI":"10.1007\/978-94-009-0369-2_5"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1093\/comjnl\/6.2.163"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"crossref","unstructured":"Donald Goldfarb. 1970. A family of variable-metric methods derived by variational means. Mathematics of computation 24 109 23-26.","DOI":"10.1090\/S0025-5718-1970-0258249-6"},{"key":"e_1_3_2_1_26_1","volume-title":"Conference On Learning Theory. PMLR, 1451-1454","author":"Hinder Oliver","year":"2018","unstructured":"Oliver Hinder. 2018. Cutting plane methods can be extended into nonconvex optimization. In Conference On Learning Theory. PMLR, 1451-1454."},{"key":"e_1_3_2_1_27_1","unstructured":"Andrew Jacobsen and Ashok Cutkosky. 2022. Parameter-free mirror descent."},{"key":"e_1_3_2_1_28_1","volume-title":"Conference on Learning Theory. PMLR, 4160-4211","author":"In","unstructured":"In Conference on Learning Theory. PMLR, 4160-4211."},{"key":"e_1_3_2_1_29_1","volume-title":"International Conference on Machine Learning. PMLR, 14590-14630","author":"Jacobsen Andrew","year":"2023","unstructured":"Andrew Jacobsen and Ashok Cutkosky. 2023. Unconstrained online learning with unbounded losses. In International Conference on Machine Learning. PMLR, 14590-14630."},{"key":"e_1_3_2_1_30_1","volume-title":"The Thirty-eighth Annual Conference on Neural Information Processing Systems.","author":"Jacobsen Andrew","year":"2024","unstructured":"Andrew Jacobsen and Francesco Orabona. 2024. An equivalence between static and dynamic regret minimization. In The Thirty-eighth Annual Conference on Neural Information Processing Systems."},{"key":"e_1_3_2_1_31_1","volume-title":"The Thirty Sixth Annual Conference on Learning Theory. PMLR","author":"Jiang Ruichen","year":"2023","unstructured":"Ruichen Jiang, Qiujiang Jin, and Aryan Mokhtari. 2023. Online learning guided curvature approximation: a quasi-Newton method with global non-asymptotic superlinear convergence. In The Thirty Sixth Annual Conference on Learning Theory. PMLR, 1962-1992."},{"key":"e_1_3_2_1_32_1","unstructured":"Ruichen Jiang and Aryan Mokhtari. 2023. Accelerated quasi-newton proximal extragradient: faster rate for smooth convex optimization. Advances in Neural Information Processing Systems 36."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"crossref","unstructured":"Ruichen Jiang Aryan Mokhtari and Francisco Patitucci. 2024. Improved complexity for smooth nonconvex optimization: a two-level online learning approach with quasi-Newton methods. arXiv preprint arXiv:2412. 02175.","DOI":"10.1145\/3717823.3718308"},{"key":"e_1_3_2_1_34_1","volume-title":"Conference On Learning Theory. PMLR, 1042-1085","author":"Jin Chi","year":"2018","unstructured":"Chi Jin, Praneeth Netrapalli, and Michael I Jordan. 2018. Accelerated gradient descent escapes saddle points faster than gradient descent. In Conference On Learning Theory. PMLR, 1042-1085."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.tcs.2019.11.015"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1137\/0803001"},{"key":"e_1_3_2_1_37_1","unstructured":"Jaeyeon Kim Asuman Ozdaglar Chanwoo Park and Ernest Ryu. 2023. Timereversed dissipation induces duality between minimizing gradient norm and function value. Advances in Neural Information Processing Systems 36."},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1137\/0613066"},{"key":"e_1_3_2_1_39_1","first-page":"11999","article-title":"A geometric structure of acceleration and its role in making gradients small fast","volume":"34","author":"Lee Jongmin","year":"2021","unstructured":"Jongmin Lee, Chanwoo Park, and Ernest Ryu. 2021. A geometric structure of acceleration and its role in making gradients small fast. Advances in Neural Information Processing Systems, 34, 11999-12012.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0377-0427(00)00540-9"},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1137\/S1052623499354242"},{"key":"e_1_3_2_1_42_1","volume-title":"International Conference on Machine Learning. PMLR, 12901-12916","author":"Li Huan","year":"2022","unstructured":"Huan Li and Zhouchen Lin. 2022. Restarted nonconvex accelerated gradient descent: no more polylogarithmic factor in the ( \u22127\/4 ) complexity. In International Conference on Machine Learning. PMLR, 12901-12916."},{"key":"e_1_3_2_1_43_1","unstructured":"Huan Li and Zhouchen Lin. 2023. Restarted nonconvex accelerated gradient descent: no more polylogarithmic factor in the in the ( \u22127\/4 ) complexity."},{"key":"e_1_3_2_1_44_1","unstructured":"Journal of Machine Learning Research 24 157 1-37."},{"key":"e_1_3_2_1_45_1","volume-title":"Conference on Learning Theory. PMLR, 3635-3684","author":"Luo Haipeng","year":"2022","unstructured":"Haipeng Luo, Mengxiao Zhang, Peng Zhao, and Zhi-Hua Zhou. 2022. Corralling a larger band of bandits: a case study on switching regret for linear bandits. In Conference on Learning Theory. PMLR, 3635-3684."},{"key":"e_1_3_2_1_46_1","unstructured":"Annie Marsden Vatsal Sharan Aaron Sidford and Gregory Valiant. 2022."},{"key":"e_1_3_2_1_47_1","volume-title":"Conference on Learning Theory. PMLR, 2390-2430","author":"Eficient","unstructured":"Eficient convex optimization requires superlinear memory. In Conference on Learning Theory. PMLR, 2390-2430."},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1137\/22M1540934"},{"key":"e_1_3_2_1_49_1","first-page":"1","article-title":"Universal heavy-ball method for nonconvex optimization under H\u00f6lder continuous hessians","author":"Marumo Naoki","year":"2024","unstructured":"Naoki Marumo and Akiko Takeda. 2024. Universal heavy-ball method for nonconvex optimization under H\u00f6lder continuous hessians. Mathematical Programming, 1-29.","journal-title":"Mathematical Programming"},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10107-003-0421-7"},{"key":"e_1_3_2_1_51_1","volume-title":"Conference on Learning Theory. PMLR, 5314-5390","author":"Mhammedi Zakaria","year":"2022","unstructured":"Zakaria Mhammedi. 2022. Eficient projection-free online convex optimization with membership oracle. In Conference on Learning Theory. PMLR, 5314-5390."},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"crossref","unstructured":"Yurii Nesterov and Boris T Polyak. 2006. Cubic regularization of Newton method and its global performance. Mathematical programming 108 1 177-205.","DOI":"10.1007\/s10107-006-0706-8"},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1137\/16M1065197"},{"key":"e_1_3_2_1_54_1","doi-asserted-by":"crossref","unstructured":"Jorge Nocedal. 1992. Theory of algorithms for unconstrained optimization.","DOI":"10.1017\/S0962492900002270"},{"key":"e_1_3_2_1_55_1","unstructured":"Acta numerica 1 199-242."},{"key":"e_1_3_2_1_56_1","volume-title":"Nonlinear Programming (SIAM-AMS Proceedings).","author":"Powell M. J. D.","year":"1976","unstructured":"M. J. D. Powell. 1976. Some global convergence properties of a variable metric algorithm for minimization without exact line searches. In Nonlinear Programming (SIAM-AMS Proceedings). Vol. IX. Society for Industrial and Applied Mathematics, Philadelphia."},{"key":"e_1_3_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1093\/imamat\/7.1.21"},{"key":"e_1_3_2_1_58_1","volume-title":"Conference on Learning Theory. PMLR, 993-1019","author":"Rakhlin Alexander","year":"2013","unstructured":"Alexander Rakhlin and Karthik Sridharan. 2013. Online learning with predictable sequences. In Conference on Learning Theory. PMLR, 993-1019."},{"key":"e_1_3_2_1_59_1","unstructured":"Cl\u00e9ment W Royer Michael O'Neill and Stephen J Wright. 2020. A Newton-CG algorithm with complexity guarantees for smooth unconstrained optimization."},{"key":"e_1_3_2_1_60_1","unstructured":"Mathematical Programming 180 451-488."},{"key":"e_1_3_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1137\/17M1134329"},{"key":"e_1_3_2_1_62_1","doi-asserted-by":"crossref","unstructured":"David F Shanno. 1970. Conditioning of quasi-Newton methods for function minimization. Mathematics of computation 24 111 647-656.","DOI":"10.1090\/S0025-5718-1970-0274029-X"},{"key":"e_1_3_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1137\/0803004"},{"key":"e_1_3_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11590-016-1070-0"},{"key":"e_1_3_2_1_65_1","unstructured":"Yi Xu Rong Jin and Tianbao Yang. 2017. Neon+: accelerated gradient methods for extracting negative curvature for non-convex optimization. arXiv preprint arXiv:1712. 01033."},{"key":"e_1_3_2_1_66_1","volume-title":"International Conference on Machine Learning. PMLR, 26085-26115","author":"Zhang Zhiyu","year":"2022","unstructured":"Zhiyu Zhang, Ashok Cutkosky, and Ioannis Paschalidis. 2022. PDE-based optimal strategy for unconstrained online learning. In International Conference on Machine Learning. PMLR, 26085-26115."}],"event":{"name":"STOC '25: 57th Annual ACM Symposium on Theory of Computing","location":"Prague Czechia","acronym":"STOC '25","sponsor":["SIGACT ACM Special Interest Group on Algorithms and Computation Theory"]},"container-title":["Proceedings of the 57th Annual ACM Symposium on Theory of Computing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3717823.3718308","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,23]],"date-time":"2025-06-23T15:50:22Z","timestamp":1750693822000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3717823.3718308"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,15]]},"references-count":66,"alternative-id":["10.1145\/3717823.3718308","10.1145\/3717823"],"URL":"https:\/\/doi.org\/10.1145\/3717823.3718308","relation":{},"subject":[],"published":{"date-parts":[[2025,6,15]]},"assertion":[{"value":"2025-06-15","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}