{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:19:44Z","timestamp":1750220384638,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":58,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,8,14]],"date-time":"2021-08-14T00:00:00Z","timestamp":1628899200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"NSF (National Science Foundation)","doi-asserted-by":"publisher","award":["IIS-2008208, IIS-1934600, IIS-1938167, IIS-1955151"],"award-info":[{"award-number":["IIS-2008208, IIS-1934600, IIS-1938167, IIS-1955151"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,8,14]]},"DOI":"10.1145\/3447548.3467287","type":"proceedings-article","created":{"date-parts":[[2021,8,12]],"date-time":"2021-08-12T06:13:10Z","timestamp":1628748790000},"page":"1530-1540","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["A Stagewise Hyperparameter Scheduler to Improve Generalization"],"prefix":"10.1145","author":[{"given":"Jianhui","family":"Sun","sequence":"first","affiliation":[{"name":"University of Virginia, Charlottesville, VA, USA"}]},{"given":"Ying","family":"Yang","sequence":"additional","affiliation":[{"name":"University of Michigan, Ann Arbor, MI, USA"}]},{"given":"Guangxu","family":"Xun","sequence":"additional","affiliation":[{"name":"University of Virginia, Charlottesville, VA, USA"}]},{"given":"Aidong","family":"Zhang","sequence":"additional","affiliation":[{"name":"University of Virginia, Charlottesville, VA, USA"}]}],"member":"320","published-online":{"date-parts":[[2021,8,14]]},"reference":[{"key":"e_1_3_2_2_1_1","volume-title":"TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16)","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi , Paul Barham , Jianmin Chen , Zhifeng Chen , Andy Davis , Jeffrey Dean , Matthieu Devin , Sanjay Ghemawat , Geoffrey Irving , Michael Isard , Manjunath Kudlur , Josh Levenberg , Rajat Monga , Sherry Moore , Derek G. Murray , Benoit Steiner , Paul Tucker , Vijay Vasudevan , Pete Warden , Martin Wicke , Yuan Yu , and Xiaoqiang Zheng . 2016 . TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) . USENIX Association, Savannah, GA, 265--283. https:\/\/www.usenix.org\/conference\/osdi16\/technical-sessions\/presentation\/abadi Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). USENIX Association, Savannah, GA, 265--283. https:\/\/www.usenix.org\/conference\/osdi16\/technical-sessions\/presentation\/abadi"},{"volume-title":"2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 8522--8531","author":"An W.","key":"e_1_3_2_2_2_1","unstructured":"W. An , H. Wang , Q. Sun , J. Xu , Q. Dai , and L. Zhang . 2018. A PID Controller Approach for Stochastic Optimization of Deep Networks . In 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 8522--8531 . W. An, H. Wang, Q. Sun, J. Xu, Q. Dai, and L. Zhang. 2018. A PID Controller Approach for Stochastic Optimization of Deep Networks. In 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 8522--8531."},{"key":"e_1_3_2_2_3_1","series-title":"SIAM Journal on Optimization","volume-title":"Robust Accelerated Gradient Methods for Smooth Strongly Convex Functions","author":"Aybat Necdet","year":"2020","unstructured":"Necdet Aybat , Alireza Fallah , Mert G\u00fcrb\u00fczbalaban , and Asuman Ozdaglar . 2020. Robust Accelerated Gradient Methods for Smooth Strongly Convex Functions . SIAM Journal on Optimization , Vol. 30 (01 2020 ), 717--751. https:\/\/doi.org\/10.1137\/19M1244925 10.1137\/19M1244925 Necdet Aybat, Alireza Fallah, Mert G\u00fcrb\u00fczbalaban, and Asuman Ozdaglar. 2020. Robust Accelerated Gradient Methods for Smooth Strongly Convex Functions. SIAM Journal on Optimization, Vol. 30 (01 2020), 717--751. https:\/\/doi.org\/10.1137\/19M1244925"},{"key":"e_1_3_2_2_4_1","doi-asserted-by":"crossref","unstructured":"Yoshua Bengio. 2012. Practical Recommendations for Gradient-Based Training of Deep Architectures. In Neural Networks: Tricks of the Trade.  Yoshua Bengio. 2012. Practical Recommendations for Gradient-Based Training of Deep Architectures. In Neural Networks: Tricks of the Trade.","DOI":"10.1007\/978-3-642-35289-8_26"},{"key":"e_1_3_2_2_5_1","article-title":"Random Search for Hyper-Parameter Optimization","volume":"13","author":"Bergstra James","year":"2012","unstructured":"James Bergstra and Yoshua Bengio . 2012 . Random Search for Hyper-Parameter Optimization . J. Mach. Learn. Res. , Vol. 13 , null (Feb. 2012), 281--305. James Bergstra and Yoshua Bengio. 2012. Random Search for Hyper-Parameter Optimization. J. Mach. Learn. Res., Vol. 13, null (Feb. 2012), 281--305.","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_3_2_2_6_1","unstructured":"L. Bottou Frank E. Curtis and J. Nocedal. 2018. Optimization Methods for Large-Scale Machine Learning. ArXiv Vol. abs\/1606.04838 (2018).  L. Bottou Frank E. Curtis and J. Nocedal. 2018. Optimization Methods for Large-Scale Machine Learning. ArXiv Vol. abs\/1606.04838 (2018)."},{"key":"e_1_3_2_2_7_1","unstructured":"B. Can Mert G\u00fcrb\u00fczbalaban and Lingjiong Zhu. 2019. Accelerated Linear Convergence of Stochastic Momentum Methods in Wasserstein Distances. In ICML.  B. Can Mert G\u00fcrb\u00fczbalaban and Lingjiong Zhu. 2019. Accelerated Linear Convergence of Stochastic Momentum Methods in Wasserstein Distances. In ICML."},{"volume-title":"Advances in Neural Information Processing Systems 32. Curran Associates","author":"Defazio Aaron","key":"e_1_3_2_2_8_1","unstructured":"Aaron Defazio . 2019. On the Curved Geometry of Accelerated Optimization . In Advances in Neural Information Processing Systems 32. Curran Associates , Inc ., 1766--1775. http:\/\/papers.nips.cc\/paper\/8453-on-the-curved-geometry-of-accelerated-optimization.pdf Aaron Defazio. 2019. On the Curved Geometry of Accelerated Optimization. In Advances in Neural Information Processing Systems 32. Curran Associates, Inc., 1766--1775. http:\/\/papers.nips.cc\/paper\/8453-on-the-curved-geometry-of-accelerated-optimization.pdf"},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.5555\/1953048.2021068"},{"key":"e_1_3_2_2_10_1","unstructured":"R. Ge Sham M. Kakade R. Kidambi and Praneeth Netrapalli. 2019. The Step Decay Schedule: A Near Optimal Geometrically Decaying Learning Rate Procedure. In NeurIPS.  R. Ge Sham M. Kakade R. Kidambi and Praneeth Netrapalli. 2019. The Step Decay Schedule: A Near Optimal Geometrically Decaying Learning Rate Procedure. In NeurIPS."},{"volume-title":"Advances in Neural Information Processing Systems 32. Curran Associates","author":"Gitman Igor","key":"e_1_3_2_2_11_1","unstructured":"Igor Gitman , Hunter Lang , Pengchuan Zhang , and Lin Xiao . 2019. Understanding the Role of Momentum in Stochastic Gradient Methods . In Advances in Neural Information Processing Systems 32. Curran Associates , Inc ., 9633--9643. http:\/\/papers.nips.cc\/paper\/9158-understanding-the-role-of-momentum-in-stochastic-gradient-methods.pdf Igor Gitman, Hunter Lang, Pengchuan Zhang, and Lin Xiao. 2019. Understanding the Role of Momentum in Stochastic Gradient Methods. In Advances in Neural Information Processing Systems 32. Curran Associates, Inc., 9633--9643. http:\/\/papers.nips.cc\/paper\/9158-understanding-the-role-of-momentum-in-stochastic-gradient-methods.pdf"},{"key":"e_1_3_2_2_12_1","volume-title":"Large Minibatch SGD: Training ImageNet in 1 Hour. CoRR","author":"Goyal Priya","year":"2017","unstructured":"Priya Goyal , Piotr Doll\u00e1r , Ross B. Girshick , Pieter Noordhuis , Lukasz Wesolowski , Aapo Kyrola , Andrew Tulloch , Yangqing Jia , and Kaiming He. 2017. Accurate , Large Minibatch SGD: Training ImageNet in 1 Hour. CoRR , Vol. abs\/ 1706 .02677 ( 2017 ). arxiv: 1706.02677 http:\/\/arxiv.org\/abs\/1706.02677 Priya Goyal, Piotr Doll\u00e1r, Ross B. Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. 2017. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour. CoRR, Vol. abs\/1706.02677 (2017). arxiv: 1706.02677 http:\/\/arxiv.org\/abs\/1706.02677"},{"key":"e_1_3_2_2_13_1","volume-title":"A Primer on PAC-Bayesian Learning. ArXiv","author":"Guedj Benjamin","year":"2019","unstructured":"Benjamin Guedj . 2019. A Primer on PAC-Bayesian Learning. ArXiv , Vol. abs\/ 1901 .05353 ( 2019 ). Benjamin Guedj. 2019. A Primer on PAC-Bayesian Learning. ArXiv, Vol. abs\/1901.05353 (2019)."},{"volume-title":"Advances in Neural Information Processing Systems 32. Curran Associates","author":"He Fengxiang","key":"e_1_3_2_2_14_1","unstructured":"Fengxiang He , Tongliang Liu , and Dacheng Tao . 2019. Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence . In Advances in Neural Information Processing Systems 32. Curran Associates , Inc ., 1143--1152. http:\/\/papers.nips.cc\/paper\/8398-control-batch-size-and-learning-rate-to-generalize-well-theoretical-and-empirical-evidence.pdf Fengxiang He, Tongliang Liu, and Dacheng Tao. 2019. Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence. In Advances in Neural Information Processing Systems 32. Curran Associates, Inc., 1143--1152. http:\/\/papers.nips.cc\/paper\/8398-control-batch-size-and-learning-rate-to-generalize-well-theoretical-and-empirical-evidence.pdf"},{"key":"e_1_3_2_2_15_1","volume-title":"Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","author":"He Kaiming","year":"2016","unstructured":"Kaiming He , X. Zhang , Shaoqing Ren , and Jian Sun . 2016 a. Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), 770--778. Kaiming He, X. Zhang, Shaoqing Ren, and Jian Sun. 2016a. Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), 770--778."},{"key":"e_1_3_2_2_16_1","volume-title":"Identity Mappings in Deep Residual Networks. ArXiv","author":"He Kaiming","year":"2016","unstructured":"Kaiming He , X. Zhang , Shaoqing Ren , and Jian Sun . 2016b. Identity Mappings in Deep Residual Networks. ArXiv , Vol. abs\/ 1603 .05027 ( 2016 ). Kaiming He, X. Zhang, Shaoqing Ren, and Jian Sun. 2016b. Identity Mappings in Deep Residual Networks. ArXiv, Vol. abs\/1603.05027 (2016)."},{"key":"e_1_3_2_2_17_1","volume-title":"Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17)","author":"Hoffer Elad","year":"2017","unstructured":"Elad Hoffer , Itay Hubara , and Daniel Soudry . 2017 . Train Longer, Generalize Better: Closing the Generalization Gap in Large Batch Training of Neural Networks . In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17) . Curran Associates Inc., Red Hook, NY, USA, 1729--1739. Elad Hoffer, Itay Hubara, and Daniel Soudry. 2017. Train Longer, Generalize Better: Closing the Generalization Gap in Large Batch Training of Neural Networks. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 1729--1739."},{"key":"e_1_3_2_2_18_1","volume-title":"Weinberger","author":"Huang Gao","year":"2017","unstructured":"Gao Huang , Yixuan Li , Geoff Pleiss , Zhuang Liu , John E. Hopcroft , and Kilian Q . Weinberger . 2017 . Snapshot Ensembles : Train 1, get M for free. CoRR , Vol. abs\/ 1704 .00109 (2017). arxiv: 1704.00109 http:\/\/arxiv.org\/abs\/1704.00109 Gao Huang, Yixuan Li, Geoff Pleiss, Zhuang Liu, John E. Hopcroft, and Kilian Q. Weinberger. 2017. Snapshot Ensembles: Train 1, get M for free. CoRR, Vol. abs\/1704.00109 (2017). arxiv: 1704.00109 http:\/\/arxiv.org\/abs\/1704.00109"},{"key":"e_1_3_2_2_19_1","unstructured":"Stanis\u0140aw Jastrz\u0229bski Zac Kenton Devansh Arpit Nicolas Ballas Asja Fischer Amos Storkey and Yoshua Bengio. 2018. Three factors influencing minima in SGD. https:\/\/openreview.net\/forum?id=rJma2bZCW  Stanis\u0140aw Jastrz\u0229bski Zac Kenton Devansh Arpit Nicolas Ballas Asja Fischer Amos Storkey and Yoshua Bengio. 2018. Three factors influencing minima in SGD. https:\/\/openreview.net\/forum?id=rJma2bZCW"},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330977"},{"key":"e_1_3_2_2_21_1","volume-title":"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima. CoRR","author":"Keskar Nitish Shirish","year":"2016","unstructured":"Nitish Shirish Keskar , Dheevatsa Mudigere , Jorge Nocedal , Mikhail Smelyanskiy , and Ping Tak Peter Tang . 2016. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima. CoRR , Vol. abs\/ 1609 .04836 ( 2016 ). arxiv: 1609.04836 http:\/\/arxiv.org\/abs\/1609.04836 Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. 2016. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima. CoRR, Vol. abs\/1609.04836 (2016). arxiv: 1609.04836 http:\/\/arxiv.org\/abs\/1609.04836"},{"key":"e_1_3_2_2_22_1","volume-title":"Improving Generalization Performance by Switching from Adam to SGD. CoRR","author":"Keskar Nitish Shirish","year":"2017","unstructured":"Nitish Shirish Keskar and Richard Socher . 2017. Improving Generalization Performance by Switching from Adam to SGD. CoRR , Vol. abs\/ 1712 .07628 ( 2017 ). arxiv: 1712.07628 http:\/\/arxiv.org\/abs\/1712.07628 Nitish Shirish Keskar and Richard Socher. 2017. Improving Generalization Performance by Switching from Adam to SGD. CoRR, Vol. abs\/1712.07628 (2017). arxiv: 1712.07628 http:\/\/arxiv.org\/abs\/1712.07628"},{"key":"e_1_3_2_2_23_1","volume-title":"Kakade","author":"Kidambi Rahul","year":"2018","unstructured":"Rahul Kidambi , Praneeth Netrapalli , Prateek Jain , and Sham M . Kakade . 2018 . On the insufficiency of existing momentum schemes for Stochastic Optimization. CoRR , Vol. abs\/ 1803 .05591 (2018). arxiv: 1803.05591 http:\/\/arxiv.org\/abs\/1803.05591 Rahul Kidambi, Praneeth Netrapalli, Prateek Jain, and Sham M. Kakade. 2018. On the insufficiency of existing momentum schemes for Stochastic Optimization. CoRR, Vol. abs\/1803.05591 (2018). arxiv: 1803.05591 http:\/\/arxiv.org\/abs\/1803.05591"},{"key":"e_1_3_2_2_24_1","volume-title":"Kingma and Jimmy Ba","author":"Diederik","year":"2015","unstructured":"Diederik P. Kingma and Jimmy Ba . 2015 . Adam : A Method for Stochastic Optimization. CoRR , Vol. abs\/ 1412 .6980 (2015). Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. CoRR, Vol. abs\/1412.6980 (2015)."},{"key":"e_1_3_2_2_25_1","unstructured":"Alex Krizhevsky Ilya Sutskever and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. (2012) 1097--1105. http:\/\/papers.nips.cc\/paper\/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf  Alex Krizhevsky Ilya Sutskever and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. (2012) 1097--1105. http:\/\/papers.nips.cc\/paper\/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf"},{"key":"e_1_3_2_2_26_1","unstructured":"A. Kulunchakov and J. Mairal. 2019. Estimate Sequences for Variance-Reduced Stochastic Composite Optimization. In ICML.  A. Kulunchakov and J. Mairal. 2019. Estimate Sequences for Variance-Reduced Stochastic Composite Optimization. In ICML."},{"key":"e_1_3_2_2_27_1","volume-title":"Oberman","author":"Laborde M.","year":"2020","unstructured":"M. Laborde and Adam M . Oberman . 2020 . A Lyapunov analysis for accelerated gradient methods: from deterministic to stochastic case. In AISTATS. M. Laborde and Adam M. Oberman. 2020. A Lyapunov analysis for accelerated gradient methods: from deterministic to stochastic case. In AISTATS."},{"key":"e_1_3_2_2_28_1","series-title":"SIAM Journal on Optimization","volume-title":"Analysis and Design of Optimization Algorithms via Integral Quadratic Constraints","author":"Lessard Laurent","year":"2014","unstructured":"Laurent Lessard , Benjamin Recht , and Andrew Packard . 2014. Analysis and Design of Optimization Algorithms via Integral Quadratic Constraints . SIAM Journal on Optimization , Vol. 26 (08 2014 ). https:\/\/doi.org\/10.1137\/15M1009597 10.1137\/15M1009597 Laurent Lessard, Benjamin Recht, and Andrew Packard. 2014. Analysis and Design of Optimization Algorithms via Integral Quadratic Constraints. SIAM Journal on Optimization, Vol. 26 (08 2014). https:\/\/doi.org\/10.1137\/15M1009597"},{"key":"e_1_3_2_2_29_1","unstructured":"Chaoyue Liu Libin Zhu and Mikhail Belkin. 2020 b. Toward a theory of optimization for over-parameterized systems of non-linear equations: the lessons of deep learning. arxiv: cs.LG\/2003.00307  Chaoyue Liu Libin Zhu and Mikhail Belkin. 2020 b. Toward a theory of optimization for over-parameterized systems of non-linear equations: the lessons of deep learning. arxiv: cs.LG\/2003.00307"},{"key":"e_1_3_2_2_30_1","unstructured":"Yanli Liu Yuan Gao and Wotao Yin. 2020 a. An Improved Analysis of Stochastic Gradient Descent with Momentum. arxiv: math.OC\/2007.07989  Yanli Liu Yuan Gao and Wotao Yin. 2020 a. An Improved Analysis of Stochastic Gradient Descent with Momentum. arxiv: math.OC\/2007.07989"},{"key":"e_1_3_2_2_31_1","unstructured":"Ben London. 2017. A PAC-Bayesian Analysis of Randomized Learning with Application to Stochastic Gradient Descent. In NIPS.  Ben London. 2017. A PAC-Bayesian Analysis of Randomized Learning with Application to Stochastic Gradient Descent. In NIPS."},{"key":"e_1_3_2_2_32_1","volume-title":"SGDR: Stochastic Gradient Descent with Restarts. CoRR","author":"Loshchilov Ilya","year":"2016","unstructured":"Ilya Loshchilov and Frank Hutter . 2016 . SGDR: Stochastic Gradient Descent with Restarts. CoRR , Vol. abs\/ 1608 .03983 (2016). arxiv: 1608.03983 http:\/\/arxiv.org\/abs\/1608.03983 Ilya Loshchilov and Frank Hutter. 2016. SGDR: Stochastic Gradient Descent with Restarts. CoRR, Vol. abs\/1608.03983 (2016). arxiv: 1608.03983 http:\/\/arxiv.org\/abs\/1608.03983"},{"key":"e_1_3_2_2_33_1","volume-title":"International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=S1fUpoR5FQ","author":"Ma Jerry","year":"2019","unstructured":"Jerry Ma and Denis Yarats . 2019 . Quasi-hyperbolic momentum and Adam for deep learning . In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=S1fUpoR5FQ Jerry Ma and Denis Yarats. 2019. Quasi-hyperbolic momentum and Adam for deep learning. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=S1fUpoR5FQ"},{"key":"e_1_3_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33011069"},{"key":"e_1_3_2_2_35_1","first-page":"1","article-title":"Stochastic Gradient Descent as Approximate Bayesian Inference","volume":"18","author":"Mandt Stephan","year":"2017","unstructured":"Stephan Mandt , Matthew D. Hoffman , and David M. Blei . 2017 . Stochastic Gradient Descent as Approximate Bayesian Inference . J. Mach. Learn. Res. , Vol. 18 , 1 (Jan. 2017), 4873--4907. Stephan Mandt, Matthew D. Hoffman, and David M. Blei. 2017. Stochastic Gradient Descent as Approximate Bayesian Inference. J. Mach. Learn. Res., Vol. 18, 1 (Jan. 2017), 4873--4907.","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_3_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/279943.279989"},{"key":"e_1_3_2_2_37_1","unstructured":"Y. Nesterov. 1983. A method for solving the convex programming problem with convergence rate O(1\/k 2).  Y. Nesterov. 1983. A method for solving the convex programming problem with convergence rate O(1\/k 2)."},{"volume-title":"PyTorch: An Imperative Style","author":"Paszke Adam","key":"e_1_3_2_2_38_1","unstructured":"Adam Paszke , S. Gross , Francisco Massa , A. Lerer , J. Bradbury , G. Chanan , T. Killeen , Z. Lin , N. Gimelshein , L. Antiga , Alban Desmaison , Andreas K\u00f6pf , E. Yang , Zach DeVito , Martin Raison , Alykhan Tejani , Sasank Chilamkurthy , B. Steiner , Lu Fang , Junjie Bai , and Soumith Chintala . 2019. PyTorch: An Imperative Style , High-Performance Deep Learning Library . In NeurIPS. Adam Paszke, S. Gross, Francisco Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, Alban Desmaison, Andreas K\u00f6pf, E. Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, B. Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In NeurIPS."},{"key":"e_1_3_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1016\/0041-5553(64)90137-5"},{"key":"e_1_3_2_2_40_1","volume-title":"An overview of gradient descent optimization algorithms. ArXiv","author":"Ruder Sebastian","year":"2016","unstructured":"Sebastian Ruder . 2016. An overview of gradient descent optimization algorithms. ArXiv , Vol. abs\/ 1609 .04747 ( 2016 ). Sebastian Ruder. 2016. An overview of gradient descent optimization algorithms. ArXiv, Vol. abs\/1609.04747 (2016)."},{"key":"e_1_3_2_2_41_1","volume-title":"3rd International Conference on Learning Representations, ICLR","author":"Simonyan Karen","year":"2015","unstructured":"Karen Simonyan and Andrew Zisserman . 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition . In 3rd International Conference on Learning Representations, ICLR 2015 , San Diego, CA , USA, May 7-9, 2015, Conference Track Proceedings . http:\/\/arxiv.org\/abs\/1409.1556 Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings. http:\/\/arxiv.org\/abs\/1409.1556"},{"key":"e_1_3_2_2_42_1","volume-title":"Cyclical Learning Rates for Training Neural Networks. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). 464--472","author":"Smith L. N.","year":"2017","unstructured":"L. N. Smith . 2017 . Cyclical Learning Rates for Training Neural Networks. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). 464--472 . https:\/\/doi.org\/10.1109\/WACV.2017.58 10.1109\/WACV.2017.58 L. N. Smith. 2017. Cyclical Learning Rates for Training Neural Networks. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). 464--472. https:\/\/doi.org\/10.1109\/WACV.2017.58"},{"key":"e_1_3_2_2_43_1","volume-title":"Smith and Nicholay Topin","author":"Leslie","year":"2017","unstructured":"Leslie N. Smith and Nicholay Topin . 2017 . Super-Convergence: Very Fast Training of Residual Networks Using Large Learning Rates. CoRR , Vol. abs\/ 1708 .07120 (2017). arxiv: 1708.07120 http:\/\/arxiv.org\/abs\/1708.07120 Leslie N. Smith and Nicholay Topin. 2017. Super-Convergence: Very Fast Training of Residual Networks Using Large Learning Rates. CoRR, Vol. abs\/1708.07120 (2017). arxiv: 1708.07120 http:\/\/arxiv.org\/abs\/1708.07120"},{"key":"e_1_3_2_2_44_1","volume-title":"Le","author":"Smith Sam","year":"2018","unstructured":"Sam Smith and Quoc V . Le . 2018 . A Bayesian Perspective on Generalization and Stochastic Gradient Descent . https:\/\/openreview.net\/pdf?id=BJij4yg0Z Sam Smith and Quoc V. Le. 2018. A Bayesian Perspective on Generalization and Stochastic Gradient Descent. https:\/\/openreview.net\/pdf?id=BJij4yg0Z"},{"volume-title":"Increase the Batch Size. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=B1Yy1BxCZ","author":"Smith Samuel L.","key":"e_1_3_2_2_45_1","unstructured":"Samuel L. Smith , Pieter-Jan Kindermans , and Quoc V. Le . 2018. Don't Decay the Learning Rate , Increase the Batch Size. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=B1Yy1BxCZ Samuel L. Smith, Pieter-Jan Kindermans, and Quoc V. Le. 2018. Don't Decay the Learning Rate, Increase the Batch Size. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=B1Yy1BxCZ"},{"key":"e_1_3_2_2_46_1","volume-title":"Proceedings of the 25th International Conference on Neural Information Processing Systems -","volume":"2","author":"Snoek Jasper","unstructured":"Jasper Snoek , Hugo Larochelle , and Ryan P. Adams . 2012. Practical Bayesian Optimization of Machine Learning Algorithms . In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 2 (NIPS'12). Curran Associates Inc., Red Hook, NY, USA, 2951--2959. Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. 2012. Practical Bayesian Optimization of Machine Learning Algorithms. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 2 (NIPS'12). Curran Associates Inc., Red Hook, NY, USA, 2951--2959."},{"key":"e_1_3_2_2_47_1","volume-title":"Recurrent Imputation for Multivariate Time Series with Missing Values. In 2019 IEEE International Conference on Healthcare Informatics, ICHI 2019","author":"Suo Qiuling","year":"2019","unstructured":"Qiuling Suo , Liuyi Yao , Guangxu Xun , Jianhui Sun , and Aidong Zhang . 2019 . Recurrent Imputation for Multivariate Time Series with Missing Values. In 2019 IEEE International Conference on Healthcare Informatics, ICHI 2019 , Xi'an, China , June 10-13, 2019. IEEE, 1--3. https:\/\/doi.org\/10.1109\/ICHI.2019.8904638 10.1109\/ICHI.2019.8904638 Qiuling Suo, Liuyi Yao, Guangxu Xun, Jianhui Sun, and Aidong Zhang. 2019. Recurrent Imputation for Multivariate Time Series with Missing Values. In 2019 IEEE International Conference on Healthcare Informatics, ICHI 2019, Xi'an, China, June 10-13, 2019. IEEE, 1--3. https:\/\/doi.org\/10.1109\/ICHI.2019.8904638"},{"key":"e_1_3_2_2_48_1","volume-title":"GLIMA: Global and Local Time Series Imputation with Multi-directional Attention Learning. In IEEE International Conference on Big Data, Big Data 2020","author":"Suo Qiuling","year":"2020","unstructured":"Qiuling Suo , Weida Zhong , Guangxu Xun , Jianhui Sun , Changyou Chen , and Aidong Zhang . 2020 . GLIMA: Global and Local Time Series Imputation with Multi-directional Attention Learning. In IEEE International Conference on Big Data, Big Data 2020 , Atlanta, GA, USA , December 10-13, 2020. IEEE, 798--807. https:\/\/doi.org\/10.1109\/BigData50022.2020.9378408 10.1109\/BigData50022.2020.9378408 Qiuling Suo, Weida Zhong, Guangxu Xun, Jianhui Sun, Changyou Chen, and Aidong Zhang. 2020. GLIMA: Global and Local Time Series Imputation with Multi-directional Attention Learning. In IEEE International Conference on Big Data, Big Data 2020, Atlanta, GA, USA, December 10-13, 2020. IEEE, 798--807. https:\/\/doi.org\/10.1109\/BigData50022.2020.9378408"},{"key":"e_1_3_2_2_49_1","volume-title":"Proceedings of the 30th International Conference on International Conference on Machine Learning -","volume":"28","author":"Sutskever Ilya","year":"2013","unstructured":"Ilya Sutskever , James Martens , George Dahl , and Geoffrey Hinton . 2013 . On the Importance of Initialization and Momentum in Deep Learning . In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28 (ICML'13). III-1139-III-1147. Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton. 2013. On the Importance of Initialization and Momentum in Deep Learning. In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28 (ICML'13). III-1139-III-1147."},{"key":"e_1_3_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/LCSYS.2017.2722406"},{"key":"e_1_3_2_2_51_1","unstructured":"Sharan Vaswani F. Bach and M. Schmidt. 2019. Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron. ArXiv Vol. abs\/1810.07288 (2019).  Sharan Vaswani F. Bach and M. Schmidt. 2019. Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron. ArXiv Vol. abs\/1810.07288 (2019)."},{"volume-title":"Advances in Neural Information Processing Systems 30. Curran Associates","author":"Wilson Ashia C","key":"e_1_3_2_2_52_1","unstructured":"Ashia C Wilson , Rebecca Roelofs , Mitchell Stern , Nati Srebro , and Benjamin Recht . 2017. The Marginal Value of Adaptive Gradient Methods in Machine Learning . In Advances in Neural Information Processing Systems 30. Curran Associates , Inc ., 4148--4158. http:\/\/papers.nips.cc\/paper\/7003-the-marginal-value-of-adaptive-gradient-methods-in-machine-learning.pdf Ashia C Wilson, Rebecca Roelofs, Mitchell Stern, Nati Srebro, and Benjamin Recht. 2017. The Marginal Value of Adaptive Gradient Methods in Machine Learning. In Advances in Neural Information Processing Systems 30. Curran Associates, Inc., 4148--4158. http:\/\/papers.nips.cc\/paper\/7003-the-marginal-value-of-adaptive-gradient-methods-in-machine-learning.pdf"},{"key":"e_1_3_2_2_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3380980"},{"key":"e_1_3_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3403151"},{"key":"e_1_3_2_2_55_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btz142"},{"key":"e_1_3_2_2_56_1","doi-asserted-by":"crossref","unstructured":"Yan Yan Tianbao Yang Zhe Li Qihang Lin and Yi Yang. 2018. A Unified Analysis of Stochastic Momentum Methods for Deep Learning. In IJCAI. 2955--2961. https:\/\/doi.org\/10.24963\/ijcai.2018\/410    10.24963\/ijcai.2018\nYan Yan Tianbao Yang Zhe Li Qihang Lin and Yi Yang. 2018. A Unified Analysis of Stochastic Momentum Methods for Deep Learning. In IJCAI. 2955--2961. https:\/\/doi.org\/10.24963\/ijcai.2018\/410","DOI":"10.24963\/ijcai.2018\/410"},{"key":"e_1_3_2_2_57_1","unstructured":"Jian Zhang and Ioannis Mitliagkas. 2018. YellowFin and the Art of Momentum Tuning. arxiv: stat.ML\/1706.03471  Jian Zhang and Ioannis Mitliagkas. 2018. YellowFin and the Art of Momentum Tuning. arxiv: stat.ML\/1706.03471"},{"key":"e_1_3_2_2_58_1","volume-title":"Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent. In ITCS.","author":"Zhu Z.","year":"2017","unstructured":"Z. Zhu and L. Orecchia . 2017 . Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent. In ITCS. Z. Zhu and L. Orecchia. 2017. Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent. In ITCS."}],"event":{"name":"KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining","sponsor":["SIGMOD ACM Special Interest Group on Management of Data","SIGKDD ACM Special Interest Group on Knowledge Discovery in Data"],"location":"Virtual Event Singapore","acronym":"KDD '21"},"container-title":["Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery &amp; Data Mining"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3447548.3467287","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/abs\/10.1145\/3447548.3467287","content-type":"text\/html","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3447548.3467287","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3447548.3467287","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:18:28Z","timestamp":1750191508000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3447548.3467287"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,14]]},"references-count":58,"alternative-id":["10.1145\/3447548.3467287","10.1145\/3447548"],"URL":"https:\/\/doi.org\/10.1145\/3447548.3467287","relation":{},"subject":[],"published":{"date-parts":[[2021,8,14]]},"assertion":[{"value":"2021-08-14","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}