{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T01:25:23Z","timestamp":1773797123733,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":41,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,10,18]],"date-time":"2020-10-18T00:00:00Z","timestamp":1602979200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,10,19]]},"DOI":"10.1145\/3412815.3416891","type":"proceedings-article","created":{"date-parts":[[2020,10,15]],"date-time":"2020-10-15T23:37:54Z","timestamp":1602805074000},"page":"119-128","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["Toward Communication Efficient Adaptive Gradient Method"],"prefix":"10.1145","author":[{"given":"Xiangyi","family":"Chen","sequence":"first","affiliation":[{"name":"Baidu Research, Bellevue, WA, USA"}]},{"given":"Xiaoyun","family":"Li","sequence":"additional","affiliation":[{"name":"Baidu Research, Bellevue, WA, USA"}]},{"given":"Ping","family":"Li","sequence":"additional","affiliation":[{"name":"Baidu Research, Bellevue, WA, USA"}]}],"member":"320","published-online":{"date-parts":[[2020,10,18]]},"reference":[{"key":"e_1_3_2_1_1_1","first-page":"102","volume-title":"Proceedings of the 36th International Conference on Machine Learning (ICML)","author":"Agarwal Naman","year":"2019","unstructured":"Naman Agarwal , Brian Bullins , Xinyi Chen , Elad Hazan , Karan Singh , Cyril Zhang , and Yi Zhang . Efficient full-matrix adaptive regularization . In Proceedings of the 36th International Conference on Machine Learning (ICML) , pages 102 -- 110 , Long Beach, CA , 2019 . Naman Agarwal, Brian Bullins, Xinyi Chen, Elad Hazan, Karan Singh, Cyril Zhang, and Yi Zhang. Efficient full-matrix adaptive regularization. In Proceedings of the 36th International Conference on Machine Learning (ICML), pages 102--110, Long Beach, CA, 2019."},{"key":"e_1_3_2_1_2_1","first-page":"1709","volume-title":"Advances in Neural Information Processing Systems (NIPS)","author":"Alistarh Dan","year":"2017","unstructured":"Dan Alistarh , Demjan Grubic , Jerry Li , Ryota Tomioka , and Milan Vojnovic . Qsgd : Communication-efficient sgd via gradient quantization and encoding . In Advances in Neural Information Processing Systems (NIPS) , pages 1709 -- 1720 , Long Beach , 2017 . Dan Alistarh, Demjan Grubic, Jerry Li, Ryota Tomioka, and Milan Vojnovic. Qsgd: Communication-efficient sgd via gradient quantization and encoding. In Advances in Neural Information Processing Systems (NIPS), pages 1709--1720, Long Beach, 2017."},{"key":"e_1_3_2_1_3_1","first-page":"559","volume-title":"Proceedings of the 35th International Conference on Machine Learning (ICML)","author":"Bernstein Jeremy","year":"2018","unstructured":"Jeremy Bernstein , Yu-Xiang Wang , Kamyar Azizzadenesheli , and Animashree Anandkumar . SIGNSGD : compressed optimisation for non-convex problems . In Proceedings of the 35th International Conference on Machine Learning (ICML) , pages 559 -- 568 , Stockholmsm\"a ssan, Stockholm, Sweden , 2018 . Jeremy Bernstein, Yu-Xiang Wang, Kamyar Azizzadenesheli, and Animashree Anandkumar. SIGNSGD: compressed optimisation for non-convex problems. In Proceedings of the 35th International Conference on Machine Learning (ICML), pages 559--568, Stockholmsm\"a ssan, Stockholm, Sweden, 2018."},{"key":"e_1_3_2_1_4_1","volume-title":"Proceedings of the 7th International Conference on Learning Representations (ICLR)","author":"Chen Xiangyi","year":"2019","unstructured":"Chen, Liu, Sun, and Hong]chen2018convergence Xiangyi Chen , Sijia Liu , Ruoyu Sun , and Mingyi Hong . On the convergence of A class of adam-type algorithms for non-convex optimization . In Proceedings of the 7th International Conference on Learning Representations (ICLR) , New Orleans, LA , 2019 a . Chen, Liu, Sun, and Hong]chen2018convergenceXiangyi Chen, Sijia Liu, Ruoyu Sun, and Mingyi Hong. On the convergence of A class of adam-type algorithms for non-convex optimization. In Proceedings of the 7th International Conference on Learning Representations (ICLR), New Orleans, LA, 2019 a ."},{"key":"e_1_3_2_1_6_1","volume-title":"Proceedings of the 7th International Conference on Learning Representations (ICLR)","author":"Chen Zaiyi","year":"2019","unstructured":"Chen, Yuan, Yi, Zhou, Chen, and Yang]chen2018universal Zaiyi Chen , Zhuoning Yuan , Jinfeng Yi , Bowen Zhou , Enhong Chen , and Tianbao Yang . Universal stagewise learning for non-convex problems with convergence on averaged solutions . In Proceedings of the 7th International Conference on Learning Representations (ICLR) , New Orleans, LA , 2019 b . Chen, Yuan, Yi, Zhou, Chen, and Yang]chen2018universalZaiyi Chen, Zhuoning Yuan, Jinfeng Yi, Bowen Zhou, Enhong Chen, and Tianbao Yang. Universal stagewise learning for non-convex problems with convergence on averaged solutions. In Proceedings of the 7th International Conference on Learning Representations (ICLR), New Orleans, LA, 2019 b ."},{"key":"e_1_3_2_1_7_1","volume-title":"Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research, 12 (Jul): 2121--2159","author":"Duchi John","year":"2011","unstructured":"John Duchi , Elad Hazan , and Yoram Singer . Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research, 12 (Jul): 2121--2159 , 2011 . John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research, 12 (Jul): 2121--2159, 2011."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330651"},{"key":"e_1_3_2_1_9_1","volume-title":"Letter recognition using holland-style adaptive classifiers. Machine learning, 6 (2): 161--182","author":"Frey Peter W","year":"1991","unstructured":"Peter W Frey and David J Slate . Letter recognition using holland-style adaptive classifiers. Machine learning, 6 (2): 161--182 , 1991 . Peter W Frey and David J Slate. Letter recognition using holland-style adaptive classifiers. Machine learning, 6 (2): 161--182, 1991."},{"key":"e_1_3_2_1_10_1","first-page":"2545","volume-title":"Proceedings of the 36th International Conference on Machine Learning (ICML)","author":"Haddadpour Farzin","year":"2019","unstructured":"Farzin Haddadpour , Mohammad Mahdi Kamani , Mehrdad Mahdavi , and Viveck R. Cadambe . Trading redundancy for communication: Speeding up distributed SGD for non-convex optimization . In Proceedings of the 36th International Conference on Machine Learning (ICML) , pages 2545 -- 2554 , Long Beach, CA , 2019 . Farzin Haddadpour, Mohammad Mahdi Kamani, Mehrdad Mahdavi, and Viveck R. Cadambe. Trading redundancy for communication: Speeding up distributed SGD for non-convex optimization. In Proceedings of the 36th International Conference on Machine Learning (ICML), pages 2545--2554, Long Beach, CA, 2019."},{"key":"e_1_3_2_1_11_1","volume-title":"Fedsketch: Communication-efficient and private federated learning via sketching. arXiv preprint arXiv:2008.04975","author":"Haddadpour Farzin","year":"2020","unstructured":"Farzin Haddadpour , Belhal Karimi , Ping Li , and Xiaoyun Li . Fedsketch: Communication-efficient and private federated learning via sketching. arXiv preprint arXiv:2008.04975 , 2020 . Farzin Haddadpour, Belhal Karimi, Ping Li, and Xiaoyun Li. Fedsketch: Communication-efficient and private federated learning via sketching. arXiv preprint arXiv:2008.04975, 2020."},{"key":"e_1_3_2_1_12_1","volume-title":"Improving generalization performance by switching from adam to sgd. arXiv preprint arXiv:1712.07628","author":"Keskar Nitish Shirish","year":"2017","unstructured":"Nitish Shirish Keskar and Richard Socher . Improving generalization performance by switching from adam to sgd. arXiv preprint arXiv:1712.07628 , 2017 . Nitish Shirish Keskar and Richard Socher. Improving generalization performance by switching from adam to sgd. arXiv preprint arXiv:1712.07628, 2017."},{"key":"e_1_3_2_1_13_1","volume-title":"Proceedings of the 3rd International Conference on Learning Representations (ICLR)","author":"Diederik","year":"2015","unstructured":"Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization . In Proceedings of the 3rd International Conference on Learning Representations (ICLR) , San Diego, CA , 2015 . Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR), San Diego, CA, 2015."},{"key":"e_1_3_2_1_14_1","unstructured":"\u1ef3 etal(2016)Konevc n\u1ef3 McMahan Yu Richt\u00e1rik Suresh and Bacon]konevcny2016federatedJakub Konevc n\u1ef3 H Brendan McMahan Felix X Yu Peter Richt\u00e1rik Ananda Theertha Suresh and Dave Bacon. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492 2016.  \u1ef3 et al.(2016)Konevc n\u1ef3 McMahan Yu Richt\u00e1rik Suresh and Bacon]konevcny2016federatedJakub Konevc n\u1ef3 H Brendan McMahan Felix X Yu Peter Richt\u00e1rik Ananda Theertha Suresh and Dave Bacon. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492 2016."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2640087.2644155"},{"key":"e_1_3_2_1_16_1","first-page":"983","volume-title":"The 22nd International Conference on Artificial Intelligence and Statistics (AISTATS)","author":"Li Xiaoyu","year":"2019","unstructured":"Xiaoyu Li and Francesco Orabona . On the convergence of stochastic gradient descent with adaptive stepsizes . In The 22nd International Conference on Artificial Intelligence and Statistics (AISTATS) , pages 983 -- 992 , Naha, Okinawa, Japan , 2019 . Xiaoyu Li and Francesco Orabona. On the convergence of stochastic gradient descent with adaptive stepsizes. In The 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), pages 983--992, Naha, Okinawa, Japan, 2019."},{"key":"e_1_3_2_1_17_1","volume-title":"Proceedings of the 6th International Conference on Learning Representations (ICLR)","author":"Lin Yujun","year":"2018","unstructured":"Yujun Lin , Song Han , Huizi Mao , Yu Wang , and Bill Dally . Deep gradient compression: Reducing the communication bandwidth for distributed training . In Proceedings of the 6th International Conference on Learning Representations (ICLR) , Vancouver, Canada , 2018 . Yujun Lin, Song Han, Huizi Mao, Yu Wang, and Bill Dally. Deep gradient compression: Reducing the communication bandwidth for distributed training. In Proceedings of the 6th International Conference on Learning Representations (ICLR), Vancouver, Canada, 2018."},{"key":"e_1_3_2_1_18_1","volume-title":"Proceedings of the 7th International Conference on Learning Representations (ICLR)","author":"Luo Liangchen","year":"2019","unstructured":"Liangchen Luo , Yuanhao Xiong , Yan Liu , and Xu Sun . Adaptive gradient methods with dynamic bound of learning rate . In Proceedings of the 7th International Conference on Learning Representations (ICLR) , New Orleans, LA , 2019 . Liangchen Luo, Yuanhao Xiong, Yan Liu, and Xu Sun. Adaptive gradient methods with dynamic bound of learning rate. In Proceedings of the 7th International Conference on Learning Representations (ICLR), New Orleans, LA, 2019."},{"key":"e_1_3_2_1_19_1","first-page":"1273","volume-title":"Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS)","author":"McMahan Brendan","year":"2017","unstructured":"Brendan McMahan , Eider Moore , Daniel Ramage , Seth Hampson , and Blaise Ag\u00fc era y Arcas . Communication-efficient learning of deep networks from decentralized data . In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS) , pages 1273 -- 1282 , Fort Lauderdale, FL , 2017 . Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Ag\u00fc era y Arcas. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), pages 1273--1282, Fort Lauderdale, FL, 2017."},{"key":"e_1_3_2_1_20_1","first-page":"693","volume-title":"Advances in neural information processing systems (NIPS)","author":"Recht Benjamin","year":"2011","unstructured":"Benjamin Recht , Christopher Re , Stephen Wright , and Feng Niu . Hogwild: A lock-free approach to parallelizing stochastic gradient descent . In Advances in neural information processing systems (NIPS) , pages 693 -- 701 , 2011 . Benjamin Recht, Christopher Re, Stephen Wright, and Feng Niu. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in neural information processing systems (NIPS), pages 693--701, 2011."},{"key":"e_1_3_2_1_21_1","volume-title":"Sanjiv Kumar, and H Brendan McMahan. Adaptive federated optimization. arXiv preprint arXiv:2003.00295","author":"Reddi Sashank","year":"2020","unstructured":"\u1ef3, Kumar, and McMahan]reddi2020adaptive Sashank Reddi , Zachary Charles , Manzil Zaheer , Zachary Garrett , Keith Rush , Jakub Konevc n\u1ef3 , Sanjiv Kumar, and H Brendan McMahan. Adaptive federated optimization. arXiv preprint arXiv:2003.00295 , 2020 . \u1ef3, Kumar, and McMahan]reddi2020adaptiveSashank Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Konevc n\u1ef3, Sanjiv Kumar, and H Brendan McMahan. Adaptive federated optimization. arXiv preprint arXiv:2003.00295, 2020."},{"key":"e_1_3_2_1_22_1","volume-title":"On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237","author":"Reddi Sashank J","year":"2019","unstructured":"Sashank J Reddi , Satyen Kale , and Sanjiv Kumar . On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 , 2019 . Sashank J Reddi, Satyen Kale, and Sanjiv Kumar. On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237, 2019."},{"key":"e_1_3_2_1_23_1","first-page":"2021","volume-title":"The 23rd International Conference on Artificial Intelligence and Statistics (AISTATS)","author":"Reisizadeh Amirhossein","year":"2020","unstructured":"Amirhossein Reisizadeh , Aryan Mokhtari , Hamed Hassani , Ali Jadbabaie , and Ramtin Pedarsani . Fedpaq : A communication-efficient federated learning method with periodic averaging and quantization . In The 23rd International Conference on Artificial Intelligence and Statistics (AISTATS) , pages 2021 -- 2031 , Palermo, Sicily, Italy , 2020 . Amirhossein Reisizadeh, Aryan Mokhtari, Hamed Hassani, Ali Jadbabaie, and Ramtin Pedarsani. Fedpaq: A communication-efficient federated learning method with periodic averaging and quantization. In The 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), pages 2021--2031, Palermo, Sicily, Italy, 2020."},{"key":"e_1_3_2_1_24_1","volume-title":"Empirical analysis of the hessian of over-parametrized neural networks. arXiv preprint arXiv:1706.04454","author":"Sagun Levent","year":"2017","unstructured":"Levent Sagun , Utku Evci , V Ugur Guney , Yann Dauphin , and Leon Bottou . Empirical analysis of the hessian of over-parametrized neural networks. arXiv preprint arXiv:1706.04454 , 2017 . Levent Sagun, Utku Evci, V Ugur Guney, Yann Dauphin, and Leon Bottou. Empirical analysis of the hessian of over-parametrized neural networks. arXiv preprint arXiv:1706.04454, 2017."},{"key":"e_1_3_2_1_25_1","first-page":"5956","volume-title":"Proceedings of the 36th International Conference on Machine Learning (ICML)","author":"Staib Matthew","year":"2019","unstructured":"Matthew Staib , Sashank J. Reddi , Satyen Kale , Sanjiv Kumar , and Suvrit Sra . Escaping saddle points with adaptive gradient methods . In Proceedings of the 36th International Conference on Machine Learning (ICML) , pages 5956 -- 5965 , Long Beach, CA , 2019 . Matthew Staib, Sashank J. Reddi, Satyen Kale, Sanjiv Kumar, and Suvrit Sra. Escaping saddle points with adaptive gradient methods. In Proceedings of the 36th International Conference on Machine Learning (ICML), pages 5956--5965, Long Beach, CA, 2019."},{"key":"e_1_3_2_1_26_1","volume-title":"Proceedings of the 7th International Conference on Learning Representations (ICLR)","author":"Stich Sebastian U.","year":"2019","unstructured":"Sebastian U. Stich . Local SGD converges fast and communicates little . In Proceedings of the 7th International Conference on Learning Representations (ICLR) , New Orleans, LA , 2019 . Sebastian U. Stich. Local SGD converges fast and communicates little. In Proceedings of the 7th International Conference on Learning Representations (ICLR), New Orleans, LA, 2019."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3366423.3380080"},{"key":"e_1_3_2_1_28_1","first-page":"1306","volume-title":"Advances in Neural Information Processing Systems (NeurIPS)","author":"Wangni Jianqiao","year":"2018","unstructured":"Jianqiao Wangni , Jialei Wang , Ji Liu , and Tong Zhang . Gradient sparsification for communication-efficient distributed optimization . In Advances in Neural Information Processing Systems (NeurIPS) , pages 1306 -- 1316 , Montr\u00e9 al, Canada , 2018 . Jianqiao Wangni, Jialei Wang, Ji Liu, and Tong Zhang. Gradient sparsification for communication-efficient distributed optimization. In Advances in Neural Information Processing Systems (NeurIPS), pages 1306--1316, Montr\u00e9 al, Canada, 2018."},{"key":"e_1_3_2_1_29_1","first-page":"6677","volume-title":"Proceedings of the 36th International Conference on Machine Learning (ICML)","author":"Ward Rachel","year":"2019","unstructured":"Rachel Ward , Xiaoxia Wu , and L\u00e9 on Bottou . Adagrad stepsizes: sharp convergence over nonconvex landscapes . In Proceedings of the 36th International Conference on Machine Learning (ICML) , pages 6677 -- 6686 , Long Beach, CA , 2019 . Rachel Ward, Xiaoxia Wu, and L\u00e9 on Bottou. Adagrad stepsizes: sharp convergence over nonconvex landscapes. In Proceedings of the 36th International Conference on Machine Learning (ICML), pages 6677--6686, Long Beach, CA, 2019."},{"key":"e_1_3_2_1_30_1","first-page":"1509","volume-title":"Advances in neural information processing systems (NIPS)","author":"Wen Wei","year":"2017","unstructured":"Wei Wen , Cong Xu , Feng Yan , Chunpeng Wu , Yandan Wang , Yiran Chen , and Hai Li . Terngrad: Ternary gradients to reduce communication in distributed deep learning . In Advances in neural information processing systems (NIPS) , pages 1509 -- 1519 , Long Beach , CA , 2017 . Wei Wen, Cong Xu, Feng Yan, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. Terngrad: Ternary gradients to reduce communication in distributed deep learning. In Advances in neural information processing systems (NIPS), pages 1509--1519, Long Beach, CA, 2017."},{"key":"e_1_3_2_1_31_1","volume-title":"Local adaalter: Communication-efficient stochastic gradient descent with adaptive learning rates. arXiv preprint arXiv:1911.09030","author":"Xie Cong","year":"2019","unstructured":"Cong Xie , Oluwasanmi Koyejo , Indranil Gupta , and Haibin Lin . Local adaalter: Communication-efficient stochastic gradient descent with adaptive learning rates. arXiv preprint arXiv:1911.09030 , 2019 . Cong Xie, Oluwasanmi Koyejo, Indranil Gupta, and Haibin Lin. Local adaalter: Communication-efficient stochastic gradient descent with adaptive learning rates. arXiv preprint arXiv:1911.09030, 2019."},{"key":"e_1_3_2_1_32_1","volume-title":"Asynchronous parallel adaptive stochastic gradient methods. arXiv preprint arXiv:2002.09095","author":"Xu Yangyang","year":"2020","unstructured":"Yangyang Xu , Colin Sutcher-Shepard , Yibo Xu , and Jie Chen . Asynchronous parallel adaptive stochastic gradient methods. arXiv preprint arXiv:2002.09095 , 2020 . Yangyang Xu, Colin Sutcher-Shepard, Yibo Xu, and Jie Chen. Asynchronous parallel adaptive stochastic gradient methods. arXiv preprint arXiv:2002.09095, 2020."},{"key":"e_1_3_2_1_33_1","volume-title":"Proceedings of the 8th International Conference on Learning Representations (ICLR), Addis Ababa","author":"You Yang","year":"2020","unstructured":"Yang You , Jing Li , Sashank Reddi , Jonathan Hseu , Sanjiv Kumar , Srinadh Bhojanapalli , Xiaodan Song , James Demmel , Kurt Keutzer , and Cho-Jui Hsieh . Large batch optimization for deep learning: Training bert in 76 minutes . In Proceedings of the 8th International Conference on Learning Representations (ICLR), Addis Ababa , Ethiopia , 2020 . Yang You, Jing Li, Sashank Reddi, Jonathan Hseu, Sanjiv Kumar, Srinadh Bhojanapalli, Xiaodan Song, James Demmel, Kurt Keutzer, and Cho-Jui Hsieh. Large batch optimization for deep learning: Training bert in 76 minutes. In Proceedings of the 8th International Conference on Learning Representations (ICLR), Addis Ababa, Ethiopia, 2020."},{"key":"e_1_3_2_1_34_1","first-page":"7184","volume-title":"Proceedings of the 36th International Conference on Machine Learning (ICML)","author":"Yu Hao","year":"2019","unstructured":"Hao Yu , Rong Jin , and Sen Yang . On the linear speedup analysis of communication efficient momentum SGD for distributed non-convex optimization . In Proceedings of the 36th International Conference on Machine Learning (ICML) , pages 7184 -- 7193 , Long Beach, CA , 2019 . Hao Yu, Rong Jin, and Sen Yang. On the linear speedup analysis of communication efficient momentum SGD for distributed non-convex optimization. In Proceedings of the 36th International Conference on Machine Learning (ICML), pages 7184--7193, Long Beach, CA, 2019."},{"key":"e_1_3_2_1_35_1","first-page":"9815","volume-title":"Advances in Neural Information Processing Systems (NeurIPS)","author":"Zaheer Manzil","year":"2018","unstructured":"Manzil Zaheer , Sashank J. Reddi , Devendra Singh Sachan , Satyen Kale , and Sanjiv Kumar . Adaptive methods for nonconvex optimization . In Advances in Neural Information Processing Systems (NeurIPS) , pages 9815 -- 9825 , Montr\u00e9 al, Canada , 2018 . Manzil Zaheer, Sashank J. Reddi, Devendra Singh Sachan, Satyen Kale, and Sanjiv Kumar. Adaptive methods for nonconvex optimization. In Advances in Neural Information Processing Systems (NeurIPS), pages 9815--9825, Montr\u00e9 al, Canada, 2018."},{"key":"e_1_3_2_1_36_1","volume-title":"Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701","author":"Zeiler Matthew D","year":"2012","unstructured":"Matthew D Zeiler . Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 , 2012 . Matthew D Zeiler. Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701, 2012."},{"key":"e_1_3_2_1_37_1","volume-title":"Proceedings of the 3rd Conference on Third Conference on Machine Learning and Systems (MLSys)","author":"Zhao Weijie","year":"2020","unstructured":"Weijie Zhao , Deping Xie , Ronglai Jia , Yulei Qian , Ruiquan Ding , Mingming Sun , and Ping Li . Distributed hierarchical gpu parameter server for massive scale deep learning ads systems . In Proceedings of the 3rd Conference on Third Conference on Machine Learning and Systems (MLSys) , Huston, TX , 2020 . Weijie Zhao, Deping Xie, Ronglai Jia, Yulei Qian, Ruiquan Ding, Mingming Sun, and Ping Li. Distributed hierarchical gpu parameter server for massive scale deep learning ads systems. In Proceedings of the 3rd Conference on Third Conference on Machine Learning and Systems (MLSys), Huston, TX, 2020."},{"key":"e_1_3_2_1_38_1","volume-title":"On the convergence of adaptive gradient methods for nonconvex optimization. arXiv preprint arXiv:1808.05671","author":"Zhou Dongruo","year":"2018","unstructured":"Dongruo Zhou , Yiqi Tang , Ziyan Yang , Yuan Cao , and Quanquan Gu . On the convergence of adaptive gradient methods for nonconvex optimization. arXiv preprint arXiv:1808.05671 , 2018 . Dongruo Zhou, Yiqi Tang, Ziyan Yang, Yuan Cao, and Quanquan Gu. On the convergence of adaptive gradient methods for nonconvex optimization. arXiv preprint arXiv:1808.05671, 2018."},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2018\/447"},{"key":"e_1_3_2_1_40_1","first-page":"2595","volume-title":"Advances in neural information processing systems (NIPS)","author":"Zinkevich Martin","year":"2010","unstructured":"Martin Zinkevich , Markus Weimer , Lihong Li , and Alex J Smola . Parallelized stochastic gradient descent . In Advances in neural information processing systems (NIPS) , pages 2595 -- 2603 , Vancouver , Canada , 2010 . Martin Zinkevich, Markus Weimer, Lihong Li, and Alex J Smola. Parallelized stochastic gradient descent. In Advances in neural information processing systems (NIPS), pages 2595--2603, Vancouver, Canada, 2010."},{"key":"e_1_3_2_1_41_1","volume-title":"On the convergence of adagrad with momentum for training deep neural networks. arXiv preprint arXiv:1808.03408, 2 (3): 5","author":"Zou Fangyu","year":"2018","unstructured":"Fangyu Zou and Li Shen . On the convergence of adagrad with momentum for training deep neural networks. arXiv preprint arXiv:1808.03408, 2 (3): 5 , 2018 . Fangyu Zou and Li Shen. On the convergence of adagrad with momentum for training deep neural networks. arXiv preprint arXiv:1808.03408, 2 (3): 5, 2018."},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01138"}],"event":{"name":"FODS '20: ACM-IMS Foundations of Data Science Conference","location":"Virtual Event USA","acronym":"FODS '20"},"container-title":["Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3412815.3416891","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3412815.3416891","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:25:02Z","timestamp":1750195502000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3412815.3416891"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,10,18]]},"references-count":41,"alternative-id":["10.1145\/3412815.3416891","10.1145\/3412815"],"URL":"https:\/\/doi.org\/10.1145\/3412815.3416891","relation":{},"subject":[],"published":{"date-parts":[[2020,10,18]]},"assertion":[{"value":"2020-10-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}