{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:18:35Z","timestamp":1750220315304,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":30,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,11,20]],"date-time":"2021-11-20T00:00:00Z","timestamp":1637366400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"QBITS Fund"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,11,20]]},"DOI":"10.1145\/3505711.3505715","type":"proceedings-article","created":{"date-parts":[[2022,3,29]],"date-time":"2022-03-29T02:20:17Z","timestamp":1648520417000},"page":"23-30","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["On Large-Batch Training of Residual Networks with SignSGD"],"prefix":"10.1145","author":[{"given":"Alex","family":"Xavier","sequence":"first","affiliation":[{"name":"CODEGEN QBiTS Lab, University of Moratuwa,Sri Lanka, Sri Lanka and Computer Science and Engineering, University of Moratuwa,Sri Lanka, Sri Lanka"}]},{"given":"Dumindu","family":"Tissera","sequence":"additional","affiliation":[{"name":"CODEGEN QBiTS Lab, University of Moratuwa,Sri Lanka, Sri Lanka"}]},{"given":"Rukshan","family":"Wijesinghe","sequence":"additional","affiliation":[{"name":"CODEGEN QBiTS Lab, University of Moratuwa,Sri Lanka, Sri Lanka"}]},{"given":"Kasun","family":"Vithanage","sequence":"additional","affiliation":[{"name":"CODEGEN QBiTS Lab, University of Moratuwa,Sri Lanka, Sri Lanka"}]},{"given":"Ranga","family":"Rodrigo","sequence":"additional","affiliation":[{"name":"CODEGEN QBiTS Lab, University of Moratuwa,Sri Lanka, Sri Lanka"}]},{"given":"Subha","family":"Fernando","sequence":"additional","affiliation":[{"name":"CODEGEN QBiTS Lab, University of Moratuwa,Sri Lanka, Sri Lanka"}]},{"given":"Sanath","family":"Jayasena","sequence":"additional","affiliation":[{"name":"Computer Science and Engineering, University of Moratuwa,Sri Lanka, Sri Lanka"}]}],"member":"320","published-online":{"date-parts":[[2022,3,28]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"International Conference on Neural Information Processing Systems (NIPS). 2680\u20132691","author":"Allen-Zhu Zeyuan","year":"2018","unstructured":"Zeyuan Allen-Zhu . 2018 . Natasha 2: Faster Non-Convex Optimization Than SGD . In International Conference on Neural Information Processing Systems (NIPS). 2680\u20132691 . Zeyuan Allen-Zhu. 2018. Natasha 2: Faster Non-Convex Optimization Than SGD. In International Conference on Neural Information Processing Systems (NIPS). 2680\u20132691."},{"key":"e_1_3_2_1_2_1","volume-title":"International Conference on Machine Learning (ICML). PMLR, 342\u2013350","author":"Balduzzi David","year":"2017","unstructured":"David Balduzzi , Marcus Frean , Lennox Leary , JP Lewis , Kurt Wan-Duo Ma , and Brian McWilliams . 2017 . The Shattered Gradients Problem: If resnets are the answer, then what is the question? . In International Conference on Machine Learning (ICML). PMLR, 342\u2013350 . David Balduzzi, Marcus Frean, Lennox Leary, JP Lewis, Kurt Wan-Duo Ma, and Brian McWilliams. 2017. The Shattered Gradients Problem: If resnets are the answer, then what is the question?. In International Conference on Machine Learning (ICML). PMLR, 342\u2013350."},{"key":"e_1_3_2_1_3_1","volume-title":"International Conference on Machine Learning (ICML). PMLR, 560\u2013569","author":"Bernstein Jeremy","year":"2018","unstructured":"Jeremy Bernstein , Yu-Xiang Wang , Kamyar Azizzadenesheli , and Animashree Anandkumar . 2018 . signSGD: Compressed Optimisation for Non-Convex Problems . In International Conference on Machine Learning (ICML). PMLR, 560\u2013569 . Jeremy Bernstein, Yu-Xiang Wang, Kamyar Azizzadenesheli, and Animashree Anandkumar. 2018. signSGD: Compressed Optimisation for Non-Convex Problems. In International Conference on Machine Learning (ICML). PMLR, 560\u2013569."},{"key":"e_1_3_2_1_4_1","unstructured":"Sara Botelho-Andrade Peter\u00a0G Casazza Desai Cheng and Tin Tran. 2017. The Exact Constant for the \u21131 \u2212 \u21132 Norm Inequality. arXiv preprint arXiv:1707.00631(2017).  Sara Botelho-Andrade Peter\u00a0G Casazza Desai Cheng and Tin Tran. 2017. The Exact Constant for the \u21131 \u2212 \u21132 Norm Inequality. arXiv preprint arXiv:1707.00631(2017)."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1137\/16M1080173"},{"volume-title":"Convex Optimization","author":"Boyd Stephen","key":"e_1_3_2_1_6_1","unstructured":"Stephen Boyd , Stephen\u00a0 P Boyd , and Lieven Vandenberghe . 2004. Convex Optimization . Cambridge university Press . Stephen Boyd, Stephen\u00a0P Boyd, and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge university Press."},{"key":"e_1_3_2_1_7_1","volume-title":"Understanding the Difficulty of Training Deep Feedforward Neural Networks. In Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS). PMLR, 249\u2013256","author":"Glorot Xavier","year":"2010","unstructured":"Xavier Glorot and Yoshua Bengio . 2010 . Understanding the Difficulty of Training Deep Feedforward Neural Networks. In Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS). PMLR, 249\u2013256 . Xavier Glorot and Yoshua Bengio. 2010. Understanding the Difficulty of Training Deep Feedforward Neural Networks. In Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS). PMLR, 249\u2013256."},{"key":"e_1_3_2_1_8_1","unstructured":"Priya Goyal Piotr Doll\u00e1r Ross Girshick Pieter Noordhuis Lukasz Wesolowski Aapo Kyrola Andrew Tulloch Yangqing Jia and Kaiming He. 2017. Accurate Large Minibatch SGD: Training Imagenet in 1 Hour. arXiv preprint arXiv:1706.02677.  Priya Goyal Piotr Doll\u00e1r Ross Girshick Pieter Noordhuis Lukasz Wesolowski Aapo Kyrola Andrew Tulloch Yangqing Jia and Kaiming He. 2017. Accurate Large Minibatch SGD: Training Imagenet in 1 Hour. arXiv preprint arXiv:1706.02677."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_10_1","unstructured":"Alex Hern\u00e1ndez-Garc\u00eda and Peter K\u00f6nig. 2018. Do Deep Nets Really Need Weight Decay and Dropout?arXiv preprint arXiv:1802.07042(2018).  Alex Hern\u00e1ndez-Garc\u00eda and Peter K\u00f6nig. 2018. Do Deep Nets Really Need Weight Decay and Dropout?arXiv preprint arXiv:1802.07042(2018)."},{"volume-title":"Train Longer","author":"Hoffer Elad","key":"e_1_3_2_1_11_1","unstructured":"Elad Hoffer , Itay Hubara , and Daniel Soudry . 2017. Train Longer , Generalize Better : Closing the Generalization Gap in Large Batch Training of Neural Networks. In Advances in Neural Information Processing Systems (NIPS) . 1731\u20131741. Elad Hoffer, Itay Hubara, and Daniel Soudry. 2017. Train Longer, Generalize Better: Closing the Generalization Gap in Large Batch Training of Neural Networks. In Advances in Neural Information Processing Systems (NIPS). 1731\u20131741."},{"key":"e_1_3_2_1_12_1","volume-title":"International Conference on Machine Learning (ICML). PMLR, 448\u2013456","author":"Ioffe Sergey","year":"2015","unstructured":"Sergey Ioffe and Christian Szegedy . 2015 . Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift . In International Conference on Machine Learning (ICML). PMLR, 448\u2013456 . Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In International Conference on Machine Learning (ICML). PMLR, 448\u2013456."},{"key":"e_1_3_2_1_13_1","unstructured":"Stanis\u0142aw Jastrz\u0119bski Zachary Kenton Devansh Arpit Nicolas Ballas Asja Fischer Yoshua Bengio and Amos Storkey. 2018. Three Factors Influencing Minima in SGD. In In Artificial Neural Networks and Machine Learning (ICANN).  Stanis\u0142aw Jastrz\u0119bski Zachary Kenton Devansh Arpit Nicolas Ballas Asja Fischer Yoshua Bengio and Amos Storkey. 2018. Three Factors Influencing Minima in SGD. In In Artificial Neural Networks and Machine Learning (ICANN)."},{"key":"e_1_3_2_1_14_1","volume-title":"Error Feedback Fixes SignSGD and Other Gradient Compression Schemes. In International Conference on Machine Learning (ICML). PMLR, 3252\u20133261","author":"Karimireddy Sai\u00a0Praneeth","year":"2019","unstructured":"Sai\u00a0Praneeth Karimireddy , Quentin Rebjock , Sebastian Stich , and Martin Jaggi . 2019 . Error Feedback Fixes SignSGD and Other Gradient Compression Schemes. In International Conference on Machine Learning (ICML). PMLR, 3252\u20133261 . Sai\u00a0Praneeth Karimireddy, Quentin Rebjock, Sebastian Stich, and Martin Jaggi. 2019. Error Feedback Fixes SignSGD and Other Gradient Compression Schemes. In International Conference on Machine Learning (ICML). PMLR, 3252\u20133261."},{"key":"e_1_3_2_1_15_1","volume-title":"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima. In International Conference on Learning Representations (ICLR).","author":"Keskar Nitish\u00a0Shirish","year":"2017","unstructured":"Nitish\u00a0Shirish Keskar , Dheevatsa Mudigere , Jorge Nocedal , Mikhail Smelyanskiy , and Ping Tak\u00a0Peter Tang . 2017 . On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima. In International Conference on Learning Representations (ICLR). Nitish\u00a0Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak\u00a0Peter Tang. 2017. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima. In International Conference on Learning Representations (ICLR)."},{"key":"e_1_3_2_1_16_1","unstructured":"Alex Krizhevsky. 2014. One Weird Trick for Parallelizing Convolutional Neural Networks. arXiv preprint arXiv:1404.5997(2014).  Alex Krizhevsky. 2014. One Weird Trick for Parallelizing Convolutional Neural Networks. arXiv preprint arXiv:1404.5997(2014)."},{"key":"e_1_3_2_1_17_1","unstructured":"Alex Krizhevsky Geoffrey Hinton 2009. Learning multiple layers of features from tiny images. (2009).  Alex Krizhevsky Geoffrey Hinton 2009. Learning multiple layers of features from tiny images. (2009)."},{"key":"e_1_3_2_1_18_1","unstructured":"Alex Krizhevsky Ilya Sutskever and Geoffrey\u00a0E Hinton. 2012. Imagenet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems (NIPS) Vol.\u00a025. 1097\u20131105.  Alex Krizhevsky Ilya Sutskever and Geoffrey\u00a0E Hinton. 2012. Imagenet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems (NIPS) Vol.\u00a025. 1097\u20131105."},{"volume-title":"Neural Networks: Tricks of the Trade","author":"LeCun A","key":"e_1_3_2_1_19_1","unstructured":"Yann\u00a0 A LeCun , L\u00e9on Bottou , Genevieve\u00a0 B Orr , and Klaus-Robert M\u00fcller . 2012. Efficient backprop . In Neural Networks: Tricks of the Trade . Springer , 9\u201348. Yann\u00a0A LeCun, L\u00e9on Bottou, Genevieve\u00a0B Orr, and Klaus-Robert M\u00fcller. 2012. Efficient backprop. In Neural Networks: Tricks of the Trade. Springer, 9\u201348."},{"key":"e_1_3_2_1_20_1","volume-title":"Visualizing the Loss Landscape of Neural Nets. In International Conference on Neural Information Processing Systems (NIPS). 6391\u20136401","author":"Li Hao","year":"2018","unstructured":"Hao Li , Zheng Xu , Gavin Taylor , Christoph Studer , and Tom Goldstein . 2018 . Visualizing the Loss Landscape of Neural Nets. In International Conference on Neural Information Processing Systems (NIPS). 6391\u20136401 . Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, and Tom Goldstein. 2018. Visualizing the Loss Landscape of Neural Nets. In International Conference on Neural Information Processing Systems (NIPS). 6391\u20136401."},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICNN.1993.298623"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2014-274"},{"key":"e_1_3_2_1_23_1","volume-title":"Increase the Batch Size. In International Conference on Learning Representations (ICLR).","author":"Smith L","year":"2018","unstructured":"Samuel\u00a0 L Smith , Pieter-Jan Kindermans , Chris Ying , and Quoc\u00a0 V Le . 2018 . Don\u2019t Decay the Learning Rate , Increase the Batch Size. In International Conference on Learning Representations (ICLR). Samuel\u00a0L Smith, Pieter-Jan Kindermans, Chris Ying, and Quoc\u00a0V Le. 2018. Don\u2019t Decay the Learning Rate, Increase the Batch Size. In International Conference on Learning Representations (ICLR)."},{"key":"e_1_3_2_1_24_1","volume-title":"Atomo: Communication-Efficient Learning via Atomic Sparsification. In Advances in Neural Information Processing Systems (NIPS). 9850\u20139861.","author":"Wang Hongyi","year":"2018","unstructured":"Hongyi Wang , Scott Sievert , Zachary Charles , Shengchao Liu , Stephen Wright , and Dimitris Papailiopoulos . 2018 . Atomo: Communication-Efficient Learning via Atomic Sparsification. In Advances in Neural Information Processing Systems (NIPS). 9850\u20139861. Hongyi Wang, Scott Sievert, Zachary Charles, Shengchao Liu, Stephen Wright, and Dimitris Papailiopoulos. 2018. Atomo: Communication-Efficient Learning via Atomic Sparsification. In Advances in Neural Information Processing Systems (NIPS). 9850\u20139861."},{"key":"e_1_3_2_1_25_1","volume-title":"International Conference on Neural Information Processing Systems (NIPS).","author":"Wen Wei","year":"2017","unstructured":"Wei Wen , Cong Xu , Feng Yan , Chunpeng Wu , Yandan Wang , Yiran Chen , and Hai Li . 2017 . Terngrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning . In International Conference on Neural Information Processing Systems (NIPS). Wei Wen, Cong Xu, Feng Yan, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2017. Terngrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning. In International Conference on Neural Information Processing Systems (NIPS)."},{"key":"e_1_3_2_1_26_1","volume-title":"An Empirical Study of Stochastic Gradient Descent with Structured Covariance Noise. In International Conference on Artificial Intelligence and Statistics (AISTATS. PMLR, 3621\u20133631","author":"Wen Yeming","year":"2020","unstructured":"Yeming Wen , Kevin Luk , Maxime Gazeau , Guodong Zhang , Harris Chan , and Jimmy Ba . 2020 . An Empirical Study of Stochastic Gradient Descent with Structured Covariance Noise. In International Conference on Artificial Intelligence and Statistics (AISTATS. PMLR, 3621\u20133631 . Yeming Wen, Kevin Luk, Maxime Gazeau, Guodong Zhang, Harris Chan, and Jimmy Ba. 2020. An Empirical Study of Stochastic Gradient Descent with Structured Covariance Noise. In International Conference on Artificial Intelligence and Statistics (AISTATS. PMLR, 3621\u20133631."},{"key":"e_1_3_2_1_27_1","volume-title":"International Conference on Machine Learning (ICML). PMLR, 10367\u201310376","author":"Wu Jingfeng","year":"2020","unstructured":"Jingfeng Wu , Wenqing Hu , Haoyi Xiong , Jun Huan , Vladimir Braverman , and Zhanxing Zhu . 2020 . On the Noisy Gradient Descent that generalizes as sgd . In International Conference on Machine Learning (ICML). PMLR, 10367\u201310376 . Jingfeng Wu, Wenqing Hu, Haoyi Xiong, Jun Huan, Vladimir Braverman, and Zhanxing Zhu. 2020. On the Noisy Gradient Descent that generalizes as sgd. In International Conference on Machine Learning (ICML). PMLR, 10367\u201310376."},{"key":"e_1_3_2_1_28_1","unstructured":"Yang You Igor Gitman and Boris Ginsburg. 2017. Large Batch Training of Convolutional Networks. arXiv preprint arXiv:1708.03888.  Yang You Igor Gitman and Boris Ginsburg. 2017. Large Batch Training of Convolutional Networks. arXiv preprint arXiv:1708.03888."},{"key":"e_1_3_2_1_29_1","volume-title":"Understanding Deep Learning Requires Rethinking Generalization. In International Conference on Learning Representations (ICLR).","author":"Zhang Chiyuan","year":"2017","unstructured":"Chiyuan Zhang , Samy Bengio , Moritz Hardt , Benjamin Recht , and Oriol Vinyals . 2017 . Understanding Deep Learning Requires Rethinking Generalization. In International Conference on Learning Representations (ICLR). Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. 2017. Understanding Deep Learning Requires Rethinking Generalization. In International Conference on Learning Representations (ICLR)."},{"key":"e_1_3_2_1_30_1","volume-title":"International Conference on Learning Representations (ICLR). 7654\u20137663","author":"Zhu Zhanxing","year":"2019","unstructured":"Zhanxing Zhu , Jingfeng Wu , Bing Yu , Lei Wu , and Jinwen Ma . 2019 . The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects . In International Conference on Learning Representations (ICLR). 7654\u20137663 . Zhanxing Zhu, Jingfeng Wu, Bing Yu, Lei Wu, and Jinwen Ma. 2019. The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects. In International Conference on Learning Representations (ICLR). 7654\u20137663."}],"event":{"name":"ICAAI 2021: 2021 the 5th International Conference on Advances in Artificial Intelligence","acronym":"ICAAI 2021","location":"Virtual Event United Kingdom"},"container-title":["2021 The 5th International Conference on Advances in Artificial Intelligence (ICAAI)"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3505711.3505715","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3505711.3505715","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:11:49Z","timestamp":1750191109000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3505711.3505715"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,11,20]]},"references-count":30,"alternative-id":["10.1145\/3505711.3505715","10.1145\/3505711"],"URL":"https:\/\/doi.org\/10.1145\/3505711.3505715","relation":{},"subject":[],"published":{"date-parts":[[2021,11,20]]},"assertion":[{"value":"2022-03-28","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}