{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,1,3]],"date-time":"2023-01-03T05:57:21Z","timestamp":1672725441931},"reference-count":51,"publisher":"Association for Computing Machinery (ACM)","issue":"2","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGACT News"],"published-print":{"date-parts":[[2022,6,10]]},"abstract":"Machine learning models can match or surpass humans on specialized tasks such as image classification [20, 14], speech recognition [37], or complex games [39]. One key tool behind this progress has been a family of optimization methods which fall under the umbrella term of stochastic gradient descent (SGD) [35], which are by and large the method of choice for training large-scale machine learning models.<\/jats:p>","DOI":"10.1145\/3544979.3544991","type":"journal-article","created":{"date-parts":[[2022,7,27]],"date-time":"2022-07-27T07:52:51Z","timestamp":1658908371000},"page":"64-82","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Elastic Consistency"],"prefix":"10.1145","volume":"53","author":[{"given":"Dan","family":"Alistarh","sequence":"first","affiliation":[{"name":"IST Austria, Klosterneuburg, Austria"}]},{"given":"Ilia","family":"Markov","sequence":"additional","affiliation":[{"name":"IST Austria, Klosterneuburg, Austria"}]},{"given":"Giorgi","family":"Nadiradze","sequence":"additional","affiliation":[{"name":"IST Austria, Klosterneuburg, Austria"}]}],"member":"320","published-online":{"date-parts":[[2022,7,27]]},"reference":[{"key":"e_1_2_1_1_1","first-page":"265","volume-title":"OSDI","volume":"16","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi , Paul Barham , Jianmin Chen , Zhifeng Chen , Andy Davis , Jeffrey Dean , Matthieu Devin , Sanjay Ghemawat , Geoffrey Irving , Michael Isard , : A system for large-scale machine learning . In OSDI , volume 16 , pages 265 -- 283 , 2016 . Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. Tensorflow: A system for large-scale machine learning. In OSDI, volume 16, pages 265--283, 2016."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D17-1045"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3212734.3212763"},{"key":"e_1_2_1_4_1","first-page":"1709","volume-title":"NIPS","author":"Alistarh Dan","year":"2017","unstructured":"Dan Alistarh , Demjan Grubic , Jerry Li , Ryota Tomioka , and Milan Vojnovic . Qsgd : Communicationefficient sgd via gradient quantization and encoding . In NIPS , pages 1709 -- 1720 , 2017 . Dan Alistarh, Demjan Grubic, Jerry Li, Ryota Tomioka, and Milan Vojnovic. Qsgd: Communicationefficient sgd via gradient quantization and encoding. In NIPS, pages 1709--1720, 2017."},{"key":"e_1_2_1_5_1","first-page":"5977","volume-title":"NIPS","author":"Alistarh Dan","year":"2018","unstructured":"Dan Alistarh , Torsten Hoefler , Mikael Johansson , Nikola Konstantinov , Sarit Khirirat , and C\u00e9dric Renggli . The convergence of sparsified gradient methods . In NIPS , pages 5977 -- 5987 , 2018 . Dan Alistarh, Torsten Hoefler, Mikael Johansson, Nikola Konstantinov, Sarit Khirirat, and C\u00e9dric Renggli. The convergence of sparsified gradient methods. In NIPS, pages 5977--5987, 2018."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1002\/0471478210"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.5555\/59912"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1561\/9781601988614"},{"key":"e_1_2_1_9_1","first-page":"1531","volume-title":"NIPS","author":"Chaturapruek Sorathan","year":"2015","unstructured":"Sorathan Chaturapruek , John C Duchi , and Christopher R\u00e9 . Asynchronous stochastic convex optimization: the noise is in the noise and sgd don't care . In NIPS , pages 1531 -- 1539 , 2015 . Sorathan Chaturapruek, John C Duchi, and Christopher R\u00e9. Asynchronous stochastic convex optimization: the noise is in the noise and sgd don't care. In NIPS, pages 1531--1539, 2015."},{"key":"e_1_2_1_10_1","first-page":"571","volume-title":"OSDI","volume":"14","author":"Chilimbi Trishul M","year":"2014","unstructured":"Trishul M Chilimbi , Yutaka Suzue , Johnson Apacible , and Karthik Kalyanaraman . Project adam : Building an efficient and scalable deep learning training system . In OSDI , volume 14 , pages 571 -- 582 , 2014 . Trishul M Chilimbi, Yutaka Suzue, Johnson Apacible, and Karthik Kalyanaraman. Project adam: Building an efficient and scalable deep learning training system. In OSDI, volume 14, pages 571--582, 2014."},{"key":"e_1_2_1_11_1","first-page":"1223","volume-title":"NIPS","author":"Dean Jeffrey","year":"2012","unstructured":"Jeffrey Dean , Greg Corrado , Rajat Monga , Kai Chen , Matthieu Devin , Mark Mao , Andrew Senior , Paul Tucker , Ke Yang , Quoc V Le , Large scale distributed deep networks . In NIPS , pages 1223 -- 1231 , 2012 . Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V Le, et al. Large scale distributed deep networks. In NIPS, pages 1223--1231, 2012."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1137\/120880811"},{"key":"e_1_2_1_13_1","volume-title":"large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677","author":"Goyal Priya","year":"2017","unstructured":"Priya Goyal , Piotr Doll\u00e1r , Ross Girshick , Pieter Noordhuis , Lukasz Wesolowski , Aapo Kyrola , Andrew Tulloch , Yangqing Jia , and Kaiming He. Accurate , large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677 , 2017 . Priya Goyal, Piotr Doll\u00e1r, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_1_15_1","first-page":"1223","volume-title":"NIPS","author":"Ho Qirong","year":"2013","unstructured":"Qirong Ho , James Cipar , Henggang Cui , Seunghak Lee , Jin Kyu Kim , Phillip B Gibbons , Garth A Gibson , Greg Ganger , and Eric P Xing . More effective distributed ml via a stale synchronous parallel parameter server . In NIPS , pages 1223 -- 1231 , 2013 . Qirong Ho, James Cipar, Henggang Cui, Seunghak Lee, Jin Kyu Kim, Phillip B Gibbons, Garth A Gibson, Greg Ganger, and Eric P Xing. More effective distributed ml via a stale synchronous parallel parameter server. In NIPS, pages 1223--1231, 2013."},{"key":"e_1_2_1_16_1","volume-title":"Prioritybased parameter propagation for distributed dnn training. arXiv preprint arXiv:1905.03960","author":"Jayarajan Anand","year":"2019","unstructured":"Anand Jayarajan , Jinliang Wei , Garth Gibson , Alexandra Fedorova , and Gennady Pekhimenko . Prioritybased parameter propagation for distributed dnn training. arXiv preprint arXiv:1905.03960 , 2019 . Anand Jayarajan, Jinliang Wei, Garth Gibson, Alexandra Fedorova, and Gennady Pekhimenko. Prioritybased parameter propagation for distributed dnn training. arXiv preprint arXiv:1905.03960, 2019."},{"key":"e_1_2_1_17_1","first-page":"3252","volume-title":"ICML","author":"Karimireddy Sai Praneeth","year":"2019","unstructured":"Sai Praneeth Karimireddy , Quentin Rebjock , Sebastian U. Stich , and Martin Jaggi . Error feedback fixes signsgd and other gradient compression schemes . In ICML , pages 3252 -- 3261 , 2019 . Sai Praneeth Karimireddy, Quentin Rebjock, Sebastian U. Stich, and Martin Jaggi. Error feedback fixes signsgd and other gradient compression schemes. In ICML, pages 3252--3261, 2019."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4842-2766-4_12"},{"key":"e_1_2_1_19_1","volume-title":"Learning multiple layers of features from tiny images","author":"Krizhevsky Alex","year":"2009","unstructured":"Alex Krizhevsky and Geoffrey Hinton . Learning multiple layers of features from tiny images . 2009 . Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. 2009."},{"key":"e_1_2_1_20_1","first-page":"1097","volume-title":"NIPS","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky , Ilya Sutskever , and Geoffrey E Hinton . Imagenet classification with deep convolutional neural networks . In NIPS , pages 1097 -- 1105 , 2012 . Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, pages 1097--1105, 2012."},{"key":"e_1_2_1_21_1","first-page":"46","volume-title":"AISTATS","author":"Leblond R\u00e9mi","year":"2017","unstructured":"R\u00e9mi Leblond , Fabian Pedregosa , and Simon Lacoste-Julien . ASAGA : asynchronous parallel SAGA . In AISTATS , pages 46 -- 54 , 2017 . R\u00e9mi Leblond, Fabian Pedregosa, and Simon Lacoste-Julien. ASAGA: asynchronous parallel SAGA. In AISTATS, pages 46--54, 2017."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.5555\/2685048.2685095"},{"key":"e_1_2_1_23_1","first-page":"2737","volume-title":"NIPS","author":"Lian Xiangru","year":"2015","unstructured":"Xiangru Lian , Yijun Huang , Yuncheng Li , and Ji Liu . Asynchronous parallel stochastic gradient for nonconvex optimization . In NIPS , pages 2737 -- 2745 , 2015 . Xiangru Lian, Yijun Huang, Yuncheng Li, and Ji Liu. Asynchronous parallel stochastic gradient for nonconvex optimization. In NIPS, pages 2737--2745, 2015."},{"key":"e_1_2_1_24_1","first-page":"5330","volume-title":"NIPS","author":"Lian Xiangru","year":"2017","unstructured":"Xiangru Lian , Ce Zhang , Huan Zhang , Cho-Jui Hsieh , Wei Zhang , and Ji Liu . Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent . In NIPS , pages 5330 -- 5340 , 2017 . Xiangru Lian, Ce Zhang, Huan Zhang, Cho-Jui Hsieh, Wei Zhang, and Ji Liu. Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent. In NIPS, pages 5330--5340, 2017."},{"key":"e_1_2_1_25_1","volume-title":"Kumar Kshitij Patel, and Martin Jaggi. Don't use large mini-batches, use local sgd. arXiv preprint arXiv:1808.07217","author":"Lin Tao","year":"2018","unstructured":"Tao Lin , Sebastian U Stich , Kumar Kshitij Patel, and Martin Jaggi. Don't use large mini-batches, use local sgd. arXiv preprint arXiv:1808.07217 , 2018 . Tao Lin, Sebastian U Stich, Kumar Kshitij Patel, and Martin Jaggi. Don't use large mini-batches, use local sgd. arXiv preprint arXiv:1808.07217, 2018."},{"key":"e_1_2_1_26_1","volume-title":"ICLR","author":"Lin Yujun","year":"2018","unstructured":"Yujun Lin , Song Han , Huizi Mao , Yu Wang , and Bill Dally . Deep gradient compression: Reducing the communication bandwidth for distributed training . In ICLR , Poster , 2018 . Yujun Lin, Song Han, Huizi Mao, Yu Wang, and Bill Dally. Deep gradient compression: Reducing the communication bandwidth for distributed training. In ICLR, Poster, 2018."},{"key":"e_1_2_1_27_1","volume-title":"Towards optimal convergence rate in decentralized stochastic training. ArXiv, abs\/2006.08085","author":"Lu Yucheng","year":"2020","unstructured":"Yucheng Lu , Z. Li , and C. D. Sa . Towards optimal convergence rate in decentralized stochastic training. ArXiv, abs\/2006.08085 , 2020 . Yucheng Lu, Z. Li, and C. D. Sa. Towards optimal convergence rate in decentralized stochastic training. ArXiv, abs\/2006.08085, 2020."},{"key":"e_1_2_1_28_1","first-page":"53","article-title":"A unified analysis of weakly consistent parallel learning. arXiv preprint arXiv:2005.06706, 2020","author":"Lu Yucheng","year":"2022","unstructured":"Yucheng Lu , Jack Nash , and Christopher De Sa. Mixml : A unified analysis of weakly consistent parallel learning. arXiv preprint arXiv:2005.06706, 2020 . ACM SIGACT News 78 June 2022 , vol. 53 , no. 2 Yucheng Lu, Jack Nash, and Christopher De Sa. Mixml: A unified analysis of weakly consistent parallel learning. arXiv preprint arXiv:2005.06706, 2020. ACM SIGACT News 78 June 2022, vol. 53, no. 2","journal-title":"ACM SIGACT News"},{"key":"e_1_2_1_29_1","volume-title":"Elastic consistency: A general consistency model for distributed stochastic gradient descent. arXiv preprint arXiv:2001.05918","author":"Nadiradze Giorgi","year":"2020","unstructured":"Giorgi Nadiradze , Ilia Markov , Bapi Chatterjee , Vyacheslav Kungurtsev , and Dan Alistarh . Elastic consistency: A general consistency model for distributed stochastic gradient descent. arXiv preprint arXiv:2001.05918 , 2020 . Giorgi Nadiradze, Ilia Markov, Bapi Chatterjee, Vyacheslav Kungurtsev, and Dan Alistarh. Elastic consistency: A general consistency model for distributed stochastic gradient descent. arXiv preprint arXiv:2001.05918, 2020."},{"key":"e_1_2_1_30_1","first-page":"2","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence, Virtual Conference","author":"Nadiradze Giorgi","year":"2021","unstructured":"Giorgi Nadiradze , Ilia Markov , Bapi Chatterjee , Vyacheslav Kungurtsev , and Dan Alistarh . Elastic consistency : A practical consistency model for distributed stochastic gradient descent . In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual Conference , pages 2 -- 9 , 2021 . Giorgi Nadiradze, Ilia Markov, Bapi Chatterjee, Vyacheslav Kungurtsev, and Dan Alistarh. Elastic consistency: A practical consistency model for distributed stochastic gradient descent. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual Conference, pages 2--9, 2021."},{"key":"e_1_2_1_31_1","first-page":"3747","volume-title":"ICML","author":"Nguyen Lam M.","year":"2018","unstructured":"Lam M. Nguyen , Phuong Ha Nguyen , Marten van Dijk , Peter Richt\u00e1rik , Katya Scheinberg , and Martin Tak\u00e1c . SGD and hogwild! convergence without the bounded gradients assumption . In ICML , pages 3747 -- 3755 , 2018 . Lam M. Nguyen, Phuong Ha Nguyen, Marten van Dijk, Peter Richt\u00e1rik, Katya Scheinberg, and Martin Tak\u00e1c. SGD and hogwild! convergence without the bounded gradients assumption. In ICML, pages 3747--3755, 2018."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3341301.3359642"},{"key":"e_1_2_1_33_1","first-page":"5220","volume-title":"ICML","author":"Qiao Aurick","year":"2019","unstructured":"Aurick Qiao , Bryon Aragam , Bingjing Zhang , and Eric P. Xing . Fault tolerance in iterative-convergent machine learning . In ICML , pages 5220 -- 5230 , 2019 . Aurick Qiao, Bryon Aragam, Bingjing Zhang, and Eric P. Xing. Fault tolerance in iterative-convergent machine learning. In ICML, pages 5220--5230, 2019."},{"key":"e_1_2_1_34_1","first-page":"693","volume-title":"NIPS","author":"Recht Benjamin","year":"2011","unstructured":"Benjamin Recht , Christopher Re , Stephen Wright , and Feng Niu . Hogwild : A lock-free approach to parallelizing stochastic gradient descent . In NIPS , pages 693 -- 701 , 2011 . Benjamin Recht, Christopher Re, Stephen Wright, and Feng Niu. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In NIPS, pages 693--701, 2011."},{"key":"e_1_2_1_35_1","first-page":"400","volume-title":"A stochastic approximation method. The annals of mathematical statistics","author":"Robbins Herbert","year":"1951","unstructured":"Herbert Robbins and Sutton Monro . A stochastic approximation method. The annals of mathematical statistics , pages 400 -- 407 , 1951 . Herbert Robbins and Sutton Monro. A stochastic approximation method. The annals of mathematical statistics, pages 400--407, 1951."},{"key":"e_1_2_1_36_1","first-page":"2674","volume-title":"NIPS","author":"Sa Christopher De","year":"2015","unstructured":"Christopher De Sa , Ce Zhang , Kunle Olukotun , and Christopher R\u00e9 . Taming the wild: A unified analysis of hogwild-style algorithms . In NIPS , pages 2674 -- 2682 , 2015 . Christopher De Sa, Ce Zhang, Kunle Olukotun, and Christopher R\u00e9. Taming the wild: A unified analysis of hogwild-style algorithms. In NIPS, pages 2674--2682, 2015."},{"key":"e_1_2_1_37_1","first-page":"1058","volume-title":"INTERSPEECH","author":"Seide Frank","year":"2014","unstructured":"Frank Seide , Hao Fu , Jasha Droppo , Gang Li , and Dong Yu . 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech dnns . In INTERSPEECH , pages 1058 -- 1062 , 2014 . Frank Seide, Hao Fu, Jasha Droppo, Gang Li, and Dong Yu. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech dnns. In INTERSPEECH, pages 1058--1062, 2014."},{"key":"e_1_2_1_38_1","volume-title":"Horovod: fast and easy distributed deep learning in tensorflow. arXiv preprint arXiv:1802.05799","author":"Sergeev Alexander","year":"2018","unstructured":"Alexander Sergeev and Mike Del Balso . Horovod: fast and easy distributed deep learning in tensorflow. arXiv preprint arXiv:1802.05799 , 2018 . Alexander Sergeev and Mike Del Balso. Horovod: fast and easy distributed deep learning in tensorflow. arXiv preprint arXiv:1802.05799, 2018."},{"key":"e_1_2_1_39_1","volume-title":"Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484--489","author":"Silver David","year":"2016","unstructured":"David Silver , Aja Huang , Chris J Maddison , Arthur Guez , Laurent Sifre , George Van Den Driessche , Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484--489 , 2016 . David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484--489, 2016."},{"key":"e_1_2_1_40_1","volume-title":"Local sgd converges fast and communicates little. arXiv preprint arXiv:1805.09767","author":"Stich Sebastian U","year":"2018","unstructured":"Sebastian U Stich . Local sgd converges fast and communicates little. arXiv preprint arXiv:1805.09767 , 2018 . Sebastian U Stich. Local sgd converges fast and communicates little. arXiv preprint arXiv:1805.09767, 2018."},{"key":"e_1_2_1_41_1","first-page":"4452","volume-title":"NIPS","author":"Stich Sebastian U.","year":"2018","unstructured":"Sebastian U. Stich , Jean-Baptiste Cordonnier , and Martin Jaggi . Sparsified SGD with memory . In NIPS , pages 4452 -- 4463 , 2018 . Sebastian U. Stich, Jean-Baptiste Cordonnier, and Martin Jaggi. Sparsified SGD with memory. In NIPS, pages 4452--4463, 2018."},{"key":"e_1_2_1_42_1","volume-title":"The error-feedback framework: Better rates for sgd with delayed gradients and compressed communication. arXiv preprint arXiv:1909.05350","author":"Stich Sebastian U","year":"2019","unstructured":"Sebastian U Stich and Sai Praneeth Karimireddy . The error-feedback framework: Better rates for sgd with delayed gradients and compressed communication. arXiv preprint arXiv:1909.05350 , 2019 . Sebastian U Stich and Sai Praneeth Karimireddy. The error-feedback framework: Better rates for sgd with delayed gradients and compressed communication. arXiv preprint arXiv:1909.05350, 2019."},{"key":"e_1_2_1_43_1","volume-title":"Sixteenth Annual Conference of the International Speech Communication Association, 2015","volume":"53","author":"Strom Nikko","year":"2022","unstructured":"Nikko Strom . Scalable distributed dnn training using commodity gpu cloud computing . In Sixteenth Annual Conference of the International Speech Communication Association, 2015 . ACM SIGACT News 79 June 2022 , vol. 53 , no. 2 Nikko Strom. Scalable distributed dnn training using commodity gpu cloud computing. In Sixteenth Annual Conference of the International Speech Communication Association, 2015. ACM SIGACT News 79 June 2022, vol. 53, no. 2"},{"key":"e_1_2_1_44_1","first-page":"32","article-title":"Practical low-rank gradient compression for distributed optimization","author":"Vogels Thijs","year":"2019","unstructured":"Thijs Vogels , Sai Praneeth Karimireddy , and Martin Jaggi . Powersgd : Practical low-rank gradient compression for distributed optimization . Advances in Neural Information Processing Systems , 32 , 2019 . Thijs Vogels, Sai Praneeth Karimireddy, and Martin Jaggi. Powersgd: Practical low-rank gradient compression for distributed optimization. Advances in Neural Information Processing Systems, 32, 2019.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_45_1","first-page":"31","article-title":"Communication-efficient learning via atomic sparsification","author":"Wang Hongyi","year":"2018","unstructured":"Hongyi Wang , Scott Sievert , Shengchao Liu , Zachary Charles , Dimitris Papailiopoulos , and Stephen Wright . Atomo : Communication-efficient learning via atomic sparsification . Advances in Neural Information Processing Systems , 31 , 2018 . Hongyi Wang, Scott Sievert, Shengchao Liu, Zachary Charles, Dimitris Papailiopoulos, and Stephen Wright. Atomo: Communication-efficient learning via atomic sparsification. Advances in Neural Information Processing Systems, 31, 2018.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_46_1","volume-title":"Cooperative sgd: A unified framework for the design and analysis of communication-efficient sgd algorithms. arXiv preprint arXiv:1808.07576","author":"Wang Jianyu","year":"2018","unstructured":"Jianyu Wang and Gauri Joshi . Cooperative sgd: A unified framework for the design and analysis of communication-efficient sgd algorithms. arXiv preprint arXiv:1808.07576 , 2018 . Jianyu Wang and Gauri Joshi. Cooperative sgd: A unified framework for the design and analysis of communication-efficient sgd algorithms. arXiv preprint arXiv:1808.07576, 2018."},{"key":"e_1_2_1_47_1","first-page":"1306","volume-title":"NIPS","author":"Wangni Jianqiao","year":"2018","unstructured":"Jianqiao Wangni , Jialei Wang , Ji Liu , and Tong Zhang . Gradient sparsification for communicationefficient distributed optimization . In NIPS , pages 1306 -- 1316 , 2018 . Jianqiao Wangni, Jialei Wang, Ji Liu, and Tong Zhang. Gradient sparsification for communicationefficient distributed optimization. In NIPS, pages 1306--1316, 2018."},{"key":"e_1_2_1_48_1","first-page":"8496","volume-title":"Advances in neural information processing systems","author":"Woodworth Blake E","year":"2018","unstructured":"Blake E Woodworth , Jialei Wang , Adam Smith , Brendan McMahan , and Nati Srebro . Graph oracle models, lower bounds, and gaps for parallel stochastic optimization . In Advances in neural information processing systems , pages 8496 -- 8506 , 2018 . Blake E Woodworth, Jialei Wang, Adam Smith, Brendan McMahan, and Nati Srebro. Graph oracle models, lower bounds, and gaps for parallel stochastic optimization. In Advances in neural information processing systems, pages 8496--8506, 2018."},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/TBDATA.2015.2472014"},{"key":"e_1_2_1_50_1","volume-title":"Wide residual networks. arXiv preprint arXiv:1605.07146","author":"Zagoruyko Sergey","year":"2016","unstructured":"Sergey Zagoruyko and Nikos Komodakis . Wide residual networks. arXiv preprint arXiv:1605.07146 , 2016 . Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016."},{"key":"e_1_2_1_51_1","first-page":"685","volume-title":"Advances in neural information processing systems","author":"Zhang Sixin","year":"2015","unstructured":"Sixin Zhang , Anna E Choromanska , and Yann LeCun . Deep learning with elastic averaging sgd . In Advances in neural information processing systems , pages 685 -- 693 , 2015 . Sixin Zhang, Anna E Choromanska, and Yann LeCun. Deep learning with elastic averaging sgd. In Advances in neural information processing systems, pages 685--693, 2015."}],"container-title":["ACM SIGACT News"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3544979.3544991","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,2]],"date-time":"2023-01-02T11:19:25Z","timestamp":1672658365000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3544979.3544991"}},"subtitle":["A Consistency Criterion for Distributed Optimization"],"short-title":[],"issued":{"date-parts":[[2022,6,10]]},"references-count":51,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,6,10]]}},"alternative-id":["10.1145\/3544979.3544991"],"URL":"http:\/\/dx.doi.org\/10.1145\/3544979.3544991","relation":{},"ISSN":["0163-5700"],"issn-type":[{"value":"0163-5700","type":"print"}],"subject":["General Materials Science"],"published":{"date-parts":[[2022,6,10]]},"assertion":[{"value":"2022-07-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}