{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,6]],"date-time":"2025-07-06T04:40:04Z","timestamp":1751776804173,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":36,"publisher":"ACM","license":[{"start":{"date-parts":[[2018,7,23]],"date-time":"2018-07-23T00:00:00Z","timestamp":1532304000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"EU Marie SkBodowska-Curie Grant Agreement","award":["665385"],"award-info":[{"award-number":["665385"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2018,7,23]]},"DOI":"10.1145\/3212734.3212763","type":"proceedings-article","created":{"date-parts":[[2018,7,31]],"date-time":"2018-07-31T16:28:33Z","timestamp":1533054513000},"page":"169-178","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":17,"title":["The Convergence of Stochastic Gradient Descent in Asynchronous Shared Memory"],"prefix":"10.1145","author":[{"given":"Dan","family":"Alistarh","sequence":"first","affiliation":[{"name":"IST Austria, Klosterneuburg, Austria"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Christopher","family":"De Sa","sequence":"additional","affiliation":[{"name":"Cornell University, Ithaca, NY, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nikola","family":"Konstantinov","sequence":"additional","affiliation":[{"name":"IST Austria, Klosterneuburg, Austria"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2018,7,23]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Alekh Agarwal and John C Duchi . 2011. Distributed delayed stochastic optimization. In Advances in Neural Information Processing Systems. 873--881. Alekh Agarwal and John C Duchi . 2011. Distributed delayed stochastic optimization. In Advances in Neural Information Processing Systems. 873--881."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"crossref","unstructured":"Hagit Attiya and Jennifer Welch . 2004. Distributed computing: fundamentals simulations and advanced topics. Vol. Vol. 19. John Wiley & Sons. Hagit Attiya and Jennifer Welch . 2004. Distributed computing: fundamentals simulations and advanced topics. Vol. Vol. 19. John Wiley & Sons.","DOI":"10.1002\/0471478210"},{"key":"e_1_3_2_1_3_1","unstructured":"Dimitri P Bertsekas and John N Tsitsiklis . 1989. Parallel and distributed computation: numerical methods. Vol. Vol. 23. Prentice hall Englewood Cliffs NJ. Dimitri P Bertsekas and John N Tsitsiklis . 1989. Parallel and distributed computation: numerical methods. Vol. Vol. 23. Prentice hall Englewood Cliffs NJ."},{"volume-title":"Proceedings of NIPS 2017","year":"2017","author":"Blanchard Peva","key":"e_1_3_2_1_4_1"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1561\/2200000050"},{"key":"e_1_3_2_1_6_1","unstructured":"Sorathan Chaturapruek John C Duchi and Christopher R\u00e9 . 2015. Asynchronous stochastic convex optimization: the noise is in the noise and SGD don't care Advances in Neural Information Processing Systems. 1531--1539. Sorathan Chaturapruek John C Duchi and Christopher R\u00e9 . 2015. Asynchronous stochastic convex optimization: the noise is in the noise and SGD don't care Advances in Neural Information Processing Systems. 1531--1539."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3154503"},{"key":"e_1_3_2_1_8_1","volume-title":"OSDI","volume":"14","author":"Chilimbi Trishul M","year":"2014"},{"key":"e_1_3_2_1_9_1","unstructured":"Christopher De Sa Kunle Olukotun and Christopher R\u00e9 . 2014. Global convergence of stochastic gradient descent for some non-convex matrix problems. arXiv preprint arXiv:1411.1134 (2014). Christopher De Sa Kunle Olukotun and Christopher R\u00e9 . 2014. Global convergence of stochastic gradient descent for some non-convex matrix problems. arXiv preprint arXiv:1411.1134 (2014)."},{"key":"e_1_3_2_1_10_1","unstructured":"Christopher M De Sa Ce Zhang Kunle Olukotun and Christopher R\u00e9 . 2015. Taming the Wild: A Unified Analysis of Hogwild-style Algorithms Advances in Neural Information Processing Systems. 2674--2682. Christopher M De Sa Ce Zhang Kunle Olukotun and Christopher R\u00e9 . 2015. Taming the Wild: A Unified Analysis of Hogwild-style Algorithms Advances in Neural Information Processing Systems. 2674--2682."},{"key":"e_1_3_2_1_11_1","unstructured":"Jeffrey Dean Greg Corrado Rajat Monga Kai Chen Matthieu Devin Mark Mao Andrew Senior Paul Tucker Ke Yang Quoc V Le et almbox. . 2012. Large scale distributed deep networks. In Advances in neural information processing systems. 1223--1231. Jeffrey Dean Greg Corrado Rajat Monga Kai Chen Matthieu Devin Mark Mao Andrew Senior Paul Tucker Ke Yang Quoc V Le et almbox. . 2012. Large scale distributed deep networks. In Advances in neural information processing systems. 1223--1231."},{"key":"e_1_3_2_1_12_1","unstructured":"John C Duchi Sorathan Chaturapruek and Christopher R\u00e9 . 2015. Asynchronous stochastic convex optimization. arXiv preprint arXiv:1508.00882 (2015). John C Duchi Sorathan Chaturapruek and Christopher R\u00e9 . 2015. Asynchronous stochastic convex optimization. arXiv preprint arXiv:1508.00882 (2015)."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-662-48653-5_14"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_15_1","unstructured":"Alex Krizhevsky Ilya Sutskever and Geoffrey E Hinton . 2012. Imagenet classification with deep convolutional neural networks Advances in neural information processing systems. 1097--1105. Alex Krizhevsky Ilya Sutskever and Geoffrey E Hinton . 2012. Imagenet classification with deep convolutional neural networks Advances in neural information processing systems. 1097--1105."},{"key":"e_1_3_2_1_16_1","first-page":"2331","article-title":"Slow learners are fast","volume":"22","author":"Langford John","year":"2009","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_17_1","unstructured":"R\u00e9mi Leblond Fabian Pederegosa and Simon Lacoste-Julien . 2018. Improved asynchronous parallel optimization analysis for stochastic incremental methods. arXiv preprint arXiv:1801.03749 (2018). R\u00e9mi Leblond Fabian Pederegosa and Simon Lacoste-Julien . 2018. Improved asynchronous parallel optimization analysis for stochastic incremental methods. arXiv preprint arXiv:1801.03749 (2018)."},{"key":"e_1_3_2_1_18_1","unstructured":"Xiangru Lian Yijun Huang Yuncheng Li and Ji Liu . 2015. Asynchronous parallel stochastic gradient for nonconvex optimization Advances in Neural Information Processing Systems. 2737--2745. Xiangru Lian Yijun Huang Yuncheng Li and Ji Liu . 2015. Asynchronous parallel stochastic gradient for nonconvex optimization Advances in Neural Information Processing Systems. 2737--2745."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1137\/140961134"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"crossref","unstructured":"Ioannis Mitliagkas Ce Zhang Stefan Hadjis and Christopher R\u00e9 . 2016. Asynchrony begets momentum with an application to deep learning 54th Annual Allerton Conference on Communication Control and Computing Allerton 2016 Monticello IL USA September 27--30 2016. IEEE 997--1004. Ioannis Mitliagkas Ce Zhang Stefan Hadjis and Christopher R\u00e9 . 2016. Asynchrony begets momentum with an application to deep learning 54th Annual Allerton Conference on Communication Control and Computing Allerton 2016 Monticello IL USA September 27--30 2016. IEEE 997--1004.","DOI":"10.1109\/ALLERTON.2016.7852343"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2775054.2694374"},{"key":"e_1_3_2_1_22_1","unstructured":"Lam M Nguyen Phuong Ha Nguyen Marten van Dijk Peter Richt\u00e1rik Katya Scheinberg and Martin Tak\u00e1vc . 2018. SGD and Hogwild! Convergence Without the Bounded Gradients Assumption. arXiv preprint arXiv:1802.03801 (2018). Lam M Nguyen Phuong Ha Nguyen Marten van Dijk Peter Richt\u00e1rik Katya Scheinberg and Martin Tak\u00e1vc . 2018. SGD and Hogwild! Convergence Without the Bounded Gradients Assumption. arXiv preprint arXiv:1802.03801 (2018)."},{"volume-title":"Hogwild: A lock-free approach to parallelizing stochastic gradient descent Advances in neural information processing systems. 693--701.","year":"2011","author":"Recht Benjamin","key":"e_1_3_2_1_23_1"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"crossref","unstructured":"Herbert Robbins and Sutton Monro . 1951. A stochastic approximation method. The Annals of Mathematical Statistics (1951) 400--407. Herbert Robbins and Sutton Monro . 1951. A stochastic approximation method. The Annals of Mathematical Statistics (1951) 400--407.","DOI":"10.1214\/aoms\/1177729586"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"crossref","unstructured":"David E Rumelhart Geoffrey E Hinton and Ronald J Williams . 1986. Learning representations by back-propagating errors. nature Vol. 323 6088 (1986) 533. David E Rumelhart Geoffrey E Hinton and Ronald J Williams . 1986. Learning representations by back-propagating errors. nature Vol. 323 6088 (1986) 533.","DOI":"10.1038\/323533a0"},{"key":"e_1_3_2_1_26_1","unstructured":"C. M. De Sa C. Zhang K. Olukotun and C. Re . 2015. Taming the wild: A unified analysis of hogwild-style algorihms Advances in Neural Information Processing Systems. C. M. De Sa C. Zhang K. Olukotun and C. Re . 2015. Taming the wild: A unified analysis of hogwild-style algorihms Advances in Neural Information Processing Systems."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"crossref","unstructured":"Scott Sallinen Nadathur Satish Mikhail Smelyanskiy Samantika S Sury and Christopher R\u00e9 . 2016. High performance parallel stochastic gradient descent in shared memory Parallel and Distributed Processing Symposium 2016 IEEE International. IEEE 873--882. Scott Sallinen Nadathur Satish Mikhail Smelyanskiy Samantika S Sury and Christopher R\u00e9 . 2016. High performance parallel stochastic gradient descent in shared memory Parallel and Distributed Processing Symposium 2016 IEEE International. IEEE 873--882.","DOI":"10.1109\/IPDPS.2016.107"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"crossref","unstructured":"F. Seide H. Fu L. G. Jasha and D. Yu . 2014. 1-bit stochastic gradient descent and application to data-parallel distributed training of speech dnns. Interspeech (2014). F. Seide H. Fu L. G. Jasha and D. Yu . 2014. 1-bit stochastic gradient descent and application to data-parallel distributed training of speech dnns. Interspeech (2014).","DOI":"10.21437\/Interspeech.2014-274"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"crossref","unstructured":"David Silver Aja Huang Chris J Maddison Arthur Guez Laurent Sifre George Van Den Driessche Julian Schrittwieser Ioannis Antonoglou Veda Panneershelvam Marc Lanctot et almbox. . 2016. Mastering the game of Go with deep neural networks and tree search. nature Vol. 529 7587 (2016) 484--489. David Silver Aja Huang Chris J Maddison Arthur Guez Laurent Sifre George Van Den Driessche Julian Schrittwieser Ioannis Antonoglou Veda Panneershelvam Marc Lanctot et almbox. . 2016. Mastering the game of Go with deep neural networks and tree search. nature Vol. 529 7587 (2016) 484--489.","DOI":"10.1038\/nature16961"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"crossref","unstructured":"Lili Su and Nitin Vaidya . 2016 a. Multi-agent optimization in the presence of byzantine adversaries: fundamental limits American Control Conference (ACC) 2016. IEEE 7183--7188. Lili Su and Nitin Vaidya . 2016 a. Multi-agent optimization in the presence of byzantine adversaries: fundamental limits American Control Conference (ACC) 2016. IEEE 7183--7188.","DOI":"10.1109\/ACC.2016.7526806"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"crossref","unstructured":"Lili Su and Nitin H Vaidya . 2016 b. Asynchronous non-bayesian learning in the presence of crash failures International Symposium on Stabilization Safety and Security of Distributed Systems. Springer 352--367. Lili Su and Nitin H Vaidya . 2016 b. Asynchronous non-bayesian learning in the presence of crash failures International Symposium on Stabilization Safety and Security of Distributed Systems. Springer 352--367.","DOI":"10.1007\/978-3-319-49259-9_28"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2933057.2933105"},{"key":"e_1_3_2_1_33_1","unstructured":"Zhixiong Yang and Waheed U Bajwa . 2017. ByRDiE: Byzantine-resilient distributed coordinate descent for decentralized learning. arXiv preprint arXiv:1708.08155 (2017). Zhixiong Yang and Waheed U Bajwa . 2017. ByRDiE: Byzantine-resilient distributed coordinate descent for decentralized learning. arXiv preprint arXiv:1708.08155 (2017)."},{"key":"e_1_3_2_1_34_1","unstructured":"Jian Zhang Ioannis Mitliagkas and Christopher R\u00e9 . 2017. YellowFin and the Art of Momentum Tuning. arXiv preprint arXiv:1706.03471 (2017). Jian Zhang Ioannis Mitliagkas and Christopher R\u00e9 . 2017. YellowFin and the Art of Momentum Tuning. arXiv preprint arXiv:1706.03471 (2017)."},{"key":"e_1_3_2_1_35_1","unstructured":"Wei Zhang Suyog Gupta Xiangru Lian and Ji Liu . 2016. Staleness-aware async-SGD for distributed deep learning Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. AAAI Press 2350--2356. Wei Zhang Suyog Gupta Xiangru Lian and Ji Liu . 2016. Staleness-aware async-SGD for distributed deep learning Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. AAAI Press 2350--2356."},{"volume-title":"Asynchronous Stochastic Gradient Descent with Delay Compensation International Conference on Machine Learning. 4120--4129","year":"2017","author":"Zheng Shuxin","key":"e_1_3_2_1_36_1"}],"event":{"name":"PODC '18: ACM Symposium on Principles of Distributed Computing","sponsor":["SIGOPS ACM Special Interest Group on Operating Systems","SIGACT ACM Special Interest Group on Algorithms and Computation Theory"],"location":"Egham United Kingdom","acronym":"PODC '18"},"container-title":["Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3212734.3212763","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3212734.3212763","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,6]],"date-time":"2025-07-06T04:09:03Z","timestamp":1751774943000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3212734.3212763"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,7,23]]},"references-count":36,"alternative-id":["10.1145\/3212734.3212763","10.1145\/3212734"],"URL":"https:\/\/doi.org\/10.1145\/3212734.3212763","relation":{},"subject":[],"published":{"date-parts":[[2018,7,23]]},"assertion":[{"value":"2018-07-23","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}