{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T09:06:36Z","timestamp":1777626396893,"version":"3.51.4"},"publisher-location":"New York, NY, USA","reference-count":32,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,8,17]],"date-time":"2020-08-17T00:00:00Z","timestamp":1597622400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Guangdong Natural Science Foundation of China","award":["2018B030312002"],"award-info":[{"award-number":["2018B030312002"]}]},{"name":"Guangdong Key Fields R&D Plan of China","award":["2019B020228001"],"award-info":[{"award-number":["2019B020228001"]}]},{"name":"National Natural Science Foundation of China","award":["U1801266, U1811461, U1711263"],"award-info":[{"award-number":["U1801266, U1811461, U1711263"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,8,17]]},"DOI":"10.1145\/3404397.3404401","type":"proceedings-article","created":{"date-parts":[[2020,8,9]],"date-time":"2020-08-09T03:54:26Z","timestamp":1596945266000},"page":"1-10","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Dual-Way Gradient Sparsification for Asynchronous Distributed Deep Learning"],"prefix":"10.1145","author":[{"given":"Zijie","family":"Yan","sequence":"first","affiliation":[{"name":"Sun Yat-sen University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Danyang","family":"Xiao","sequence":"additional","affiliation":[{"name":"Sun Yat-sen University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mengqiang","family":"Chen","sequence":"additional","affiliation":[{"name":"Sun Yat-sen University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jieying","family":"Zhou","sequence":"additional","affiliation":[{"name":"Sun Yat-sen University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Weigang","family":"Wu","sequence":"additional","affiliation":[{"name":"Sun Yat-sen University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2020,8,17]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Alham\u00a0Fikri Aji and Kenneth Heafield. 2017. Sparse communication for distributed gradient descent. arXiv preprint arXiv:1704.05021(2017).  Alham\u00a0Fikri Aji and Kenneth Heafield. 2017. Sparse communication for distributed gradient descent. arXiv preprint arXiv:1704.05021(2017)."},{"key":"e_1_3_2_1_2_1","unstructured":"Dan Alistarh Torsten Hoefler Mikael Johansson Nikola Konstantinov Sarit Khirirat and C\u00e9dric Renggli. 2018. The convergence of sparsified gradient methods. In Advances in Neural Information Processing Systems. 5973\u20135983.  Dan Alistarh Torsten Hoefler Mikael Johansson Nikola Konstantinov Sarit Khirirat and C\u00e9dric Renggli. 2018. The convergence of sparsified gradient methods. In Advances in Neural Information Processing Systems. 5973\u20135983."},{"key":"e_1_3_2_1_3_1","volume-title":"Qsgd: Randomized quantization for communication-optimal stochastic gradient descent. arXiv preprint arXiv:1610.02132(2016).","author":"Alistarh Dan","year":"2016"},{"key":"e_1_3_2_1_4_1","unstructured":"Saar Barkai Ido Hakimi and Assaf Schuster. 2019. Gap Aware Mitigation of Gradient Staleness. arXiv preprint arXiv:1909.10802(2019).  Saar Barkai Ido Hakimi and Assaf Schuster. 2019. Gap Aware Mitigation of Gradient Staleness. arXiv preprint arXiv:1909.10802(2019)."},{"key":"e_1_3_2_1_5_1","volume-title":"Thirty-Second AAAI Conference on Artificial Intelligence.","author":"Chen Chia-Yu","year":"2018"},{"key":"e_1_3_2_1_6_1","volume-title":"International conference on machine learning. 1337\u20131345","author":"Coates Adam","year":"2013"},{"key":"e_1_3_2_1_7_1","unstructured":"Jeffrey Dean Greg Corrado Rajat Monga Kai Chen Matthieu Devin Mark Mao Andrew Senior Paul Tucker Ke Yang Quoc\u00a0V Le 2012. Large scale distributed deep networks. In Advances in neural information processing systems. 1223\u20131231.  Jeffrey Dean Greg Corrado Rajat Monga Kai Chen Matthieu Devin Mark Mao Andrew Senior Paul Tucker Ke Yang Quoc\u00a0V Le 2012. Large scale distributed deep networks. In Advances in neural information processing systems. 1223\u20131231."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/MLHPC.2016.004"},{"key":"e_1_3_2_1_10_1","unstructured":"Priya Goyal Piotr Doll\u00e1r Ross Girshick Pieter Noordhuis Lukasz Wesolowski Aapo Kyrola Andrew Tulloch Yangqing Jia and Kaiming He. 2017. Accurate large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677(2017).  Priya Goyal Piotr Doll\u00e1r Ross Girshick Pieter Noordhuis Lukasz Wesolowski Aapo Kyrola Andrew Tulloch Yangqing Jia and Kaiming He. 2017. Accurate large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677(2017)."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2016.0028"},{"key":"e_1_3_2_1_12_1","unstructured":"Elad Hoffer Itay Hubara and Daniel Soudry. 2017. Train longer generalize better: closing the generalization gap in large batch training of neural networks. In Advances in Neural Information Processing Systems. 1731\u20131741.  Elad Hoffer Itay Hubara and Daniel Soudry. 2017. Train longer generalize better: closing the generalization gap in large batch training of neural networks. In Advances in Neural Information Processing Systems. 1731\u20131741."},{"key":"e_1_3_2_1_13_1","unstructured":"Xianyan Jia Shutao Song Wei He Yangzihao Wang Haidong Rong Feihu Zhou Liqiang Xie Zhenyu Guo Yuanzhou Yang Liwei Yu 2018. Highly scalable deep learning training system with mixed-precision: Training imagenet in four minutes. arXiv preprint arXiv:1807.11205(2018).  Xianyan Jia Shutao Song Wei He Yangzihao Wang Haidong Rong Feihu Zhou Liqiang Xie Zhenyu Guo Yuanzhou Yang Liwei Yu 2018. Highly scalable deep learning training system with mixed-precision: Training imagenet in four minutes. arXiv preprint arXiv:1807.11205(2018)."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2834892.2834893"},{"key":"e_1_3_2_1_15_1","unstructured":"Alex Krizhevsky. 2014. One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997(2014).  Alex Krizhevsky. 2014. One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997(2014)."},{"key":"e_1_3_2_1_17_1","volume-title":"11th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 14). 583\u2013598.","author":"Li Mu"},{"key":"e_1_3_2_1_18_1","unstructured":"Yujun Lin Song Han Huizi Mao Yu Wang and William\u00a0J Dally. 2017. Deep gradient compression: Reducing the communication bandwidth for distributed training. arXiv preprint arXiv:1712.01887(2017).  Yujun Lin Song Han Huizi Mao Yu Wang and William\u00a0J Dally. 2017. Deep gradient compression: Reducing the communication bandwidth for distributed training. arXiv preprint arXiv:1712.01887(2017)."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/ALLERTON.2016.7852343"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1137\/0330046"},{"key":"e_1_3_2_1_21_1","volume-title":"Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in neural information processing systems. 693\u2013701.","author":"Recht Benjamin","year":"2011"},{"key":"e_1_3_2_1_22_1","volume-title":"Sparcml: High-performance sparse communication for machine learning. arXiv preprint arXiv:1802.08021(2018).","author":"Renggli C\u00e8dric","year":"2018"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-015-0816-y"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2014-274"},{"key":"e_1_3_2_1_25_1","unstructured":"Sebastian\u00a0U Stich Jean-Baptiste Cordonnier and Martin Jaggi. 2018. Sparsified SGD with memory. In Advances in Neural Information Processing Systems. 4447\u20134458.  Sebastian\u00a0U Stich Jean-Baptiste Cordonnier and Martin Jaggi. 2018. Sparsified SGD with memory. In Advances in Neural Information Processing Systems. 4447\u20134458."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2015-354"},{"key":"e_1_3_2_1_27_1","unstructured":"Hanlin Tang Xiangru Lian Tong Zhang and Ji Liu. 2019. DoubleSqueeze: Parallel Stochastic Gradient Descent with Double-Pass Error-Compensated Compression. arXiv preprint arXiv:1905.05957(2019).  Hanlin Tang Xiangru Lian Tong Zhang and Ji Liu. 2019. DoubleSqueeze: Parallel Stochastic Gradient Descent with Double-Pass Error-Compensated Compression. arXiv preprint arXiv:1905.05957(2019)."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/TAC.1986.1104412"},{"key":"e_1_3_2_1_29_1","unstructured":"Weiran Wang and Nathan Srebro. 2017. Stochastic nonconvex optimization with large minibatches. arXiv preprint arXiv:1709.08728(2017).  Weiran Wang and Nathan Srebro. 2017. Stochastic nonconvex optimization with large minibatches. arXiv preprint arXiv:1709.08728(2017)."},{"key":"e_1_3_2_1_30_1","unstructured":"Jianqiao Wangni Jialei Wang Ji Liu and Tong Zhang. 2018. Gradient sparsification for communication-efficient distributed optimization. In Advances in Neural Information Processing Systems. 1299\u20131309.  Jianqiao Wangni Jialei Wang Ji Liu and Tong Zhang. 2018. Gradient sparsification for communication-efficient distributed optimization. In Advances in Neural Information Processing Systems. 1299\u20131309."},{"key":"e_1_3_2_1_31_1","volume-title":"Terngrad: Ternary gradients to reduce communication in distributed deep learning. In Advances in neural information processing systems. 1509\u20131519.","author":"Wen Wei","year":"2017"},{"key":"e_1_3_2_1_32_1","volume-title":"Scaling sgd batch size to 32k for imagenet training. arXiv preprint arXiv:1708.03888 6","author":"You Yang","year":"2017"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2013.6638950"}],"event":{"name":"ICPP '20: 49th International Conference on Parallel Processing","location":"Edmonton AB Canada","acronym":"ICPP '20"},"container-title":["49th International Conference on Parallel Processing - ICPP"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3404397.3404401","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3404397.3404401","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:31:42Z","timestamp":1750195902000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3404397.3404401"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,8,17]]},"references-count":32,"alternative-id":["10.1145\/3404397.3404401","10.1145\/3404397"],"URL":"https:\/\/doi.org\/10.1145\/3404397.3404401","relation":{},"subject":[],"published":{"date-parts":[[2020,8,17]]},"assertion":[{"value":"2020-08-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}