{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,2]],"date-time":"2026-04-02T09:42:02Z","timestamp":1775122922869,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":39,"publisher":"ACM","license":[{"start":{"date-parts":[[2019,10,27]],"date-time":"2019-10-27T00:00:00Z","timestamp":1572134400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2019,10,27]]},"DOI":"10.1145\/3341301.3359642","type":"proceedings-article","created":{"date-parts":[[2019,10,21]],"date-time":"2019-10-21T13:34:22Z","timestamp":1571664862000},"page":"16-29","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":255,"title":["A generic communication scheduler for distributed DNN training acceleration"],"prefix":"10.1145","author":[{"given":"Yanghua","family":"Peng","sequence":"first","affiliation":[{"name":"The University of Hong Kong and ByteDance Inc."}]},{"given":"Yibo","family":"Zhu","sequence":"additional","affiliation":[{"name":"ByteDance Inc."}]},{"given":"Yangrui","family":"Chen","sequence":"additional","affiliation":[{"name":"The University of Hong Kong"}]},{"given":"Yixin","family":"Bao","sequence":"additional","affiliation":[{"name":"The University of Hong Kong"}]},{"given":"Bairen","family":"Yi","sequence":"additional","affiliation":[{"name":"ByteDance Inc."}]},{"given":"Chang","family":"Lan","sequence":"additional","affiliation":[{"name":"ByteDance Inc."}]},{"given":"Chuan","family":"Wu","sequence":"additional","affiliation":[{"name":"ByteDance Inc."}]},{"given":"Chuanxiong","family":"Guo","sequence":"additional","affiliation":[{"name":"The University of Hong Kong"}]}],"member":"320","published-online":{"date-parts":[[2019,10,27]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"2019. ByteScheduler Appendix. https:\/\/www.dropbox.com\/s\/smoq6xd6pr7av81\/bytescheduler_appendix.pdf?dl=0.  2019. ByteScheduler Appendix. https:\/\/www.dropbox.com\/s\/smoq6xd6pr7av81\/bytescheduler_appendix.pdf?dl=0."},{"key":"e_1_3_2_1_2_1","unstructured":"2019. ByteScheduler Source Code. https:\/\/github.com\/bytedance\/byteps.  2019. ByteScheduler Source Code. https:\/\/github.com\/bytedance\/byteps."},{"key":"e_1_3_2_1_3_1","unstructured":"2019. MLPerf Training v0.6 Results. https:\/\/mlperf.org\/training-results-0-6\/.  2019. MLPerf Training v0.6 Results. https:\/\/mlperf.org\/training-results-0-6\/."},{"key":"e_1_3_2_1_4_1","unstructured":"2019. NVIDIA Collective Communications Library (NCCL). https:\/\/developer.nvidia.com\/nccl.  2019. NVIDIA Collective Communications Library (NCCL). https:\/\/developer.nvidia.com\/nccl."},{"key":"e_1_3_2_1_5_1","unstructured":"2019. TensorFlow Grapper. https:\/\/github.com\/tensorflow\/tensorflow\/tree\/master\/tensorflow\/core\/grappler.  2019. TensorFlow Grapper. https:\/\/github.com\/tensorflow\/tensorflow\/tree\/master\/tensorflow\/core\/grappler."},{"key":"e_1_3_2_1_6_1","volume-title":"Proceedings of USENIX Symposium on Operating Systems Design and Implementation (OSDI).","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi , Paul Barham , Jianmin Chen , Zhifeng Chen , Andy Davis , Jeffrey Dean , Matthieu Devin , Sanjay Ghemawat , Geoffrey Irving , Michael Isard , 2016 . TensorFlow: A System for Large-Scale Machine Learning . In Proceedings of USENIX Symposium on Operating Systems Design and Implementation (OSDI). Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A System for Large-Scale Machine Learning. In Proceedings of USENIX Symposium on Operating Systems Design and Implementation (OSDI)."},{"key":"e_1_3_2_1_7_1","volume-title":"Sparse Communication for Distributed Gradient Descent. arXiv preprint arXiv:1704.05021","author":"Aji Alham Fikri","year":"2017","unstructured":"Alham Fikri Aji and Kenneth Heafield . 2017. Sparse Communication for Distributed Gradient Descent. arXiv preprint arXiv:1704.05021 ( 2017 ). Alham Fikri Aji and Kenneth Heafield. 2017. Sparse Communication for Distributed Gradient Descent. arXiv preprint arXiv:1704.05021 (2017)."},{"key":"e_1_3_2_1_8_1","volume-title":"Proceedings of USENIX Symposium on Networked Systems Design and Implementation (NSDI).","author":"Alipourfard Omid","year":"2017","unstructured":"Omid Alipourfard , Hongqiang Harry Liu , Jianshu Chen , Shivaram Venkataraman , Minlan Yu , and Ming Zhang . 2017 . Cherrypick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics . In Proceedings of USENIX Symposium on Networked Systems Design and Implementation (NSDI). Omid Alipourfard, Hongqiang Harry Liu, Jianshu Chen, Shivaram Venkataraman, Minlan Yu, and Ming Zhang. 2017. Cherrypick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics. In Proceedings of USENIX Symposium on Networked Systems Design and Implementation (NSDI)."},{"key":"e_1_3_2_1_9_1","volume-title":"Proceedings of Advances in Neural Information Processing Systems (NIPS).","author":"Alistarh Dan","year":"2017","unstructured":"Dan Alistarh , Demjan Grubic , Jerry Li , Ryota Tomioka , and Milan Vojnovic . 2017 . QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding . In Proceedings of Advances in Neural Information Processing Systems (NIPS). Dan Alistarh, Demjan Grubic, Jerry Li, Ryota Tomioka, and Milan Vojnovic. 2017. QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding. In Proceedings of Advances in Neural Information Processing Systems (NIPS)."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3236367.3236381"},{"key":"e_1_3_2_1_11_1","volume-title":"A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning. arXiv preprint arXiv:1012.2599","author":"Brochu Eric","year":"2010","unstructured":"Eric Brochu , Vlad M Cora , and Nando De Freitas . 2010. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning. arXiv preprint arXiv:1012.2599 ( 2010 ). Eric Brochu, Vlad M Cora, and Nando De Freitas. 2010. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning. arXiv preprint arXiv:1012.2599 (2010)."},{"key":"e_1_3_2_1_12_1","volume-title":"Proceedings of NIPS Workshop on Machine Learning Systems.","author":"Chen Tianqi","year":"2016","unstructured":"Tianqi Chen , Mu Li , Yutian Li , Min Lin , Naiyan Wang , Minjie Wang , Tianjun Xiao , Bing Xu , Chiyuan Zhang , and Zheng Zhang . 2016 . MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems . In Proceedings of NIPS Workshop on Machine Learning Systems. Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2016. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. In Proceedings of NIPS Workshop on Machine Learning Systems."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2619239.2626315"},{"key":"e_1_3_2_1_14_1","volume-title":"GossipGraD: Scalable Deep Learning using Gossip Communication based Asynchronous Gradient Descent. arXiv preprint arXiv:1803.05880","author":"Daily Jeff","year":"2018","unstructured":"Jeff Daily , Abhinav Vishnu , Charles Siegel , Thomas Warfel , and Vinay Amatya . 2018. GossipGraD: Scalable Deep Learning using Gossip Communication based Asynchronous Gradient Descent. arXiv preprint arXiv:1803.05880 ( 2018 ). Jeff Daily, Abhinav Vishnu, Charles Siegel, Thomas Warfel, and Vinay Amatya. 2018. GossipGraD: Scalable Deep Learning using Gossip Communication based Asynchronous Gradient Descent. arXiv preprint arXiv:1803.05880 (2018)."},{"key":"e_1_3_2_1_15_1","volume-title":"Proceedings of Advances in Neural Information Processing Systems (NIPS).","author":"Dean Jeffrey","year":"2012","unstructured":"Jeffrey Dean , Greg Corrado , Rajat Monga , Kai Chen , Matthieu Devin , Mark Mao , Andrew Senior , Paul Tucker , Ke Yang , Quoc V Le , 2012 . Large Scale Distributed Deep Networks . In Proceedings of Advances in Neural Information Processing Systems (NIPS). Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V Le, et al. 2012. Large Scale Distributed Deep Networks. In Proceedings of Advances in Neural Information Processing Systems (NIPS)."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687767"},{"key":"e_1_3_2_1_17_1","volume-title":"Proceedings of USENIX Symposium on Networked Systems Design and Implementation (NSDI).","author":"Gu Juncheng","year":"2019","unstructured":"Juncheng Gu , Mosharaf Chowdhury , Kang G Shin , Yibo Zhu , Myeongjae Jeon , Junjie Qian , Hongqiang Liu , and Chuanxiong Guo . 2019 . Tiresias: A GPU Cluster Manager for Distributed Deep Learning . In Proceedings of USENIX Symposium on Networked Systems Design and Implementation (NSDI). Juncheng Gu, Mosharaf Chowdhury, Kang G Shin, Yibo Zhu, Myeongjae Jeon, Junjie Qian, Hongqiang Liu, and Chuanxiong Guo. 2019. Tiresias: A GPU Cluster Manager for Distributed Deep Learning. In Proceedings of USENIX Symposium on Networked Systems Design and Implementation (NSDI)."},{"key":"e_1_3_2_1_18_1","volume-title":"Proceedings of Systems and Machine Learning (SysML).","author":"Hashemi Sayed Hadi","year":"2019","unstructured":"Sayed Hadi Hashemi , Sangeetha Abdu Jyothi , and Roy H Campbell . 2019 . TicTac: Accelerating Distributed Deep Learning with Communication Scheduling . In Proceedings of Systems and Machine Learning (SysML). Sayed Hadi Hashemi, Sangeetha Abdu Jyothi, and Roy H Campbell. 2019. TicTac: Accelerating Distributed Deep Learning with Communication Scheduling. In Proceedings of Systems and Machine Learning (SysML)."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_20_1","volume-title":"Proceedings of Advances in Neural Information Processing Systems (NIPS).","author":"Ho Qirong","year":"2013","unstructured":"Qirong Ho , James Cipar , Henggang Cui , Seunghak Lee , Jin Kyu Kim , Phillip B Gibbons , Garth A Gibson , Greg Ganger , and Eric P Xing . 2013 . More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server . In Proceedings of Advances in Neural Information Processing Systems (NIPS). Qirong Ho, James Cipar, Henggang Cui, Seunghak Lee, Jin Kyu Kim, Phillip B Gibbons, Garth A Gibson, Greg Ganger, and Eric P Xing. 2013. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server. In Proceedings of Advances in Neural Information Processing Systems (NIPS)."},{"key":"e_1_3_2_1_21_1","volume-title":"Proceedings of Systems and Machine Learning (SysML).","author":"Jayarajan Anand","year":"2019","unstructured":"Anand Jayarajan , Jinliang Wei , Garth Gibson , Alexandra Fedorova , and Gennady Pekhimenko . 2019 . Priority-Based Parameter Propagation for Distributed DNN Training . In Proceedings of Systems and Machine Learning (SysML). Anand Jayarajan, Jinliang Wei, Garth Gibson, Alexandra Fedorova, and Gennady Pekhimenko. 2019. Priority-Based Parameter Propagation for Distributed DNN Training. In Proceedings of Systems and Machine Learning (SysML)."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654889"},{"key":"e_1_3_2_1_23_1","volume-title":"Proceedings of Advances in Neural Information Processing Systems (NIPS).","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky , Ilya Sutskever , and Geoffrey E Hinton . 2012 . ImageNet Classification with Deep Convolutional Neural Networks . In Proceedings of Advances in Neural Information Processing Systems (NIPS). Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of Advances in Neural Information Processing Systems (NIPS)."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2640087.2644155"},{"key":"e_1_3_2_1_25_1","volume-title":"International Journal of Parallel Programming","author":"Liu Jiuxing","year":"2004","unstructured":"Jiuxing Liu , Jiesheng Wu , and Dhabaleswar K Panda . 2004. High Performance RDMA-based MPI Implementation over InfiniBand . International Journal of Parallel Programming ( 2004 ). Jiuxing Liu, Jiesheng Wu, and Dhabaleswar K Panda. 2004. High Performance RDMA-based MPI Implementation over InfiniBand. International Journal of Parallel Programming (2004)."},{"key":"e_1_3_2_1_26_1","volume-title":"Proceedings of USENIX Workshop on Hot Topics in Cloud Computing (HotCloud).","author":"Mai Luo","year":"2015","unstructured":"Luo Mai , Chuntao Hong , and Paolo Costa . 2015 . Optimizing Network Performance in Distributed Machine Learning . In Proceedings of USENIX Workshop on Hot Topics in Cloud Computing (HotCloud). Luo Mai, Chuntao Hong, and Paolo Costa. 2015. Optimizing Network Performance in Distributed Machine Learning. In Proceedings of USENIX Workshop on Hot Topics in Cloud Computing (HotCloud)."},{"key":"e_1_3_2_1_27_1","volume-title":"Mllib: Machine Learning in Apache Spark. Journal of Machine Learning Research","author":"Meng Xiangrui","year":"2016","unstructured":"Xiangrui Meng , Joseph Bradley , Burak Yavuz , Evan Sparks , Shivaram Venkataraman , Davies Liu , Jeremy Freeman , DB Tsai , Manish Amde , Sean Owen , 2016 . Mllib: Machine Learning in Apache Spark. Journal of Machine Learning Research (2016). Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, et al. 2016. Mllib: Machine Learning in Apache Spark. Journal of Machine Learning Research (2016)."},{"key":"e_1_3_2_1_28_1","volume-title":"Proceedings of NIPS Autodiff Workshop.","author":"Paszke Adam","year":"2017","unstructured":"Adam Paszke , Sam Gross , Soumith Chintala , Gregory Chanan , Edward Yang , Zachary DeVito , Zeming Lin , Alban Desmaison , Luca Antiga , and Adam Lerer . 2017 . Automatic Differentiation in PyTorch . In Proceedings of NIPS Autodiff Workshop. Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic Differentiation in PyTorch. In Proceedings of NIPS Autodiff Workshop."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3190508.3190517"},{"key":"e_1_3_2_1_30_1","volume-title":"An Overview of Gradient Descent Optimization Algorithms. arXiv preprint arXiv:1609.04747","author":"Ruder Sebastian","year":"2016","unstructured":"Sebastian Ruder . 2016. An Overview of Gradient Descent Optimization Algorithms. arXiv preprint arXiv:1609.04747 ( 2016 ). Sebastian Ruder. 2016. An Overview of Gradient Descent Optimization Algorithms. arXiv preprint arXiv:1609.04747 (2016)."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2945397"},{"key":"e_1_3_2_1_32_1","volume-title":"Horovod: Fast and Easy Distributed Deep Learning in TensorFlow. arXiv preprint arXiv:1802.05799","author":"Sergeev Alexander","year":"2018","unstructured":"Alexander Sergeev and Mike Del Balso . 2018 . Horovod: Fast and Easy Distributed Deep Learning in TensorFlow. arXiv preprint arXiv:1802.05799 (2018). Alexander Sergeev and Mike Del Balso. 2018. Horovod: Fast and Easy Distributed Deep Learning in TensorFlow. arXiv preprint arXiv:1802.05799 (2018)."},{"key":"e_1_3_2_1_33_1","volume-title":"Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman . 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556 ( 2014 ). Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556 (2014)."},{"key":"e_1_3_2_1_34_1","volume-title":"Proceedings of Advances in Neural Information Processing Systems (NIPS).","author":"Snoek Jasper","year":"2012","unstructured":"Jasper Snoek , Hugo Larochelle , and Ryan P Adams . 2012 . Practical Bayesian Optimization of Machine Learning Algorithms . In Proceedings of Advances in Neural Information Processing Systems (NIPS). Jasper Snoek, Hugo Larochelle, and Ryan P Adams. 2012. Practical Bayesian Optimization of Machine Learning Algorithms. In Proceedings of Advances in Neural Information Processing Systems (NIPS)."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2013.158"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3302424.3303953"},{"key":"e_1_3_2_1_37_1","volume-title":"Proceedings of Advances in Neural Information Processing Systems (NIPS).","author":"Wen Wei","year":"2017","unstructured":"Wei Wen , Cong Xu , Feng Yan , Chunpeng Wu , Yandan Wang , Yiran Chen , and Hai Li . 2017 . Terngrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning . In Proceedings of Advances in Neural Information Processing Systems (NIPS). Wei Wen, Cong Xu, Feng Yan, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2017. Terngrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning. In Proceedings of Advances in Neural Information Processing Systems (NIPS)."},{"key":"e_1_3_2_1_38_1","unstructured":"Wikipedia. 2019. Monkey Patch. https:\/\/en.wikipedia.org\/wiki\/Monkey_patch.  Wikipedia. 2019. Monkey Patch. https:\/\/en.wikipedia.org\/wiki\/Monkey_patch."},{"key":"e_1_3_2_1_39_1","volume-title":"Proceedings of USENIX Annual Technical Conference (USENIX ATC).","author":"Zhang Hao","year":"2017","unstructured":"Hao Zhang , Zeyu Zheng , Shizhen Xu , Wei Dai , Qirong Ho , Xiaodan Liang , Zhiting Hu , Jinliang Wei , Pengtao Xie , and Eric P Xing . 2017 . Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters . In Proceedings of USENIX Annual Technical Conference (USENIX ATC). Hao Zhang, Zeyu Zheng, Shizhen Xu, Wei Dai, Qirong Ho, Xiaodan Liang, Zhiting Hu, Jinliang Wei, Pengtao Xie, and Eric P Xing. 2017. Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters. In Proceedings of USENIX Annual Technical Conference (USENIX ATC)."}],"event":{"name":"SOSP '19: ACM SIGOPS 27th Symposium on Operating Systems Principles","location":"Huntsville Ontario Canada","acronym":"SOSP '19","sponsor":["SIGOPS ACM Special Interest Group on Operating Systems","USENIX Assoc USENIX Assoc"]},"container-title":["Proceedings of the 27th ACM Symposium on Operating Systems Principles"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3341301.3359642","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3341301.3359642","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:12:56Z","timestamp":1750201976000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3341301.3359642"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,10,27]]},"references-count":39,"alternative-id":["10.1145\/3341301.3359642","10.1145\/3341301"],"URL":"https:\/\/doi.org\/10.1145\/3341301.3359642","relation":{},"subject":[],"published":{"date-parts":[[2019,10,27]]},"assertion":[{"value":"2019-10-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}