{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,2]],"date-time":"2026-04-02T09:41:57Z","timestamp":1775122917871,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":30,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,8,9]],"date-time":"2021-08-09T00:00:00Z","timestamp":1628467200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61972158"],"award-info":[{"award-number":["61972158"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003399","name":"Science and Technology Commission of Shanghai Municipality","doi-asserted-by":"publisher","award":["20511102802 & 18DZ2270800"],"award-info":[{"award-number":["20511102802 & 18DZ2270800"]}],"id":[{"id":"10.13039\/501100003399","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100007219","name":"Natural Science Foundation of Shanghai","doi-asserted-by":"publisher","award":["21ZR1419900"],"award-info":[{"award-number":["21ZR1419900"]}],"id":[{"id":"10.13039\/100007219","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,8,9]]},"DOI":"10.1145\/3472456.3472467","type":"proceedings-article","created":{"date-parts":[[2021,10,5]],"date-time":"2021-10-05T18:46:04Z","timestamp":1633459564000},"page":"1-11","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Prophet: Speeding up Distributed DNN Training with Predictable Communication Scheduling"],"prefix":"10.1145","author":[{"given":"Zhenwei","family":"Zhang","sequence":"first","affiliation":[{"name":"East China Normal University, China"}]},{"given":"Qiang","family":"Qi","sequence":"additional","affiliation":[{"name":"East China Normal University, China"}]},{"given":"Ruitao","family":"Shang","sequence":"additional","affiliation":[{"name":"East China Normal University, China"}]},{"given":"Li","family":"Chen","sequence":"additional","affiliation":[{"name":"University of Louisiana at Lafayette, United States of America"}]},{"given":"Fei","family":"Xu","sequence":"additional","affiliation":[{"name":"East China Normal University, China"}]}],"member":"320","published-online":{"date-parts":[[2021,10,5]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Proc.\u00a0of USENIX OSDI. 265\u2013283","author":"Barham Paul","year":"2016","unstructured":"MartAbadi, Paul Barham , Jianmin Chen , Zhifeng Chen , Andy Davis , Jeffrey Dean , Matthieu Devin , Sanjay Ghemawat , Geoffrey Irving , Michael Isard , Manjunath Kudlur , Josh Levenberg , Rajat Monga , Sherry Moore , Derek\u00a0 G. Murray , Benoit Steiner , Paul Tucker , Vijay Vasudevan , Pete Warden , Martin Wicke , Yuan Yu , and Xiaoqiang Zheng . 2016 . TensorFlow: A System for Large-Scale Machine Learning . In Proc.\u00a0of USENIX OSDI. 265\u2013283 . MartAbadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek\u00a0G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In Proc.\u00a0of USENIX OSDI. 265\u2013283."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM41043.2020.9155446"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM.2019.8737587"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3419111.3421299"},{"key":"e_1_3_2_1_5_1","unstructured":"Jianmin Chen Xinghao Pan Rajat Monga Samy Bengio and Rafal Jozefowicz. 2016. Revisiting Distributed Synchronous SGD. arXiv preprint arXiv:1604.00981(2016).  Jianmin Chen Xinghao Pan Rajat Monga Samy Bengio and Rafal Jozefowicz. 2016. Revisiting Distributed Synchronous SGD. arXiv preprint arXiv:1604.00981(2016)."},{"key":"e_1_3_2_1_6_1","unstructured":"Tianqi Chen Mu Li Yutian Li Min Lin Naiyan Wang Minjie Wang Tianjun Xiao Bing Xu Chiyuan Zhang and Zheng Zhang. 2015. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. arXiv preprint arXiv:1512.01274(2015).  Tianqi Chen Mu Li Yutian Li Min Lin Naiyan Wang Minjie Wang Tianjun Xiao Bing Xu Chiyuan Zhang and Zheng Zhang. 2015. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. arXiv preprint arXiv:1512.01274(2015)."},{"key":"e_1_3_2_1_7_1","volume-title":"Proc.\u00a0of NIPS. 1223\u20131231","author":"Dean Jeffrey","year":"2012","unstructured":"Jeffrey Dean , Greg Corrado , Rajat Monga , Kai Chen , Matthieu Devin , Mark Mao , Marc\u2019aurelio Ranzato , Andrew Senior , Paul Tucker , Ke Yang , 2012 . Large Scale Distributed Deep Networks . In Proc.\u00a0of NIPS. 1223\u20131231 . Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Marc\u2019aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, 2012. Large Scale Distributed Deep Networks. In Proc.\u00a0of NIPS. 1223\u20131231."},{"key":"e_1_3_2_1_8_1","volume-title":"Proc.\u00a0of MLSys.","author":"Hashemi Sayed\u00a0Hadi","year":"2019","unstructured":"Sayed\u00a0Hadi Hashemi , Sangeetha\u00a0Abdu Jyothi , and Roy\u00a0 H Campbell . 2019 . TicTac: Accelerating Distributed Deep Learning with Communication Scheduling . In Proc.\u00a0of MLSys. Sayed\u00a0Hadi Hashemi, Sangeetha\u00a0Abdu Jyothi, and Roy\u00a0H Campbell. 2019. TicTac: Accelerating Distributed Deep Learning with Communication Scheduling. In Proc.\u00a0of MLSys."},{"key":"e_1_3_2_1_9_1","volume-title":"Proc.\u00a0of NIPS. 1223\u20131231","author":"Ho Qirong","year":"2013","unstructured":"Qirong Ho , James Cipar , Henggang Cui , Seunghak Lee , Jin\u00a0Kyu Kim , Phillip\u00a0 B Gibbons , Garth\u00a0 A Gibson , Greg Ganger , and Eric\u00a0 P Xing . 2013 . More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server . In Proc.\u00a0of NIPS. 1223\u20131231 . Qirong Ho, James Cipar, Henggang Cui, Seunghak Lee, Jin\u00a0Kyu Kim, Phillip\u00a0B Gibbons, Garth\u00a0A Gibson, Greg Ganger, and Eric\u00a0P Xing. 2013. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server. In Proc.\u00a0of NIPS. 1223\u20131231."},{"key":"e_1_3_2_1_10_1","volume-title":"Proc.\u00a0of MLSys.","author":"Jayarajan Anand","year":"2019","unstructured":"Anand Jayarajan , Jinliang Wei , Garth Gibson , Alexandra Fedorova , and Gennady Pekhimenko . 2019 . Priority-based Parameter Propagation for Distributed DNN Training . In Proc.\u00a0of MLSys. Anand Jayarajan, Jinliang Wei, Garth Gibson, Alexandra Fedorova, and Gennady Pekhimenko. 2019. Priority-based Parameter Propagation for Distributed DNN Training. In Proc.\u00a0of MLSys."},{"key":"e_1_3_2_1_11_1","volume-title":"Proc.\u00a0of USENIX ATC. 947\u2013960","author":"Jeon Myeongjae","year":"2019","unstructured":"Myeongjae Jeon , Shivaram Venkataraman , Amar Phanishayee , Junjie Qian , Wencong Xiao , and Fan Yang . 2019 . Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads . In Proc.\u00a0of USENIX ATC. 947\u2013960 . Myeongjae Jeon, Shivaram Venkataraman, Amar Phanishayee, Junjie Qian, Wencong Xiao, and Fan Yang. 2019. Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads. In Proc.\u00a0of USENIX ATC. 947\u2013960."},{"key":"e_1_3_2_1_12_1","volume-title":"Proc.\u00a0of IEEE OSDI. 463\u2013479","author":"Jiang Yimin","year":"2020","unstructured":"Yimin Jiang , Yibo Zhu , Chang Lan , Bairen Yi , Yong Cui , and Chuanxiong Guo . 2020 . A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU\/CPU Clusters . In Proc.\u00a0of IEEE OSDI. 463\u2013479 . Yimin Jiang, Yibo Zhu, Chang Lan, Bairen Yi, Yong Cui, and Chuanxiong Guo. 2020. A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU\/CPU Clusters. In Proc.\u00a0of IEEE OSDI. 463\u2013479."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/MLHPC.2016.006"},{"key":"e_1_3_2_1_14_1","volume-title":"Proc.\u00a0of NIPS. 1097\u20131105","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky , Ilya Sutskever , and Geoffrey\u00a0 E Hinton . 2012 . ImageNet Classification with Deep Convolutional Neural Networks . In Proc.\u00a0of NIPS. 1097\u20131105 . Alex Krizhevsky, Ilya Sutskever, and Geoffrey\u00a0E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Proc.\u00a0of NIPS. 1097\u20131105."},{"key":"e_1_3_2_1_15_1","volume-title":"Proc.\u00a0of USENIX NSDI. 741\u2013761","author":"Lao ChonLam","year":"2021","unstructured":"ChonLam Lao , Yanfang Le , Kshiteej Mahajan , Yixi Chen , Wenfei Wu , Aditya Akella , and Michael Swift . 2021 . ATP: In-network Aggregation for Multi-Tenant Learning . In Proc.\u00a0of USENIX NSDI. 741\u2013761 . ChonLam Lao, Yanfang Le, Kshiteej Mahajan, Yixi Chen, Wenfei Wu, Aditya Akella, and Michael Swift. 2021. ATP: In-network Aggregation for Multi-Tenant Learning. In Proc.\u00a0of USENIX NSDI. 741\u2013761."},{"key":"e_1_3_2_1_16_1","volume-title":"Proc.\u00a0of NIPS. 2834\u20132842","author":"Lee Seunghak","year":"2014","unstructured":"Seunghak Lee , Jin\u00a0Kyu Kim , Xun Zheng , Qirong Ho , Garth\u00a0 A Gibson , and Eric\u00a0 P Xing . 2014 . On Model Parallelization and Scheduling Strategies for Distributed Machine Learning . In Proc.\u00a0of NIPS. 2834\u20132842 . Seunghak Lee, Jin\u00a0Kyu Kim, Xun Zheng, Qirong Ho, Garth\u00a0A Gibson, and Eric\u00a0P Xing. 2014. On Model Parallelization and Scheduling Strategies for Distributed Machine Learning. In Proc.\u00a0of NIPS. 2834\u20132842."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2640087.2644155"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3267809.3267840"},{"key":"e_1_3_2_1_19_1","first-page":"105","article-title":"PaddlePaddle: An Open-Source Deep Learning Platform from Industrial Practice","volume":"1","author":"Ma Yanjun","year":"2019","unstructured":"Yanjun Ma , Dianhai Yu , Tian Wu , and Haifeng Wang . 2019 . PaddlePaddle: An Open-Source Deep Learning Platform from Industrial Practice . Frontiers of Data and Computing 1 , 1 (2019), 105 \u2013 115 . Yanjun Ma, Dianhai Yu, Tian Wu, and Haifeng Wang. 2019. PaddlePaddle: An Open-Source Deep Learning Platform from Industrial Practice. Frontiers of Data and Computing 1, 1 (2019), 105\u2013115.","journal-title":"Frontiers of Data and Computing"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2020.2974843"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3341301.3359642"},{"key":"e_1_3_2_1_22_1","volume-title":"Proc.\u00a0of USENIX NSDI. 785\u2013808","author":"Sapio Amedeo","year":"2021","unstructured":"Amedeo Sapio , Marco Canini , Chen-Yu Ho , Jacob Nelson , Panos Kalnis , Changhoon Kim , Arvind Krishnamurthy , Masoud Moshref , Dan\u00a0 RK Ports , and Peter Richt\u00e1rik . 2021 . Scaling Distributed Machine Learning with In-Network Aggregation . In Proc.\u00a0of USENIX NSDI. 785\u2013808 . Amedeo Sapio, Marco Canini, Chen-Yu Ho, Jacob Nelson, Panos Kalnis, Changhoon Kim, Arvind Krishnamurthy, Masoud Moshref, Dan\u00a0RK Ports, and Peter Richt\u00e1rik. 2021. Scaling Distributed Machine Learning with In-Network Aggregation. In Proc.\u00a0of USENIX NSDI. 785\u2013808."},{"key":"e_1_3_2_1_23_1","volume-title":"Horovod: Fast and Easy Distributed Deep Learning in TensorFlow. arXiv preprint arXiv:1802.05799(2018).","author":"Sergeev Alexander","year":"2018","unstructured":"Alexander Sergeev and Mike\u00a0Del Balso . 2018 . Horovod: Fast and Easy Distributed Deep Learning in TensorFlow. arXiv preprint arXiv:1802.05799(2018). Alexander Sergeev and Mike\u00a0Del Balso. 2018. Horovod: Fast and Easy Distributed Deep Learning in TensorFlow. arXiv preprint arXiv:1802.05799(2018)."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM.2019.8737367"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3419111.3421296"},{"key":"e_1_3_2_1_26_1","unstructured":"Qiang Wang Shaohuai Shi Canhui Wang and Xiaowen Chu. 2020. Communication Contention Aware Scheduling of Multiple Deep Learning Training Jobs. arXiv preprint arXiv:2002.10105(2020).  Qiang Wang Shaohuai Shi Canhui Wang and Xiaowen Chu. 2020. Communication Contention Aware Scheduling of Multiple Deep Learning Training Jobs. arXiv preprint arXiv:2002.10105(2020)."},{"key":"e_1_3_2_1_27_1","volume-title":"Proc.\u00a0of IEEE INFOCOM. 1678\u20131687","author":"Wang S.","unstructured":"S. Wang , D. Li , and J. Geng . 2020. Geryon: Accelerating Distributed CNN Training by Network-Level Flow Scheduling . In Proc.\u00a0of IEEE INFOCOM. 1678\u20131687 . S. Wang, D. Li, and J. Geng. 2020. Geryon: Accelerating Distributed CNN Training by Network-Level Flow Scheduling. In Proc.\u00a0of IEEE INFOCOM. 1678\u20131687."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33015289"},{"key":"e_1_3_2_1_29_1","volume-title":"Proc.\u00a0of USENIX ATC. 181\u2013193","author":"Zhang Hao","year":"2017","unstructured":"Hao Zhang , Zeyu Zheng , Shizhen Xu , Wei Dai , Qirong Ho , Xiaodan Liang , Zhiting Hu , Jinliang Wei , Pengtao Xie , and Eric\u00a0 P. Xing . 2017 . Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters . In Proc.\u00a0of USENIX ATC. 181\u2013193 . Hao Zhang, Zeyu Zheng, Shizhen Xu, Wei Dai, Qirong Ho, Xiaodan Liang, Zhiting Hu, Jinliang Wei, Pengtao Xie, and Eric\u00a0P. Xing. 2017. Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters. In Proc.\u00a0of USENIX ATC. 181\u2013193."},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDCS.2019.00150"}],"event":{"name":"ICPP 2021: 50th International Conference on Parallel Processing","location":"Lemont IL USA","acronym":"ICPP 2021"},"container-title":["50th International Conference on Parallel Processing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3472456.3472467","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3472456.3472467","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:48:11Z","timestamp":1750193291000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3472456.3472467"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,9]]},"references-count":30,"alternative-id":["10.1145\/3472456.3472467","10.1145\/3472456"],"URL":"https:\/\/doi.org\/10.1145\/3472456.3472467","relation":{},"subject":[],"published":{"date-parts":[[2021,8,9]]},"assertion":[{"value":"2021-10-05","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}