{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,4]],"date-time":"2026-06-04T16:04:27Z","timestamp":1780589067962,"version":"3.54.1"},"publisher-location":"New York, NY, USA","reference-count":64,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,11,14]],"date-time":"2022-11-14T00:00:00Z","timestamp":1668384000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-sa\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,11,14]]},"DOI":"10.1145\/3563766.3564096","type":"proceedings-article","created":{"date-parts":[[2022,11,14]],"date-time":"2022-11-14T16:29:31Z","timestamp":1668443371000},"page":"93-100","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["Efficient flow scheduling in distributed deep learning training with echelon formation"],"prefix":"10.1145","author":[{"given":"Rui","family":"Pan","sequence":"first","affiliation":[{"name":"Princeton University"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yiming","family":"Lei","sequence":"additional","affiliation":[{"name":"Max Planck Institute for Informatics"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jialong","family":"Li","sequence":"additional","affiliation":[{"name":"Max Planck Institute for Informatics"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Zhiqiang","family":"Xie","sequence":"additional","affiliation":[{"name":"Stanford University"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Binhang","family":"Yuan","sequence":"additional","affiliation":[{"name":"ETH Z\u00fcrich"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yiting","family":"Xia","sequence":"additional","affiliation":[{"name":"Max Planck Institute for Informatics"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2022,11,14]]},"reference":[{"key":"e_1_3_2_2_1_1","unstructured":"2022. Echelon Formation. https:\/\/en.wikipedia.org\/wiki\/Echelon_formation.  2022. Echelon Formation. https:\/\/en.wikipedia.org\/wiki\/Echelon_formation."},{"key":"e_1_3_2_2_2_1","unstructured":"2022. EchelonFlow technical report. https:\/\/anonymous.4open.science\/r\/EchelonFlow_report.  2022. EchelonFlow technical report. https:\/\/anonymous.4open.science\/r\/EchelonFlow_report."},{"key":"e_1_3_2_2_3_1","unstructured":"2022. FairScale. https:\/\/github.com\/facebookresearch\/fairscale.  2022. FairScale. https:\/\/github.com\/facebookresearch\/fairscale."},{"key":"e_1_3_2_2_4_1","unstructured":"2022. Gloo. https:\/\/github.com\/facebookincubator\/gloo.  2022. Gloo. https:\/\/github.com\/facebookincubator\/gloo."},{"key":"e_1_3_2_2_5_1","unstructured":"2022. NVIDIA Collective Communications Library (NCCL). https:\/\/developer.nvidia.com\/nccl.  2022. NVIDIA Collective Communications Library (NCCL). https:\/\/developer.nvidia.com\/nccl."},{"key":"e_1_3_2_2_6_1","unstructured":"2022. NVIDIA Multi-Instance GPU User Guide. https:\/\/docs.nvidia.com\/datacenter\/tesla\/pdf\/NVIDIA_MIG_User_Guide.pdf.  2022. NVIDIA Multi-Instance GPU User Guide. https:\/\/docs.nvidia.com\/datacenter\/tesla\/pdf\/NVIDIA_MIG_User_Guide.pdf."},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3230543.3230569"},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2534169.2486031"},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNET.2017.2669216"},{"key":"e_1_3_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM41043.2020.9155446"},{"key":"e_1_3_2_2_11_1","first-page":"241","article-title":"Blueconnect: Decomposing all-reduce for deep learning on heterogeneous network hierarchy","volume":"1","author":"Cho Minsik","year":"2019","unstructured":"Minsik Cho , Ulrich Finkler , David Kung , and Hillery Hunter . 2019 . Blueconnect: Decomposing all-reduce for deep learning on heterogeneous network hierarchy . Proceedings of Machine Learning and Systems 1 (2019), 241 -- 251 . Minsik Cho, Ulrich Finkler, David Kung, and Hillery Hunter. 2019. Blueconnect: Decomposing all-reduce for deep learning on heterogeneous network hierarchy. Proceedings of Machine Learning and Systems 1 (2019), 241--251.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2390231.2390237"},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2829988.2787480"},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2619239.2626315"},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1327452.1327492"},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.14778\/3503585.3503590"},{"key":"e_1_3_2_2_17_1","volume-title":"Altruistic Scheduling in Multi-Resource Clusters. In 12th USENIX symposium on operating systems design and implementation (OSDI 16)","author":"Grandl Robert","year":"2016","unstructured":"Robert Grandl , Mosharaf Chowdhury , Aditya Akella , and Ganesh Ananthanarayanan . 2016 . Altruistic Scheduling in Multi-Resource Clusters. In 12th USENIX symposium on operating systems design and implementation (OSDI 16) . 65--80. Robert Grandl, Mosharaf Chowdhury, Aditya Akella, and Ganesh Ananthanarayanan. 2016. Altruistic Scheduling in Multi-Resource Clusters. In 12th USENIX symposium on operating systems design and implementation (OSDI 16). 65--80."},{"key":"e_1_3_2_2_18_1","volume-title":"GRAPHENE: Packing and Dependency-Aware Scheduling for Data-Parallel Clusters. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16)","author":"Grandl Robert","year":"2016","unstructured":"Robert Grandl , Srikanth Kandula , Sriram Rao , Aditya Akella , and Janardhan Kulkarni . 2016 . GRAPHENE: Packing and Dependency-Aware Scheduling for Data-Parallel Clusters. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) . 81--97. Robert Grandl, Srikanth Kandula, Sriram Rao, Aditya Akella, and Janardhan Kulkarni. 2016. GRAPHENE: Packing and Dependency-Aware Scheduling for Data-Parallel Clusters. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 81--97."},{"key":"e_1_3_2_2_19_1","first-page":"418","article-title":"Tictac: Accelerating distributed deep learning with communication scheduling","volume":"1","author":"Hashemi Sayed Hadi","year":"2019","unstructured":"Sayed Hadi Hashemi , Sangeetha Abdu Jyothi , and Roy Campbell . 2019 . Tictac: Accelerating distributed deep learning with communication scheduling . Proceedings of Machine Learning and Systems 1 (2019), 418 -- 430 . Sayed Hadi Hashemi, Sangeetha Abdu Jyothi, and Roy Campbell. 2019. Tictac: Accelerating distributed deep learning with communication scheduling. Proceedings of Machine Learning and Systems 1 (2019), 418--430.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2377677.2377710"},{"key":"e_1_3_2_2_21_1","volume-title":"Gpipe: Efficient training of giant neural networks using pipeline parallelism. Advances in neural information processing systems 32","author":"Huang Yanping","year":"2019","unstructured":"Yanping Huang , Youlong Cheng , Ankur Bapna , Orhan Firat , Dehao Chen , Mia Chen , HyoukJoong Lee , Jiquan Ngiam , Quoc V Le , Yonghui Wu , 2019 . Gpipe: Efficient training of giant neural networks using pipeline parallelism. Advances in neural information processing systems 32 (2019). Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Dehao Chen, Mia Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V Le, Yonghui Wu, et al. 2019. Gpipe: Efficient training of giant neural networks using pipeline parallelism. Advances in neural information processing systems 32 (2019)."},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1272996.1273005"},{"key":"e_1_3_2_2_23_1","volume-title":"2019 USENIX Annual Technical Conference (USENIX ATC 19)","author":"Jajoo Akshay","year":"2019","unstructured":"Akshay Jajoo , Y Charlie Hu , and Xiaojun Lin . 2019 . Your coflow has many flows: sampling them for fun and speed . In 2019 USENIX Annual Technical Conference (USENIX ATC 19) . 833--848. Akshay Jajoo, Y Charlie Hu, and Xiaojun Lin. 2019. Your coflow has many flows: sampling them for fun and speed. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). 833--848."},{"key":"e_1_3_2_2_24_1","first-page":"132","article-title":"Priority-based parameter propagation for distributed DNN training","volume":"1","author":"Jayarajan Anand","year":"2019","unstructured":"Anand Jayarajan , Jinliang Wei , Garth Gibson , Alexandra Fedorova , and Gennady Pekhimenko . 2019 . Priority-based parameter propagation for distributed DNN training . Proceedings of Machine Learning and Systems 1 (2019), 132 -- 145 . Anand Jayarajan, Jinliang Wei, Garth Gibson, Alexandra Fedorova, and Gennady Pekhimenko. 2019. Priority-based parameter propagation for distributed DNN training. Proceedings of Machine Learning and Systems 1 (2019), 132--145.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_3_2_2_25_1","volume-title":"Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads. In 2019 USENIX Annual Technical Conference (USENIX ATC 19)","author":"Jeon Myeongjae","year":"2019","unstructured":"Myeongjae Jeon , Shivaram Venkataraman , Amar Phanishayee , Junjie Qian , Wencong Xiao , and Fan Yang . 2019 . Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads. In 2019 USENIX Annual Technical Conference (USENIX ATC 19) . 947--960. Myeongjae Jeon, Shivaram Venkataraman, Amar Phanishayee, Junjie Qian, Wencong Xiao, and Fan Yang. 2019. Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). 947--960."},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.5555\/3488766.3488792"},{"key":"e_1_3_2_2_27_1","volume-title":"Towards An Application Objective-Aware Network Interface. In 12th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 20)","author":"Jyothi Sangeetha Abdu","year":"2020","unstructured":"Sangeetha Abdu Jyothi , Sayed Hadi Hashemi , Roy Campbell , and Brighten Godfrey . 2020 . Towards An Application Objective-Aware Network Interface. In 12th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 20) . Sangeetha Abdu Jyothi, Sayed Hadi Hashemi, Roy Campbell, and Brighten Godfrey. 2020. Towards An Application Objective-Aware Network Interface. In 12th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 20)."},{"key":"e_1_3_2_2_28_1","volume-title":"torchgpipe: On-the-fly pipeline parallelism for training giant models. arXiv preprint arXiv:2004.09910","author":"Kim Chiheon","year":"2020","unstructured":"Chiheon Kim , Heungsub Lee , Myungryong Jeong , Woonhyuk Baek , Boogeon Yoon , Ildoo Kim , Sungbin Lim , and Sungwoong Kim . 2020. torchgpipe: On-the-fly pipeline parallelism for training giant models. arXiv preprint arXiv:2004.09910 ( 2020 ). Chiheon Kim, Heungsub Lee, Myungryong Jeong, Woonhyuk Baek, Boogeon Yoon, Ildoo Kim, Sungbin Lim, and Sungwoong Kim. 2020. torchgpipe: On-the-fly pipeline parallelism for training giant models. arXiv preprint arXiv:2004.09910 (2020)."},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10723-015-9337-8"},{"key":"e_1_3_2_2_30_1","volume-title":"Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky , Ilya Sutskever , and Geoffrey E Hinton . 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 ( 2012 ), 1097--1105. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012), 1097--1105."},{"key":"e_1_3_2_2_31_1","volume-title":"ATP: In-network Aggregation for Multi-tenant Learning. In 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21)","author":"Lao ChonLam","year":"2021","unstructured":"ChonLam Lao , Yanfang Le , Kshiteej Mahajan , Yixi Chen , Wenfei Wu , Aditya Akella , and Michael Swift . 2021 . ATP: In-network Aggregation for Multi-tenant Learning. In 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21) . 741--761. ChonLam Lao, Yanfang Le, Kshiteej Mahajan, Yixi Chen, Wenfei Wu, Aditya Akella, and Michael Swift. 2021. ATP: In-network Aggregation for Multi-tenant Learning. In 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21). 741--761."},{"key":"e_1_3_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2640087.2644155"},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.14778\/3415478.3415530"},{"key":"e_1_3_2_2_34_1","volume-title":"Siphon: Expediting Inter-Datacenter Coflows in Wide-Area Data Analytics. In 2018 USENIX Annual Technical Conference (USENIX ATC 18)","author":"Liu Shuhao","year":"2018","unstructured":"Shuhao Liu , Li Chen , and Baochun Li . 2018 . Siphon: Expediting Inter-Datacenter Coflows in Wide-Area Data Analytics. In 2018 USENIX Annual Technical Conference (USENIX ATC 18) . 507--518. Shuhao Liu, Li Chen, and Baochun Li. 2018. Siphon: Expediting Inter-Datacenter Coflows in Wide-Area Data Analytics. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). 507--518."},{"key":"e_1_3_2_2_35_1","volume-title":"Scheduling dependent coflows with guaranteed job completion time. In 2016 IEEE Trustcom\/BigDataSE\/ISPA","author":"Liu Yang","unstructured":"Yang Liu , Wenxin Li , Keqiu Li , Heng Qi , Xiaoyi Tao , and Sheng Chen . 2016. Scheduling dependent coflows with guaranteed job completion time. In 2016 IEEE Trustcom\/BigDataSE\/ISPA . IEEE , 2109--2115. Yang Liu, Wenxin Li, Keqiu Li, Heng Qi, Xiaoyi Tao, and Sheng Chen. 2016. Scheduling dependent coflows with guaranteed job completion time. In 2016 IEEE Trustcom\/BigDataSE\/ISPA. IEEE, 2109--2115."},{"key":"e_1_3_2_2_36_1","first-page":"82","article-title":"Plink: Discovering and exploiting locality for accelerated distributed training on the public cloud","volume":"2","author":"Luo Liang","year":"2020","unstructured":"Liang Luo , Peter West , Jacob Nelson , Arvind Krishnamurthy , and Luis Ceze . 2020 . Plink: Discovering and exploiting locality for accelerated distributed training on the public cloud . Proceedings of Machine Learning and Systems 2 (2020), 82 -- 97 . Liang Luo, Peter West, Jacob Nelson, Arvind Krishnamurthy, and Luis Ceze. 2020. Plink: Discovering and exploiting locality for accelerated distributed training on the public cloud. Proceedings of Machine Learning and Systems 2 (2020), 82--97.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_3_2_2_37_1","volume-title":"7th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 15)","author":"Mai Luo","year":"2015","unstructured":"Luo Mai , Chuntao Hong , and Paolo Costa . 2015 . Optimizing network performance in distributed machine learning . In 7th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 15) . Luo Mai, Chuntao Hong, and Paolo Costa. 2015. Optimizing network performance in distributed machine learning. In 7th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 15)."},{"key":"e_1_3_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807184"},{"key":"e_1_3_2_2_39_1","volume-title":"13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18)","author":"Moritz Philipp","year":"2018","unstructured":"Philipp Moritz , Robert Nishihara , Stephanie Wang , Alexey Tumanov , Richard Liaw , Eric Liang , Melih Elibol , Zongheng Yang , William Paul , Michael I Jordan , 2018 . Ray: A distributed framework for emerging AI applications . In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) . 561--577. Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I Jordan, et al. 2018. Ray: A distributed framework for emerging AI applications. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 561--577."},{"key":"e_1_3_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3341301.3359646"},{"key":"e_1_3_2_2_41_1","volume-title":"International Conference on Machine Learning. PMLR, 7937--7947","author":"Narayanan Deepak","year":"2021","unstructured":"Deepak Narayanan , Amar Phanishayee , Kaiyu Shi , Xie Chen , and Matei Zaharia . 2021 . Memory-efficient pipeline-parallel dnn training . In International Conference on Machine Learning. PMLR, 7937--7947 . Deepak Narayanan, Amar Phanishayee, Kaiyu Shi, Xie Chen, and Matei Zaharia. 2021. Memory-efficient pipeline-parallel dnn training. In International Conference on Machine Learning. PMLR, 7937--7947."},{"key":"e_1_3_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3458817.3476209"},{"key":"e_1_3_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3341301.3359642"},{"key":"e_1_3_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC41405.2020.00024"},{"key":"e_1_3_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3406703"},{"key":"e_1_3_2_2_46_1","volume-title":"Scaling Distributed Machine Learning with In-Network Aggregation. In 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21)","author":"Sapio Amedeo","year":"2021","unstructured":"Amedeo Sapio , Marco Canini , Chen-Yu Ho , Jacob Nelson , Panos Kalnis , Changhoon Kim , Arvind Krishnamurthy , Masoud Moshref , Dan Ports , and Peter Richtarik . 2021 . Scaling Distributed Machine Learning with In-Network Aggregation. In 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21) . 785--808. Amedeo Sapio, Marco Canini, Chen-Yu Ho, Jacob Nelson, Panos Kalnis, Changhoon Kim, Arvind Krishnamurthy, Masoud Moshref, Dan Ports, and Peter Richtarik. 2021. Scaling Distributed Machine Learning with In-Network Aggregation. In 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21). 785--808."},{"key":"e_1_3_2_2_47_1","volume-title":"Horovod: fast and easy distributed deep learning in TensorFlow. arXiv preprint arXiv:1802.05799","author":"Sergeev Alexander","year":"2018","unstructured":"Alexander Sergeev and Mike Del Balso . 2018. Horovod: fast and easy distributed deep learning in TensorFlow. arXiv preprint arXiv:1802.05799 ( 2018 ). Alexander Sergeev and Mike Del Balso. 2018. Horovod: fast and easy distributed deep learning in TensorFlow. arXiv preprint arXiv:1802.05799 (2018)."},{"key":"e_1_3_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNET.2021.3116133"},{"key":"e_1_3_2_2_49_1","volume-title":"Megatron-lm: Training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053","author":"Shoeybi Mohammad","year":"2019","unstructured":"Mohammad Shoeybi , Mostofa Patwary , Raul Puri , Patrick LeGresley , Jared Casper , and Bryan Catanzaro . 2019 . Megatron-lm: Training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053 (2019). Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. 2019. Megatron-lm: Training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053 (2019)."},{"key":"e_1_3_2_2_50_1","unstructured":"Shaden Smith Mostofa Patwary Brandon Norick Patrick LeGresley Samyam Rajbhandari Jared Casper Zhun Liu Shrimai Prabhumoye George Zerveas Vijay Korthikanti etal 2022. Using deepspeed and megatron to train megatron-turing nlg 530b a large-scale generative language model. arXiv preprint arXiv:2201.11990 (2022).  Shaden Smith Mostofa Patwary Brandon Norick Patrick LeGresley Samyam Rajbhandari Jared Casper Zhun Liu Shrimai Prabhumoye George Zerveas Vijay Korthikanti et al. 2022. Using deepspeed and megatron to train megatron-turing nlg 530b a large-scale generative language model. arXiv preprint arXiv:2201.11990 (2022)."},{"key":"e_1_3_2_2_51_1","volume-title":"Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence. 3314--3320","author":"Sun Penghao","year":"2021","unstructured":"Penghao Sun , Zehua Guo , Junchao Wang , Junfei Li , Julong Lan , and Yuxiang Hu . 2021 . Deepweave: Accelerating job completion time with deep reinforcement learning-based coflow scheduling . In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence. 3314--3320 . Penghao Sun, Zehua Guo, Junchao Wang, Junfei Li, Julong Lan, and Yuxiang Hu. 2021. Deepweave: Accelerating job completion time with deep reinforcement learning-based coflow scheduling. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence. 3314--3320."},{"key":"e_1_3_2_2_52_1","volume-title":"Serving DNN models with multi-instance gpus: A case of the reconfigurable machine scheduling problem. arXiv preprint arXiv:2109.11067","author":"Tan Cheng","year":"2021","unstructured":"Cheng Tan , Zhichao Li , Jian Zhang , Yu Cao , Sikai Qi , Zherui Liu , Yibo Zhu , and Chuanxiong Guo . 2021. Serving DNN models with multi-instance gpus: A case of the reconfigurable machine scheduling problem. arXiv preprint arXiv:2109.11067 ( 2021 ). Cheng Tan, Zhichao Li, Jian Zhang, Yu Cao, Sikai Qi, Zherui Liu, Yibo Zhu, and Chuanxiong Guo. 2021. Serving DNN models with multi-instance gpus: A case of the reconfigurable machine scheduling problem. arXiv preprint arXiv:2109.11067 (2021)."},{"key":"e_1_3_2_2_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM.2018.8486340"},{"key":"e_1_3_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/3419111.3421296"},{"key":"e_1_3_2_2_55_1","first-page":"56","article-title":"MPI: a standard message passing interface","volume":"12","author":"Walker David W","year":"1996","unstructured":"David W Walker and Jack J Dongarra . 1996 . MPI: a standard message passing interface . Supercomputer 12 (1996), 56 -- 68 . David W Walker and Jack J Dongarra. 1996. MPI: a standard message passing interface. Supercomputer 12 (1996), 56--68.","journal-title":"Supercomputer"},{"key":"e_1_3_2_2_56_1","first-page":"172","article-title":"Blink: Fast and generic collectives for distributed ml","volume":"2","author":"Wang Guanhua","year":"2020","unstructured":"Guanhua Wang , Shivaram Venkataraman , Amar Phanishayee , Nikhil Devanur , Jorgen Thelin , and Ion Stoica . 2020 . Blink: Fast and generic collectives for distributed ml . Proceedings of Machine Learning and Systems 2 (2020), 172 -- 186 . Guanhua Wang, Shivaram Venkataraman, Amar Phanishayee, Nikhil Devanur, Jorgen Thelin, and Ion Stoica. 2020. Blink: Fast and generic collectives for distributed ml. Proceedings of Machine Learning and Systems 2 (2020), 172--186.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_3_2_2_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/CloudCom.2017.55"},{"key":"e_1_3_2_2_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/2043164.2018443"},{"key":"e_1_3_2_2_59_1","volume-title":"9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12)","author":"Zaharia Matei","year":"2012","unstructured":"Matei Zaharia , Mosharaf Chowdhury , Tathagata Das , Ankur Dave , Justin Ma , Murphy McCauly , Michael J Franklin , Scott Shenker , and Ion Stoica . 2012 . Resilient distributed datasets: A Fault-Tolerant abstraction for In-Memory cluster computing . In 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12) . 15--28. Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A Fault-Tolerant abstraction for In-Memory cluster computing. In 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12). 15--28."},{"key":"e_1_3_2_2_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/2934872.2934880"},{"key":"e_1_3_2_2_61_1","volume-title":"2017 USENIX Annual Technical Conference (USENIX ATC 17)","author":"Zhang Hao","year":"2017","unstructured":"Hao Zhang , Zeyu Zheng , Shizhen Xu , Wei Dai , Qirong Ho , Xiaodan Liang , Zhiting Hu , Jinliang Wei , Pengtao Xie , and Eric P Xing . 2017 . Poseidon: An efficient communication architecture for distributed deep learning on GPU clusters . In 2017 USENIX Annual Technical Conference (USENIX ATC 17) . 181--193. Hao Zhang, Zeyu Zheng, Shizhen Xu, Wei Dai, Qirong Ho, Xiaodan Liang, Zhiting Hu, Jinliang Wei, Pengtao Xie, and Eric P Xing. 2017. Poseidon: An efficient communication architecture for distributed deep learning on GPU clusters. In 2017 USENIX Annual Technical Conference (USENIX ATC 17). 181--193."},{"key":"e_1_3_2_2_62_1","doi-asserted-by":"publisher","DOI":"10.1145\/3405671.3405810"},{"key":"e_1_3_2_2_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/3472456.3472467"},{"key":"e_1_3_2_2_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM.2015.7218408"}],"event":{"name":"HotNets '22: The 21st ACM Workshop on Hot Topics in Networks","location":"Austin Texas","acronym":"HotNets '22","sponsor":["SIGCOMM ACM Special Interest Group on Data Communication"]},"container-title":["Proceedings of the 21st ACM Workshop on Hot Topics in Networks"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3563766.3564096","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3563766.3564096","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:48:50Z","timestamp":1750182530000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3563766.3564096"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,11,14]]},"references-count":64,"alternative-id":["10.1145\/3563766.3564096","10.1145\/3563766"],"URL":"https:\/\/doi.org\/10.1145\/3563766.3564096","relation":{},"subject":[],"published":{"date-parts":[[2022,11,14]]},"assertion":[{"value":"2022-11-14","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}