{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,17]],"date-time":"2025-11-17T12:08:33Z","timestamp":1763381313905,"version":"3.45.0"},"publisher-location":"New York, NY, USA","reference-count":24,"publisher":"ACM","funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62402025"],"award-info":[{"award-number":["62402025"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"National Key Research and Development Program of China","award":["2022YFB2901300"],"award-info":[{"award-number":["2022YFB2901300"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,11,17]]},"DOI":"10.1145\/3772356.3772387","type":"proceedings-article","created":{"date-parts":[[2025,11,17]],"date-time":"2025-11-17T12:02:48Z","timestamp":1763380968000},"page":"168-175","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["JEEVES: The Valet Who Masters the Art of Cross-DC Training Scheduling"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-7490-5568","authenticated-orcid":false,"given":"Haotian","family":"Deng","sequence":"first","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-2773-3742","authenticated-orcid":false,"given":"Xuebin","family":"Song","sequence":"additional","affiliation":[{"name":"Beihang University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5274-5512","authenticated-orcid":false,"given":"Menghao","family":"Zhang","sequence":"additional","affiliation":[{"name":"Beihang University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3481-8447","authenticated-orcid":false,"given":"Yuan","family":"Yang","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4847-4585","authenticated-orcid":false,"given":"Mingwei","family":"Xu","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2025,11,17]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Dacoso GmbH. n.d.. Managed DCI \u2014 High Performance Connectivity Between Data Centers. Accessed: 2025-06-22."},{"key":"e_1_3_2_1_2_1","volume-title":"Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies","volume":"1","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). 4171\u20134186."},{"key":"e_1_3_2_1_3_1","unstructured":"Arthur Douillard Yanislav Donchev Keith Rush Satyen Kale Zachary Charles Zachary Garrett Gabriel Teston Dave Lacey Ross McIlroy Jiajun Shen et al. 2025. Streaming diloco with overlapping communication: Towards a distributed free lunch. arXiv preprint arXiv:2501.18512 (2025)."},{"key":"e_1_3_2_1_4_1","volume-title":"Diloco: Distributed low-communication training of language models. arXiv preprint arXiv:2311.08105","author":"Douillard Arthur","year":"2023","unstructured":"Arthur Douillard, Qixuan Feng, Andrei A Rusu, Rachita Chhaparia, Yani Donchev, Adhiguna Kuncoro, Marc'Aurelio Ranzato, Arthur Szlam, and Jiajun Shen. 2023. Diloco: Distributed low-communication training of language models. arXiv preprint arXiv:2311.08105 (2023)."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3437801.3441593"},{"key":"e_1_3_2_1_6_1","unstructured":"Aaron Grattafiori Abhimanyu Dubey Abhinav Jauhri Abhinav Pandey Abhishek Kadian Ahmad Al-Dahle Aiesha Letman Akhil Mathur Alan Schelten Alex Vaughan et al. 2024. The llama 3 herd of models. arXiv preprint arXiv:2407.21783 (2024)."},{"key":"e_1_3_2_1_7_1","unstructured":"Gurobi Optimization LLC. 2024. Gurobi Optimizer Reference Manual. https:\/\/www.gurobi.com"},{"key":"e_1_3_2_1_8_1","volume-title":"Pipedream: Fast and efficient pipeline parallel dnn training. arXiv preprint arXiv:1806.03377","author":"Harlap Aaron","year":"2018","unstructured":"Aaron Harlap, Deepak Narayanan, Amar Phanishayee, Vivek Seshadri, Nikhil Devanur, Greg Ganger, and Phil Gibbons. 2018. Pipedream: Fast and efficient pipeline parallel dnn training. arXiv preprint arXiv:1806.03377 (2018)."},{"key":"e_1_3_2_1_9_1","volume-title":"The communication challenge for MPP: Intel Paragon and Meiko CS-2. Parallel computing 20, 3","author":"Hockney Roger W","year":"1994","unstructured":"Roger W Hockney. 1994. The communication challenge for MPP: Intel Paragon and Meiko CS-2. Parallel computing 20, 3 (1994), 389\u2013398."},{"key":"e_1_3_2_1_10_1","volume-title":"Gpipe: Efficient training of giant neural networks using pipeline parallelism. Advances in neural information processing systems 32","author":"Huang Yanping","year":"2019","unstructured":"Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Dehao Chen, Mia Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V Le, Yonghui Wu, et al. 2019. Gpipe: Efficient training of giant neural networks using pipeline parallelism. Advances in neural information processing systems 32 (2019)."},{"key":"e_1_3_2_1_11_1","volume-title":"21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24)","author":"Jiang Ziheng","year":"2024","unstructured":"Ziheng Jiang, Haibin Lin, Yinmin Zhong, Qi Huang, Yangrui Chen, Zhi Zhang, Yanghua Peng, Xiang Li, Cong Xie, Shibiao Nong, et al. 2024. {MegaScale}: Scaling large language model training to more than 10,000 {GPUs}. In 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). 745\u2013760."},{"key":"e_1_3_2_1_12_1","volume-title":"Scaling laws for neural language models. arXiv preprint arXiv:2001.08361","author":"Kaplan Jared","year":"2020","unstructured":"Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. 2020. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361 (2020)."},{"key":"e_1_3_2_1_13_1","first-page":"48","article-title":"Breadth-first pipeline parallelism","volume":"5","author":"Lamy-Poirier Joel","year":"2023","unstructured":"Joel Lamy-Poirier. 2023. Breadth-first pipeline parallelism. Proceedings of Machine Learning and Systems 5 (2023), 48\u201367.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3458817.3476145"},{"key":"e_1_3_2_1_15_1","unstructured":"Aixin Liu Bei Feng Bing Xue Bingxuan Wang Bochao Wu Chengda Lu Chenggang Zhao Chengqi Deng Chenyu Zhang Chong Ruan Damai Dai and .... 2024. DeepSeek-V3 Technical Report. arXiv preprint arXiv:2412.19437 (Dec. 2024). Technical Report."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3581784.3607073"},{"key":"e_1_3_2_1_17_1","volume-title":"International Conference on Machine Learning. PMLR, 7937\u20137947","author":"Narayanan Deepak","year":"2021","unstructured":"Deepak Narayanan, Amar Phanishayee, Kaiyu Shi, Xie Chen, and Matei Zaharia. 2021. Memory-efficient pipeline-parallel dnn training. In International Conference on Machine Learning. PMLR, 7937\u20137947."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3458817.3476209"},{"key":"e_1_3_2_1_19_1","unstructured":"NVIDIA. 2024. Turbocharge LLM Training across Long-Haul Data Center Networks with the NVIDIA NeMo Framework. https:\/\/developer.nvidia.com\/blog\/turbocharge-llm-training-across-long-haul-data-center-networks-with-nvidia-nemo-framework\/"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3406703"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3387514.3405899"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3642970.3655843"},{"key":"e_1_3_2_1_23_1","volume-title":"Attention is all you need. Advances in neural information processing systems 30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017)."},{"key":"e_1_3_2_1_24_1","volume-title":"Cross-region Model Training with Communication-Computation Overlapping and Delay Compensation. arXiv preprint arXiv:2504.17672","author":"Zhu Ying","year":"2025","unstructured":"Ying Zhu, Yang Xu, Hongli Xu, Yunming Liao, Zhiwei Yao, and Liusheng Huang. 2025. Cross-region Model Training with Communication-Computation Overlapping and Delay Compensation. arXiv preprint arXiv:2504.17672 (2025)."}],"event":{"name":"HotNets '25: 24th ACM Workshop on Hot Topics in Networks","location":"UMD Campus College Park MD USA","acronym":"HotNets '25","sponsor":["SIGCOMM ACM Special Interest Group on Data Communication"]},"container-title":["Proceedings of the 24th ACM Workshop on Hot Topics in Networks"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3772356.3772387","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,17]],"date-time":"2025-11-17T12:05:31Z","timestamp":1763381131000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3772356.3772387"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,17]]},"references-count":24,"alternative-id":["10.1145\/3772356.3772387","10.1145\/3772356"],"URL":"https:\/\/doi.org\/10.1145\/3772356.3772387","relation":{},"subject":[],"published":{"date-parts":[[2025,11,17]]},"assertion":[{"value":"2025-11-17","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}