{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,5]],"date-time":"2026-03-05T15:34:29Z","timestamp":1772724869063,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":76,"publisher":"ACM","funder":[{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2024YFB4505800"],"award-info":[{"award-number":["2024YFB4505800"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62402411"],"award-info":[{"award-number":["62402411"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Guangdong Provincial Project","award":["2023QN10X252"],"award-info":[{"award-number":["2023QN10X252"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,6,21]]},"DOI":"10.1145\/3695053.3731025","type":"proceedings-article","created":{"date-parts":[[2025,6,20]],"date-time":"2025-06-20T16:43:11Z","timestamp":1750437791000},"page":"498-513","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Chimera: Communication Fusion for Hybrid Parallelism in Large Language Models"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-9861-0469","authenticated-orcid":false,"given":"Le","family":"Qin","sequence":"first","affiliation":[{"name":"The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6805-7669","authenticated-orcid":false,"given":"Junwei","family":"Cui","sequence":"additional","affiliation":[{"name":"The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6369-6389","authenticated-orcid":false,"given":"Weilin","family":"Cai","sequence":"additional","affiliation":[{"name":"The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4011-6668","authenticated-orcid":false,"given":"Jiayi","family":"Huang","sequence":"additional","affiliation":[{"name":"The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China"}]}],"member":"320","published-online":{"date-parts":[[2025,6,20]]},"reference":[{"key":"e_1_3_3_3_2_2","unstructured":"Jean-Baptiste Alayrac Jeff Donahue Pauline Luc Antoine Miech Iain Barr Yana Hasson Karel Lenc Arthur Mensch Katie Millicah Malcolm Reynolds Roman Ring Eliza Rutherford Serkan Cabi Tengda Han Zhitao Gong Sina Samangooei Marianne Monteiro Jacob Menick Sebastian Borgeaud Andrew Brock Aida Nematzadeh Sahand Sharifzadeh Mikolaj Binkowski Ricardo Barreira Oriol Vinyals Andrew Zisserman and Karen Simonyan. 2022. Flamingo: A Visual Language Model for Few-Shot Learning. Advances in Neural Information Processing Systems 35 (2022) 23716\u201323736."},{"key":"e_1_3_3_3_3_2","series-title":"(SC \u201922)","first-page":"1","volume-title":"Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis","author":"Aminabadi Reza\u00a0Yazdani","year":"2022","unstructured":"Reza\u00a0Yazdani Aminabadi, Samyam Rajbhandari, Ammar\u00a0Ahmad Awan, Cheng Li, Du Li, Elton Zheng, Olatunji Ruwase, Shaden Smith, Minjia Zhang, Jeff Rasley, and Yuxiong He. 2022. DeepSpeed-Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis(SC \u201922). IEEE, 1\u201315."},{"key":"e_1_3_3_3_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3620665.3640366"},{"key":"e_1_3_3_3_5_2","unstructured":"Shuai Bai Keqin Chen Xuejing Liu Jialin Wang Wenbin Ge Sibo Song Kai Dang Peng Wang Shijie Wang Jun Tang Humen Zhong Yuanzhi Zhu Mingkun Yang Zhaohai Li Jianqiang Wan Pengfei Wang Wei Ding Zheren Fu Yiheng Xu Jiabo Ye Xi Zhang Tianbao Xie Zesen Cheng Hang Zhang Zhibo Yang Haiyang Xu and Junyang Lin. 2025. Qwen2.5-VL Technical Report. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2502.13923 (2025)."},{"key":"e_1_3_3_3_6_2","unstructured":"Yoshua Bengio Geoffrey Hinton Andrew Yao Dawn Song Pieter Abbeel Yuval\u00a0Noah Harari Ya-Qin Zhang Lan Xue Shai Shalev-Shwartz Gillian Hadfield Jeff Clune Tegan Maharaj Frank Hutter At\u0131l\u0131m\u00a0G\u00fcne\u015f Baydin Sheila McIlraith Qiqi Gao Ashwin Acharya David Krueger Anca Dragan Philip Torr Stuart Russell Daniel Kahnemann Jan Brauner and S\u00f6ren Mindermann. 2023. Managing AI Risks in an Era of Rapid Progress. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2310.17688 (2023)."},{"key":"e_1_3_3_3_7_2","unstructured":"Tom Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared\u00a0D Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell Sandhini Agarwal Ariel Herbert-Voss Gretchen Krueger Tom Henighan Rewon Child Aditya Ramesh Daniel Ziegler Jeffrey Wu Clemens Winter Chris Hesse Mark Chen Eric Sigler Mateusz Litwin Scott Gray Benjamin Chess Jack Clark Christopher Berner Sam McCandlish Alec Radford Ilya Sutskever and Dario Amodei. 2020. Language Models Are Few-Shot Learners. Advances in Neural Information Processing Systems 33 (2020) 1877\u20131901."},{"key":"e_1_3_3_3_8_2","unstructured":"Weilin Cai Juyong Jiang Le Qin Junwei Cui Sunghun Kim and Jiayi Huang. 2024. Shortcut-connected Expert Parallelism for Accelerating Mixture of Experts. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2404.05019 (2024)."},{"key":"e_1_3_3_3_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/3620666.3651379"},{"key":"e_1_3_3_3_10_2","unstructured":"Mark Chen Jerry Tworek Heewoo Jun Qiming Yuan Henrique\u00a0Ponde de Oliveira\u00a0Pinto Jared Kaplan Harri Edwards Yuri Burda Nicholas Joseph Greg Brockman Alex Ray Raul Puri Gretchen Krueger Michael Petrov Heidy Khlaaf Girish Sastry Pamela Mishkin Brooke Chan Scott Gray Nick Ryder Mikhail Pavlov Alethea Power Lukasz Kaiser Mohammad Bavarian Clemens Winter Philippe Tillet Felipe\u00a0Petroski Such Dave Cummings Matthias Plappert Fotios Chantzis Elizabeth Barnes Ariel Herbert-Voss William\u00a0Hebgen Guss Alex Nichol Alex Paino Nikolas Tezak Jie Tang Igor Babuschkin Suchir Balaji Shantanu Jain William Saunders Christopher Hesse Andrew\u00a0N. Carr Jan Leike Josh Achiam Vedant Misra Evan Morikawa Alec Radford Matthew Knight Miles Brundage Mira Murati Katie Mayer Peter Welinder Bob McGrew Dario Amodei Sam McCandlish Ilya Sutskever and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2107.03374 (2021)."},{"key":"e_1_3_3_3_11_2","volume-title":"Proceedings of the Twelfth International Conference on Learning Representations","author":"Cheng Daixuan","year":"2024","unstructured":"Daixuan Cheng, Shaohan Huang, and Furu Wei. 2024. Adapting Large Language Models via Reading Comprehension. In Proceedings of the Twelfth International Conference on Learning Representations."},{"key":"e_1_3_3_3_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA56546.2023.10071117"},{"key":"e_1_3_3_3_13_2","unstructured":"OpenAI Community. 2019. GPT2 Medium. https:\/\/huggingface.co\/gpt2-medium\/resolve\/main\/config.json. [Online; Accessed 21-Nov-2024]."},{"key":"e_1_3_3_3_14_2","unstructured":"DeepSeek-AI. 2024. DeepSeek-V3 Technical Report. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2412.19437 (2024)."},{"key":"e_1_3_3_3_15_2","first-page":"4171","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171\u20134186."},{"key":"e_1_3_3_3_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA47549.2020.00056"},{"key":"e_1_3_3_3_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503221.3508418"},{"key":"e_1_3_3_3_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA52012.2021.00023"},{"key":"e_1_3_3_3_19_2","unstructured":"Zongle Huang Shupei Fan Chen Tang Xinyuan Lin Shuwen Deng and Yongpan Liu. 2024. Hecaton: Training and Finetuning Large Language Models with Scalable Chiplet Systems. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2407.05784 (2024)."},{"key":"e_1_3_3_3_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC42614.2022.9731633"},{"key":"e_1_3_3_3_21_2","volume-title":"Proceedings of the Hot Chips 30 Symposium (HCS)","author":"Ishii Alex","year":"2018","unstructured":"Alex Ishii, Denis Foley, Eric Anderson, Bill Dally, Glenn Dearth, Larry Dennison, Mark Hummel, and John Schafer. 2018. NVSwitch and DGX-2: NVLink-Switching Chip and Scale-Up Compute Server. In Proceedings of the Hot Chips 30 Symposium (HCS)."},{"key":"e_1_3_3_3_22_2","first-page":"711","volume-title":"Proceedings of Machine Learning and Systems","volume":"3","author":"Ivanov Andrei","year":"2021","unstructured":"Andrei Ivanov, Nikoli Dryden, Tal Ben-Nun, Shigang Li, and Torsten Hoefler. 2021. Data Movement Is All You Need: A Case Study on Optimizing Transformers. In Proceedings of Machine Learning and Systems, Vol.\u00a03. 711\u2013732."},{"key":"e_1_3_3_3_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/3662158.3662806"},{"key":"e_1_3_3_3_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/1810085.1810093"},{"key":"e_1_3_3_3_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503222.3507778"},{"key":"e_1_3_3_3_26_2","unstructured":"Albert\u00a0Q. Jiang Alexandre Sablayrolles Arthur Mensch Chris Bamford Devendra\u00a0Singh Chaplot Diego de\u00a0las Casas Florian Bressand Gianna Lengyel Guillaume Lample Lucile Saulnier L\u00e9lio\u00a0Renard Lavaud Marie-Anne Lachaux Pierre Stock Teven\u00a0Le Scao Thibaut Lavril Thomas Wang Timoth\u00e9e Lacroix and William\u00a0El Sayed. 2023. Mistral 7B. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2310.06825 (2023)."},{"key":"e_1_3_3_3_27_2","first-page":"74","volume-title":"Proceedings of Machine Learning and Systems","volume":"6","author":"Jiang Chenyu","year":"2024","unstructured":"Chenyu Jiang, Ye Tian, Zhen Jia, Chuan Wu, Yida Wang, and Shuai Zheng. 2024. Lancet: Accelerating Mixture-of-Experts Training by Overlapping Weight Gradient Computation and All-to-All Communication. In Proceedings of Machine Learning and Systems, Vol.\u00a06. 74\u201386."},{"key":"e_1_3_3_3_28_2","unstructured":"Juyong Jiang Fan Wang Jiasi Shen Sungju Kim and Sunghun Kim. 2024. A Survey on Large Language Models for Code Generation. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2406.00515 (2024)."},{"key":"e_1_3_3_3_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2013.6557149"},{"key":"e_1_3_3_3_30_2","series-title":"(NSDI\u201924)","first-page":"745","volume-title":"Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation","author":"Jiang Ziheng","year":"2024","unstructured":"Ziheng Jiang, Haibin Lin, Yinmin Zhong, Qi Huang, Yangrui Chen, Zhi Zhang, Yanghua Peng, Xiang Li, Cong Xie, Shibiao Nong, Yulu Jia, Sun He, Hongmin Chen, Zhihao Bai, Qi Hou, Shipeng Yan, Ding Zhou, Yiyao Sheng, Zhuo Jiang, Haohan Xu, Haoran Wei, Zhang Zhang, Pengfei Nie, Leqi Zou, Sida Zhao, Liang Xiang, Zherui Liu, Zhe Li, Xiaoying Jia, Jianxi Ye, Xin Jin, and Xin Liu. 2024. MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs. In Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation(NSDI\u201924). 745\u2013760."},{"key":"e_1_3_3_3_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/3579371.3589350"},{"key":"e_1_3_3_3_32_2","doi-asserted-by":"crossref","unstructured":"Norman\u00a0P Jouppi Doe\u00a0Hyun Yoon George Kurian Sheng Li Nishant Patil James Laudon Cliff Young and David Patterson. 2020. A Domain-Specific Supercomputer for Training Deep Neural Networks. Commun. ACM 63 7 (2020) 67\u201378.","DOI":"10.1145\/3360307"},{"key":"e_1_3_3_3_33_2","first-page":"341","volume-title":"Proceedings of Machine Learning and Systems","volume":"5","author":"Korthikanti Vijay\u00a0Anand","year":"2023","unstructured":"Vijay\u00a0Anand Korthikanti, Jared Casper, Sangkug Lym, Lawrence McAfee, Michael Andersch, Mohammad Shoeybi, and Bryan Catanzaro. 2023. Reducing Activation Recomputation in Large Transformer Models. In Proceedings of Machine Learning and Systems, Vol.\u00a05. 341\u2013353."},{"key":"e_1_3_3_3_34_2","unstructured":"Sameer Kumar and Norm Jouppi. 2020. Highly Available Data Parallel ML Training on Mesh Networks. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2011.03605 (2020)."},{"key":"e_1_3_3_3_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA57654.2024.00069"},{"key":"e_1_3_3_3_36_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Lepikhin Dmitry","year":"2021","unstructured":"Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, and Zhifeng Chen. 2021. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_3_3_37_2","volume-title":"Workshop on Advancing Neural Network Training: Computational Efficiency, Scalability, and Resource Optimization (WANT@NeurIPS 2023)","author":"Li Dacheng","year":"2023","unstructured":"Dacheng Li, Rulin Shao, Anze Xie, Eric\u00a0P Xing, Joseph\u00a0E Gonzalez, Ion Stoica, Xuezhe Ma, and Hao Zhang. 2023. LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers. In Workshop on Advancing Neural Network Training: Computational Efficiency, Scalability, and Resource Optimization (WANT@NeurIPS 2023)."},{"key":"e_1_3_3_3_38_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.acl-long.134"},{"key":"e_1_3_3_3_39_2","unstructured":"Heng Liao Bingyang Liu Xianping Chen Zhigang Guo Chuanning Cheng Jianbing Wang Xiangyu Chen Peng Dong Rui Meng Wenjie Liu Zhe Zhou Ziyang Zhang Yuhang Gai Cunle Qian Yi Xiong Zhongwu Cheng Jing Xia Yuli Ma Xi Chen Wenhua Du Shizhong Xiao Chungang Li Yong Qin Liudong Xiong Zhou Yu Lv Chen Lei Chen Buyun Wang Pei Wu Junen Gao Xiaochu Li Jian He Shizhuan Yan and Bill McColl. 2025. UB-Mesh: a Hierarchically Localized nD-FullMesh Datacenter Network Architecture. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2503.20377 (2025)."},{"key":"e_1_3_3_3_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/HCS55958.2022.9895479"},{"key":"e_1_3_3_3_41_2","volume-title":"Proceedings of the Twelfth International Conference on Learning Representations","author":"Liu Hao","year":"2024","unstructured":"Hao Liu, Matei Zaharia, and Pieter Abbeel. 2024. Ring Attention with Blockwise Transformers for Near-Infinite Context. In Proceedings of the Twelfth International Conference on Learning Representations."},{"key":"e_1_3_3_3_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISBI.2008.4541126"},{"key":"e_1_3_3_3_43_2","unstructured":"Microsoft. 2023. DeepSpeed. https:\/\/github.com\/microsoft\/DeepSpeed\/tree\/master\/deepspeed\/moe. [Online; Accessed 21-Nov-2024]."},{"key":"e_1_3_3_3_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/3458817.3476209"},{"key":"e_1_3_3_3_45_2","unstructured":"NVIDIA. 2020. NVIDIA A100 Tensor Core GPU Architecture. Volume 1.0: Whitepaper Part 1 2020 (2020) 82."},{"key":"e_1_3_3_3_46_2","unstructured":"NVIDIA. 2020. NVIDIA Collective Communication Library (NCCL) Documentation. https:\/\/docs.nvidia.com\/deeplearning\/nccl\/user-guide\/docs\/index.html. [Online; Accessed 22-Feb-2025]."},{"key":"e_1_3_3_3_47_2","unstructured":"NVIDIA. 2023. Megatron-LM. https:\/\/github.com\/NVIDIA\/Megatron-LM. [Online; Accessed 22-Feb-2025]."},{"key":"e_1_3_3_3_48_2","unstructured":"NVIDIA. 2024. NVIDIA DGX Platform. https:\/\/www.nvidia.com\/en-us\/data-center\/dgx-platform\/. [Online; Accessed 29-Jul-2024]."},{"key":"e_1_3_3_3_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA59077.2024.00019"},{"key":"e_1_3_3_3_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/HCS52781.2021.9567250"},{"key":"e_1_3_3_3_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/SC41406.2024.00094"},{"key":"e_1_3_3_3_52_2","series-title":"(FAST \u201925)","first-page":"155","volume-title":"Proceedings of the 23rd USENIX Conference on File and Storage Technologies","author":"Qin Ruoyu","year":"2025","unstructured":"Ruoyu Qin, Zheming Li, Weiran He, Mingxing Zhang, Yongwei Wu, Weimin Zheng, and Xinran Xu. 2025. Mooncake: Trading More Storage for Less Computation \u2014A KVCache-centric Architecture for Serving LLM Chatbot. In Proceedings of the 23rd USENIX Conference on File and Storage Technologies(FAST \u201925). 155\u2013170."},{"key":"e_1_3_3_3_53_2","unstructured":"Alec Radford Jeffrey Wu Rewon Child David Luan Dario Amodei and Ilya Sutskever. 2019. Language Models Are Unsupervised Multitask Learners. OpenAI Blog 1 8 (2019)."},{"key":"e_1_3_3_3_54_2","first-page":"18332","volume-title":"Proceedings of the 39th International Conference on Machine Learning","author":"Rajbhandari Samyam","year":"2022","unstructured":"Samyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza\u00a0Yazdani Aminabadi, Ammar\u00a0Ahmad Awan, Jeff Rasley, and Yuxiong He. 2022. DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale. In Proceedings of the 39th International Conference on Machine Learning. PMLR, 18332\u201318346."},{"key":"e_1_3_3_3_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA52012.2021.00049"},{"key":"e_1_3_3_3_56_2","doi-asserted-by":"publisher","DOI":"10.1145\/3470496.3527382"},{"key":"e_1_3_3_3_57_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3406703"},{"key":"e_1_3_3_3_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS48437.2020.00016"},{"key":"e_1_3_3_3_59_2","unstructured":"Ananda Samajdar Yuhao Zhu Paul Whatmough Matthew Mattina and Tushar Krishna. 2018. SCALE-Sim: Systolic CNN Accelerator Simulator. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1811.02883 (2018)."},{"key":"e_1_3_3_3_60_2","doi-asserted-by":"crossref","unstructured":"Robert\u00a0R Schaller. 1997. Moore\u2019s Law: Past Present and Future. IEEE Spectrum 34 6 (1997) 52\u201359.","DOI":"10.1109\/6.591665"},{"key":"e_1_3_3_3_61_2","series-title":"(NSDI \u201923)","first-page":"593","volume-title":"Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation","author":"Shah Aashaka","year":"2023","unstructured":"Aashaka Shah, Vijay Chidambaram, Meghan Cowan, Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Jacob Nelson, Olli Saarikivi, and Rachee Singh. 2023. TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches. In Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation(NSDI \u201923). 593\u2013612."},{"key":"e_1_3_3_3_62_2","first-page":"296","volume-title":"Proceedings of Machine Learning and Systems","volume":"6","author":"Sheng Ying","year":"2024","unstructured":"Ying Sheng, Shiyi Cao, Dacheng Li, Coleman Hooper, Nicholas Lee, Shuo Yang, Christopher Chou, Banghua Zhu, Lianmin Zheng, Kurt Keutzer, Joseph\u00a0E. Gonzalez, and Ion Stoica. 2024. S-LoRA: Serving Thousands of Concurrent LoRA Adapters. In Proceedings of Machine Learning and Systems, Vol.\u00a06. 296\u2013311."},{"key":"e_1_3_3_3_63_2","unstructured":"Mohammad Shoeybi Mostofa Patwary Raul Puri Patrick LeGresley Jared Casper and Bryan Catanzaro. 2019. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1909.08053 (2019)."},{"key":"e_1_3_3_3_64_2","doi-asserted-by":"publisher","DOI":"10.1145\/3577193.3593704"},{"key":"e_1_3_3_3_65_2","doi-asserted-by":"publisher","DOI":"10.1109\/HCS55958.2022.9895534"},{"key":"e_1_3_3_3_66_2","unstructured":"HPC-AI Tech. 2023. ColossalAI: Making Large AI Models Cheaper Faster and More Accessible. https:\/\/github.com\/hpcaitech\/ColossalAI. [Online; Accessed 22-Feb-2025]."},{"key":"e_1_3_3_3_67_2","doi-asserted-by":"crossref","unstructured":"Rajeev Thakur Rolf Rabenseifner and William Gropp. 2005. Optimization of Collective Communication Operations in MPICH. The International Journal of High Performance Computing Applications 19 1 (2005) 49\u201366.","DOI":"10.1177\/1094342005051521"},{"key":"e_1_3_3_3_68_2","unstructured":"Hugo Touvron Thibaut Lavril Gautier Izacard Xavier Martinet Marie-Anne Lachaux Timoth\u00e9e Lacroix Baptiste Rozi\u00e8re Naman Goyal Eric Hambro Faisal Azhar Aurelien Rodriguez Joulin Armand Edouard Grave and Guillaume Lample. 2023. LLaMA: Open and Efficient Foundation Language Models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2302.13971 (2023)."},{"key":"e_1_3_3_3_69_2","doi-asserted-by":"publisher","DOI":"10.1145\/3491101.3519665"},{"key":"e_1_3_3_3_70_2","unstructured":"Guanhua Wang Chengming Zhang Zheyu Shen Ang Li and Olatunji Ruwase. 2024. Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2409.15241 (2024)."},{"key":"e_1_3_3_3_71_2","series-title":"(ASPLOS \u201923)","first-page":"93","volume-title":"Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1","author":"Wang Shibo","year":"2023","unstructured":"Shibo Wang, Jinliang Wei, Amit Sabne, Andy Davis, Berkin Ilbeyi, Blake Hechtman, Dehao Chen, Karthik\u00a0Srinivasa Murthy, Marcello Maggioni, Qiao Zhang, Sameer Kumar, Tongfei Guo, Yuanzhong Xu, and Zongwei Zhou. 2023. Overlap Communication with Dependent Computation via Decomposition in Large Deep Learning Models. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1(ASPLOS \u201923). 93\u2013106."},{"key":"e_1_3_3_3_72_2","unstructured":"Lilian Weng. 2021. How to Train Really Large Models on Many GPUs? Lil\u2019Log (Sep 2021). https:\/\/lilianweng.github.io\/posts\/2021-09-25-train-large\/"},{"key":"e_1_3_3_3_73_2","unstructured":"Lilian Weng. 2023. Large Transformer Model Inference Optimization. Lil\u2019Log (Jan 2023). https:\/\/lilianweng.github.io\/posts\/2023-01-10-inference-optimization\/"},{"key":"e_1_3_3_3_74_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO61859.2024.00068"},{"key":"e_1_3_3_3_75_2","unstructured":"Chris Ying Sameer Kumar Dehao Chen Tao Wang and Youlong Cheng. 2018. Image Classification at Supercomputer Scale. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1811.06992 (2018)."},{"key":"e_1_3_3_3_76_2","series-title":"(OSDI \u201922)","first-page":"559","volume-title":"Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation","author":"Zheng Lianmin","year":"2022","unstructured":"Lianmin Zheng, Zhuohan Li, Hao Zhang, Yonghao Zhuang, Zhifeng Chen, Yanping Huang, Yida Wang, Yuanzhong Xu, Danyang Zhuo, Eric\u00a0P. Xing, Joseph\u00a0E. Gonzalez, and Ion Stoica. 2022. Alpa: Automating Inter-and Intra-Operator Parallelism for Distributed Deep Learning. In Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation(OSDI \u201922). 559\u2013578."},{"key":"e_1_3_3_3_77_2","series-title":"(OSDI \u201924)","first-page":"193","volume-title":"Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation","author":"Zhong Yinmin","year":"2024","unstructured":"Yinmin Zhong, Shengyu Liu, Junda Chen, Jianbo Hu, Yibo Zhu, Xuanzhe Liu, Xin Jin, and Hao Zhang. 2024. DistServe: Disaggregating Prefill and Decoding for Goodput-Optimized Large Language Model Serving. In Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation(OSDI \u201924). 193\u2013210."}],"event":{"name":"ISCA '25: Proceedings of the 52nd Annual International Symposium on Computer Architecture","location":"Tokyo Japan","acronym":"SIGARCH '25","sponsor":["SIGARCH ACM Special Interest Group on Computer Architecture"]},"container-title":["Proceedings of the 52nd Annual International Symposium on Computer Architecture"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3695053.3731025","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,21]],"date-time":"2025-06-21T11:01:24Z","timestamp":1750503684000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3695053.3731025"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,20]]},"references-count":76,"alternative-id":["10.1145\/3695053.3731025","10.1145\/3695053"],"URL":"https:\/\/doi.org\/10.1145\/3695053.3731025","relation":{},"subject":[],"published":{"date-parts":[[2025,6,20]]},"assertion":[{"value":"2025-06-20","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}