{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,8]],"date-time":"2025-12-08T21:31:28Z","timestamp":1765229488885,"version":"3.46.0"},"publisher-location":"New York, NY, USA","reference-count":74,"publisher":"ACM","funder":[{"DOI":"10.13039\/501100000038","name":"Natural Sciences and Engineering Research Council of Canada","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,12,15]]},"DOI":"10.1145\/3721462.3730950","type":"proceedings-article","created":{"date-parts":[[2025,12,8]],"date-time":"2025-12-08T19:56:49Z","timestamp":1765223809000},"page":"126-139","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["FreeRide: Harvesting Bubbles in Pipeline Parallelism"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7075-1764","authenticated-orcid":false,"given":"Jiashu","family":"Zhang","sequence":"first","affiliation":[{"name":"University of Waterloo, Waterloo, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-4227-5596","authenticated-orcid":false,"given":"Zihan","family":"Pan","sequence":"additional","affiliation":[{"name":"University of Waterloo, Waterloo, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-1968-1180","authenticated-orcid":false,"given":"Molly Yiming","family":"Xu","sequence":"additional","affiliation":[{"name":"University of Waterloo, Waterloo, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-7036-0397","authenticated-orcid":false,"given":"Khuzaima","family":"Daudjee","sequence":"additional","affiliation":[{"name":"University of Waterloo, Waterloo, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9706-6177","authenticated-orcid":false,"given":"Sihang","family":"Liu","sequence":"additional","affiliation":[{"name":"University of Waterloo, Waterloo, Canada"}]}],"member":"320","published-online":{"date-parts":[[2025,12,14]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Price of AWS G4 instances. https:\/\/aws.amazon.com\/ec2\/instance-types\/g4\/","year":"2024","unstructured":"Amazon. Price of AWS G4 instances. https:\/\/aws.amazon.com\/ec2\/instance-types\/g4\/, 2024."},{"key":"e_1_3_2_1_2_1","volume-title":"Price of AWS P4 instances. https:\/\/aws.amazon.com\/ec2\/instance-types\/p4\/","year":"2024","unstructured":"Amazon. Price of AWS P4 instances. https:\/\/aws.amazon.com\/ec2\/instance-types\/p4\/, 2024."},{"key":"e_1_3_2_1_3_1","volume-title":"14th USENIX Symposium on Operating Systems Design and Implementation (OSDI)","author":"Bai Zhihao","year":"2020","unstructured":"Zhihao Bai, Zhen Zhang, Yibo Zhu, and Xin Jin. PipeSwitch: Fast pipelined context switching for deep learning applications. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2020. 10.5555\/3488766.3488794"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCC.2014.51"},{"key":"e_1_3_2_1_5_1","volume-title":"Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)","author":"Chen Quan","year":"2017","unstructured":"Quan Chen, Hailong Yang, Minyi Guo, Ram Srivatsa Kannan, Jason Mars, and Lingjia Tang. Prophet: Precise QoS prediction on non-preemptive accelerators to improve utilization in warehouse-scale computers. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2017. 10.1145\/3037697.3037700"},{"key":"e_1_3_2_1_6_1","volume-title":"USENIX Annual Technical Conference (ATC)","author":"Choi Sangjin","year":"2023","unstructured":"Sangjin Choi, Inhoe Koo, Jeongseob Ahn, Myeongjae Jeon, and Youngjin Kwon. EnvPipe: Performance-preserving DNN training framework for saving energy. In USENIX Annual Technical Conference (ATC), 2023. https:\/\/www.usenix.org\/conference\/atc23\/presentation\/choi."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.5555\/3648699.3648939"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1327452.1327492"},{"key":"e_1_3_2_1_9_1","volume-title":"https:\/\/github.com\/microsoft\/DeepSpeed\/tree\/v0.12.2","author":"Deepspeed","year":"2023","unstructured":"DeepSpeed. Deepspeed 0.12.2. https:\/\/github.com\/microsoft\/DeepSpeed\/tree\/v0.12.2, 2023."},{"key":"e_1_3_2_1_10_1","volume-title":"Pipeline Parallelism in DeepSpeed. https:\/\/www.deepspeed.ai\/tutorials\/pipeline\/","year":"2023","unstructured":"DeepSpeed. Pipeline Parallelism in DeepSpeed. https:\/\/www.deepspeed.ai\/tutorials\/pipeline\/, 2023."},{"key":"e_1_3_2_1_11_1","volume-title":"Docker security. https:\/\/docs.docker.com\/engine\/security\/","year":"2024","unstructured":"Docker. Docker security. https:\/\/docs.docker.com\/engine\/security\/, 2024."},{"key":"e_1_3_2_1_12_1","volume-title":"Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP)","author":"Fan Shiqing","year":"2021","unstructured":"Shiqing Fan, Yi Rong, Chen Meng, Zongyan Cao, Siyu Wang, Zhen Zheng, Chuan Wu, Guoping Long, Jun Yang, Lixue Xia, Lansong Diao, Xiaoyong Liu, and Wei Lin. DAPPLE: a pipelined data parallel approach for training large models. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2021. 10.1145\/3437801.3441593"},{"key":"e_1_3_2_1_13_1","volume-title":"8th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2011","author":"Ghodsi Ali","year":"1972","unstructured":"Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, and Ion Stoica. Dominant resource fairness: Fair allocation of multiple resource types. In 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2011. 10.5555\/1972457.1972490"},{"key":"e_1_3_2_1_14_1","unstructured":"GNU. Job control signals (the GNU C library). https:\/\/www.gnu.org\/software\/libc\/manual\/html_node\/Job-Control-Signals.html."},{"key":"e_1_3_2_1_15_1","volume-title":"The desperate hunt for the A.I. boom's most indispensable prize. https:\/\/www.nytimes.com\/2023\/08\/16\/technology\/ai-gpu-chips-shortage.html","author":"Griffith Erin","year":"2023","unstructured":"Erin Griffith. The desperate hunt for the A.I. boom's most indispensable prize. https:\/\/www.nytimes.com\/2023\/08\/16\/technology\/ai-gpu-chips-shortage.html, 2023."},{"key":"e_1_3_2_1_16_1","volume-title":"gRPC - a high performance, open source universal RPC framework. https:\/\/grpc.io\/","author":"RPC.","year":"2024","unstructured":"gRPC. gRPC - a high performance, open source universal RPC framework. https:\/\/grpc.io\/, 2024."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2024.3470074"},{"key":"e_1_3_2_1_18_1","volume-title":"16th USENIX Symposium on Operating Systems Design and Implementation (OSDI)","author":"Han Mingcong","year":"2022","unstructured":"Mingcong Han, Hanze Zhang, Rong Chen, and Haibo Chen. Microsecond-scale preemption for concurrent GPU-accelerated DNN inferences. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2022. https:\/\/www.usenix.org\/conference\/osdi22\/presentation\/han."},{"key":"e_1_3_2_1_19_1","first-page":"777","volume-title":"17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23)","author":"Hu Qinghao","year":"2023","unstructured":"Qinghao Hu, Zhisheng Ye, Meng Zhang, Qiaoling Chen, Peng Sun, Yonggang Wen, and Tianwei Zhang. Hydro: Surrogate-Based hyperparameter tuning service in datacenters. In 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23), pages 757\u2013777, Boston, MA, July 2023. USENIX Association."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2502.05043"},{"key":"e_1_3_2_1_21_1","volume-title":"Advances in Neural Information Processing Systems (NeurIPS)","author":"Huang Yanping","year":"2019","unstructured":"Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Dehao Chen, Mia Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V Le, Yonghui Wu, and zhifeng Chen. GPipe: Efficient training of giant neural networks using pipeline parallelism. In Advances in Neural Information Processing Systems (NeurIPS), 2019. 10.5555\/3454287.3454297"},{"key":"e_1_3_2_1_22_1","volume-title":"Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA)","author":"Jouppi Norman P.","year":"2017","unstructured":"Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA), 2017. 10.1145\/3079856.3080246"},{"key":"e_1_3_2_1_23_1","volume-title":"nanoGPT: The simplest, fastest repository for training\/finetuning medium-sized GPTs. https:\/\/github.com\/karpathy\/nanoGPT","author":"Karpathy Andrej","year":"2024","unstructured":"Andrej Karpathy. nanoGPT: The simplest, fastest repository for training\/finetuning medium-sized GPTs. https:\/\/github.com\/karpathy\/nanoGPT, 2024."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2004.09910"},{"key":"e_1_3_2_1_25_1","volume-title":"IEEE International Conference on Consumer Electronics - Asia (ICCE-Asia), 2020","author":"Kim Taehun","year":"2020","unstructured":"Taehun Kim and Youngjoo Shin. GPU side-channel attacks are everywhere: A survey. In IEEE International Conference on Consumer Electronics - Asia (ICCE-Asia), 2020. 10.1109\/ICCE-Asia49877.2020.9276805"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2009.263"},{"key":"e_1_3_2_1_27_1","volume-title":"Pricing of Lambda. https:\/\/lambdalabs.com\/service\/gpu-cloud#pricing","year":"2024","unstructured":"Lambda. Pricing of Lambda. https:\/\/lambdalabs.com\/service\/gpu-cloud#pricing, 2024."},{"key":"e_1_3_2_1_28_1","volume-title":"OpenAI's GPT-3 language model: A technical overview. https:\/\/lambdalabs.com\/blog\/demystifying-gpt-3","author":"Li Chuan","year":"2020","unstructured":"Chuan Li. OpenAI's GPT-3 language model: A technical overview. https:\/\/lambdalabs.com\/blog\/demystifying-gpt-3, 2020."},{"key":"e_1_3_2_1_29_1","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","author":"Li Shigang","year":"2021","unstructured":"Shigang Li and Torsten Hoefler. Chimera: Efficiently training large-scale neural networks with bidirectional pipelines. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2021. 10.1145\/3458817.3476145"},{"key":"e_1_3_2_1_30_1","volume-title":"2021 USENIX Annual Technical Conference (ATC)","author":"Lim Gangmuk","year":"2021","unstructured":"Gangmuk Lim, Jeongseob Ahn, Wencong Xiao, Youngjin Kwon, and Myeongjae Jeon. Zico: Efficient GPU memory sharing for concurrent DNN training. In 2021 USENIX Annual Technical Conference (ATC), 2021. https:\/\/www.usenix.org\/conference\/atc21\/presentation\/lim."},{"key":"e_1_3_2_1_31_1","volume-title":"Proceedings of the 13th Symposium on Cloud Computing (SoCC)","author":"Liu Haotian","year":"2022","unstructured":"Haotian Liu, Bo Tang, Jiashu Zhang, Yangshen Deng, Xiao Yan, Xinying Zheng, Qiaomu Shen, Dan Zeng, Zunyao Mao, Chaozu Zhang, Zhengxin You, Zhihao Wang, Runzhe Jiang, Fang Wang, Man Lung Yiu, Huan Li, Mingji Han, Qian Li, and Zhenghai Luo. GHive: Accelerating analytical query processing in Apache Hive via CPU-GPU heterogeneous computing. In Proceedings of the 13th Symposium on Cloud Computing (SoCC), 2022. 10.1145\/3542929.3563503"},{"key":"e_1_3_2_1_32_1","volume-title":"IEEE Security and Privacy Workshops (SPW), 2019","author":"Liu Sihang","year":"2019","unstructured":"Sihang Liu, Yizhou Wei, Jianfeng Chi, Faysal Hossain Shezan, and Yuan Tian. Side channel attacks in computation offloading systems with GPU virtualization. In IEEE Security and Privacy Workshops (SPW), 2019. 10.1109\/SPW.2019.00037"},{"key":"e_1_3_2_1_33_1","volume-title":"Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)","author":"Liu Zihan","year":"2022","unstructured":"Zihan Liu, Jingwen Leng, Zhihui Zhang, Quan Chen, Chao Li, and Minyi Guo. VELTAIR: Towards high-performance multi-tenant deep learning services via adaptive compilation and scheduling. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2022. 10.1145\/3503222.3507752"},{"key":"e_1_3_2_1_34_1","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","author":"Liu Ziming","year":"2023","unstructured":"Ziming Liu, Shenggan Cheng, Haotian Zhou, and Yang You. Hanayo: Harnessing wave-like pipeline parallelism for enhanced large model training efficiency. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2023. 10.1145\/3581784.3607073"},{"key":"e_1_3_2_1_35_1","volume-title":"MTIA v1: Meta's first-generation AI inference accelerator. https:\/\/ai.meta.com\/blog\/meta-training-inference-accelerator-AI-MTIA\/","year":"2023","unstructured":"Meta. MTIA v1: Meta's first-generation AI inference accelerator. https:\/\/ai.meta.com\/blog\/meta-training-inference-accelerator-AI-MTIA\/, 2023."},{"key":"e_1_3_2_1_36_1","volume-title":"Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (CCS)","author":"Naghibijouybari Hoda","year":"2018","unstructured":"Hoda Naghibijouybari, Ajaya Neupane, Zhiyun Qian, and Nael Abu-Ghazaleh. Rendered insecure: GPU side channel attacks are practical. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (CCS), 2018. 10.1145\/3243734.3243831"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/TDSC.2019.2944624"},{"key":"e_1_3_2_1_38_1","volume-title":"Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP)","author":"Narayanan Deepak","year":"2019","unstructured":"Deepak Narayanan, Aaron Harlap, Amar Phanishayee, Vivek Seshadri, Nikhil R. Devanur, Gregory R. Ganger, Phillip B. Gibbons, and Matei Zaharia. PipeDream: Generalized pipeline parallelism for DNN training. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP), 2019. 10.1145\/3341301.3359646"},{"key":"e_1_3_2_1_39_1","volume-title":"Proceedings of the 38th International Conference on Machine Learning (ICML), 2021","author":"Narayanan Deepak","year":"2006","unstructured":"Deepak Narayanan, Amar Phanishayee, Kaiyu Shi, Xie Chen, and Matei Zaharia. Memory-efficient pipeline-parallel DNN training. In Proceedings of the 38th International Conference on Machine Learning (ICML), 2021. 10.48550\/arXiv.2006.09503"},{"issue":"5","key":"e_1_3_2_1_40_1","doi-asserted-by":"crossref","first-page":"134","DOI":"10.1145\/3727200.3727220","article-title":"Towards sustainable large language model serving","volume":"4","author":"Nguyen Sophia","year":"2024","unstructured":"Sophia Nguyen, Beihao Zhou, Yi Ding, and Sihang Liu. Towards sustainable large language model serving. ACM SIGENERGY Energy Informatics Review, 4(5):134\u2013140, 2024.","journal-title":"ACM SIGENERGY Energy Informatics Review"},{"key":"e_1_3_2_1_41_1","volume-title":"Image resize and watermarking example using nvJPEG. https:\/\/github.com\/NVIDIA\/CUDALibrarySamples\/tree\/master\/nvJPEG\/Image-Resize-WaterMark","year":"2019","unstructured":"Nvidia. Image resize and watermarking example using nvJPEG. https:\/\/github.com\/NVIDIA\/CUDALibrarySamples\/tree\/master\/nvJPEG\/Image-Resize-WaterMark, 2019."},{"key":"e_1_3_2_1_42_1","volume-title":"CUDA C programming guide. https:\/\/docs.nvidia.com\/cuda\/cuda-c-programming-guide\/","year":"2024","unstructured":"Nvidia. CUDA C programming guide. https:\/\/docs.nvidia.com\/cuda\/cuda-c-programming-guide\/, 2024."},{"key":"e_1_3_2_1_43_1","volume-title":"Multi-process service. https:\/\/docs.nvidia.com\/deploy\/mps\/index.html","year":"2024","unstructured":"Nvidia. Multi-process service. https:\/\/docs.nvidia.com\/deploy\/mps\/index.html, 2024."},{"key":"e_1_3_2_1_44_1","volume-title":"Nvidia multi-instance GPU memory protection. https:\/\/docs.nvidia.com\/deploy\/mps\/index.html#memory-protection","year":"2024","unstructured":"Nvidia. Nvidia multi-instance GPU memory protection. https:\/\/docs.nvidia.com\/deploy\/mps\/index.html#memory-protection, 2024."},{"key":"e_1_3_2_1_45_1","unstructured":"Nvidia. Nvidia multi-instance GPU user guide. http:\/\/docs.nvidia.com\/datacenter\/tesla\/mig-user-guide\/index.html 2024."},{"key":"e_1_3_2_1_46_1","volume-title":"Proceedings of Machine Learning and Systems (MLSys)","author":"Osawa Kazuki","year":"2023","unstructured":"Kazuki Osawa, Shigang Li, and Torsten Hoefler. PipeFisher: Efficient training of large language models using pipelining and fisher information matrices. In Proceedings of Machine Learning and Systems (MLSys), 2023. 10.48550\/arXiv.2211.14133"},{"key":"e_1_3_2_1_47_1","volume-title":"The PageRank citation ranking: Bring order to the web. https:\/\/www.cis.upenn.edu\/~mkearns\/teaching\/NetworkedLife\/pagerank.pdf","author":"Page Lawrence","year":"1998","unstructured":"Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The PageRank citation ranking: Bring order to the web. https:\/\/www.cis.upenn.edu\/~mkearns\/teaching\/NetworkedLife\/pagerank.pdf, 1998."},{"key":"e_1_3_2_1_48_1","volume-title":"G-Safe: Safe GPU sharing in multi-tenant environments. arXiv preprint arXiv:2401.09290","author":"Pavlidakis Manos","year":"2024","unstructured":"Manos Pavlidakis, Giorgos Vasiliadis, Stelios Mavridis, Anargyros Argyros, Antony Chazapis, and Angelos Bilas. G-Safe: Safe GPU sharing in multi-tenant environments. arXiv preprint arXiv:2401.09290, 2024. https:\/\/arxiv.org\/abs\/2401.09290."},{"key":"e_1_3_2_1_49_1","unstructured":"PyTorch. Models and pre-trained weights \u2014 Torchvision main documentation. https:\/\/pytorch.org\/vision\/main\/models.html."},{"key":"e_1_3_2_1_50_1","unstructured":"PyTorch. Pytorch profiler. https:\/\/pytorch.org\/tutorials\/recipes\/recipes\/profiler_recipe.html."},{"key":"e_1_3_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2401.10241"},{"key":"e_1_3_2_1_52_1","volume-title":"Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD)","author":"Rasley Jeff","year":"2020","unstructured":"Jeff Rasley, Samyam Rajbhandari, Olatunji Ruwase, and Yuxiong He. DeepSpeed: System optimizations enable training deep learning models with over 100 billion parameters. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), 2020. 10.1145\/3394486.3406703"},{"key":"e_1_3_2_1_53_1","volume-title":"Turing-NLG: A 17-billion-parameter language model by Microsoft. https:\/\/www.microsoft.com\/en-us\/research\/blog\/turing-nlg-a-17-billion-parameter-language-model-by-microsoft\/","author":"Rosset Corby","year":"2020","unstructured":"Corby Rosset. Turing-NLG: A 17-billion-parameter language model by Microsoft. https:\/\/www.microsoft.com\/en-us\/research\/blog\/turing-nlg-a-17-billion-parameter-language-model-by-microsoft\/, 2020."},{"key":"e_1_3_2_1_54_1","volume-title":"RunPod - The cloud built for AI. https:\/\/www.runpod.io\/","year":"2024","unstructured":"RunPod. RunPod - The cloud built for AI. https:\/\/www.runpod.io\/, 2024."},{"key":"e_1_3_2_1_55_1","first-page":"9","volume-title":"2023 IEEE High Performance Extreme Computing Conference (HPEC)","author":"Samsi Siddharth","unstructured":"Siddharth Samsi, Dan Zhao, Joseph McDonald, Baolin Li, Adam Michaleas, Michael Jones, William Bergeron, Jeremy Kepner, Devesh Tiwari, and Vijay Gadepally. From words to watts: Benchmarking the energy costs of large language model inference. In 2023 IEEE High Performance Extreme Computing Conference (HPEC), pages 1\u20139. IEEE, 2023."},{"key":"e_1_3_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1909.08053"},{"key":"e_1_3_2_1_57_1","doi-asserted-by":"publisher","unstructured":"Shaden Smith Mostofa Patwary Brandon Norick Patrick LeGresley Samyam Rajbhandari Jared Casper Zhun Liu Shrimai Prabhumoye George Zerveas Vijay Korthikanti Elton Zhang Rewon Child Reza Yazdani Aminabadi Julie Bernauer Xia Song Mohammad Shoeybi Yuxiong He Michael Houston Saurabh Tiwary and Bryan Catanzaro. Using DeepSpeed and Megatron to train Megatron-Turing NLG 530B A large-scale generative language model. arXiv preprint arXiv:2201.11990 2022. 10.48550\/arXiv.2201.11990","DOI":"10.48550\/arXiv.2201.11990"},{"key":"e_1_3_2_1_58_1","volume-title":"Dynamollm: Designing llm inference clusters for performance and energy efficiency. arXiv preprint arXiv:2408.00741","author":"Stojkovic Jovan","year":"2024","unstructured":"Jovan Stojkovic, Chaojie Zhang, \u00cd\u00f1igo Goiri, Josep Torrellas, and Esha Choukse. Dynamollm: Designing llm inference clusters for performance and energy efficiency. arXiv preprint arXiv:2408.00741, 2024."},{"key":"e_1_3_2_1_59_1","volume-title":"Proceedings of the 4th Workshop on Machine Learning and Systems","author":"Strati Foteini","year":"2024","unstructured":"Foteini Strati, Paul Elvinger, Tolga Kerimoglu, and Ana Klimovic. ML training with cloud GPU shortages: Is cross-region the answer? In Proceedings of the 4th Workshop on Machine Learning and Systems, 2024. 10.1145\/3642970.3655843"},{"key":"e_1_3_2_1_60_1","volume-title":"Proceedings of the Nineteenth European Conference on Computer Systems (EuroSys)","author":"Strati Foteini","year":"2024","unstructured":"Foteini Strati, Xianzhe Ma, and Ana Klimovic. Orion: Interference-aware, finegrained GPU sharing for ml applications. In Proceedings of the Nineteenth European Conference on Computer Systems (EuroSys), 2024. 10.1145\/3627703.3629578"},{"key":"e_1_3_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2402.03791"},{"key":"e_1_3_2_1_62_1","volume-title":"20th USENIX Symposium on Networked Systems Design and Implementation (NSDI)","author":"Thorpe John","year":"2023","unstructured":"John Thorpe, Pengzhan Zhao, Jonathan Eyolfson, Yifan Qiao, Zhihao Jia, Minjia Zhang, Ravi Netravali, and Guoqing Harry Xu. Bamboo: Making preemptible instances resilient for affordable training of large DNNs. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2023. https:\/\/www.usenix.org\/conference\/nsdi23\/presentation\/thorpe."},{"key":"e_1_3_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2207.02852"},{"key":"e_1_3_2_1_64_1","volume-title":"20th USENIX Symposium on Networked Systems Design and Implementation (NSDI)","author":"Wu Bingyang","year":"2023","unstructured":"Bingyang Wu, Zili Zhang, Zhihao Bai, Xuanzhe Liu, and Xin Jin. Transparent GPU sharing in container clouds for deep learning workloads. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2023. https:\/\/www.usenix.org\/conference\/nsdi23\/presentation\/wu."},{"key":"e_1_3_2_1_65_1","volume-title":"13th USENIX Symposium on Operating Systems Design and Implementation (OSDI)","author":"Xiao Wencong","year":"2018","unstructured":"Wencong Xiao, Romil Bhardwaj, Ramachandran Ramjee, Muthian Sivathanu, Nipun Kwatra, Zhenhua Han, Pratyush Patel, Xuan Peng, Hanyu Zhao, Quanlu Zhang, Fan Yang, and Lidong Zhou. Gandiva: Introspective cluster scheduling for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2018. 10.5555\/3291168.3291212"},{"key":"e_1_3_2_1_66_1","volume-title":"14th USENIX Symposium on Operating Systems Design and Implementation (OSDI)","author":"Xiao Wencong","year":"2020","unstructured":"Wencong Xiao, Shiru Ren, Yong Li, Yang Zhang, Pengyang Hou, Zhi Li, Yihui Feng, Wei Lin, and Yangqing Jia. AntMan: Dynamic scaling on GPU clusters for deep learning. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2020. 10.5555\/3488766.3488796"},{"key":"e_1_3_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1145\/3283450"},{"key":"e_1_3_2_1_68_1","volume-title":"Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics","author":"Yang Jaewon","year":"2012","unstructured":"Jaewon Yang and Jure Leskovec. Defining and evaluating network communities based on ground-truth. In Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics, 2012. 10.1145\/2350190.2350193"},{"key":"e_1_3_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1804.05039"},{"key":"e_1_3_2_1_70_1","volume-title":"Proceedings of Machine Learning and Systems (MLSys), 2020","author":"Yu Peifeng","year":"2020","unstructured":"Peifeng Yu and Mosharaf Chowdhury. Fine-grained GPU sharing primitives for deep learning applications. In Proceedings of Machine Learning and Systems (MLSys), 2020. https:\/\/proceedings.mlsys.org\/paper_files\/paper\/2020\/hash\/d9cd83bc91b8c36a0c7c0fcca59228f2-Abstract.html."},{"key":"e_1_3_2_1_71_1","doi-asserted-by":"publisher","unstructured":"Susan Zhang Stephen Roller Naman Goyal Mikel Artetxe Moya Chen Shuohui Chen Christopher Dewan Mona Diab Xian Li Xi Victoria Lin Todor Mihaylov Myle Ott Sam Shleifer Kurt Shuster Daniel Simig Punit Singh Koura Anjali Sridhar Tianlu Wang and Luke Zettlemoyer. OPT: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068 2022. 10.48550\/arXiv.2205.01068","DOI":"10.48550\/arXiv.2205.01068"},{"key":"e_1_3_2_1_72_1","volume-title":"USENIX Annual Technical Conference (ATC)","author":"Zhang Wei","year":"2022","unstructured":"Wei Zhang, Binghao Chen, Zhenhua Han, Quan Chen, Peng Cheng, Fan Yang, Ran Shu, Yuqing Yang, and Minyi Guo. PilotFish: Harvesting free cycles of cloud gaming with deep learning training. In USENIX Annual Technical Conference (ATC), 2022. https:\/\/www.usenix.org\/conference\/atc22\/presentation\/zhang-wei."},{"key":"e_1_3_2_1_73_1","volume-title":"Nael Abu-Ghazaleh, Andres Marquez, and Kevin Barker. Beyond the bridge: Contention-based covert and side channel attacks on multi-GPU interconnect. arXiv preprint arXiv:2404.03877","author":"Zhang Yicheng","year":"2024","unstructured":"Yicheng Zhang, Ravan Nazaraliyev, Sankha Baran Dutta, Nael Abu-Ghazaleh, Andres Marquez, and Kevin Barker. Beyond the bridge: Contention-based covert and side channel attacks on multi-GPU interconnect. arXiv preprint arXiv:2404.03877, 2024. https:\/\/arxiv.org\/abs\/2404.03877v2."},{"key":"e_1_3_2_1_74_1","volume-title":"16th USENIX Symposium on Operating Systems Design and Implementation (OSDI)","author":"Zheng Lianmin","year":"2022","unstructured":"Lianmin Zheng, Zhuohan Li, Hao Zhang, Yonghao Zhuang, Zhifeng Chen, Yanping Huang, Yida Wang, Yuanzhong Xu, Danyang Zhuo, Eric P. Xing, Joseph E. Gonzalez, and Ion Stoica. Alpa: Automating inter- and intra-operator parallelism for distributed deep learning. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2022. https:\/\/www.usenix.org\/conference\/osdi22\/presentation\/zheng-lianmin."}],"event":{"name":"MIDDLEWARE '25: 26th International Middleware Conference","location":"Vanderbilt University Nashville TN USA","acronym":"MIDDLEWARE '25","sponsor":["IFIP","Usenix"]},"container-title":["Proceedings of the 26th International Middleware Conference"],"original-title":[],"deposited":{"date-parts":[[2025,12,8]],"date-time":"2025-12-08T19:59:26Z","timestamp":1765223966000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3721462.3730950"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,14]]},"references-count":74,"alternative-id":["10.1145\/3721462.3730950","10.1145\/3721462"],"URL":"https:\/\/doi.org\/10.1145\/3721462.3730950","relation":{},"subject":[],"published":{"date-parts":[[2025,12,14]]},"assertion":[{"value":"2025-12-14","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}