{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,13]],"date-time":"2026-01-13T23:39:41Z","timestamp":1768347581992,"version":"3.49.0"},"publisher-location":"New York, NY, USA","reference-count":67,"publisher":"ACM","funder":[{"DOI":"10.13039\/100000001","name":"NSF (National Science Foundation)","doi-asserted-by":"publisher","award":["CSR-2106634, CSR-2312785"],"award-info":[{"award-number":["CSR-2106634, CSR-2312785"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,11,19]]},"DOI":"10.1145\/3772052.3772236","type":"proceedings-article","created":{"date-parts":[[2026,1,13]],"date-time":"2026-01-13T16:19:00Z","timestamp":1768321140000},"page":"320-333","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["10C\n                    <scp>ache<\/scp>\n                    : Heterogeneous Resource-Aware Tensor Caching and Migration for LLM Training"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-2592-8572","authenticated-orcid":false,"given":"Sabiha","family":"Afroz","sequence":"first","affiliation":[{"name":"Virginia Tech, Blacksburg, Virginia, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3228-6384","authenticated-orcid":false,"given":"Redwan Ibne Seraj","family":"Khan","sequence":"additional","affiliation":[{"name":"Virginia Tech, Blacksburg, Virginia, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4732-2707","authenticated-orcid":false,"given":"Hadeel","family":"Albahar","sequence":"additional","affiliation":[{"name":"Kuwait University, Kuwait City, Kuwait"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9223-8061","authenticated-orcid":false,"given":"Jingoo","family":"Han","sequence":"additional","affiliation":[{"name":"Virginia Tech, Blacksburg, Virginia, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0871-7263","authenticated-orcid":false,"given":"Ali R.","family":"Butt","sequence":"additional","affiliation":[{"name":"Virginia Tech, Blacksburg, Virginia, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2026,1,13]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"2012. Pinned Host Memory. https:\/\/developer.nvidia.com\/blog\/how-optimize-data-transfers-cuda-cc\/."},{"key":"e_1_3_2_1_2_1","unstructured":"2017. NVIDIA Tesla V100 GPU architecture. http:\/\/images.nvidia.com\/content\/volta-architecture\/pdf\/volta-architecture-whitepaper.pdf."},{"key":"e_1_3_2_1_3_1","unstructured":"2017. NVIDIA. Unified Memory for CUDA Beginners. https:\/\/developer.nvidia.com\/blog\/unified-memory-cuda-beginners\/."},{"key":"e_1_3_2_1_4_1","unstructured":"2025. DeepSpeed. https:\/\/github.com\/microsoft\/DeepSpeed\/tree\/master."},{"key":"e_1_3_2_1_5_1","unstructured":"2025. Fx Graph. https:\/\/pytorch.org\/docs\/stable\/fx.html."},{"key":"e_1_3_2_1_6_1","unstructured":"2025. GPUDirect Storage. https:\/\/developer.nvidia.com\/blog\/gpudirect-storage\/."},{"key":"e_1_3_2_1_7_1","unstructured":"2025. NVIDIA GH200 Grace Hopper Superchip. https:\/\/www.nvidia.com\/en-us\/data-center\/grace-hopper-superchip\/."},{"key":"e_1_3_2_1_8_1","unstructured":"2025. NVIDIA H100 GPU. https:\/\/www.nvidia.com\/en-eu\/data-center\/h100\/."},{"key":"e_1_3_2_1_9_1","unstructured":"2025. PyTorch. https:\/\/pytorch.org\/."},{"key":"e_1_3_2_1_10_1","unstructured":"2025. PyTorch Hook. https:\/\/pytorch.org\/docs\/stable\/generated\/torch.nn.Module.html."},{"key":"e_1_3_2_1_11_1","volume-title":"Florencia Leoni Aleman, and et al","author":"Achiam Josh","year":"2023","unstructured":"Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, and et al. 2023. Gpt-4 technical report. arXiv:2303.08774[cs.CV]"},{"key":"e_1_3_2_1_12_1","unstructured":"Ebtesam Almazrouei Hamza Alobeidli Abdulaziz Alshamsi Alessandro Cappelli Ruxandra Cojocaru M\u00e9rouane Debbah and \u00c9tienne Goffinet et al. 2023. The falcon series of open language models. arXiv:2311.16867 [cs.CV]"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"crossref","unstructured":"Moiz Arif Avinash Maurya Sudharshan Vazhkudai and Bogdan Nicolae. 2025. Evaluating Expansion Memory for Optimizer State Offloading for Large Transformer Models. In HPAI4S'25: HPC for AI Foundation Models & LLMs for Science (co-located with IPDPS'25).","DOI":"10.1109\/IPDPSW66978.2025.00151"},{"key":"e_1_3_2_1_14_1","unstructured":"David Chappell. 2010. Introducing the windows azure platform. In David Chappell & Associates White Paper."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3597926.3598082"},{"key":"e_1_3_2_1_16_1","volume-title":"2020 ACM\/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). 926\u2013939","author":"Choukse Esha","unstructured":"Esha Choukse, Michael B. Sullivan, Mike O'Connor, Mattan Erez, Jeff Pool, David Nellans, and Stephen W. Keckler. 2020. Buddy compression: Enabling larger memory for deep learning and hpc workloads on gpus. In 2020 ACM\/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). 926\u2013939."},{"key":"e_1_3_2_1_17_1","volume-title":"Proceedings of naacL-HLT. 2.","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT. 2."},{"key":"e_1_3_2_1_18_1","volume-title":"Proceedings of the 2024 ACM Symposium on Cloud Computing. 961\u2013976","author":"Fu Xinwei","unstructured":"Xinwei Fu, Zhen Zhang, Haozheng Fan, Guangtai Huang, Mohammad El-Shabani, and Randy Huang et al. 2024. Distributed training of large language models on aws trainium. In Proceedings of the 2024 ACM Symposium on Cloud Computing. 961\u2013976."},{"key":"e_1_3_2_1_19_1","volume-title":"Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 450\u2013466","author":"Guo Cong","unstructured":"Cong Guo, Rui Zhang, Jiale Xu, Jingwen Leng, Zihan Liu, Ziyu Huang, and Minyi Guo et al. 2024. Gmlake: Efficient and transparent gpu memory defragmentation for large-scale dnn training with virtual memory stitching. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 450\u2013466."},{"key":"e_1_3_2_1_20_1","volume-title":"Proceedings of the 32nd International Conference on International Conference on Machine Learning. 1737\u20131746","author":"Gupta Suyog","year":"2015","unstructured":"Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep Learning with Limited Numerical Precision. In Proceedings of the 32nd International Conference on International Conference on Machine Learning. 1737\u20131746."},{"key":"e_1_3_2_1_21_1","volume-title":"Proceedings of the 56th Annual IEEE\/ACM International Symposium on Microarchitecture. 395\u2013410","author":"Haoyang Zhang","year":"2023","unstructured":"Zhang Haoyang, Zhou Yirui, Xue Yuqi, Liu Yiqi, and Huang Jian. 2023. G10: Enabling An Efficient Unified GPU Memory and Storage Architecture with Smart Tensor Migrations. In Proceedings of the 56th Annual IEEE\/ACM International Symposium on Microarchitecture. 395\u2013410."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378465"},{"key":"e_1_3_2_1_23_1","volume-title":"Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 328\u2013339","author":"Howard Jeremy","year":"2018","unstructured":"Jeremy Howard and Sebastian Ruder. 2018. Universal Language Model Finetuning for Text Classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 328\u2013339."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378530"},{"key":"e_1_3_2_1_25_1","volume-title":"Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 207\u2013221","author":"Jaehoon Jung","year":"2023","unstructured":"Jung Jaehoon, Kim Jinpyo, and Lee Jaejin. 2023. DeepUM: Tensor Migration and Prefetching in Unified Memory. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 207\u2013221."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00070"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA57654.2024.00034"},{"key":"e_1_3_2_1_28_1","volume-title":"Thunderserve: High-performance and cost-efficient llm serving in cloud environments. arXiv:2502.09334 [cs.CV]","author":"Jiang Youhe","year":"2025","unstructured":"Youhe Jiang, Fangcheng Fu, Xiaozhe Yao, Taiyi Wang, Bin Cui, and Ana Klimovic et al. 2025. Thunderserve: High-performance and cost-efficient llm serving in cloud environments. arXiv:2502.09334 [cs.CV]"},{"key":"e_1_3_2_1_29_1","volume-title":"ZeRO-Offload: Democratizing Billion-Scale Model Training. In In 2021 USENIX Annual Technical Conference (USENIX ATC 21)","author":"Jie Ren","year":"2021","unstructured":"Ren Jie, Rajbhandari Samyam, Aminabadi Reza Yazdani, Ruwase Olatunji, Yang Shuangyan, Zhang Minjia, Li Dong, and He Yuxiong. 2021. ZeRO-Offload: Democratizing Billion-Scale Model Training. In In 2021 USENIX Annual Technical Conference (USENIX ATC 21). 551\u2013564."},{"key":"e_1_3_2_1_30_1","volume-title":"Flash Neuron: SSD-Enabled Large-Batch Training of Very Deep Neural Networks. In 19th USENIX Conference on File and Storage Technologies (FAST 21)","author":"Jonghyun Bae","year":"2021","unstructured":"Bae Jonghyun, Lee Jongsung, Jin Yunho, Son Sam, Kim Shine, Jang Hakbeom, Ham Tae Jun, and Lee Jae W. 2021. Flash Neuron: SSD-Enabled Large-Batch Training of Very Deep Neural Networks. In 19th USENIX Conference on File and Storage Technologies (FAST 21). 387\u2013401."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2925426.2926294"},{"key":"e_1_3_2_1_32_1","unstructured":"Kate Keahey Jason Anderson Zhuo Zhen Pierre Riteau Paul Ruth and Dan Stanzione et al. 2020. Lessons learned from the chameleon testbed. In 2020 USENIX annual technical conference (USENIX ATC 20). 219\u2013233."},{"key":"e_1_3_2_1_33_1","volume-title":"Rujia Wang, et al.","author":"Seraj Khan Redwan Ibne","year":"2024","unstructured":"Redwan Ibne Seraj Khan, Kunal Jain, Haiying Shen, Ankur Mallick, Anjaly Parayil, Anoop Kulkarni, Steve Kofsky, Pankhuri Choudhary, Renee St Amant, Rujia Wang, et al. 2024. Ensuring Fair LLM Serving Amid Diverse Applications. arXiv preprint arXiv:2411.15997 (2024)."},{"key":"e_1_3_2_1_34_1","volume-title":"Proceedings of the 2024 ACM Symposium on Cloud Computing. 52\u201368","author":"Seraj Khan Redwan Ibne","year":"2024","unstructured":"Redwan Ibne Seraj Khan, Arnab K Paul, Yue Cheng, Xun Steve Jian, and Ali R Butt. 2024. FedCaSe: Enhancing Federated Learning with Heterogeneity-aware Caching and Scheduling. In Proceedings of the 2024 ACM Symposium on Cloud Computing. 52\u201368."},{"key":"e_1_3_2_1_35_1","volume-title":"SHADE: Enable Fundamental Cacheability for Distributed Deep Learning Training. In 21st USENIX Conference on File and Storage Technologies (FAST 23)","author":"Seraj Khan Redwan Ibne","unstructured":"Redwan Ibne Seraj Khan, Ahmad Hossein Yazdani, Yuqi Fu, Arnab K. Paul, Bo Ji, Xun Jian, Yue Cheng, and Ali R. Butt. 2023. SHADE: Enable Fundamental Cacheability for Distributed Deep Learning Training. In 21st USENIX Conference on File and Storage Technologies (FAST 23). USENIX Association, Santa Clara, CA, 135\u2013152. https:\/\/www.usenix.org\/conference\/fast23\/presentation\/khan"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA56546.2023.10071024"},{"key":"e_1_3_2_1_37_1","volume-title":"Kingma and Ba Jimmy","author":"Diederik","year":"2015","unstructured":"Diederik P. Kingma and Ba Jimmy. 2015. Adam: A method for stochastic optimization. In ICLR."},{"key":"e_1_3_2_1_38_1","volume-title":"Proceedings of the 29th Symposium on Operating Systems Principles. 611\u2013626","author":"Kwon Woosuk","unstructured":"Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, and Cody Hao Yu et al. 2023. Efficient memory management for large language model serving with pagedattention. In Proceedings of the 29th Symposium on Operating Systems Principles. 611\u2013626."},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2018.00021"},{"key":"e_1_3_2_1_40_1","unstructured":"Tingfeng Lan Yusen Wu Bin Ma Zhaoyuan Su Rui Yang and Tekin Bicer et al. 2025. ZenFlow: Enabling Stall-Free Offloading Training via Asynchronous Updates. arXiv:2505.12242 [cs.CV]"},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"crossref","unstructured":"Xinyu Lian Masahiro Tanaka Olatunji Ruwase and Minjia Zhang. 2025. SuperOffload: Unleashing the Power of Large-Scale LLM Training on Superchips. arXiv:2509.21271 [cs.CV]","DOI":"10.1145\/3760250.3762217"},{"key":"e_1_3_2_1_42_1","unstructured":"Yong-Cheng Liaw and Shuo-Han Chen. 2025. MemAscend: System Memory Optimization for SSD-Offloaded LLM Fine-Tuning. arXiv:2505.23254 [cs.CV]"},{"key":"e_1_3_2_1_43_1","volume":"201","author":"Mathew Sajee","unstructured":"Sajee Mathew and J. Varia. 2014. Overview of amazon web services. In Amazon Whitepapers 105. 22.","journal-title":"J. Varia."},{"key":"e_1_3_2_1_44_1","volume-title":"Erich Elsen, David Garcia, and Boris Ginsburg et al.","author":"Micikevicius Paulius","year":"2017","unstructured":"Paulius Micikevicius, Sharan Narang, Gregory Diamos Jonah Alben, Erich Elsen, David Garcia, and Boris Ginsburg et al. 2017. Mixed precision training. arXiv:1710.03740 [cs.CV]"},{"key":"e_1_3_2_1_45_1","volume-title":"Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 386\u2013400","author":"Niu Wei","year":"2024","unstructured":"Wei Niu, Gagan Agrawal, and Bin Ren. 2024. SoD2: Statically Optimizing Dynamic Deep Neural Network Execution. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 386\u2013400."},{"key":"e_1_3_2_1_46_1","unstructured":"Xinglin Pan Wenxiang Lin Lin Zhang Shaohuai Shi Zhenheng Tang and Rui Wang et. al. 2025. FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models. arXiv:2501.10714 [cs.CV]"},{"key":"e_1_3_2_1_47_1","volume-title":"Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 207\u2013222","author":"Patel Pratyush","unstructured":"Pratyush Patel, Esha Choukse, Chaojie Zhang, \u00cd\u00f1igo Goiri, Brijesh Warrier, and Nithish Mahalingam et al. 2024. Characterizing power management opportunities for llms in the cloud. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 207\u2013222."},{"key":"e_1_3_2_1_48_1","unstructured":"Bharadwaj Pudipeddi Maral Mesmakhosroshahi Jinwen Xi and Sujeeth Bharadwaj. 2020. Training large neural networks with constant memory using a new execution algorithm. arXiv:2002.05645 [cs.CV]"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC41405.2020.00024"},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA51647.2021.00057"},{"key":"e_1_3_2_1_51_1","volume-title":"Enabling Large Dynamic Neural Network Training with Learning-based Memory Management. In IEEE International Symposium on High-Performance Computer Architecture (HPCA). 788\u2013802","author":"Ren Jie","unstructured":"Jie Ren, Dong Xu, and Shuangyan Yang et al. 2024. Enabling Large Dynamic Neural Network Training with Learning-based Memory Management. In IEEE International Symposium on High-Performance Computer Architecture (HPCA). 788\u2013802."},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"crossref","unstructured":"Agniswar Roy Abhik Banerjee and Navneet Bhardwaj. 2021. A study on Google Cloud Platform (GCP) and its security. In Machine Learning Techniques and Analytics for Cloud Security. 313\u2013338.","DOI":"10.1002\/9781119764113.ch15"},{"key":"e_1_3_2_1_53_1","volume-title":"Proceedings of the international conference for high performance computing, networking, storage and analysis. 1\u201314","author":"Samyam Rajbhandari","year":"2021","unstructured":"Rajbhandari Samyam, Ruwase Olatunji, Rasley Jeff, Smith Shaden, and He Yuxiong. 2021. Zero-infinity: Breaking the gpu memory wall for extreme scale deep learning. In Proceedings of the international conference for high performance computing, networking, storage and analysis. 1\u201314."},{"key":"e_1_3_2_1_54_1","volume-title":"BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. (Nov.","author":"Scao Teven Le","year":"2023","unstructured":"Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, et al. 2023. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. (Nov. 2023). https:\/\/inria.hal.science\/hal-03850124 working paper or preprint."},{"key":"e_1_3_2_1_55_1","volume-title":"International Conference on Machine Learning. 31094\u201331116","author":"Sheng Ying","unstructured":"Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, and Beidi Chen et al. 2023. Flexgen: High-throughput generative inference of large language models with a single gpu. In International Conference on Machine Learning. 31094\u201331116."},{"key":"e_1_3_2_1_56_1","unstructured":"Aditya S. Shethiya. 2025. Deploying AI Models in. NET Web Applications Using Azure Kubernetes Service (AKS). In Spectrum of Research 5."},{"key":"e_1_3_2_1_57_1","volume-title":"Megatron-lm: Training multi-billion parameter language models using model parallelism. arXiv:1909.08053 [cs.CV]","author":"Shoeybi Mohammad","year":"2019","unstructured":"Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. 2019. Megatron-lm: Training multi-billion parameter language models using model parallelism. arXiv:1909.08053 [cs.CV]"},{"key":"e_1_3_2_1_58_1","volume-title":"Proceedings of Machine Learning and Systems.","author":"Su Qidong","unstructured":"Qidong Su, Wei Zhao, Xin Li, Muralidhar Andoorveedu, Chenhao Jiang, and Zhanda Zhu et al. 2025. Seesaw: High-throughput LLM Inference via Model Re-sharding. In Proceedings of Machine Learning and Systems."},{"key":"e_1_3_2_1_59_1","volume-title":"STRONGHOLD: Fast and Affordable Billion-Scale Deep Learning Model Training. In SC'22: The International Conference for High Performance Computing, Networking, Storage and Analysis","author":"Sun Xiaoyang","year":"2022","unstructured":"Xiaoyang Sun, Wei Wang, S. Qiu, R. Yang, S. Huang, J. Xu, and Z. Wang. 2022. STRONGHOLD: Fast and Affordable Billion-Scale Deep Learning Model Training. In SC'22: The International Conference for High Performance Computing, Networking, Storage and Analysis. Association for Computing Machinery."},{"key":"e_1_3_2_1_60_1","doi-asserted-by":"crossref","unstructured":"Robert Tinn Hao Cheng Yu Gu Naoto Usuyama Xiaodong Liu Tristan Naumann Jianfeng Gao and Hoifung Poon. 2023. Fine-tuning large neural language models for biomedical natural language processing. Patterns. In Patterns 4 no. 4.","DOI":"10.1016\/j.patter.2023.100729"},{"key":"e_1_3_2_1_61_1","volume-title":"Llama: Open and efficient foundation language models. arXiv:2302.13971 [cs.CV]","author":"Touvron Hugo","year":"2023","unstructured":"Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth\u00e9e Lacroix, and Baptiste Rozi\u00e8re et al. 2023. Llama: Open and efficient foundation language models. arXiv:2302.13971 [cs.CV]"},{"key":"e_1_3_2_1_62_1","volume-title":"Proceedings of the 23rd ACM SIGPLAN symposium on principles and practice of parallel programming. 41\u201353","author":"Wang Linnan","unstructured":"Linnan Wang, Jinmian Ye, Yiyang Zhao, Wei Wu, and Ang Li et al. 2018. Superneurons: Dynamic GPU memory management for training deep neural networks. In Proceedings of the 23rd ACM SIGPLAN symposium on principles and practice of parallel programming. 41\u201353."},{"key":"e_1_3_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/3689031.3717455"},{"key":"e_1_3_2_1_64_1","volume-title":"Opt: Open pre-trained transformer language models. arXiv:2205.01068 [cs.CV]","author":"Zhang Susan","year":"2022","unstructured":"Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, and Christopher Dewan et al. 2022. Opt: Open pre-trained transformer language models. arXiv:2205.01068 [cs.CV]"},{"key":"e_1_3_2_1_65_1","volume-title":"Proceedings of the ACM on Management of Data 3, no. 1. 1\u201328","author":"Zhao Pinxue","unstructured":"Pinxue Zhao, Hailin Zhang, Fangcheng Fu, Xiaonan Nie, Qibin Liu, and Fang Yang an Yuanbo Peng et al. 2025. MEMO: Fine-grained Tensor Management For Ultralong Context LLM Training. In Proceedings of the ACM on Management of Data 3, no. 1. 1\u201328."},{"key":"e_1_3_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/3613424.3614248"},{"key":"e_1_3_2_1_67_1","volume-title":"18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24)","author":"Zhong Yinmin","unstructured":"Yinmin Zhong, Shengyu Liu, Junda Chen, Jianbo Hu, Yibo Zhu, and Xuanzhe Liu et al. 2024. DistServe: Disaggregating prefill and decoding for goodput-optimized large language model serving. In 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24). 193\u2013210."}],"event":{"name":"SoCC '25: ACM Symposium on Cloud Computing","location":"Online USA","acronym":"SoCC '25","sponsor":["SIGOPS ACM Special Interest Group on Operating Systems","SIGMOD ACM Special Interest Group on Management of Data"]},"container-title":["Proceedings of the 2025 ACM Symposium on Cloud Computing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3772052.3772236","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,13]],"date-time":"2026-01-13T16:19:47Z","timestamp":1768321187000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3772052.3772236"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,19]]},"references-count":67,"alternative-id":["10.1145\/3772052.3772236","10.1145\/3772052"],"URL":"https:\/\/doi.org\/10.1145\/3772052.3772236","relation":{},"subject":[],"published":{"date-parts":[[2025,11,19]]},"assertion":[{"value":"2026-01-13","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}