{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,11]],"date-time":"2026-01-11T04:36:51Z","timestamp":1768106211733,"version":"3.49.0"},"reference-count":80,"publisher":"Association for Computing Machinery (ACM)","issue":"8","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2025,4]]},"abstract":"<jats:p>With the breakthrough of Transformer-based pre-trained models, the demand for fine-tuning (FT) to adapt the base pre-trained models to downstream applications continues to grow, so it is essential for service providers to reduce the cost of processing FT requests. Low-rank adaption (LoRA) is a widely used FT technique that only trains small-scale adapters and keeps the base model unaltered, conveying the possibility of processing multiple FT tasks by jointly training different LoRA adapters with a shared base model.<\/jats:p>\n          <jats:p>Nevertheless, through in-depth analysis, we reveal the efficiency of joint FT is dampened by two heterogeneity issues in the training data \u2014 the sequence length variation and skewness. To tackle these issues, we develop LobRA, a brand new framework that supports processing multiple FT tasks by jointly training LoRA adapters. Two innovative designs are introduced. Firstly, LobRA deploys the FT replicas (i.e., model replicas for FT) with heterogeneous resource usages and parallel configurations, matching the diverse workloads caused by the sequence length variation. Secondly, for each training step, LobRA takes account of the sequence length skewness and dispatches the training data among the heterogeneous FT replicas to achieve workload balance. We conduct experiments to assess the performance of LobRA, validating that it significantly reduces the GPU seconds required for joint FT by 45.03%-60.67%.<\/jats:p>","DOI":"10.14778\/3742728.3742752","type":"journal-article","created":{"date-parts":[[2025,9,3]],"date-time":"2025-09-03T13:32:53Z","timestamp":1756906373000},"page":"2616-2625","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["LobRA: Multi-Tenant Fine-Tuning over Heterogeneous Data"],"prefix":"10.14778","volume":"18","author":[{"given":"Sheng","family":"Lin","sequence":"first","affiliation":[{"name":"Peking University"}]},{"given":"Fangcheng","family":"Fu","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University"}]},{"given":"Haoyang","family":"Li","sequence":"additional","affiliation":[{"name":"Peking University"}]},{"given":"Hao","family":"Ge","sequence":"additional","affiliation":[{"name":"Peking University"}]},{"given":"Xuanyu","family":"Wang","sequence":"additional","affiliation":[{"name":"Peking University"}]},{"given":"Jiawen","family":"Niu","sequence":"additional","affiliation":[{"name":"Peking University"}]},{"given":"Yaofeng","family":"Tu","sequence":"additional","affiliation":[{"name":"ZTE Corporation"}]},{"given":"Bin","family":"Cui","sequence":"additional","affiliation":[{"name":"Peking University"}]}],"member":"320","published-online":{"date-parts":[[2025,9,3]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2009. Optimization with PuLP. https:\/\/coin-or.github.io\/pulp\/."},{"key":"e_1_2_1_2_1","unstructured":"2023. The Ymir Proejct: Dataset and Workload. https:\/\/sites.google.com\/view\/ymir-project#h.dw77b5uw44tb."},{"key":"e_1_2_1_3_1","unstructured":"2024. SCIP: Solving Constraint Integer Programs. https:\/\/www.scipopt.org\/."},{"key":"e_1_2_1_4_1","unstructured":"2025. Full Version (with Appendix) of LobRA. https:\/\/github.com\/ccchengff\/LobRA\/blob\/main\/LobRA_Full_Version_with_Appendix.pdf."},{"key":"e_1_2_1_5_1","volume-title":"LongAlign: A Recipe for Long Context Alignment of Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2024 (EMNLP Findings","author":"Bai Yushi","year":"2024","unstructured":"Yushi Bai, Xin Lv, Jiajie Zhang, Yuze He, Ji Qi, Lei Hou, Jie Tang, Yuxiao Dong, and Juanzi Li. 2024. LongAlign: A Recipe for Long Context Alignment of Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2024 (EMNLP Findings, 2024). 1376\u20131395."},{"key":"e_1_2_1_6_1","first-page":"72","article-title":"Adaptive load balancing for parameter servers in distributed machine learning over heterogeneous networks","volume":"21","author":"Cai Weibo","year":"2023","unstructured":"Weibo Cai, Shulin Yang, Gang Sun, Qiming Zhang, and Hongfang Yu. 2023. Adaptive load balancing for parameter servers in distributed machine learning over heterogeneous networks. ZTE Communications 21, 1 (2023), 72.","journal-title":"ZTE Communications"},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of Machine Learning and Systems 2024 (MLSys","author":"Chen Lequn","year":"2024","unstructured":"Lequn Chen, Zihao Ye, Yongji Wu, Danyang Zhuo, Luis Ceze, and Arvind Krishnamurthy. 2024. Punica: Multi-Tenant LoRA Serving. In Proceedings of Machine Learning and Systems 2024 (MLSys 2024)."},{"key":"e_1_2_1_8_1","volume-title":"International Conference on Learning Representations 2024 (ICLR","author":"Dao Tri","year":"2024","unstructured":"Tri Dao. 2024. FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning. In International Conference on Learning Representations 2024 (ICLR 2024)."},{"key":"e_1_2_1_9_1","volume-title":"FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness. In Annual Conference on Neural Information Processing Systems 2022 (NeurIPS","author":"Dao Tri","year":"2022","unstructured":"Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, and Christopher R\u00e9. 2022. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness. In Annual Conference on Neural Information Processing Systems 2022 (NeurIPS 2022)."},{"key":"e_1_2_1_10_1","unstructured":"DataBricks. 2025. Documents for Foundation Model Fine-tuning. https:\/\/docs.databricks.com\/aws\/en\/large-language-models\/foundation-model-training."},{"key":"e_1_2_1_11_1","unstructured":"Harm de Vries. 2023. In the long (context) run. https:\/\/www.harmdevries.com\/post\/context-length\/."},{"key":"e_1_2_1_12_1","volume-title":"Deep learning-based semantic feature extraction: A literature review and future directions. ZTE communications 21, 2","author":"Deng Letian","year":"2023","unstructured":"Letian Deng and Yanru Zhao. 2023. Deep learning-based semantic feature extraction: A literature review and future directions. ZTE communications 21, 2 (2023), 11."},{"key":"e_1_2_1_13_1","volume-title":"Fewer Truncations Improve Language Modeling. In International Conference on Machine Learning 2024 (ICML","author":"Ding Hantian","year":"2024","unstructured":"Hantian Ding, Zijian Wang, Giovanni Paolini, Varun Kumar, Anoop Deoras, Dan Roth, and Stefano Soatto. 2024. Fewer Truncations Improve Language Modeling. In International Conference on Machine Learning 2024 (ICML 2024)."},{"key":"e_1_2_1_14_1","volume-title":"Smith","author":"Dodge Jesse","year":"2020","unstructured":"Jesse Dodge, Gabriel Ilharco, Roy Schwartz, Ali Farhadi, Hannaneh Hajishirzi, and Noah A. Smith. 2020. Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping. CoRR abs\/2002.06305 (2020)."},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL","author":"Dong Guanting","year":"2024","unstructured":"Guanting Dong, Hongyi Yuan, Keming Lu, Chengpeng Li, Mingfeng Xue, Dayiheng Liu, Wei Wang, Zheng Yuan, Chang Zhou, and Jingren Zhou. 2024. How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). 177\u2013198."},{"key":"e_1_2_1_16_1","doi-asserted-by":"crossref","unstructured":"Fei Du Xin-Jian Ma Jing-Ru Yang Yi Liu Chao-Ran Luo Xue-Bin Wang Hai-Ou Jiang and Xiang Jing. 2024. A Survey of LLM Datasets: From Autoregressive Model to AI Chatbot. J. Comput. Sci. Technol. (2024).","DOI":"10.1007\/s11390-024-3767-3"},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP","author":"Edunov Sergey","year":"2018","unstructured":"Sergey Edunov, Myle Ott, Michael Auli, and David Grangier. 2018. Understanding Back-Translation at Scale. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018). 489\u2013500."},{"key":"e_1_2_1_18_1","doi-asserted-by":"crossref","first-page":"804","DOI":"10.14778\/3503585.3503590","article-title":"BAGUA: Scaling up Distributed Learning with System Relaxations","volume":"15","author":"Gan Shaoduo","year":"2021","unstructured":"Shaoduo Gan, Xiangru Lian, Rui Wang, Jianbin Chang, Chengjun Liu, Hongmei Shi, Shengzhuo Zhang, Xianghong Li, Tengxu Sun, Jiawei Jiang, Binhang Yuan, Sen Yang, Ji Liu, and Ce Zhang. 2021. BAGUA: Scaling up Distributed Learning with System Relaxations. Proc. VLDB Endow. 15, 4 (2021), 804\u2013813.","journal-title":"Proc. VLDB Endow."},{"key":"e_1_2_1_19_1","volume-title":"Yu","author":"Gan Wensheng","year":"2023","unstructured":"Wensheng Gan, Shicheng Wan, and Philip S. Yu. 2023. Model-as-a-Service (MaaS): A Survey. CoRR abs\/2311.05804 (2023)."},{"key":"e_1_2_1_20_1","doi-asserted-by":"crossref","unstructured":"Lei Guan Dong-Sheng Li Jiye Liang Wen-Jian Wang Ke-shi Ge and Xicheng Lu. 2024. Advances of Pipeline Model Parallelism for Deep Learning Training: An Overview. J. Comput. Sci. Technol. (2024).","DOI":"10.1007\/s11390-024-3872-3"},{"key":"e_1_2_1_21_1","volume-title":"LoRA: Low-Rank Adaptation of Large Language Models. In International Conference on Learning Representations 2022 (ICLR","author":"Hu Edward J.","year":"2022","unstructured":"Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-Rank Adaptation of Large Language Models. In International Conference on Learning Representations 2022 (ICLR 2022)."},{"key":"e_1_2_1_22_1","volume-title":"First Conference on Language Modeling (COLM","author":"Huang Chengsong","year":"2024","unstructured":"Chengsong Huang, Qian Liu, Bill Yuchen Lin, Tianyu Pang, Chao Du, and Min Lin. 2024. LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition. In First Conference on Language Modeling (COLM 2024)."},{"key":"e_1_2_1_23_1","volume-title":"Annual Conference on Neural Information Processing Systems 2019 (NeurIPS","author":"Huang Yanping","year":"2019","unstructured":"Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Dehao Chen, Mia Xu Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, and Zhifeng Chen. 2019. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism. In Annual Conference on Neural Information Processing Systems 2019 (NeurIPS 2019). 103\u2013112."},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the 29th ACM Symposium on Operating Systems Principles (SOSP","author":"Jang Insu","year":"2023","unstructured":"Insu Jang, Zhenning Yang, Zhen Zhang, Xin Jin, and Mosharaf Chowdhury. 2023. Oobleck: Resilient Distributed Training of Large Models Using Pipeline Templates. In Proceedings of the 29th ACM Symposium on Operating Systems Principles (SOSP 2023). 382\u2013395."},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of Machine Learning and Systems 2019 (MLSys","author":"Jia Zhihao","year":"2019","unstructured":"Zhihao Jia, Matei Zaharia, and Alex Aiken. 2019. Beyond Data and Model Parallelism for Deep Neural Networks. In Proceedings of Machine Learning and Systems 2019 (MLSys 2019)."},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI","author":"Jiang Youhe","year":"2023","unstructured":"Youhe Jiang, Fangcheng Fu, Xupeng Miao, Xiaonan Nie, and Bin Cui. 2023. OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI 2023). 2142\u20132150."},{"key":"e_1_2_1_27_1","volume-title":"Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations 2015 (ICLR","author":"Diederik","year":"2015","unstructured":"Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations 2015 (ICLR 2015)."},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of Machine Learning and Systems 2023 (MLSys","author":"Korthikanti Vijay","year":"2023","unstructured":"Vijay Korthikanti, Jared Casper, Sangkug Lym, Lawrence McAfee, Michael Andersch, Mohammad Shoeybi, and Bryan Catanzaro. 2023. Reducing Activation Recomputation in Large Transformer Models. In Proceedings of Machine Learning and Systems 2023 (MLSys 2023)."},{"key":"e_1_2_1_29_1","volume-title":"Efficient sequence packing without cross-contamination: Accelerating large language models without impacting performance. CoRR abs\/2107.02027","author":"Krell Mario Michael","year":"2021","unstructured":"Mario Michael Krell, Matej Kosec, Sergio P Perez, and Andrew Fitzgibbon. 2021. Efficient sequence packing without cross-contamination: Accelerating large language models without impacting performance. CoRR abs\/2107.02027 (2021)."},{"key":"e_1_2_1_30_1","volume-title":"Laura Wynter, Raghu Kiran Ganti, and Mayank Mishra.","author":"Kundu Achintya","year":"2024","unstructured":"Achintya Kundu, Rhui Dih Lee, Laura Wynter, Raghu Kiran Ganti, and Mayank Mishra. 2024. Enhancing Training Efficiency Using Packing with Flash Attention. CoRR abs\/2407.09105 (2024)."},{"key":"e_1_2_1_31_1","doi-asserted-by":"crossref","first-page":"1939","DOI":"10.14778\/3659437.3659449","article-title":"GPTuner: A Manual-Reading Database Tuning System via GPT-Guided Bayesian Optimization","volume":"17","author":"Lao Jiale","year":"2024","unstructured":"Jiale Lao, Yibo Wang, Yufei Li, Jianping Wang, Yunjia Zhang, Zhiyuan Cheng, Wanghu Chen, Mingjie Tang, and Jianguo Wang. 2024. GPTuner: A Manual-Reading Database Tuning System via GPT-Guided Bayesian Optimization. Proc. VLDB Endow. 17, 8 (2024), 1939\u20131952.","journal-title":"Proc. VLDB Endow."},{"key":"e_1_2_1_32_1","first-page":"27","article-title":"End-to-end chinese entity recognition based on bert-bilstm-att-crf","volume":"20","author":"Li Daiyi","year":"2022","unstructured":"Daiyi Li, Yaofeng Tu, Xiangsheng Zhou, Yangming Zhang, and Zongmin Ma. 2022. End-to-end chinese entity recognition based on bert-bilstm-att-crf. ZTE Communications 20, S1 (2022), 27.","journal-title":"ZTE Communications"},{"key":"e_1_2_1_33_1","volume-title":"Hetu v2: A General and Scalable Deep Learning System with Hierarchical and Heterogeneous Single Program Multiple Data Annotations. CoRR abs\/2504.20490","author":"Li Haoyang","year":"2025","unstructured":"Haoyang Li, Fangcheng Fu, Hao Ge, Sheng Lin, Xuanyu Wang, Jiawen Niu, Xupeng Miao, and Bin Cui. 2025. Hetu v2: A General and Scalable Deep Learning System with Hierarchical and Heterogeneous Single Program Multiple Data Annotations. CoRR abs\/2504.20490 (2025)."},{"key":"e_1_2_1_34_1","doi-asserted-by":"crossref","first-page":"3005","DOI":"10.14778\/3415478.3415530","article-title":"PyTorch Distributed: Experiences on Accelerating Data Parallel Training","volume":"13","author":"Li Shen","year":"2020","unstructured":"Shen Li, Yanli Zhao, Rohan Varma, Omkar Salpekar, Pieter Noordhuis, Teng Li, Adam Paszke, Jeff Smith, Brian Vaughan, Pritam Damania, and Soumith Chintala. 2020. PyTorch Distributed: Experiences on Accelerating Data Parallel Training. Proc. VLDB Endow. 13, 12 (2020), 3005\u20133018.","journal-title":"Proc. VLDB Endow."},{"key":"e_1_2_1_35_1","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP","author":"Liu Yang","year":"2019","unstructured":"Yang Liu and Mirella Lapata. 2019. Text Summarization with Pretrained Encoders. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019). 3728\u20133738."},{"key":"e_1_2_1_36_1","volume-title":"Decoupled Weight Decay Regularization. In 7th International Conference on Learning Representations 2019 (ICLR","author":"Loshchilov Ilya","year":"2019","unstructured":"Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In 7th International Conference on Learning Representations 2019 (ICLR 2019)."},{"key":"e_1_2_1_37_1","volume-title":"A Survey on LoRA of Large Language Models. CoRR abs\/2407.11046","author":"Mao Yuren","year":"2024","unstructured":"Yuren Mao, Yuhang Ge, Yijiang Fan, Wenyi Xu, Yu Mi, Zhonghao Hu, and Yunjun Gao. 2024. A Survey on LoRA of Large Language Models. CoRR abs\/2407.11046 (2024)."},{"key":"e_1_2_1_38_1","volume-title":"Hetu: a highly efficient automatic parallel distributed deep learning system. Sci. China Inf. Sci. 66","author":"Miao Xupeng","year":"2023","unstructured":"Xupeng Miao, Xiaonan Nie, Hailin Zhang, Tong Zhao, and Bin Cui. 2023. Hetu: a highly efficient automatic parallel distributed deep learning system. Sci. China Inf. Sci. 66 (2023)."},{"key":"e_1_2_1_39_1","volume-title":"FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning. CoRR abs\/2402.18789","author":"Miao Xupeng","year":"2024","unstructured":"Xupeng Miao, Gabriele Oliaro, Xinhao Cheng, Mengdi Wu, Colin Unger, and Zhihao Jia. 2024. FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning. CoRR abs\/2402.18789 (2024)."},{"key":"e_1_2_1_40_1","doi-asserted-by":"crossref","first-page":"470","DOI":"10.14778\/3570690.3570697","article-title":"Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism","volume":"16","author":"Miao Xupeng","year":"2022","unstructured":"Xupeng Miao, Yujie Wang, Youhe Jiang, Chunan Shi, Xiaonan Nie, Hailin Zhang, and Bin Cui. 2022. Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism. Proc. VLDB Endow. 16, 3 (2022), 470\u2013479.","journal-title":"Proc. VLDB Endow."},{"key":"e_1_2_1_41_1","volume-title":"Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP","author":"Narayanan Deepak","year":"2019","unstructured":"Deepak Narayanan, Aaron Harlap, Amar Phanishayee, Vivek Seshadri, Nikhil R. Devanur, Gregory R. Ganger, Phillip B. Gibbons, and Matei Zaharia. 2019. PipeDream: generalized pipeline parallelism for DNN training. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP 2019). 1\u201315."},{"key":"e_1_2_1_42_1","volume-title":"Memory-Efficient Pipeline-Parallel DNN Training. In International Conference on Machine Learning 2021 (ICML","volume":"139","author":"Narayanan Deepak","year":"2021","unstructured":"Deepak Narayanan, Amar Phanishayee, Kaiyu Shi, Xie Chen, and Matei Zaharia. 2021. Memory-Efficient Pipeline-Parallel DNN Training. In International Conference on Machine Learning 2021 (ICML 2021), Vol. 139. 7937\u20137947."},{"key":"e_1_2_1_43_1","volume-title":"International Conference for High Performance Computing, Networking 2021 (SC","author":"Narayanan Deepak","year":"2021","unstructured":"Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick LeGresley, Mostofa Patwary, Vijay Korthikanti, Dmitri Vainbrand, Prethvi Kashinkunti, Julie Bernauer, Bryan Catanzaro, Amar Phanishayee, and Matei Zaharia. 2021. Efficient large-scale language model training on GPU clusters using megatron-LM. In International Conference for High Performance Computing, Networking 2021 (SC 2021). 58."},{"key":"e_1_2_1_44_1","doi-asserted-by":"crossref","first-page":"3781","DOI":"10.14778\/3611540.3611564","article-title":"Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent","volume":"16","author":"Nie Xiaonan","year":"2023","unstructured":"Xiaonan Nie, Yi Liu, Fangcheng Fu, Jinbao Xue, Dian Jiao, Xupeng Miao, Yangyu Tao, and Bin Cui. 2023. Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent. Proc. VLDB Endow. 16, 12 (2023), 3781\u20133794.","journal-title":"Proc. VLDB Endow."},{"key":"e_1_2_1_45_1","unstructured":"NVIDIA. 2024. NVIDIA Collective Communications Library (NCCL). https:\/\/developer.nvidia.com\/nccl."},{"key":"e_1_2_1_46_1","unstructured":"OpenAI. 2022. ChatGPT: Optimizing Language Models for Dialogue. https:\/\/openai.com\/blog\/chatgpt."},{"key":"e_1_2_1_47_1","unstructured":"OpenAI. 2023. GPT-4 Technical Report. CoRR abs\/2303.08774 (2023)."},{"key":"e_1_2_1_48_1","unstructured":"OpenAI. 2024. OpenAI Platform: Fine-tuning. https:\/\/platform.openai.com\/docs\/guides\/fine-tuning\/."},{"key":"e_1_2_1_49_1","volume-title":"Aafaq Iqbal khan, and Arsalan Shahid","author":"Parthasarathy Venkatesh Balavadhani","year":"2024","unstructured":"Venkatesh Balavadhani Parthasarathy, Ahtsham Zafar, Aafaq Iqbal khan, and Arsalan Shahid. 2024. The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities. CoRR abs\/2408.13296 (2024)."},{"key":"e_1_2_1_50_1","volume-title":"Y-Lan Boureau, and Jason Weston.","author":"Roller Stephen","year":"2020","unstructured":"Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric Michael Smith, Y-Lan Boureau, and Jason Weston. 2020. Recipes for building an open-domain chatbot. CoRR abs\/2004.13637 (2020)."},{"key":"e_1_2_1_51_1","volume-title":"Proceedings of Machine Learning and Systems 2024 (MLSys","author":"Sheng Ying","year":"2024","unstructured":"Ying Sheng, Shiyi Cao, Dacheng Li, Coleman Hooper, Nicholas Lee, Shuo Yang, Christopher Chou, Banghua Zhu, Lianmin Zheng, Kurt Keutzer, Joseph E. Gonzalez, and Ion Stoica. 2024. SLoRA: Scalable Serving of Thousands of LoRA Adapters. In Proceedings of Machine Learning and Systems 2024 (MLSys 2024)."},{"key":"e_1_2_1_52_1","volume-title":"In-Context Pretraining: Language Modeling Beyond Document Boundaries. In International Conference on Learning Representations 2024 (ICLR","author":"Shi Weijia","year":"2024","unstructured":"Weijia Shi, Sewon Min, Maria Lomeli, Chunting Zhou, Margaret Li, Xi Victoria Lin, Noah A. Smith, Luke Zettlemoyer, Wen-tau Yih, and Mike Lewis. 2024. In-Context Pretraining: Language Modeling Beyond Document Boundaries. In International Conference on Learning Representations 2024 (ICLR 2024)."},{"key":"e_1_2_1_53_1","volume-title":"Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. CoRR abs\/1909.08053","author":"Shoeybi Mohammad","year":"2019","unstructured":"Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. 2019. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. CoRR abs\/1909.08053 (2019)."},{"key":"e_1_2_1_54_1","volume-title":"A Study of Optimizations for Fine-tuning Large Language Models. CoRR abs\/2406.02290","author":"Singh Arjun","year":"2024","unstructured":"Arjun Singh, Nikhil Pandey, Anup Shirgaonkar, Pavan Manoj, and Vijay Aski. 2024. A Study of Optimizations for Fine-tuning Large Language Models. CoRR abs\/2406.02290 (2024)."},{"key":"e_1_2_1_55_1","unstructured":"Snowflake. 2025. Fine-tuning (Snowflake Cortex). https:\/\/docs.snowflake.com\/en\/user-guide\/snowflake-cortex\/cortex-finetuning."},{"key":"e_1_2_1_56_1","volume-title":"Proceedings of the 33rd ACM International Conference on Information and Knowledge Management (CIKM","author":"Song Zijian","year":"2024","unstructured":"Zijian Song, Wenhan Zhang, Lifang Deng, Jiandong Zhang, Kaigui Bian, and Bin Cui. 2024. MultiLoRA: Multi-Directional Low Rank Adaptation for Multi-Domain Recommendation. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management (CIKM 2024). 2148\u20132157."},{"key":"e_1_2_1_57_1","volume-title":"Piper: Multidimensional Planner for DNN Parallelization. In Annual Conference on Neural Information Processing Systems 2021 (NeurIPS","author":"Tarnawski Jakub","year":"2021","unstructured":"Jakub Tarnawski, Deepak Narayanan, and Amar Phanishayee. 2021. Piper: Multidimensional Planner for DNN Parallelization. In Annual Conference on Neural Information Processing Systems 2021 (NeurIPS 2021). 24829\u201324840."},{"key":"e_1_2_1_58_1","unstructured":"Hugo Touvron Louis Martin Kevin Stone Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale Dan Bikel Lukas Blecher Cristian Canton-Ferrer Moya Chen Guillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu Brian Fuller Cynthia Gao Vedanuj Goswami Naman Goyal Anthony Hartshorn Saghar Hosseini Rui Hou Hakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa Isabel Kloumann Artem Korenev Punit Singh Koura Marie-Anne Lachaux Thibaut Lavril Jenya Lee Diana Liskovich Yinghai Lu Yuning Mao Xavier Martinet Todor Mihaylov Pushkar Mishra Igor Molybog Yixin Nie Andrew Poulton Jeremy Reizenstein Rashi Rungta Kalyan Saladi Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang Ross Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang Angela Fan Melanie Kambadur Sharan Narang Aur\u00e9lien Rodriguez Robert Stojnic Sergey Edunov and Thomas Scialom. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. CoRR abs\/2307.09288 (2023)."},{"key":"e_1_2_1_59_1","volume-title":"Annual Conference on Neural Information Processing Systems 2017 (NeurIPS","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Annual Conference on Neural Information Processing Systems 2017 (NeurIPS 2017)."},{"key":"e_1_2_1_60_1","volume-title":"Exploring the State of Instruction Tuning on Open Resources. In Annual Conference on Neural Information Processing Systems 2023 (NeurIPS","author":"Wang Yizhong","year":"2023","unstructured":"Yizhong Wang, Hamish Ivison, Pradeep Dasigi, Jack Hessel, Tushar Khot, Khyathi Raghavi Chandu, David Wadden, Kelsey MacMillan, Noah A. Smith, Iz Beltagy, and Hannaneh Hajishirzi. 2023. How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources. In Annual Conference on Neural Information Processing Systems 2023 (NeurIPS 2023)."},{"key":"e_1_2_1_61_1","doi-asserted-by":"crossref","first-page":"3906","DOI":"10.1109\/TKDE.2024.3370614","article-title":"Improving Automatic Parallel Training via Balanced Memory Workload Optimization","volume":"36","author":"Wang Yujie","year":"2024","unstructured":"Yujie Wang, Youhe Jiang, Xupeng Miao, Fangcheng Fu, Shenhan Zhu, Xiaonan Nie, Yaofeng Tu, and Bin Cui. 2024. Improving Automatic Parallel Training via Balanced Memory Workload Optimization. IEEE Trans. Knowl. Data Eng. 36, 8 (2024), 3906\u20133920.","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"e_1_2_1_62_1","volume-title":"MultiLoRA: Democratizing LoRA for Better Multi-Task Learning. CoRR abs\/2311.11501","author":"Wang Yiming","year":"2023","unstructured":"Yiming Wang, Yu Lin, Xiaodong Zeng, and Guannan Zhang. 2023. MultiLoRA: Democratizing LoRA for Better Multi-Task Learning. CoRR abs\/2311.11501 (2023)."},{"key":"e_1_2_1_63_1","volume-title":"Mixture of LoRA Experts. In The Twelfth International Conference on Learning Representations 2024 (ICLR","author":"Wu Xun","year":"2024","unstructured":"Xun Wu, Shaohan Huang, and Furu Wei. 2024. Mixture of LoRA Experts. In The Twelfth International Conference on Learning Representations 2024 (ICLR 2024)."},{"key":"e_1_2_1_64_1","volume-title":"Efficient Multi-task LLM Quantization and Serving for Multiple LoRA Adapters. In Annual Conference on Neural Information Processing Systems 2024 (NeurIPS","author":"Xia Yifei","year":"2024","unstructured":"Yifei Xia, Fangcheng Fu, Wentao Zhang Jiawei Jiang, and Bin Cui. 2024. Efficient Multi-task LLM Quantization and Serving for Multiple LoRA Adapters. In Annual Conference on Neural Information Processing Systems 2024 (NeurIPS 2024)."},{"key":"e_1_2_1_65_1","unstructured":"An Yang Baosong Yang Binyuan Hui Bo Zheng Bowen Yu Chang Zhou Chengpeng Li Chengyuan Li Dayiheng Liu Fei Huang Guanting Dong Haoran Wei Huan Lin Jialong Tang Jialin Wang Jian Yang Jianhong Tu Jianwei Zhang Jianxin Ma Jianxin Yang Jin Xu Jingren Zhou Jinze Bai Jinzheng He Junyang Lin Kai Dang Keming Lu Keqin Chen Kexin Yang Mei Li Mingfeng Xue Na Ni Pei Zhang Peng Wang Ru Peng Rui Men Ruize Gao Runji Lin Shijie Wang Shuai Bai Sinan Tan Tianhang Zhu Tianhao Li Tianyu Liu Wenbin Ge Xiaodong Deng Xiaohuan Zhou Xingzhang Ren Xinyu Zhang Xipin Wei Xuancheng Ren Xuejing Liu Yang Fan Yang Yao Yichang Zhang Yu Wan Yunfei Chu Yuqiong Liu Zeyu Cui Zhenru Zhang Zhifang Guo and Zhihao Fan. 2024. Qwen2 Technical Report. CoRR abs\/2407.10671 (2024)."},{"key":"e_1_2_1_66_1","volume-title":"mLoRA: Fine-Tuning LoRA Adapters via Highly-Efficient Pipeline Parallelism in Multiple GPUs. CoRR abs\/2312.02515","author":"Ye Zhengmao","year":"2023","unstructured":"Zhengmao Ye, Dengchun Li, Zetao Hu, Tingfeng Lan, Jian Sha, Sicong Zhang, Lei Duan, Jie Zuo, Hui Lu, Yuanchun Zhou, and Mingjie Tang. 2023. mLoRA: Fine-Tuning LoRA Adapters via Highly-Efficient Pipeline Parallelism in Multiple GPUs. CoRR abs\/2312.02515 (2023)."},{"key":"e_1_2_1_67_1","volume-title":"Deep learning for code generation: a survey. Sci. China Inf. Sci. 67","author":"Zhang Huangzhao","year":"2024","unstructured":"Huangzhao Zhang, Kechi Zhang, Zhuo Li, Jia Li, Jia Li, Yongmin Li, Yunfei Zhao, Yuqi Zhu, Fang Liu, Ge Li, and Zhi Jin. 2024. Deep learning for code generation: a survey. Sci. China Inf. Sci. 67 (2024)."},{"key":"e_1_2_1_68_1","volume-title":"PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In International conference on machine learning (ICML","volume":"119","author":"Zhang Jingqing","year":"2020","unstructured":"Jingqing Zhang, Yao Zhao, Mohammad Saleh, and Peter J. Liu. 2020. PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In International conference on machine learning (ICML 2020), Vol. 119. 11328\u201311339."},{"key":"e_1_2_1_69_1","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations (ACL 2020 Demo). 270\u2013278","author":"Zhang Yizhe","year":"2020","unstructured":"Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, and Bill Dolan. 2020. DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations (ACL 2020 Demo). 270\u2013278."},{"key":"e_1_2_1_70_1","volume-title":"Explicit Behavior Interaction with Heterogeneous Graph for Multi-behavior Recommendation. Data Sci. Eng. 9","author":"Zhang Zhongping","year":"2024","unstructured":"Zhongping Zhang, Yin Jia, Yuehan Hou, and Xinlu Yu. 2024. Explicit Behavior Interaction with Heterogeneous Graph for Multi-behavior Recommendation. Data Sci. Eng. 9 (2024)."},{"key":"e_1_2_1_71_1","doi-asserted-by":"crossref","first-page":"37","DOI":"10.14778\/3561261.3561265","article-title":"MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud","volume":"16","author":"Zhang Zhen","year":"2022","unstructured":"Zhen Zhang, Shuai Zheng, Yida Wang, Justin Chiu, George Karypis, Trishul Chilimbi, Mu Li, and Xin Jin. 2022. MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud. Proc. VLDB Endow. 16, 1 (2022), 37\u201350.","journal-title":"Proc. VLDB Endow."},{"key":"e_1_2_1_72_1","volume-title":"Proceedings of the 2025 ACM International Conference on Management of Data (SIGMOD","author":"Zhao Pinxue","year":"2025","unstructured":"Pinxue Zhao, Hailin Zhang, Fangcheng Fu, Xiaonan Nie, Qibin Liu, Fang Yang, Yuanbo Peng, Dian Jiao, Shuaipeng Li, Jinbao Xue, Yangyu Tao, and Bin Cui. 2025. Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs. In Proceedings of the 2025 ACM International Conference on Management of Data (SIGMOD 2025)."},{"key":"e_1_2_1_73_1","doi-asserted-by":"crossref","first-page":"3848","DOI":"10.14778\/3611540.3611569","article-title":"PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel","volume":"16","author":"Zhao Yanli","year":"2023","unstructured":"Yanli Zhao, Andrew Gu, Rohan Varma, Liang Luo, Chien-Chin Huang, Min Xu, Less Wright, Hamid Shojanazeri, Myle Ott, Sam Shleifer, Alban Desmaison, Can Balioglu, Pritam Damania, Bernard Nguyen, Geeta Chauhan, Yuchen Hao, Ajit Mathews, and Shen Li. 2023. PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Proc. VLDB Endow. 16, 12 (2023), 3848\u20133860.","journal-title":"Proc. VLDB Endow."},{"key":"e_1_2_1_74_1","volume-title":"Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI","author":"Zheng Lianmin","year":"2022","unstructured":"Lianmin Zheng, Zhuohan Li, Hao Zhang, Yonghao Zhuang, Zhifeng Chen, Yanping Huang, Yida Wang, Yuanzhong Xu, Danyang Zhuo, Eric P. Xing, Joseph E. Gonzalez, and Ion Stoica. 2022. Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2022). 559\u2013578."},{"key":"e_1_2_1_75_1","volume-title":"Proceedings of the 53rd International Conference on Parallel Processing, (ICPP","author":"Zheng Ying","year":"2024","unstructured":"Ying Zheng, Lei Jiao, Han Yang, Lulu Chen, Ying Liu, Yuxiao Wang, Yuedong Xu, Xin Wang, and Zongpeng Li. 2024. Online Scheduling and Pricing for Multi-LoRA Fine-Tuning Tasks. In Proceedings of the 53rd International Conference on Parallel Processing, (ICPP 2024). 357\u2013366."},{"key":"e_1_2_1_76_1","volume-title":"Multi-LoRA Composition for Image Generation. CoRR abs\/2402.16843","author":"Zhong Ming","year":"2024","unstructured":"Ming Zhong, Yelong Shen, Shuohang Wang, Yadong Lu, Yizhu Jiao, Siru Ouyang, Donghan Yu, Jiawei Han, and Weizhu Chen. 2024. Multi-LoRA Composition for Image Generation. CoRR abs\/2402.16843 (2024)."},{"key":"e_1_2_1_77_1","doi-asserted-by":"crossref","first-page":"2514","DOI":"10.14778\/3675034.3675043","article-title":"D-Bot: Database Diagnosis System using Large Language Models","volume":"17","author":"Zhou Xuanhe","year":"2024","unstructured":"Xuanhe Zhou, Guoliang Li, Zhaoyan Sun, Zhiyuan Liu, Weize Chen, Jianming Wu, Jiesi Liu, Ruohang Feng, and Guoyang Zeng. 2024. D-Bot: Database Diagnosis System using Large Language Models. Proc. VLDB Endow. 17, 10 (2024), 2514\u20132527.","journal-title":"Proc. VLDB Endow."},{"key":"e_1_2_1_78_1","volume-title":"DB-GPT: Large Language Model Meets Database. Data Sci. Eng. 9","author":"Zhou Xuanhe","year":"2024","unstructured":"Xuanhe Zhou, Zhaoyan Sun, and Guoliang Li. 2024. DB-GPT: Large Language Model Meets Database. Data Sci. Eng. 9 (2024)."},{"key":"e_1_2_1_79_1","volume-title":"PetS: A Unified Framework for Parameter-Efficient Transformers Serving. In 2022 USENIX Annual Technical Conference (ATC","author":"Zhou Zhe","year":"2022","unstructured":"Zhe Zhou, Xuechao Wei, Jiejing Zhang, and Guangyu Sun. 2022. PetS: A Unified Framework for Parameter-Efficient Transformers Serving. In 2022 USENIX Annual Technical Conference (ATC 2022). 489\u2013504."},{"key":"e_1_2_1_80_1","volume-title":"Findings of the Association for Computational Linguistics: NAACL (NAACL Findings","author":"Zhu Wenhao","year":"2024","unstructured":"Wenhao Zhu, Hongyi Liu, Qingxiu Dong, Jingjing Xu, Shujian Huang, Lingpeng Kong, Jiajun Chen, and Lei Li. 2024. Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis. In Findings of the Association for Computational Linguistics: NAACL (NAACL Findings 2024). 2765\u20132781."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3742728.3742752","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,3]],"date-time":"2025-09-03T13:37:55Z","timestamp":1756906675000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3742728.3742752"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4]]},"references-count":80,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2025,4]]}},"alternative-id":["10.14778\/3742728.3742752"],"URL":"https:\/\/doi.org\/10.14778\/3742728.3742752","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2025,4]]},"assertion":[{"value":"2025-09-03","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}