{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,1]],"date-time":"2025-11-01T09:36:33Z","timestamp":1761989793314,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":39,"publisher":"ACM","license":[{"start":{"date-parts":[[2024,12,2]],"date-time":"2024-12-02T00:00:00Z","timestamp":1733097600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,12,2]]},"DOI":"10.1145\/3652892.3700758","type":"proceedings-article","created":{"date-parts":[[2024,11,27]],"date-time":"2024-11-27T19:36:13Z","timestamp":1732736173000},"page":"185-198","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Menos: Split Fine-Tuning Large Language Models with Efficient GPU Memory Sharing"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-0653-2817","authenticated-orcid":false,"given":"Chenghao","family":"Hu","sequence":"first","affiliation":[{"name":"University of Toronto, Toronto, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2404-0974","authenticated-orcid":false,"given":"Baochun","family":"Li","sequence":"additional","affiliation":[{"name":"University of Toronto, Toronto, Canada"}]}],"member":"320","published-online":{"date-parts":[[2024,12,2]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Training Deep Nets with Sublinear Memory Cost. arXiv preprint arXiv:1604.06174","author":"Chen Tianqi","year":"2016","unstructured":"Tianqi Chen, Bing Xu, Chiyuan Zhang, and Carlos Guestrin. 2016. Training Deep Nets with Sublinear Memory Cost. arXiv preprint arXiv:1604.06174 (2016)."},{"key":"e_1_3_2_1_2_1","unstructured":"Tim Dettmers Mike Lewis Younes Belkada and Luke Zettlemoyer. 2022. GPT3.int8(): 8-bit Matrix Multiplication for Transformers at Scale. In Advances in Neural Information Processing Systems Alice H. Oh Alekh Agarwal Danielle Belgrave and Kyunghyun Cho (Eds.). https:\/\/openreview.net\/forum?id=dXiGWqBoxaD"},{"key":"e_1_3_2_1_3_1","volume-title":"International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=shpkpVXzo3h","author":"Dettmers Tim","year":"2022","unstructured":"Tim Dettmers, Mike Lewis, Sam Shleifer, and Luke Zettlemoyer. 2022. 8-bit Optimizers via Block-wise Quantization. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=shpkpVXzo3h"},{"key":"e_1_3_2_1_4_1","volume-title":"QLoRA: Efficient Finetuning of Quantized LLMs. arXiv preprint arXiv:2305.14314","author":"Dettmers Tim","year":"2023","unstructured":"Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. 2023. QLoRA: Efficient Finetuning of Quantized LLMs. arXiv preprint arXiv:2305.14314 (2023)."},{"key":"e_1_3_2_1_5_1","volume-title":"GPTQ: Accurate Quantization for Generative Pre-trained Transformers. In The Eleventh International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=tcbBPnfwxS","author":"Frantar Elias","year":"2023","unstructured":"Elias Frantar, Saleh Ashkboos, Torsten Hoefler, and Dan Alistarh. 2023. GPTQ: Accurate Quantization for Generative Pre-trained Transformers. In The Eleventh International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=tcbBPnfwxS"},{"key":"e_1_3_2_1_6_1","volume-title":"Tiresias: A GPU Cluster Manager for Distributed Deep Learning. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19)","author":"Gu Juncheng","year":"2019","unstructured":"Juncheng Gu, Mosharaf Chowdhury, Kang G Shin, Yibo Zhu, Myeongjae Jeon, Junjie Qian, Hongqiang Liu, and Chuanxiong Guo. 2019. Tiresias: A GPU Cluster Manager for Distributed Deep Learning. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19). 485--500."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jnca.2018.05.003"},{"key":"e_1_3_2_1_8_1","volume-title":"International Conference on Machine Learning. PMLR, 2790--2799","author":"Houlsby Neil","year":"2019","unstructured":"Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-efficient transfer learning for NLP. In International Conference on Machine Learning. PMLR, 2790--2799."},{"key":"e_1_3_2_1_9_1","volume-title":"LoRA: Low-Rank Adaptation of Large Language Models. In International Conference on Learning Representations.","author":"Hu Edward J","year":"2021","unstructured":"Edward J Hu, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. 2021. LoRA: Low-Rank Adaptation of Large Language Models. In International Conference on Learning Representations."},{"key":"e_1_3_2_1_10_1","volume-title":"Lidong Bing, and Soujanya Poria.","author":"Hu Zhiqiang","year":"2023","unstructured":"Zhiqiang Hu, Yihuai Lan, Lei Wang, Wanyu Xu, Ee-Peng Lim, Roy Ka-Wei Lee, Lidong Bing, and Soujanya Poria. 2023. LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models. arXiv preprint arXiv:2304.01933 (2023)."},{"key":"e_1_3_2_1_11_1","volume-title":"Elastic Resource Sharing for Distributed Deep Learning. In 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21)","author":"Hwang Changho","year":"2021","unstructured":"Changho Hwang, Taehyun Kim, Sunghyun Kim, Jinwoo Shin, and KyoungSoo Park. 2021. Elastic Resource Sharing for Distributed Deep Learning. In 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21). 721--739."},{"key":"e_1_3_2_1_12_1","volume-title":"Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads. In 2019 USENIX Annual Technical Conference (USENIX ATC 19)","author":"Jeon Myeongjae","year":"2019","unstructured":"Myeongjae Jeon, Shivaram Venkataraman, Amar Phanishayee, Junjie Qian, Wencong Xiao, and Fan Yang. 2019. Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). USENIX Association, Renton, WA, 947--960. https:\/\/www.usenix.org\/conference\/atc19\/presentation\/jeon"},{"key":"e_1_3_2_1_13_1","unstructured":"Andrej Karpathy. 2015. char-rnn. https:\/\/github.com\/karpathy\/charrnn."},{"key":"e_1_3_2_1_14_1","volume-title":"Proc. NAACL-HLT. 4171--4186","author":"Ming-Wei Chang Jacob Devlin","year":"2019","unstructured":"Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proc. NAACL-HLT. 4171--4186."},{"key":"e_1_3_2_1_15_1","volume-title":"Proceedings of Machine Learning and Systems 5","author":"Korthikanti Vijay Anand","year":"2023","unstructured":"Vijay Anand Korthikanti, Jared Casper, Sangkug Lym, Lawrence McAfee, Michael Andersch, Mohammad Shoeybi, and Bryan Catanzaro. 2023. Reducing Activation Recomputation in Large Transformer Models. Proceedings of Machine Learning and Systems 5 (2023)."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.353"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3543507.3583212"},{"key":"e_1_3_2_1_18_1","volume-title":"Analyzing Leakage of Personally Identifiable Information in Language Models. In 2023 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, 346--363","author":"Lukas Nils","year":"2023","unstructured":"Nils Lukas, Ahmed Salem, Robert Sim, Shruti Tople, Lukas Wutschitz, and Santiago Zanella-B\u00e9guelin. 2023. Analyzing Leakage of Personally Identifiable Information in Language Models. In 2023 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, 346--363."},{"key":"e_1_3_2_1_19_1","volume-title":"PEFT: State-of-the-art Parameter-Efficient Fine-Tuning methods. https:\/\/github.com\/huggingface\/peft.","author":"Mangrulkar Sourab","year":"2022","unstructured":"Sourab Mangrulkar, Sylvain Gugger, Lysandre Debut, Younes Belkada, Sayak Paul, and Benjamin Bossan. 2022. PEFT: State-of-the-art Parameter-Efficient Fine-Tuning methods. https:\/\/github.com\/huggingface\/peft."},{"key":"e_1_3_2_1_20_1","unstructured":"Brendan McMahan Eider Moore Daniel Ramage Seth Hampson and Blaise Aguera y Arcas. 2017. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Artificial Intelligence and Statistics. PMLR 1273--1282."},{"key":"e_1_3_2_1_21_1","unstructured":"Stephen Merity Caiming Xiong James Bradbury and Richard Socher. 2016. Pointer Sentinel Mixture Models. arXiv:1609.07843 [cs.CL]"},{"key":"e_1_3_2_1_22_1","volume-title":"Mixed Precision Training. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=r1gs9JgRZ","author":"Micikevicius Paulius","year":"2018","unstructured":"Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, and Hao Wu. 2018. Mixed Precision Training. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=r1gs9JgRZ"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.932708"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3485447.3512153"},{"key":"e_1_3_2_1_25_1","unstructured":"Adam Paszke Sam Gross Soumith Chintala Gregory Chanan Edward Yang Zachary DeVito Zeming Lin Alban Desmaison Luca Antiga and Adam Lerer. 2017. Automatic Differentiation in PyTorch. In NIPSW."},{"key":"e_1_3_2_1_26_1","volume-title":"Split Learning for Collaborative Deep Learning in Healthcare. arXiv preprint arXiv:1912.12115","author":"Poirot Maarten G","year":"2019","unstructured":"Maarten G Poirot, Praneeth Vepakomma, Ken Chang, Jayashree Kalpathy-Cramer, Rajiv Gupta, and Ramesh Raskar. 2019. Split Learning for Collaborative Deep Learning in Healthcare. arXiv preprint arXiv:1912.12115 (2019)."},{"key":"e_1_3_2_1_27_1","unstructured":"Alec Radford Karthik Narasimhan Tim Salimans Ilya Sutskever et al. 2018. Improving language understanding by generative pre-training. (2018)."},{"key":"e_1_3_2_1_28_1","volume-title":"Hashimoto","author":"Taori Rohan","year":"2023","unstructured":"Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto . 2023. Stanford Alpaca: An Instruction-following LLaMA model. https:\/\/github.com\/tatsu-lab\/stanford_alpaca."},{"key":"e_1_3_2_1_29_1","volume-title":"Does Fine-Tuning GPT-3 with the OpenAI API Leak Personally-Identifiable Information? arXiv preprint arXiv:2307.16382","author":"Sun Albert Yu","year":"2023","unstructured":"Albert Yu Sun, Eliott Zemour, Arushi Saxena, Udith Vaidyanathan, Eric Lin, Christian Lau, and Vaikkunth Mugunthan. 2023. Does Fine-Tuning GPT-3 with the OpenAI API Leak Personally-Identifiable Information? arXiv preprint arXiv:2307.16382 (2023)."},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v36i8.20825"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTER.2019.8891040"},{"key":"e_1_3_2_1_32_1","unstructured":"Hugo Touvron Louis Martin Kevin Stone Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale et al. 2023. Llama 2: Open Foundation and Fine-tuned Chat Models. arXiv preprint arXiv:2307.09288 (2023)."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/CLOUD53861.2021.00038"},{"key":"e_1_3_2_1_34_1","volume-title":"Split Learning for Health: Distributed Deep Learning without Sharing Raw Patient Data. arXiv preprint arXiv:1812.00564","author":"Vepakomma Praneeth","year":"2018","unstructured":"Praneeth Vepakomma, Otkrist Gupta, Tristan Swedish, and Ramesh Raskar. 2018. Split Learning for Health: Distributed Deep Learning without Sharing Raw Patient Data. arXiv preprint arXiv:1812.00564 (2018)."},{"key":"e_1_3_2_1_35_1","volume-title":"Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, and William Fedus.","author":"Wei Jason","year":"2022","unstructured":"Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, and William Fedus. 2022. Emergent Abilities of Large Language Models. Transactions on Machine Learning Research (2022). https:\/\/openreview.net\/forum?id=yzkSU5zdwD Survey Certification."},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3369583.3392679"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-short.1"},{"key":"e_1_3_2_1_38_1","volume-title":"Xi Victoria Lin, et al","author":"Zhang Susan","year":"2022","unstructured":"Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, et al. 2022. OPT: Open Pre-Trained Transformer Language Models. arXiv preprint arXiv:2205.01068 (2022)."},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/TBDATA.2023.3280405"}],"event":{"name":"Middleware '24: 25th International Middleware Conference","sponsor":["IFIP","Usenix"],"location":"Hong Kong Hong Kong","acronym":"Middleware '24"},"container-title":["Proceedings of the 25th International Middleware Conference"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3652892.3700758","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3652892.3700758","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T22:53:57Z","timestamp":1750287237000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3652892.3700758"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,2]]},"references-count":39,"alternative-id":["10.1145\/3652892.3700758","10.1145\/3652892"],"URL":"https:\/\/doi.org\/10.1145\/3652892.3700758","relation":{},"subject":[],"published":{"date-parts":[[2024,12,2]]},"assertion":[{"value":"2024-12-02","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}