{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,23]],"date-time":"2026-02-23T23:10:50Z","timestamp":1771888250065,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":46,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T00:00:00Z","timestamp":1674777600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,1,27]]},"DOI":"10.1145\/3575693.3575703","type":"proceedings-article","created":{"date-parts":[[2023,1,30]],"date-time":"2023-01-30T22:56:55Z","timestamp":1675119415000},"page":"489-501","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":26,"title":["Mobius: Fine Tuning Large-Scale Models on Commodity GPU Servers"],"prefix":"10.1145","author":[{"given":"Yangyang","family":"Feng","sequence":"first","affiliation":[{"name":"Tsinghua University, China"}]},{"given":"Minhui","family":"Xie","sequence":"additional","affiliation":[{"name":"Tsinghua University, China"}]},{"given":"Zijie","family":"Tian","sequence":"additional","affiliation":[{"name":"Tsinghua University, China"}]},{"given":"Shuo","family":"Wang","sequence":"additional","affiliation":[{"name":"Tsinghua University, China"}]},{"given":"Youyou","family":"Lu","sequence":"additional","affiliation":[{"name":"Tsinghua University, China"}]},{"given":"Jiwu","family":"Shu","sequence":"additional","affiliation":[{"name":"Tsinghua University, China"}]}],"member":"320","published-online":{"date-parts":[[2023,1,30]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"[n. d.]. Amazon EC2 P3 Instances. https:\/\/aws.amazon.com\/ec2\/instance-types\/p3\/ \t\t\t\t  [n. d.]. Amazon EC2 P3 Instances. https:\/\/aws.amazon.com\/ec2\/instance-types\/p3\/"},{"key":"e_1_3_2_1_2_1","unstructured":"[n. d.]. Amazon EC2 P4 Instances. https:\/\/aws.amazon.com\/ec2\/instance-types\/p4\/ \t\t\t\t  [n. d.]. Amazon EC2 P4 Instances. https:\/\/aws.amazon.com\/ec2\/instance-types\/p4\/"},{"key":"e_1_3_2_1_3_1","unstructured":"[n. d.]. DeepSpeed. https:\/\/github.com\/microsoft\/DeepSpeed \t\t\t\t  [n. d.]. DeepSpeed. https:\/\/github.com\/microsoft\/DeepSpeed"},{"key":"e_1_3_2_1_4_1","unstructured":"[n. d.]. DGX-2. https:\/\/www.nvidia.com\/en-us\/data-center\/dgx-2\/ \t\t\t\t  [n. d.]. DGX-2. https:\/\/www.nvidia.com\/en-us\/data-center\/dgx-2\/"},{"key":"e_1_3_2_1_5_1","unstructured":"[n. d.]. DGX A100. https:\/\/www.nvidia.com\/en-us\/data-center\/dgx-a100\/ \t\t\t\t  [n. d.]. DGX A100. https:\/\/www.nvidia.com\/en-us\/data-center\/dgx-a100\/"},{"key":"e_1_3_2_1_6_1","unstructured":"[n. d.]. EleutherAI\/gpt-j-6B. https:\/\/huggingface.co\/EleutherAI\/gpt-j-6B \t\t\t\t  [n. d.]. EleutherAI\/gpt-j-6B. https:\/\/huggingface.co\/EleutherAI\/gpt-j-6B"},{"key":"e_1_3_2_1_7_1","unstructured":"[n. d.]. GEFORCE RTX 3090 Family. https:\/\/www.nvidia.com\/en-us\/geforce\/graphics-cards\/30-series\/rtx-3090-3090ti\/ \t\t\t\t  [n. d.]. GEFORCE RTX 3090 Family. https:\/\/www.nvidia.com\/en-us\/geforce\/graphics-cards\/30-series\/rtx-3090-3090ti\/"},{"key":"e_1_3_2_1_8_1","unstructured":"[n. d.]. GPU cloud servers. https:\/\/en.immers.cloud\/gpu\/ \t\t\t\t  [n. d.]. GPU cloud servers. https:\/\/en.immers.cloud\/gpu\/"},{"key":"e_1_3_2_1_9_1","unstructured":"[n. d.]. GPUDirect. https:\/\/developer.nvidia.com\/gpudirect \t\t\t\t  [n. d.]. GPUDirect. https:\/\/developer.nvidia.com\/gpudirect"},{"key":"e_1_3_2_1_10_1","unstructured":"[n. d.]. Gurobi. https:\/\/www.gurobi.com \t\t\t\t  [n. d.]. Gurobi. https:\/\/www.gurobi.com"},{"key":"e_1_3_2_1_11_1","unstructured":"[n. d.]. NVIDIA A100 TENSOR CORE GPU. https:\/\/www.nvidia.com\/en-us\/data-center\/a100\/ \t\t\t\t  [n. d.]. NVIDIA A100 TENSOR CORE GPU. https:\/\/www.nvidia.com\/en-us\/data-center\/a100\/"},{"key":"e_1_3_2_1_12_1","unstructured":"[n. d.]. NVLink and NVSwitch. https:\/\/www.nvidia.com\/en-us\/data-center\/nvlink\/ \t\t\t\t  [n. d.]. NVLink and NVSwitch. https:\/\/www.nvidia.com\/en-us\/data-center\/nvlink\/"},{"key":"e_1_3_2_1_13_1","unstructured":"[n. d.]. OpenAI\u2019s GPT-3 Language Model: A Technical Overview. https:\/\/lambdalabs.com\/blog\/demystifying-gpt-3\/ \t\t\t\t  [n. d.]. OpenAI\u2019s GPT-3 Language Model: A Technical Overview. https:\/\/lambdalabs.com\/blog\/demystifying-gpt-3\/"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3492321.3519584"},{"key":"e_1_3_2_1_15_1","volume-title":"FlashNeuron:SSD-Enabled Large-Batch Training of Very Deep Neural Networks. In 19th USENIX Conference on File and Storage Technologies (FAST 21)","author":"Bae Jonghyun","year":"2021","unstructured":"Jonghyun Bae , Jongsung Lee , Yunho Jin , Sam Son , Shine Kim , Hakbeom Jang , Tae Jun Ham , and Jae W Lee . 2021 . FlashNeuron:SSD-Enabled Large-Batch Training of Very Deep Neural Networks. In 19th USENIX Conference on File and Storage Technologies (FAST 21) . 387\u2013401. Jonghyun Bae, Jongsung Lee, Yunho Jin, Sam Son, Shine Kim, Hakbeom Jang, Tae Jun Ham, and Jae W Lee. 2021. FlashNeuron:SSD-Enabled Large-Batch Training of Very Deep Neural Networks. In 19th USENIX Conference on File and Storage Technologies (FAST 21). 387\u2013401."},{"key":"e_1_3_2_1_16_1","volume-title":"Language models are few-shot learners. Advances in neural information processing systems, 33","author":"Brown Tom","year":"2020","unstructured":"Tom Brown , Benjamin Mann , Nick Ryder , Melanie Subbiah , Jared D Kaplan , Prafulla Dhariwal , Arvind Neelakantan , Pranav Shyam , Girish Sastry , and Amanda Askell . 2020. Language models are few-shot learners. Advances in neural information processing systems, 33 ( 2020 ), 1877\u20131901. Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, and Amanda Askell. 2020. Language models are few-shot learners. Advances in neural information processing systems, 33 (2020), 1877\u20131901."},{"key":"e_1_3_2_1_17_1","unstructured":"Tianqi Chen Bing Xu Chiyuan Zhang and Carlos Guestrin. 2016. Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174. \t\t\t\t  Tianqi Chen Bing Xu Chiyuan Zhang and Carlos Guestrin. 2016. Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174."},{"key":"e_1_3_2_1_18_1","unstructured":"Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold and Sylvain Gelly. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929. \t\t\t\t  Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold and Sylvain Gelly. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929."},{"key":"e_1_3_2_1_19_1","volume-title":"2021 USENIX Annual Technical Conference (USENIX ATC 21)","author":"Eliad Saar","year":"2021","unstructured":"Saar Eliad , Ido Hakimi , Alon De Jagger , Mark Silberstein , and Assaf Schuster . 2021 . Fine-tuning giant neural networks on commodity hardware with automatic pipeline model parallelism . In 2021 USENIX Annual Technical Conference (USENIX ATC 21) . USENIX Association, 381\u2013396. isbn:978-1-939133-23-6 https:\/\/www.usenix.org\/conference\/atc21\/presentation\/eliad Saar Eliad, Ido Hakimi, Alon De Jagger, Mark Silberstein, and Assaf Schuster. 2021. Fine-tuning giant neural networks on commodity hardware with automatic pipeline model parallelism. In 2021 USENIX Annual Technical Conference (USENIX ATC 21). USENIX Association, 381\u2013396. isbn:978-1-939133-23-6 https:\/\/www.usenix.org\/conference\/atc21\/presentation\/eliad"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3437801.3441593"},{"key":"e_1_3_2_1_21_1","volume-title":"Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. https:\/\/doi.org\/10.48550\/ARXIV.2101.03961","author":"Fedus William","year":"2021","unstructured":"William Fedus , Barret Zoph , and Noam Shazeer . 2021 . Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. https:\/\/doi.org\/10.48550\/ARXIV.2101.03961 10.48550\/ARXIV.2101.03961 William Fedus, Barret Zoph, and Noam Shazeer. 2021. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. https:\/\/doi.org\/10.48550\/ARXIV.2101.03961"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1031"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378530"},{"key":"e_1_3_2_1_24_1","volume-title":"Gpipe: Efficient training of giant neural networks using pipeline parallelism. Advances in neural information processing systems, 32","author":"Huang Yanping","year":"2019","unstructured":"Yanping Huang , Youlong Cheng , Ankur Bapna , Orhan Firat , Dehao Chen , Mia Chen , HyoukJoong Lee , Jiquan Ngiam , Quoc V Le , and Yonghui Wu . 2019 . Gpipe: Efficient training of giant neural networks using pipeline parallelism. Advances in neural information processing systems, 32 (2019). Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Dehao Chen, Mia Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V Le, and Yonghui Wu. 2019. Gpipe: Efficient training of giant neural networks using pipeline parallelism. Advances in neural information processing systems, 32 (2019)."},{"key":"e_1_3_2_1_25_1","unstructured":"Jared Kaplan Sam McCandlish Tom Henighan Tom B Brown Benjamin Chess Rewon Child Scott Gray Alec Radford Jeffrey Wu and Dario Amodei. 2020. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361. \t\t\t\t  Jared Kaplan Sam McCandlish Tom Henighan Tom B Brown Benjamin Chess Rewon Child Scott Gray Alec Radford Jeffrey Wu and Dario Amodei. 2020. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361."},{"key":"e_1_3_2_1_26_1","unstructured":"Chiheon Kim Heungsub Lee Myungryong Jeong Woonhyuk Baek Boogeon Yoon Ildoo Kim Sungbin Lim and Sungwoong Kim. 2020. torchgpipe: On-the-fly Pipeline Parallelism for Training Giant Models. arxiv:2004.09910. \t\t\t\t  Chiheon Kim Heungsub Lee Myungryong Jeong Woonhyuk Baek Boogeon Yoon Ildoo Kim Sungbin Lim and Sungwoong Kim. 2020. torchgpipe: On-the-fly Pipeline Parallelism for Training Giant Models. arxiv:2004.09910."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3458817.3476145"},{"key":"e_1_3_2_1_28_1","unstructured":"Shen Li Yanli Zhao Rohan Varma Omkar Salpekar Pieter Noordhuis Teng Li Adam Paszke Jeff Smith Brian Vaughan and Pritam Damania. 2020. Pytorch distributed: Experiences on accelerating data parallel training. arXiv preprint arXiv:2006.15704. \t\t\t\t  Shen Li Yanli Zhao Rohan Varma Omkar Salpekar Pieter Noordhuis Teng Li Adam Paszke Jeff Smith Brian Vaughan and Pritam Damania. 2020. Pytorch distributed: Experiences on accelerating data parallel training. arXiv preprint arXiv:2006.15704."},{"key":"e_1_3_2_1_29_1","unstructured":"Stephen Merity Caiming Xiong James Bradbury and Richard Socher. 2016. Pointer sentinel mixture models. arXiv preprint arXiv:1609.07843. \t\t\t\t  Stephen Merity Caiming Xiong James Bradbury and Richard Socher. 2016. Pointer sentinel mixture models. arXiv preprint arXiv:1609.07843."},{"key":"e_1_3_2_1_30_1","unstructured":"Paulius Micikevicius Sharan Narang Jonah Alben Gregory Diamos Erich Elsen David Garcia Boris Ginsburg Michael Houston Oleksii Kuchaiev and Ganesh Venkatesh. 2017. Mixed precision training. arXiv preprint arXiv:1710.03740. \t\t\t\t  Paulius Micikevicius Sharan Narang Jonah Alben Gregory Diamos Erich Elsen David Garcia Boris Ginsburg Michael Houston Oleksii Kuchaiev and Ganesh Venkatesh. 2017. Mixed precision training. arXiv preprint arXiv:1710.03740."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3341301.3359646"},{"key":"e_1_3_2_1_32_1","volume-title":"International Conference on Machine Learning. 7937\u20137947","author":"Narayanan Deepak","year":"2021","unstructured":"Deepak Narayanan , Amar Phanishayee , Kaiyu Shi , Xie Chen , and Matei Zaharia . 2021 . Memory-efficient pipeline-parallel dnn training . In International Conference on Machine Learning. 7937\u20137947 . Deepak Narayanan, Amar Phanishayee, Kaiyu Shi, Xie Chen, and Matei Zaharia. 2021. Memory-efficient pipeline-parallel dnn training. In International Conference on Machine Learning. 7937\u20137947."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"crossref","unstructured":"Deepak Narayanan Mohammad Shoeybi Jared Casper Patrick LeGresley Mostofa Patwary Vijay Korthikanti Dmitri Vainbrand Prethvi Kashinkunti Julie Bernauer and Bryan Catanzaro. 2021. Efficient large-scale language model training on gpu clusters using megatron-lm. 1\u201315. \t\t\t\t  Deepak Narayanan Mohammad Shoeybi Jared Casper Patrick LeGresley Mostofa Patwary Vijay Korthikanti Dmitri Vainbrand Prethvi Kashinkunti Julie Bernauer and Bryan Catanzaro. 2021. Efficient large-scale language model training on gpu clusters using megatron-lm. 1\u201315.","DOI":"10.1145\/3458817.3476209"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378505"},{"key":"e_1_3_2_1_35_1","volume-title":"ZeRO: Memory Optimizations Toward Training Trillion Parameter Models. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis (SC). 262\u2013277","author":"Rajbhandari Samyam","year":"2020","unstructured":"Samyam Rajbhandari , Jeff Rasley , Olatunji Ruwase , and Yuxiong He . 2020 . ZeRO: Memory Optimizations Toward Training Trillion Parameter Models. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis (SC). 262\u2013277 . Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, and Yuxiong He. 2020. ZeRO: Memory Optimizations Toward Training Trillion Parameter Models. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis (SC). 262\u2013277."},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3458817.3476205"},{"key":"e_1_3_2_1_37_1","volume-title":"ZeRO-Offload: Democratizing Billion-Scale Model Training. In 2021 USENIX Annual Technical Conference (USENIX ATC 21)","author":"Ren Jie","year":"2021","unstructured":"Jie Ren , Samyam Rajbhandari , Reza Yazdani Aminabadi , Olatunji Ruwase , Shuangyan Yang , Minjia Zhang , Dong Li , and Yuxiong He . 2021 . ZeRO-Offload: Democratizing Billion-Scale Model Training. In 2021 USENIX Annual Technical Conference (USENIX ATC 21) . 551\u2013564. Jie Ren, Samyam Rajbhandari, Reza Yazdani Aminabadi, Olatunji Ruwase, Shuangyan Yang, Minjia Zhang, Dong Li, and Yuxiong He. 2021. ZeRO-Offload: Democratizing Billion-Scale Model Training. In 2021 USENIX Annual Technical Conference (USENIX ATC 21). 551\u2013564."},{"key":"e_1_3_2_1_38_1","unstructured":"Or Sharir Barak Peleg and Yoav Shoham. 2020. The cost of training nlp models: A concise overview. arXiv preprint arXiv:2004.08900. \t\t\t\t  Or Sharir Barak Peleg and Yoav Shoham. 2020. The cost of training nlp models: A concise overview. arXiv preprint arXiv:2004.08900."},{"key":"e_1_3_2_1_39_1","volume-title":"Mesh-tensorflow: Deep learning for supercomputers. Advances in neural information processing systems, 31","author":"Shazeer Noam","year":"2018","unstructured":"Noam Shazeer , Youlong Cheng , Niki Parmar , Dustin Tran , Ashish Vaswani , Penporn Koanantakool , Peter Hawkins , HyoukJoong Lee , Mingsheng Hong , and Cliff Young . 2018 . Mesh-tensorflow: Deep learning for supercomputers. Advances in neural information processing systems, 31 (2018). Noam Shazeer, Youlong Cheng, Niki Parmar, Dustin Tran, Ashish Vaswani, Penporn Koanantakool, Peter Hawkins, HyoukJoong Lee, Mingsheng Hong, and Cliff Young. 2018. Mesh-tensorflow: Deep learning for supercomputers. Advances in neural information processing systems, 31 (2018)."},{"key":"e_1_3_2_1_40_1","volume-title":"Megatron-lm: Training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053.","author":"Shoeybi Mohammad","year":"2019","unstructured":"Mohammad Shoeybi , Mostofa Patwary , Raul Puri , Patrick LeGresley , Jared Casper , and Bryan Catanzaro . 2019 . Megatron-lm: Training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053. Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. 2019. Megatron-lm: Training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053."},{"key":"e_1_3_2_1_41_1","volume-title":"\u0141 ukasz Kaiser, and Illia Polosukhin","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N Gomez , \u0141 ukasz Kaiser, and Illia Polosukhin . 2017 . Attention is all you need. Advances in neural information processing systems, 30 (2017). Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141 ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, 30 (2017)."},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3178487.3178491"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3302424.3303953"},{"key":"e_1_3_2_1_44_1","volume-title":"Jinliang Wei, Seunghak Lee, Xun Zheng, Pengtao Xie, Abhimanu Kumar, and Yaoliang Yu.","author":"Xing Eric P","year":"2015","unstructured":"Eric P Xing , Qirong Ho , Wei Dai , Jin Kyu Kim , Jinliang Wei, Seunghak Lee, Xun Zheng, Pengtao Xie, Abhimanu Kumar, and Yaoliang Yu. 2015 . Petuum : A new platform for distributed machine learning on big data. IEEE transactions on Big Data , 1, 2 (2015), 49\u201367. Eric P Xing, Qirong Ho, Wei Dai, Jin Kyu Kim, Jinliang Wei, Seunghak Lee, Xun Zheng, Pengtao Xie, Abhimanu Kumar, and Yaoliang Yu. 2015. Petuum: A new platform for distributed machine learning on big data. IEEE transactions on Big Data, 1, 2 (2015), 49\u201367."},{"key":"e_1_3_2_1_45_1","first-page":"269","article-title":"Pipemare: Asynchronous pipeline parallel dnn training","volume":"3","author":"Yang Bowen","year":"2021","unstructured":"Bowen Yang , Jian Zhang , Jonathan Li , Christopher R\u00e9 , Christopher Aberger , and Christopher De Sa . 2021 . Pipemare: Asynchronous pipeline parallel dnn training . Proceedings of Machine Learning and Systems , 3 (2021), 269 \u2013 296 . Bowen Yang, Jian Zhang, Jonathan Li, Christopher R\u00e9, Christopher Aberger, and Christopher De Sa. 2021. Pipemare: Asynchronous pipeline parallel dnn training. Proceedings of Machine Learning and Systems, 3 (2021), 269\u2013296.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_3_2_1_46_1","volume-title":"Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068.","author":"Zhang Susan","year":"2022","unstructured":"Susan Zhang , Stephen Roller , Naman Goyal , Mikel Artetxe , Moya Chen , Shuohui Chen , Christopher Dewan , Mona Diab , Xian Li , and Xi Victoria Lin . 2022 . Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068. Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, and Xi Victoria Lin. 2022. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068."}],"event":{"name":"ASPLOS '23: 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2","location":"Vancouver BC Canada","acronym":"ASPLOS '23","sponsor":["SIGARCH ACM Special Interest Group on Computer Architecture","SIGOPS ACM Special Interest Group on Operating Systems","SIGPLAN ACM Special Interest Group on Programming Languages"]},"container-title":["Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3575693.3575703","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3575693.3575703","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T18:43:52Z","timestamp":1750272232000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3575693.3575703"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,27]]},"references-count":46,"alternative-id":["10.1145\/3575693.3575703","10.1145\/3575693"],"URL":"https:\/\/doi.org\/10.1145\/3575693.3575703","relation":{},"subject":[],"published":{"date-parts":[[2023,1,27]]},"assertion":[{"value":"2023-01-30","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}