{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,12]],"date-time":"2026-03-12T12:18:44Z","timestamp":1773317924073,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":55,"publisher":"ACM","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,11,16]]},"DOI":"10.1145\/3712285.3759855","type":"proceedings-article","created":{"date-parts":[[2025,11,12]],"date-time":"2025-11-12T16:05:39Z","timestamp":1762963539000},"page":"1409-1428","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["SlimPipe: Memory-Thrifty and Efficient Pipeline Parallelism for Long-Context LLM Training"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-7317-1166","authenticated-orcid":false,"given":"Zhouyang","family":"Li","sequence":"first","affiliation":[{"name":"Kuaishou Technology, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-1878-0199","authenticated-orcid":false,"given":"Yuliang","family":"Liu","sequence":"additional","affiliation":[{"name":"Kuaishou Technology, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-5071-489X","authenticated-orcid":false,"given":"Wei","family":"Zhang","sequence":"additional","affiliation":[{"name":"Kuaishou Technology, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6119-8829","authenticated-orcid":false,"given":"Tailing","family":"Yuan","sequence":"additional","affiliation":[{"name":"Kuaishou Technology, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-7953-1651","authenticated-orcid":false,"given":"Bin","family":"Chen","sequence":"additional","affiliation":[{"name":"Kuaishou Technology, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-3826-8436","authenticated-orcid":false,"given":"Chengru","family":"Song","sequence":"additional","affiliation":[{"name":"Kuaishou Technology, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2025,11,15]]},"reference":[{"key":"e_1_3_3_2_2_2","doi-asserted-by":"crossref","unstructured":"Joshua Ainslie James Lee-Thorp Michiel De\u00a0Jong Yury Zemlyanskiy Federico Lebr\u00f3n and Sumit Sanghai. 2023. Gqa: Training generalized multi-query transformer models from multi-head checkpoints. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2305.13245 (2023).","DOI":"10.18653\/v1\/2023.emnlp-main.298"},{"key":"e_1_3_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2025.naacl-long.454"},{"key":"e_1_3_3_2_4_2","unstructured":"Olivier Beaumont Lionel Eyraud-Dubois Julien Hermann Alexis Joly and Alena Shilova. 2019. Optimal checkpointing for heterogeneous chains: how to train deep neural networks with limited memory. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1911.13214 (2019)."},{"key":"e_1_3_3_2_5_2","unstructured":"Amanda Bertsch Maor Ivgi Emily Xiao Uri Alon Jonathan Berant Matthew\u00a0R Gormley and Graham Neubig. 2024. In-context learning with long-context models: An in-depth exploration. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2405.00200 (2024)."},{"key":"e_1_3_3_2_6_2","unstructured":"William Brandon Aniruddha Nrusimha Kevin Qian Zachary Ankner Tian Jin Zhiye Song and Jonathan Ragan-Kelley. 2023. Striped attention: Faster ring attention for causal transformers. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2311.09431 (2023)."},{"key":"e_1_3_3_2_7_2","unstructured":"Tom\u00a0B. Brown Benjamin Mann Nick Ryder Melanie Subbiah J. Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell Sandhini Agarwal Ariel Herbert-Voss Gretchen Krueger T. Henighan R. Child A. Ramesh Daniel\u00a0M. Ziegler Jeff Wu Clemens Winter Christopher Hesse Mark Chen Eric Sigler Ma teusz Litwin S. Gray B. Chess Jack Clark Christopher Berner Sam McCandlish Alec Radford I. Sutskever and Dario Amodei. 2020. Language Models are Few-Shot Learners. Neural Information Processing Systems (2020)."},{"key":"e_1_3_3_2_8_2","unstructured":"Tianqi Chen Bing Xu Chiyuan Zhang and Carlos Guestrin. 2016. Training deep nets with sublinear memory cost. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1604.06174 (2016)."},{"key":"e_1_3_3_2_9_2","unstructured":"Sharan Chetlur Cliff Woolley Philippe Vandermersch Jonathan Cohen John Tran Bryan Catanzaro and Evan Shelhamer. 2014. cuDNN: Efficient primitives for deep learning. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1410.0759 (2014)."},{"key":"e_1_3_3_2_10_2","unstructured":"Zihang Dai Zhilin Yang Yiming Yang Jaime Carbonell Quoc\u00a0V Le and Ruslan Salakhutdinov. 2019. Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1901.02860 (2019)."},{"key":"e_1_3_3_2_11_2","doi-asserted-by":"crossref","unstructured":"Tri Dao Dan Fu Stefano Ermon Atri Rudra and Christopher R\u00e9. 2022. Flashattention: Fast and memory-efficient exact attention with io-awareness. Advances in Neural Information Processing Systems 35 (2022) 16344\u201316359.","DOI":"10.52202\/068431-1189"},{"key":"e_1_3_3_2_12_2","unstructured":"Abhimanyu Dubey Abhinav Jauhri Abhinav Pandey Abhishek Kadian Ahmad Al-Dahle Aiesha Letman Akhil Mathur Alan Schelten Amy Yang Angela Fan Anirudh Goyal Anthony Hartshorn Aobo Yang Archi Mitra Archie Sravankumar Artem Korenev Arthur Hinsvark Arun Rao Aston Zhang Aurelien Rodriguez Austen Gregerson Ava Spataru Baptiste Roziere Bethany Biron Binh Tang Bobbie Chern Charlotte Caucheteux Chaya Nayak Chloe Bi Chris Marra Chris McConnell Christian Keller Christophe Touret Chunyang Wu Corinne Wong Cristian\u00a0Canton Ferrer Cyrus Nikolaidis Damien Allonsius Daniel Song Danielle Pintz Danny Livshits David Esiobu Dhruv Choudhary Dhruv Mahajan Diego Garcia-Olano Diego Perino Dieuwke Hupkes Egor Lakomkin Ehab AlBadawy Elina Lobanova Emily Dinan Eric\u00a0Michael Smith Filip Radenovic Frank Zhang Gabriel Synnaeve Gabrielle Lee Georgia\u00a0Lewis Anderson Graeme Nail Gregoire Mialon Guan Pang Guillem Cucurell Hailey Nguyen Hannah Korevaar Hu Xu Hugo Touvron Iliyan Zarov Imanol\u00a0Arrieta Ibarra Isabel Kloumann Ishan Misra Ivan Evtimov Jade Copet Jaewon Lee Jan Geffert Jana Vranes Jason Park Jay Mahadeokar Jeet Shah Jelmer van\u00a0der Linde Jennifer Billock Jenny Hong Jenya Lee Jeremy Fu Jianfeng Chi Jianyu Huang Jiawen Liu Jie Wang Jiecao Yu Joanna Bitton Joe Spisak Jongsoo Park Joseph Rocca Joshua Johnstun Joshua Saxe Junteng Jia Kalyan\u00a0Vasuden Alwala Kartikeya Upasani Kate Plawiak Ke Li Kenneth Heafield Kevin Stone Khalid El-Arini Krithika Iyer Kshitiz Malik Kuenley Chiu Kunal Bhalla Lauren Rantala-Yeary Laurens van\u00a0der Maaten Lawrence Chen Liang Tan Liz Jenkins Louis Martin Lovish Madaan Lubo Malo Lukas Blecher Lukas Landzaat Luke de Oliveira Madeline Muzzi Mahesh Pasupuleti Mannat Singh Manohar Paluri Marcin Kardas Mathew Oldham Mathieu Rita Maya Pavlova Melanie Kambadur Mike Lewis Min Si Mitesh\u00a0Kumar Singh Mona Hassan Naman Goyal Narjes Torabi Nikolay Bashlykov Nikolay Bogoychev Niladri Chatterji Olivier Duchenne Onur \u00c7elebi Patrick Alrassy Pengchuan Zhang Pengwei Li Petar Vasic Peter Weng Prajjwal Bhargava Pratik Dubal Praveen Krishnan Punit\u00a0Singh Koura Puxin Xu Qing He Qingxiao Dong Ragavan Srinivasan Raj Ganapathy Ramon Calderer Ricardo\u00a0Silveira Cabral Robert Stojnic Roberta Raileanu Rohit Girdhar Rohit Patel Romain Sauvestre Ronnie Polidoro Roshan Sumbaly Ross Taylor Ruan Silva Rui Hou Rui Wang Saghar Hosseini Sahana Chennabasappa Sanjay Singh Sean Bell Seohyun\u00a0Sonia Kim Sergey Edunov Shaoliang Nie Sharan Narang Sharath Raparthy Sheng Shen Shengye Wan Shruti Bhosale Shun Zhang Simon Vandenhende Soumya Batra Spencer Whitman Sten Sootla Stephane Collot Suchin Gururangan Sydney Borodinsky Tamar Herman Tara Fowler Tarek Sheasha Thomas Georgiou Thomas Scialom Tobias Speckbacher Todor Mihaylov Tong Xiao Ujjwal Karn Vedanuj Goswami Vibhor Gupta Vignesh Ramanathan Viktor Kerkez Vincent Gonguet Virginie Do Vish Vogeti Vladan Petrovic Weiwei Chu Wenhan Xiong Wenyin Fu Whitney Meers Xavier Martinet Xiaodong Wang Xiaoqing\u00a0Ellen Tan Xinfeng Xie Xuchao Jia Xuewei Wang Yaelle Goldschlag Yashesh Gaur Yasmine Babaei Yi Wen Yiwen Song Yuchen Zhang Yue Li Yuning Mao Zacharie\u00a0Delpierre Coudert Zheng Yan Zhengxing Chen Zoe Papakipos Aaditya Singh Aaron Grattafiori Abha Jain Adam Kelsey Adam Shajnfeld Adithya Gangidi Adolfo Victoria Ahuva Goldstand Ajay Menon Ajay Sharma Alex Boesenberg Alex Vaughan Alexei Baevski Allie Feinstein Amanda Kallet Amit Sangani Anam Yunus Andrei Lupu Andres Alvarado Andrew Caples Andrew Gu Andrew Ho Andrew Poulton Andrew Ryan Ankit Ramchandani Annie Franco Aparajita Saraf Arkabandhu Chowdhury Ashley Gabriel Ashwin Bharambe Assaf Eisenman Azadeh Yazdan Beau James Ben Maurer Benjamin Leonhardi Bernie Huang Beth Loyd Beto\u00a0De Paola Bhargavi Paranjape Bing Liu Bo Wu Boyu Ni Braden Hancock Bram Wasti Brandon Spence Brani Stojkovic Brian Gamido Britt Montalvo Carl Parker Carly Burton Catalina Mejia Changhan Wang Changkyu Kim Chao Zhou Chester Hu Ching-Hsiang Chu Chris Cai Chris Tindal Christoph Feichtenhofer Damon Civin Dana Beaty Daniel Kreymer Daniel Li Danny Wyatt David Adkins David Xu Davide Testuggine Delia David Devi Parikh Diana Liskovich Didem Foss Dingkang Wang Duc Le Dustin Holland Edward Dowling Eissa Jamil Elaine Montgomery Eleonora Presani Emily Hahn Emily Wood Erik Brinkman Esteban Arcaute Evan Dunbar Evan Smothers Fei Sun Felix Kreuk Feng Tian Firat Ozgenel Francesco Caggioni Francisco Guzm\u00e1n Frank Kanayet Frank Seide Gabriela\u00a0Medina Florez Gabriella Schwarz Gada Badeer Georgia Swee Gil Halpern Govind Thattai Grant Herman Grigory Sizov Guangyi Zhang Guna Lakshminarayanan Hamid Shojanazeri Han Zou Hannah Wang Hanwen Zha Haroun Habeeb Harrison Rudolph Helen Suk Henry Aspegren Hunter Goldman Ibrahim Damlaj Igor Molybog Igor Tufanov Irina-Elena Veliche Itai Gat Jake Weissman James Geboski James Kohli Japhet Asher Jean-Baptiste Gaya Jeff Marcus Jeff Tang Jennifer Chan Jenny Zhen Jeremy Reizenstein Jeremy Teboul Jessica Zhong Jian Jin Jingyi Yang Joe Cummings Jon Carvill Jon Shepard Jonathan McPhie Jonathan Torres Josh Ginsburg Junjie Wang Kai Wu Kam\u00a0Hou U Karan Saxena Karthik Prasad Kartikay Khandelwal Katayoun Zand Kathy Matosich Kaushik Veeraraghavan Kelly Michelena Keqian Li Kun Huang Kunal Chawla Kushal Lakhotia Kyle Huang Lailin Chen Lakshya Garg Lavender A Leandro Silva Lee Bell Lei Zhang Liangpeng Guo Licheng Yu Liron Moshkovich Luca Wehrstedt Madian Khabsa Manav Avalani Manish Bhatt Maria Tsimpoukelli Martynas Mankus Matan Hasson Matthew Lennie Matthias Reso Maxim Groshev Maxim Naumov Maya Lathi Meghan Keneally Michael\u00a0L. Seltzer Michal Valko Michelle Restrepo Mihir Patel Mik Vyatskov Mikayel Samvelyan Mike Clark Mike Macey Mike Wang Miquel\u00a0Jubert Hermoso Mo Metanat Mohammad Rastegari Munish Bansal Nandhini Santhanam Natascha Parks Natasha White Navyata Bawa Nayan Singhal Nick Egebo Nicolas Usunier Nikolay\u00a0Pavlovich Laptev Ning Dong Ning Zhang Norman Cheng Oleg Chernoguz Olivia Hart Omkar Salpekar Ozlem Kalinli Parkin Kent Parth Parekh Paul Saab Pavan Balaji Pedro Rittner Philip Bontrager Pierre Roux Piotr Dollar Polina Zvyagina Prashant Ratanchandani Pritish Yuvraj Qian Liang Rachad Alao Rachel Rodriguez Rafi Ayub Raghotham Murthy Raghu Nayani Rahul Mitra Raymond Li Rebekkah Hogan Robin Battey Rocky Wang Rohan Maheswari Russ Howes Ruty Rinott Sai\u00a0Jayesh Bondu Samyak Datta Sara Chugh Sara Hunt Sargun Dhillon Sasha Sidorov Satadru Pan Saurabh Verma Seiji Yamamoto Sharadh Ramaswamy Shaun Lindsay Shaun Lindsay Sheng Feng Shenghao Lin Shengxin\u00a0Cindy Zha Shiva Shankar Shuqiang Zhang Shuqiang Zhang Sinong Wang Sneha Agarwal Soji Sajuyigbe Soumith Chintala Stephanie Max Stephen Chen Steve Kehoe Steve Satterfield Sudarshan Govindaprasad Sumit Gupta Sungmin Cho Sunny Virk Suraj Subramanian Sy Choudhury Sydney Goldman Tal Remez Tamar Glaser Tamara Best Thilo Kohler Thomas Robinson Tianhe Li Tianjun Zhang Tim Matthews Timothy Chou Tzook Shaked Varun Vontimitta Victoria Ajayi Victoria Montanez Vijai Mohan Vinay\u00a0Satish Kumar Vishal Mangla V\u00edtor Albiero Vlad Ionescu Vlad Poenaru Vlad\u00a0Tiberiu Mihailescu Vladimir Ivanov Wei Li Wenchen Wang Wenwen Jiang Wes Bouaziz Will Constable Xiaocheng Tang Xiaofang Wang Xiaojian Wu Xiaolan Wang Xide Xia Xilun Wu Xinbo Gao Yanjun Chen Ye Hu Ye Jia Ye Qi Yenda Li Yilin Zhang Ying Zhang Yossi Adi Youngjin Nam Yu Wang Yuchen Hao Yundi Qian Yuzi He Zach Rait Zachary DeVito Zef Rosnbrick Zhaoduo Wen Zhenyu Yang and Zhiwei Zhao. 2024. The Llama 3 Herd of Models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2407.21783 (2024)."},{"key":"e_1_3_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3437801.3441593"},{"key":"e_1_3_3_2_14_2","unstructured":"William Fedus Barret Zoph and Noam Shazeer. 2022. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. Journal of Machine Learning Research 23 120 (2022) 1\u201339."},{"key":"e_1_3_3_2_15_2","unstructured":"Trevor Gale Deepak Narayanan Cliff Young and Matei Zaharia. 2023. Megablocks: Efficient sparse training with mixture-of-experts. Proceedings of Machine Learning and Systems 5 (2023) 288\u2013304."},{"key":"e_1_3_3_2_16_2","unstructured":"Yanping Huang Youlong Cheng Ankur Bapna Orhan Firat Dehao Chen Mia Chen HyoukJoong Lee Jiquan Ngiam Quoc\u00a0V Le Yonghui Wu et\u00a0al. 2019. Gpipe: Efficient training of giant neural networks using pipeline parallelism. Advances in neural information processing systems 32 (2019)."},{"key":"e_1_3_3_2_17_2","unstructured":"Changho Hwang Wei Cui Yifan Xiong Ziyue Yang Ze Liu Han Hu Zilong Wang Rafael Salas Jithin Jose Prabhat Ram et\u00a0al. 2023. Tutel: Adaptive mixture-of-experts at scale. Proceedings of Machine Learning and Systems 5 (2023) 269\u2013287."},{"key":"e_1_3_3_2_18_2","unstructured":"Sam\u00a0Ade Jacobs Masahiro Tanaka Chengming Zhang Minjia Zhang Shuaiwen\u00a0Leon Song Samyam Rajbhandari and Yuxiong He. 2023. Deepspeed ulysses: System optimizations for enabling training of extreme long sequence transformer models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2309.14509 (2023)."},{"key":"e_1_3_3_2_19_2","first-page":"497","volume-title":"Proceedings of Machine Learning and Systems","volume":"2","author":"Jain Paras","year":"2020","unstructured":"Paras Jain, Ajay Jain, Aniruddha Nrusimha, Amir Gholami, Pieter Abbeel, Kurt Keutzer, Ion Stoica, and Joseph\u00a0E. Gonzalez. 2020. Checkmate: Breaking the memory wall with optimal tensor rematerialization. In Proceedings of Machine Learning and Systems , Vol.\u00a02. 497\u2013511."},{"key":"e_1_3_3_2_20_2","unstructured":"Albert\u00a0Q Jiang Alexandre Sablayrolles Antoine Roux Arthur Mensch Blanche Savary Chris Bamford Devendra\u00a0Singh Chaplot Diego de\u00a0las Casas Emma\u00a0Bou Hanna Florian Bressand et\u00a0al. 2024. Mixtral of experts. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2401.04088 (2024)."},{"key":"e_1_3_3_2_21_2","first-page":"16639","volume-title":"International Conference on Machine Learning","author":"Kim Taebum","year":"2023","unstructured":"Taebum Kim, Hyoungjoo Kim, Gyeong-In Yu, and Byung-Gon Chun. 2023. BPIPE: memory-balanced pipeline parallelism for training large language models. In International Conference on Machine Learning. 16639\u201316653."},{"key":"e_1_3_3_2_22_2","unstructured":"Marisa Kirisame Steven Lyubomirsky Altan Haan Jennifer Brennan Mike He Jared Roesch Tianqi Chen and Zachary Tatlock. 2020. Dynamic Tensor Rematerialization. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2006.09616 (2020)."},{"key":"e_1_3_3_2_23_2","unstructured":"Vijay\u00a0Anand Korthikanti Jared Casper Sangkug Lym Lawrence McAfee Michael Andersch Mohammad Shoeybi and Bryan Catanzaro. 2023. Reducing activation recomputation in large transformer models. Proceedings of Machine Learning and Systems 5 (2023) 341\u2013353."},{"key":"e_1_3_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/3600006.3613165"},{"key":"e_1_3_3_2_25_2","unstructured":"Dmitry Lepikhin HyoukJoong Lee Yuanzhong Xu Dehao Chen Orhan Firat Yanping Huang Maxim Krikun Noam Shazeer and Zhifeng Chen. 2020. Gshard: Scaling giant models with conditional computation and automatic sharding. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2006.16668 (2020)."},{"key":"e_1_3_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.acl-long.134"},{"key":"e_1_3_3_2_27_2","first-page":"6543","volume-title":"International Conference on Machine Learning","author":"Li Zhuohan","year":"2021","unstructured":"Zhuohan Li, Siyuan Zhuang, Shiyuan Guo, Danyang Zhuo, Hao Zhang, Dawn Song, and Ion Stoica. 2021. Terapipe: Token-level pipeline parallelism for training large-scale language models. In International Conference on Machine Learning. 6543\u20136552."},{"key":"e_1_3_3_2_28_2","unstructured":"Hao Liu Matei Zaharia and Pieter Abbeel. 2023. Ring attention with blockwise transformers for near-infinite context. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2310.01889 (2023)."},{"key":"e_1_3_3_2_29_2","unstructured":"Nelson\u00a0F Liu Kevin Lin John Hewitt Ashwin Paranjape Michele Bevilacqua Fabio Petroni and Percy Liang. 2023. Lost in the middle: How language models use long contexts. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2307.03172 (2023)."},{"key":"e_1_3_3_2_30_2","unstructured":"Sam McCandlish Jared Kaplan Dario Amodei and OpenAI\u00a0Dota Team. 2018. An empirical model of large-batch training. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1812.06162 (2018)."},{"key":"e_1_3_3_2_31_2","unstructured":"Maxim Milakov and Natalia Gimelshein. 2018. Online normalizer calculation for softmax. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1805.02867 (2018)."},{"key":"e_1_3_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3341301.3359646"},{"key":"e_1_3_3_2_33_2","first-page":"7937","volume-title":"International Conference on Machine Learning","author":"Narayanan Deepak","year":"2021","unstructured":"Deepak Narayanan, Amar Phanishayee, Kaiyu Shi, Xie Chen, and Matei Zaharia. 2021. Memory-efficient pipeline-parallel dnn training. In International Conference on Machine Learning. PMLR, 7937\u20137947."},{"key":"e_1_3_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/3458817.3476209"},{"key":"e_1_3_3_2_35_2","unstructured":"Adam Paszke Sam Gross Francisco Massa Adam Lerer James Bradbury Gregory Chanan Trevor Killeen Zeming Lin Natalia Gimelshein Luca Antiga et\u00a0al. 2019. Pytorch: An imperative style high-performance deep learning library. Advances in neural information processing systems 32 (2019)."},{"key":"e_1_3_3_2_36_2","first-page":"17573","volume-title":"Proceedings of the 39th International Conference on Machine Learning","volume":"162","author":"Patil Shishir\u00a0G.","year":"2022","unstructured":"Shishir\u00a0G. Patil, Paras Jain, Prabal Dutta, Ion Stoica, and Joseph\u00a0E. Gonzalez. 2022. POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging. In Proceedings of the 39th International Conference on Machine Learning , Vol.\u00a0162. 17573\u201317583."},{"key":"e_1_3_3_2_37_2","unstructured":"Reiner Pope Sholto Douglas Aakanksha Chowdhery Jacob Devlin James Bradbury Jonathan Heek Kefan Xiao Shivani Agrawal and Jeff Dean. 2023. Efficiently scaling transformer inference. Proceedings of Machine Learning and Systems 5 (2023) 606\u2013624."},{"key":"e_1_3_3_2_38_2","unstructured":"Ofir Press and Lior Wolf. 2016. Using the output embedding to improve language models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1608.05859 (2016)."},{"key":"e_1_3_3_2_39_2","unstructured":"Penghui Qi Xinyi Wan Nyamdavaa Amar and Min Lin. 2024. Pipeline Parallelism with Controllable Memory. arxiv:https:\/\/arXiv.org\/abs\/2405.15362\u00a0[cs.LG] https:\/\/arxiv.org\/abs\/2405.15362"},{"key":"e_1_3_3_2_40_2","volume-title":"The Twelfth International Conference on Learning Representations","author":"Qi Penghui","year":"2024","unstructured":"Penghui Qi, Xinyi Wan, Guangxing Huang, and Min Lin. 2024. Zero Bubble (Almost) Pipeline Parallelism. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_3_3_2_41_2","first-page":"18332","volume-title":"International conference on machine learning","author":"Rajbhandari Samyam","year":"2022","unstructured":"Samyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza\u00a0Yazdani Aminabadi, Ammar\u00a0Ahmad Awan, Jeff Rasley, and Yuxiong He. 2022. Deepspeed-moe: Advancing mixture-of-experts inference and training to power next-generation ai scale. In International conference on machine learning. PMLR, 18332\u201318346."},{"key":"e_1_3_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/SC41405.2020.00024"},{"key":"e_1_3_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3406703"},{"key":"e_1_3_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.5555\/3195638.3195660"},{"key":"e_1_3_3_2_45_2","doi-asserted-by":"crossref","unstructured":"David\u00a0E Rumelhart Geoffrey\u00a0E Hinton and Ronald\u00a0J Williams. 1986. Learning representations by back-propagating errors. nature 323 6088 (1986) 533\u2013536.","DOI":"10.1038\/323533a0"},{"key":"e_1_3_3_2_46_2","unstructured":"Mohammad Shoeybi Mostofa Patwary Raul Puri Patrick LeGresley Jared Casper and Bryan Catanzaro. 2019. Megatron-LM: Training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1909.08053 (2019)."},{"key":"e_1_3_3_2_47_2","first-page":"200","volume-title":"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","author":"Shriram SB","year":"2019","unstructured":"SB Shriram, Anshuj Garg, and Purushottam Kulkarni. 2019. Dynamic memory management for GPU-based training of deep neural networks. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 200\u2013209."},{"key":"e_1_3_3_2_48_2","unstructured":"Gemini Team Rohan Anil Sebastian Borgeaud Jean-Baptiste Alayrac Jiahui Yu Radu Soricut Johan Schalkwyk Andrew\u00a0M. Dai Anja Hauth Katie Millican David Silver Melvin Johnson Ioannis Antonoglou Julian Schrittwieser Amelia Glaese Jilin Chen Emily Pitler Timothy Lillicrap Angeliki Lazaridou Orhan Firat James Molloy Michael Isard Paul\u00a0R. Barham Tom Hennigan Benjamin Lee Fabio Viola Malcolm Reynolds Yuanzhong Xu Ryan Doherty Eli Collins Clemens Meyer Eliza Rutherford Erica Moreira Kareem Ayoub Megha Goel Jack Krawczyk Cosmo Du Ed Chi Heng-Tze Cheng Eric Ni Purvi Shah Patrick Kane Betty Chan Manaal Faruqui Aliaksei Severyn Hanzhao Lin YaGuang Li Yong Cheng Abe Ittycheriah Mahdis Mahdieh Mia Chen Pei Sun Dustin Tran Sumit Bagri Balaji Lakshminarayanan Jeremiah Liu Andras Orban Fabian G\u00fcra Hao Zhou Xinying Song Aurelien Boffy Harish Ganapathy Steven Zheng HyunJeong Choe \u00c1goston Weisz Tao Zhu Yifeng Lu Siddharth Gopal Jarrod Kahn Maciej Kula Jeff Pitman Rushin Shah Emanuel Taropa Majd\u00a0Al Merey Martin Baeuml Zhifeng Chen Laurent\u00a0El Shafey Yujing Zhang Olcan Sercinoglu George Tucker Enrique Piqueras Maxim Krikun Iain Barr Nikolay Savinov Ivo Danihelka Becca Roelofs Ana\u00efs White Anders Andreassen Tamara von Glehn Lakshman Yagati Mehran Kazemi Lucas Gonzalez Misha Khalman Jakub Sygnowski Alexandre Frechette Charlotte Smith Laura Culp Lev Proleev Yi Luan Xi Chen James Lottes Nathan Schucher Federico Lebron Alban Rrustemi Natalie Clay Phil Crone Tomas Kocisky Jeffrey Zhao Bartek Perz Dian Yu Heidi Howard Adam Bloniarz Jack\u00a0W. Rae Han Lu Laurent Sifre Marcello Maggioni Fred Alcober Dan Garrette Megan Barnes Shantanu Thakoor Jacob Austin Gabriel Barth-Maron William Wong Rishabh Joshi Rahma Chaabouni Deeni Fatiha Arun Ahuja Gaurav\u00a0Singh Tomar Evan Senter Martin Chadwick Ilya Kornakov Nithya Attaluri I\u00f1aki Iturrate Ruibo Liu Yunxuan Li Sarah Cogan Jeremy Chen Chao Jia Chenjie Gu Qiao Zhang Jordan Grimstad Ale\u00a0Jakse Hartman Xavier Garcia Thanumalayan\u00a0Sankaranarayana Pillai Jacob Devlin Michael Laskin Diego de Las\u00a0Casas Dasha Valter Connie Tao Lorenzo Blanco Adri\u00e0\u00a0Puigdom\u00e8nech Badia David Reitter Mianna Chen Jenny Brennan Clara Rivera Sergey Brin Shariq Iqbal Gabriela Surita Jane Labanowski Abhi Rao Stephanie Winkler Emilio Parisotto Yiming Gu Kate Olszewska Ravi Addanki Antoine Miech Annie Louis Denis Teplyashin Geoff Brown Elliot Catt Jan Balaguer Jackie Xiang Pidong Wang Zoe Ashwood Anton Briukhov Albert Webson Sanjay Ganapathy Smit Sanghavi Ajay Kannan Ming-Wei Chang Axel Stjerngren Josip Djolonga Yuting Sun Ankur Bapna Matthew Aitchison Pedram Pejman Henryk Michalewski Tianhe Yu Cindy Wang Juliette Love Junwhan Ahn Dawn Bloxwich Kehang Han Peter Humphreys Thibault Sellam James Bradbury Varun Godbole Sina Samangooei Bogdan Damoc Alex Kaskasoli S\u00e9bastien M.\u00a0R. Arnold Vijay Vasudevan Shubham Agrawal Jason Riesa Dmitry Lepikhin Richard Tanburn Srivatsan Srinivasan Hyeontaek Lim Sarah Hodkinson Pranav Shyam Johan Ferret Steven Hand Ankush Garg Tom\u00a0Le Paine Jian Li Yujia Li Minh Giang Alexander Neitz Zaheer Abbas Sarah York Machel Reid Elizabeth Cole Aakanksha Chowdhery Dipanjan Das Dominika Rogozi\u0144ska Vitaliy Nikolaev Pablo Sprechmann Zachary Nado Lukas Zilka Flavien Prost Luheng He Marianne Monteiro Gaurav Mishra Chris Welty Josh Newlan Dawei Jia Miltiadis Allamanis Clara\u00a0Huiyi Hu Raoul de Liedekerke Justin Gilmer Carl Saroufim Shruti Rijhwani Shaobo Hou Disha Shrivastava Anirudh Baddepudi Alex Goldin Adnan Ozturel Albin Cassirer Yunhan Xu Daniel Sohn Devendra Sachan Reinald\u00a0Kim Amplayo Craig Swanson Dessie Petrova Shashi Narayan Arthur Guez Siddhartha Brahma Jessica Landon Miteyan Patel Ruizhe Zhao Kevin Villela Luyu Wang Wenhao Jia Matthew Rahtz Mai Gim\u00e9nez Legg Yeung James Keeling Petko Georgiev Diana Mincu Boxi Wu Salem Haykal Rachel Saputro Kiran Vodrahalli James Qin Zeynep Cankara Abhanshu Sharma Nick Fernando Will Hawkins Behnam Neyshabur Solomon Kim Adrian Hutter Priyanka Agrawal Alex Castro-Ros George van\u00a0den Driessche Tao Wang Fan Yang Shuo yiin Chang Paul Komarek Ross McIlroy Mario Lu\u010di\u0107 Guodong Zhang Wael Farhan Michael Sharman Paul Natsev Paul Michel Yamini Bansal Siyuan Qiao Kris Cao Siamak Shakeri Christina Butterfield Justin Chung Paul\u00a0Kishan Rubenstein Shivani Agrawal Arthur Mensch Kedar Soparkar Karel Lenc Timothy Chung Aedan Pope Loren Maggiore Jackie Kay Priya Jhakra Shibo Wang Joshua Maynez Mary Phuong Taylor Tobin Andrea Tacchetti Maja Trebacz Kevin Robinson Yash Katariya Sebastian Riedel Paige Bailey Kefan Xiao Nimesh Ghelani Lora Aroyo Ambrose Slone Neil Houlsby Xuehan Xiong Zhen Yang Elena Gribovskaya Jonas Adler Mateo Wirth Lisa Lee Music Li Thais Kagohara Jay Pavagadhi Sophie Bridgers Anna Bortsova Sanjay Ghemawat Zafarali Ahmed Tianqi Liu Richard Powell Vijay Bolina Mariko Iinuma Polina Zablotskaia James Besley Da-Woon Chung Timothy Dozat Ramona Comanescu Xiance Si Jeremy Greer Guolong Su Martin Polacek Rapha\u00ebl\u00a0Lopez Kaufman Simon Tokumine Hexiang Hu Elena Buchatskaya Yingjie Miao Mohamed Elhawaty Aditya Siddhant Nenad Tomasev Jinwei Xing Christina Greer Helen Miller Shereen Ashraf Aurko Roy Zizhao Zhang Ada Ma Angelos Filos Milos Besta Rory Blevins Ted Klimenko Chih-Kuan Yeh Soravit Changpinyo Jiaqi Mu Oscar Chang Mantas Pajarskas Carrie Muir Vered Cohen Charline\u00a0Le Lan Krishna Haridasan Amit Marathe Steven Hansen Sholto Douglas Rajkumar Samuel Mingqiu Wang Sophia Austin Chang Lan Jiepu Jiang Justin Chiu Jaime\u00a0Alonso Lorenzo Lars\u00a0Lowe Sj\u00f6sund S\u00e9bastien Cevey Zach Gleicher Thi Avrahami Anudhyan Boral Hansa Srinivasan Vittorio Selo Rhys May Konstantinos Aisopos L\u00e9onard Hussenot Livio\u00a0Baldini Soares Kate Baumli Michael\u00a0B. Chang Adri\u00e0 Recasens Ben Caine Alexander Pritzel Filip Pavetic Fabio Pardo Anita Gergely Justin Frye Vinay Ramasesh Dan Horgan Kartikeya Badola Nora Kassner Subhrajit Roy Ethan Dyer V\u00edctor\u00a0Campos Campos Alex Tomala Yunhao Tang Dalia\u00a0El Badawy Elspeth White Basil Mustafa Oran Lang Abhishek Jindal Sharad Vikram Zhitao Gong Sergi Caelles Ross Hemsley Gregory Thornton Fangxiaoyu Feng Wojciech Stokowiec Ce Zheng Phoebe Thacker \u00c7a\u011flar \u00dcnl\u00fc Zhishuai Zhang Mohammad Saleh James Svensson Max Bileschi Piyush Patil Ankesh Anand Roman Ring Katerina Tsihlas Arpi Vezer Marco Selvi Toby Shevlane Mikel Rodriguez Tom Kwiatkowski Samira Daruki Keran Rong Allan Dafoe Nicholas FitzGerald Keren Gu-Lemberg Mina Khan Lisa\u00a0Anne Hendricks Marie Pellat Vladimir Feinberg James Cobon-Kerr Tara Sainath Maribeth Rauh Sayed\u00a0Hadi Hashemi Richard Ives Yana Hasson Eric Noland Yuan Cao Nathan Byrd Le Hou Qingze Wang Thibault Sottiaux Michela Paganini Jean-Baptiste Lespiau Alexandre Moufarek Samer Hassan Kaushik Shivakumar Joost van Amersfoort Amol Mandhane Pratik Joshi Anirudh Goyal Matthew Tung Andrew Brock Hannah Sheahan Vedant Misra Cheng Li Nemanja Raki\u0107evi\u0107 Mostafa Dehghani Fangyu Liu Sid Mittal Junhyuk Oh Seb Noury Eren Sezener Fantine Huot Matthew Lamm Nicola\u00a0De Cao Charlie Chen Sidharth Mudgal Romina Stella Kevin Brooks Gautam Vasudevan Chenxi Liu Mainak Chain Nivedita Melinkeri Aaron Cohen Venus Wang Kristie Seymore Sergey Zubkov Rahul Goel Summer Yue Sai Krishnakumaran Brian Albert Nate Hurley Motoki Sano Anhad Mohananey Jonah Joughin Egor Filonov Tomasz K\u0119pa Yomna Eldawy Jiawern Lim Rahul Rishi Shirin Badiezadegan Taylor Bos Jerry Chang Sanil Jain Sri Gayatri\u00a0Sundara Padmanabhan Subha Puttagunta Kalpesh Krishna Leslie Baker Norbert Kalb Vamsi Bedapudi Adam Kurzrok Shuntong Lei Anthony Yu Oren Litvin Xiang Zhou Zhichun Wu Sam Sobell Andrea Siciliano Alan Papir Robby Neale Jonas Bragagnolo Tej Toor Tina Chen Valentin Anklin Feiran Wang Richie Feng Milad Gholami Kevin Ling Lijuan Liu Jules Walter Hamid Moghaddam Arun Kishore Jakub Adamek Tyler Mercado Jonathan Mallinson Siddhinita Wandekar Stephen Cagle Eran Ofek Guillermo Garrido Clemens Lombriser Maksim Mukha Botu Sun Hafeezul\u00a0Rahman Mohammad Josip Matak Yadi Qian Vikas Peswani Pawel Janus Quan Yuan Leif Schelin Oana David Ankur Garg Yifan He Oleksii Duzhyi Anton \u00c4lgmyr Timoth\u00e9e Lottaz Qi Li Vikas Yadav Luyao Xu Alex Chinien Rakesh Shivanna Aleksandr Chuklin Josie Li Carrie Spadine Travis Wolfe Kareem Mohamed Subhabrata Das Zihang Dai Kyle He Daniel von Dincklage Shyam Upadhyay Akanksha Maurya Luyan Chi Sebastian Krause Khalid Salama Pam\u00a0G Rabinovitch Pavan Kumar\u00a0Reddy M Aarush Selvan Mikhail Dektiarev Golnaz Ghiasi Erdem Guven Himanshu Gupta Boyi Liu Deepak Sharma Idan\u00a0Heimlich Shtacher Shachi Paul Oscar Akerlund Fran\u00e7ois-Xavier Aubet Terry Huang Chen Zhu Eric Zhu Elico Teixeira Matthew Fritze Francesco Bertolini Liana-Eleonora Marinescu Martin B\u00f6lle Dominik Paulus Khyatti Gupta Tejasi Latkar Max Chang Jason Sanders Roopa Wilson Xuewei Wu Yi-Xuan Tan Lam\u00a0Nguyen Thiet Tulsee Doshi Sid Lall Swaroop Mishra Wanming Chen Thang Luong Seth Benjamin Jasmine Lee Ewa Andrejczuk Dominik Rabiej Vipul Ranjan Krzysztof Styrc Pengcheng Yin Jon Simon Malcolm\u00a0Rose Harriott Mudit Bansal Alexei Robsky Geoff Bacon David Greene Daniil Mirylenka Chen Zhou Obaid Sarvana Abhimanyu Goyal Samuel Andermatt Patrick Siegler Ben Horn Assaf Israel Francesco Pongetti Chih-Wei\u00a0\"Louis\" Chen Marco Selvatici Pedro Silva Kathie Wang Jackson Tolins Kelvin Guu Roey Yogev Xiaochen Cai Alessandro Agostini Maulik Shah Hung Nguyen Noah\u00a0\u00d3 Donnaile S\u00e9bastien Pereira Linda Friso Adam Stambler Adam Kurzrok Chenkai Kuang Yan Romanikhin Mark Geller ZJ Yan Kane Jang Cheng-Chun Lee Wojciech Fica Eric Malmi Qijun Tan Dan Banica Daniel Balle Ryan Pham Yanping Huang Diana Avram Hongzhi Shi Jasjot Singh Chris Hidey Niharika Ahuja Pranab Saxena Dan Dooley Srividya\u00a0Pranavi Potharaju Eileen O\u2019Neill Anand Gokulchandran Ryan Foley Kai Zhao Mike Dusenberry Yuan Liu Pulkit Mehta Ragha Kotikalapudi Chalence Safranek-Shrader Andrew Goodman Joshua Kessinger Eran Globen Prateek Kolhar Chris Gorgolewski Ali Ibrahim Yang Song Ali Eichenbaum Thomas Brovelli Sahitya Potluri Preethi Lahoti Cip Baetu Ali Ghorbani Charles Chen Andy Crawford Shalini Pal Mukund Sridhar Petru Gurita Asier Mujika Igor Petrovski Pierre-Louis Cedoz Chenmei Li Shiyuan Chen Niccol\u00f2\u00a0Dal Santo Siddharth Goyal Jitesh Punjabi Karthik Kappaganthu Chester Kwak Pallavi LV Sarmishta Velury Himadri Choudhury Jamie Hall Premal Shah Ricardo Figueira Matt Thomas Minjie Lu Ting Zhou Chintu Kumar Thomas Jurdi Sharat Chikkerur Yenai Ma Adams Yu Soo Kwak Victor \u00c4hdel Sujeevan Rajayogam Travis Choma Fei Liu Aditya Barua Colin Ji Ji\u00a0Ho Park Vincent Hellendoorn Alex Bailey Taylan Bilal Huanjie Zhou Mehrdad Khatir Charles Sutton Wojciech Rzadkowski Fiona Macintosh Konstantin Shagin Paul Medina Chen Liang Jinjing Zhou Pararth Shah Yingying Bi Attila Dankovics Shipra Banga Sabine Lehmann Marissa Bredesen Zifan Lin John\u00a0Eric Hoffmann Jonathan Lai Raynald Chung Kai Yang Nihal Balani Arthur Bra\u017einskas Andrei Sozanschi Matthew Hayes H\u00e9ctor\u00a0Fern\u00e1ndez Alcalde Peter Makarov Will Chen Antonio Stella Liselotte Snijders Michael Mandl Ante K\u00e4rrman Pawe\u0142 Nowak Xinyi Wu Alex Dyck Krishnan Vaidyanathan Raghavender R Jessica Mallet Mitch Rudominer Eric Johnston Sushil Mittal Akhil Udathu Janara Christensen Vishal Verma Zach Irving Andreas Santucci Gamaleldin Elsayed Elnaz Davoodi Marin Georgiev Ian Tenney Nan Hua Geoffrey Cideron Edouard Leurent Mahmoud Alnahlawi Ionut Georgescu Nan Wei Ivy Zheng Dylan Scandinaro Heinrich Jiang Jasper Snoek Mukund Sundararajan Xuezhi Wang Zack Ontiveros Itay Karo Jeremy Cole Vinu Rajashekhar Lara Tumeh Eyal Ben-David Rishub Jain Jonathan Uesato Romina Datta Oskar Bunyan Shimu Wu John Zhang Piotr Stanczyk Ye Zhang David Steiner Subhajit Naskar Michael Azzam Matthew Johnson Adam Paszke Chung-Cheng Chiu Jaume\u00a0Sanchez Elias Afroz Mohiuddin Faizan Muhammad Jin Miao Andrew Lee Nino Vieillard Jane Park Jiageng Zhang Jeff Stanway Drew Garmon Abhijit Karmarkar Zhe Dong Jong Lee Aviral Kumar Luowei Zhou Jonathan Evens William Isaac Geoffrey Irving Edward Loper Michael Fink Isha Arkatkar Nanxin Chen Izhak Shafran Ivan Petrychenko Zhe Chen Johnson Jia Anselm Levskaya Zhenkai Zhu Peter Grabowski Yu Mao Alberto Magni Kaisheng Yao Javier Snaider Norman Casagrande Evan Palmer Paul Suganthan Alfonso Casta\u00f1o Irene Giannoumis Wooyeol Kim Miko\u0142aj Rybi\u0144ski Ashwin Sreevatsa Jennifer Prendki David Soergel Adrian Goedeckemeyer Willi Gierke Mohsen Jafari Meenu Gaba Jeremy Wiesner Diana\u00a0Gage Wright Yawen Wei Harsha Vashisht Yana Kulizhskaya Jay Hoover Maigo Le Lu Li Chimezie Iwuanyanwu Lu Liu Kevin Ramirez Andrey Khorlin Albert Cui Tian LIN Marcus Wu Ricardo Aguilar Keith Pallo Abhishek Chakladar Ginger Perng Elena\u00a0Allica Abellan Mingyang Zhang Ishita Dasgupta Nate Kushman Ivo Penchev Alena Repina Xihui Wu Tom van\u00a0der Weide Priya Ponnapalli Caroline Kaplan Jiri Simsa Shuangfeng Li Olivier Dousse Fan Yang Jeff Piper Nathan Ie Rama Pasumarthi Nathan Lintz Anitha Vijayakumar Daniel Andor Pedro Valenzuela Minnie Lui Cosmin Paduraru Daiyi Peng Katherine Lee Shuyuan Zhang Somer Greene Duc\u00a0Dung Nguyen Paula Kurylowicz Cassidy Hardin Lucas Dixon Lili Janzer Kiam Choo Ziqiang Feng Biao Zhang Achintya Singhal Dayou Du Dan McKinnon Natasha Antropova Tolga Bolukbasi Orgad Keller David Reid Daniel Finchelstein Maria\u00a0Abi Raad Remi Crocker Peter Hawkins Robert Dadashi Colin Gaffney Ken Franko Anna Bulanova R\u00e9mi Leblond Shirley Chung Harry Askham Luis\u00a0C. Cobo Kelvin Xu Felix Fischer Jun Xu Christina Sorokin Chris Alberti Chu-Cheng Lin Colin Evans Alek Dimitriev Hannah Forbes Dylan Banarse Zora Tung Mark Omernick Colton Bishop Rachel Sterneck Rohan Jain Jiawei Xia Ehsan Amid Francesco Piccinno Xingyu Wang Praseem Banzal Daniel\u00a0J. Mankowitz Alex Polozov Victoria Krakovna Sasha Brown MohammadHossein Bateni Dennis Duan Vlad Firoiu Meghana Thotakuri Tom Natan Matthieu Geist Ser tan Girgin Hui Li Jiayu Ye Ofir Roval Reiko Tojo Michael Kwong James Lee-Thorp Christopher Yew Danila Sinopalnikov Sabela Ramos John Mellor Abhishek Sharma Kathy Wu David Miller Nicolas Sonnerat Denis Vnukov Rory Greig Jennifer Beattie Emily Caveness Libin Bai Julian Eisenschlos Alex Korchemniy Tomy Tsai Mimi Jasarevic Weize Kong Phuong Dao Zeyu Zheng Frederick Liu Fan Yang Rui Zhu Tian\u00a0Huey Teh Jason Sanmiya Evgeny Gladchenko Nejc Trdin Daniel Toyama Evan Rosen Sasan Tavakkol Linting Xue Chen Elkind Oliver Woodman John Carpenter George Papamakarios Rupert Kemp Sushant Kafle Tanya Grunina Rishika Sinha Alice Talbert Diane Wu Denese Owusu-Afriyie Cosmo Du Chloe Thornton Jordi Pont-Tuset Pradyumna Narayana Jing Li Saaber Fatehi John Wieting Omar Ajmeri Benigno Uria Yeongil Ko Laura Knight Am\u00e9lie H\u00e9liou Ning Niu Shane Gu Chenxi Pang Yeqing Li Nir Levine Ariel Stolovich Rebeca Santamaria-Fernandez Sonam Goenka Wenny Yustalim Robin Strudel Ali Elqursh Charlie Deck Hyo Lee Zonglin Li Kyle Levin Raphael Hoffmann Dan Holtmann-Rice Olivier Bachem Sho Arora Christy Koh Soheil\u00a0Hassas Yeganeh Siim P\u00f5der Mukarram Tariq Yanhua Sun Lucian Ionita Mojtaba Seyedhosseini Pouya Tafti Zhiyu Liu Anmol Gulati Jasmine Liu Xinyu Ye Bart Chrzaszcz Lily Wang Nikhil Sethi Tianrun Li Ben Brown Shreya Singh Wei Fan Aaron Parisi Joe Stanton Vinod Koverkathu Christopher\u00a0A. Choquette-Choo Yunjie Li TJ Lu Abe Ittycheriah Prakash Shroff Mani Varadarajan Sanaz Bahargam Rob Willoughby David Gaddy Guillaume Desjardins Marco Cornero Brona Robenek Bhavishya Mittal Ben Albrecht Ashish Shenoy Fedor Moiseev Henrik Jacobsson Alireza Ghaffarkhah Morgane Rivi\u00e8re Alanna Walton Cl\u00e9ment Crepy Alicia Parrish Zongwei Zhou Clement Farabet Carey Radebaugh Praveen Srinivasan Claudia van\u00a0der Salm Andreas Fidjeland Salvatore Scellato Eri Latorre-Chimoto Hanna Klimczak-Pluci\u0144ska David Bridson Dario de Cesare Tom Hudson Piermaria Mendolicchio Lexi Walker Alex Morris Matthew Mauger Alexey Guseynov Alison Reid Seth Odoom Lucia Loher Victor Cotruta Madhavi Yenugula Dominik Grewe Anastasia Petrushkina Tom Duerig Antonio Sanchez Steve Yadlowsky Amy Shen Amir Globerson Lynette Webb Sahil Dua Dong Li Surya Bhupatiraju Dan Hurt Haroon Qureshi Ananth Agarwal Tomer Shani Matan Eyal Anuj Khare Shreyas\u00a0Rammohan Belle Lei Wang Chetan Tekur Mihir\u00a0Sanjay Kale Jinliang Wei Ruoxin Sang Brennan Saeta Tyler Liechty Yi Sun Yao Zhao Stephan Lee Pandu Nayak Doug Fritz Manish\u00a0Reddy Vuyyuru John Aslanides Nidhi Vyas Martin Wicke Xiao Ma Evgenii Eltyshev Nina Martin Hardie Cate James Manyika Keyvan Amiri Yelin Kim Xi Xiong Kai Kang Florian Luisier Nilesh Tripuraneni David Madras Mandy Guo Austin Waters Oliver Wang Joshua Ainslie Jason Baldridge Han Zhang Garima Pruthi Jakob Bauer Feng Yang Riham Mansour Jason Gelman Yang Xu George Polovets Ji Liu Honglong Cai Warren Chen XiangHai Sheng Emily Xue Sherjil Ozair Christof Angermueller Xiaowei Li Anoop Sinha Weiren Wang Julia Wiesinger Emmanouil Koukoumidis Yuan Tian Anand Iyer Madhu Gurumurthy Mark Goldenson Parashar Shah MK Blake Hongkun Yu Anthony Urbanowicz Jennimaria Palomaki Chrisantha Fernando Ken Durden Harsh Mehta Nikola Momchev Elahe Rahimtoroghi Maria Georgaki Amit Raul Sebastian Ruder Morgan Redshaw Jinhyuk Lee Denny Zhou Komal Jalan Dinghua Li Blake Hechtman Parker Schuh Milad Nasr Kieran Milan Vladimir Mikulik Juliana Franco Tim Green Nam Nguyen Joe Kelley Aroma Mahendru Andrea Hu Joshua Howland Ben Vargas Jeffrey Hui Kshitij Bansal Vikram Rao Rakesh Ghiya Emma Wang Ke Ye Jean\u00a0Michel Sarr Melanie\u00a0Moranski Preston Madeleine Elish Steve Li Aakash Kaku Jigar Gupta Ice Pasupat Da-Cheng Juan Milan Someswar Tejvi M. Xinyun Chen Aida Amini Alex Fabrikant Eric Chu Xuanyi Dong Amruta Muthal Senaka Buthpitiya Sarthak Jauhari Nan Hua Urvashi Khandelwal Ayal Hitron Jie Ren Larissa Rinaldi Shahar Drath Avigail Dabush Nan-Jiang Jiang Harshal Godhia Uli Sachs Anthony Chen Yicheng Fan Hagai Taitelbaum Hila Noga Zhuyun Dai James Wang Chen Liang Jenny Hamer Chun-Sung Ferng Chenel Elkind Aviel Atias Paulina Lee V\u00edt List\u00edk Mathias Carlen Jan van\u00a0de Kerkhof Marcin Pikus Krunoslav Zaher Paul M\u00fcller Sasha Zykova Richard Stefanec Vitaly Gatsko Christoph Hirnschall Ashwin Sethi Xingyu\u00a0Federico Xu Chetan Ahuja Beth Tsai Anca Stefanoiu Bo Feng Keshav Dhandhania Manish Katyal Akshay Gupta Atharva Parulekar Divya Pitta Jing Zhao Vivaan Bhatia Yashodha Bhavnani Omar Alhadlaq Xiaolin Li Peter Danenberg Dennis Tu Alex Pine Vera Filippova Abhipso Ghosh Ben Limonchik Bhargava Urala Chaitanya\u00a0Krishna Lanka Derik Clive Yi Sun Edward Li Hao Wu Kevin Hongtongsak Ianna Li Kalind Thakkar Kuanysh Omarov Kushal Majmundar Michael Alverson Michael Kucharski Mohak Patel Mudit Jain Maksim Zabelin Paolo Pelagatti Rohan Kohli Saurabh Kumar Joseph Kim Swetha Sankar Vineet Shah Lakshmi Ramachandruni Xiangkai Zeng Ben Bariach Laura Weidinger Tu Vu Alek Andreev Antoine He Kevin Hui Sheleem Kashem Amar Subramanya Sissie Hsiao Demis Hassabis Koray Kavukcuoglu Adam Sadovsky Quoc Le Trevor Strohman Yonghui Wu Slav Petrov Jeffrey Dean and Oriol Vinyals. 2024. Gemini: A Family of Highly Capable Multimodal Models. arxiv:https:\/\/arXiv.org\/abs\/2312.11805\u00a0[cs.CL] https:\/\/arxiv.org\/abs\/2312.11805"},{"key":"e_1_3_3_2_49_2","unstructured":"Gemma Team Thomas Mesnard Cassidy Hardin Robert Dadashi Surya Bhupatiraju Shreya Pathak Laurent Sifre Morgane Rivi\u00e8re Mihir\u00a0Sanjay Kale Juliette Love et\u00a0al. 2024. Gemma: Open models based on gemini research and technology. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2403.08295 (2024)."},{"key":"e_1_3_3_2_50_2","unstructured":"Hugo Touvron Thibaut Lavril Gautier Izacard Xavier Martinet Marie-Anne Lachaux Timoth\u00e9e Lacroix Baptiste Rozi\u00e8re Naman Goyal Eric Hambro Faisal Azhar Aur\u00e9lien Rodriguez Armand Joulin Edouard Grave and Guillaume Lample. 2023. LLaMA: Open and efficient foundation language models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2302.13971 (2023)."},{"key":"e_1_3_3_2_51_2","unstructured":"Hugo Touvron Thibaut Lavril Gautier Izacard Xavier Martinet Marie-Anne Lachaux Timoth\u00e9e Lacroix Baptiste Rozi\u00e8re Naman Goyal Eric Hambro Faisal Azhar et\u00a0al. 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2302.13971 (2023)."},{"key":"e_1_3_3_2_52_2","unstructured":"Hugo Touvron Louis Martin Kevin Stone Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale Dan Bikel Lukas Blecher Cristian Canton-Ferrer Moya Chen Guillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu Brian Fuller Cynthia Gao Vedanuj Goswami Naman Goyal Anthony Hartshorn Saghar Hosseini Rui Hou Hakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa Isabel Kloumann Artem Korenev Punit\u00a0Singh Koura Marie-Anne Lachaux Thibaut Lavril Jenya Lee Diana Liskovich Yinghai Lu Yuning Mao Xavier Martinet Todor Mihaylov Pushkar Mishra Igor Molybog Yixin Nie Andrew Poulton Jeremy Reizenstein Rashi Rungta Kalyan Saladi Alan Schelten Ruan Silva Eric\u00a0Michael Smith Ranjan Subramanian Xiaoqing\u00a0Ellen Tan Binh Tang Ross Taylor Adina Williams Jian\u00a0Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang Angela Fan Melanie Kambadur Sharan Narang Aur\u00e9lien Rodriguez Robert Stojnic Sergey Edunov and Thomas Scialom. 2023. Llama\u00a02: Open foundation and fine-tuned chat models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2307.09288 (2023)."},{"key":"e_1_3_3_2_53_2","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan\u00a0N Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017)."},{"key":"e_1_3_3_2_54_2","doi-asserted-by":"publisher","unstructured":"Eric\u00a0P. Xing Qirong Ho Wei Dai Jin\u00a0Kyu Kim Jinliang Wei Seunghak Lee Xun Zheng Pengtao Xie Abhimanu Kumar and Yaoliang Yu. 2015. Petuum: A New Platform for Distributed Machine Learning on Big Data. IEEE Transactions on Big Data 1 2 (2015) 49\u201367. 10.1109\/TBDATA.2015.2472014","DOI":"10.1109\/TBDATA.2015.2472014"},{"key":"e_1_3_3_2_55_2","first-page":"545","volume-title":"2024 USENIX Annual Technical Conference (USENIX ATC 24)","author":"Yuan Tailing","year":"2024","unstructured":"Tailing Yuan, Yuliang Liu, Xucheng Ye, Shenglong Zhang, Jianchao Tan, Bin Chen, Chengru Song, and Di Zhang. 2024. Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Parallelism. In 2024 USENIX Annual Technical Conference (USENIX ATC 24). 545\u2013561."},{"key":"e_1_3_3_2_56_2","unstructured":"Barret Zoph Irwan Bello Sameer Kumar Nan Du Yanping Huang Jeff Dean Noam Shazeer and William Fedus. 2022. St-moe: Designing stable and transferable sparse expert models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2202.08906 (2022)."}],"event":{"name":"SC '25: The International Conference for High Performance Computing, Networking, Storage and Analysis","location":"St. Louis MO USA","acronym":"SC '25","sponsor":["SIGHPC ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing"]},"container-title":["Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3712285.3759855","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T18:31:30Z","timestamp":1773253890000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3712285.3759855"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,15]]},"references-count":55,"alternative-id":["10.1145\/3712285.3759855","10.1145\/3712285"],"URL":"https:\/\/doi.org\/10.1145\/3712285.3759855","relation":{},"subject":[],"published":{"date-parts":[[2025,11,15]]},"assertion":[{"value":"2025-11-15","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}