{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T05:05:26Z","timestamp":1750309526067,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":22,"publisher":"ACM","license":[{"start":{"date-parts":[[2025,2,19]],"date-time":"2025-02-19T00:00:00Z","timestamp":1739923200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Science and Technology Council","award":["111-2221-E-007-064-MY3"],"award-info":[{"award-number":["111-2221-E-007-064-MY3"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,2,19]]},"DOI":"10.1145\/3712031.3712325","type":"proceedings-article","created":{"date-parts":[[2025,3,27]],"date-time":"2025-03-27T12:28:34Z","timestamp":1743078514000},"page":"142-152","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Libra: A Python-Level Tensor Re-Materialization Strategy for Reducing Deep Learning GPU Memory Usage"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-9129-5999","authenticated-orcid":false,"given":"Ling-Sung","family":"Wang","sequence":"first","affiliation":[{"name":"National Tsing Hua University, Hsinchu, Taiwan"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-0531-3054","authenticated-orcid":false,"given":"Sao-Hsuan","family":"Lin","sequence":"additional","affiliation":[{"name":"National Tsing Hua University, Hsinchu, Taiwan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7851-1140","authenticated-orcid":false,"given":"Jerry","family":"Chou","sequence":"additional","affiliation":[{"name":"National Tsing Hua University, Hsinchu, Taiwan"}]}],"member":"320","published-online":{"date-parts":[[2025,3,27]]},"reference":[{"key":"e_1_3_3_1_2_2","first-page":"265","volume-title":"12th USENIX symposium on operating systems design and implementation (OSDI 16)","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et\u00a0al. 2016. { TensorFlow} : a system for { Large-Scale} machine learning. In 12th USENIX symposium on operating systems design and implementation (OSDI 16). 265\u2013283."},{"key":"e_1_3_3_1_3_2","series-title":"Proceedings of Machine Learning Research","first-page":"822","volume-title":"Proceedings of the 36th International Conference on Machine Learning","volume":"97","author":"Brutzkus Alon","year":"2019","unstructured":"Alon Brutzkus and Amir Globerson. 2019. Why do Larger Models Generalize Better? A Theoretical Perspective via the XOR Problem. In Proceedings of the 36th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol.\u00a097), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 822\u2013830. https:\/\/proceedings.mlr.press\/v97\/brutzkus19b.html"},{"key":"e_1_3_3_1_4_2","first-page":"1","volume-title":"Proceedings of the 36th ACM International Conference on Supercomputing","author":"Hu Zhongzhe","year":"2022","unstructured":"Zhongzhe Hu, Junmin Xiao, Zheye Deng, Mingyi Li, Kewei Zhang, Xiaoyang Zhang, Ke Meng, Ninghui Sun, and Guangming Tan. 2022. MegTaiChi: Dynamic tensor-based memory management optimization for DNN training. In Proceedings of the 36th ACM International Conference on Supercomputing. 1\u201313."},{"key":"e_1_3_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378530"},{"key":"e_1_3_3_1_6_2","unstructured":"Yanping Huang Youlong Cheng Ankur Bapna Orhan Firat Dehao Chen Mia Chen HyoukJoong Lee Jiquan Ngiam Quoc\u00a0V Le Yonghui Wu et\u00a0al. 2019. Gpipe: Efficient training of giant neural networks using pipeline parallelism. Advances in neural information processing systems 32 (2019)."},{"key":"e_1_3_3_1_7_2","doi-asserted-by":"crossref","unstructured":"Sagar Imambi Kolla\u00a0Bhanu Prakash and GR Kanagachidambaresan. 2021. PyTorch. Programming with TensorFlow: Solution for Edge Computing Applications (2021) 87\u2013104.","DOI":"10.1007\/978-3-030-57077-4_10"},{"key":"e_1_3_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00070"},{"key":"e_1_3_3_1_9_2","unstructured":"Paras Jain Ajay Jain Aniruddha Nrusimha Amir Gholami Pieter Abbeel Joseph Gonzalez Kurt Keutzer and Ion Stoica. 2020. Checkmate: Breaking the memory wall with optimal tensor rematerialization. Proceedings of Machine Learning and Systems 2 (2020) 497\u2013511."},{"key":"e_1_3_3_1_10_2","unstructured":"Jianjin Liao Mingzhen Li Qingxiao Sun Jiwei Hao Fengwei Yu Shengdong Chen Ye Tao Zicheng Zhang Hailong Yang Zhongzhi Luan et\u00a0al. 2022. Mimose: An input-aware checkpointing planner for efficient training on gpu. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2209.02478 (2022)."},{"key":"e_1_3_3_1_11_2","first-page":"2615","volume-title":"2022 IEEE 38th International Conference on Data Engineering (ICDE)","author":"Nie Xiaonan","year":"2022","unstructured":"Xiaonan Nie, Xupeng Miao, Zhi Yang, and Bin Cui. 2022. Tsplit: Fine-grained gpu memory management for efficient dnn training via tensor splitting. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 2615\u20132628."},{"key":"e_1_3_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378505"},{"key":"e_1_3_3_1_13_2","unstructured":"Jie Ren Samyam Rajbhandari Reza\u00a0Yazdani Aminabadi Olatunji Ruwase Shuangyan Yang Minjia Zhang Dong Li and Yuxiong He. 2021. ZeRO-Offload: Democratizing Billion-Scale Model Training. arxiv:https:\/\/arXiv.org\/abs\/2101.06840\u00a0[cs.DC]"},{"key":"e_1_3_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.5555\/3195638.3195660"},{"key":"e_1_3_3_1_15_2","unstructured":"Aashaka Shah Chao-Yuan Wu Jayashree Mohan Vijay Chidambaram and Philipp Kr\u00e4henb\u00fchl. 2021. Memory Optimization for Deep Networks. arxiv:https:\/\/arXiv.org\/abs\/2010.14501\u00a0[cs.LG]"},{"key":"e_1_3_3_1_16_2","unstructured":"Mohammad Shoeybi Mostofa Patwary Raul Puri Patrick LeGresley Jared Casper and Bryan Catanzaro. 2019. Megatron-lm: Training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1909.08053 (2019)."},{"key":"e_1_3_3_1_17_2","unstructured":"Benoit Steiner Mostafa Elhoushi Jacob Kahn and James Hegarty. 2022. Olla: Optimizing the lifetime and location of arrays to reduce the memory usage of neural networks. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2210.12924 (2022)."},{"key":"e_1_3_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_3_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00010"},{"key":"e_1_3_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/3178487.3178491"},{"key":"e_1_3_3_1_21_2","unstructured":"Chiyuan Zhang Samy Bengio Moritz Hardt Benjamin Recht and Oriol Vinyals. 2017. Understanding deep learning requires rethinking generalization. arxiv:https:\/\/arXiv.org\/abs\/1611.03530\u00a0[cs.LG]"},{"key":"e_1_3_3_1_22_2","first-page":"42018","volume-title":"International Conference on Machine Learning","author":"Zhao Xunyi","year":"2023","unstructured":"Xunyi Zhao, Th\u00e9otime Le\u00a0Hellard, Lionel Eyraud-Dubois, Julia Gusak, and Olivier Beaumont. 2023. Rockmate: an efficient, fast, automatic and generic tool for re-materialization in pytorch. In International Conference on Machine Learning. PMLR, 42018\u201342045."},{"key":"e_1_3_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00204"}],"event":{"name":"HPCASIA '25: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","acronym":"HPCASIA '25","location":"Hsinchu Taiwan"},"container-title":["Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3712031.3712325","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3712031.3712325","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:18:10Z","timestamp":1750295890000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3712031.3712325"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,19]]},"references-count":22,"alternative-id":["10.1145\/3712031.3712325","10.1145\/3712031"],"URL":"https:\/\/doi.org\/10.1145\/3712031.3712325","relation":{},"subject":[],"published":{"date-parts":[[2025,2,19]]},"assertion":[{"value":"2025-03-27","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}