{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:12:41Z","timestamp":1750219961874,"version":"3.41.0"},"reference-count":28,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2022,12,16]],"date-time":"2022-12-16T00:00:00Z","timestamp":1671148800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/"}],"funder":[{"name":"Education and Research"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2023,3,31]]},"abstract":"<jats:p>\n            Memory efficiency is crucial in training deep learning networks on resource-restricted devices. During backpropagation, forward tensors are used to calculate gradients. Despite the option of keeping those dependencies in memory until they are reused in backpropagation, some forward tensors can be discarded and recomputed later from saved tensors, so-called\n            <jats:italic>checkpoints<\/jats:italic>\n            . This allows, in particular, for resource-constrained heterogeneous environments to make use of all available compute devices. Unfortunately, the definition of these checkpoints is a non-trivial problem and poses a challenge to the programmer\u2014improper or excessive recomputations negate the benefit of checkpointing.\n          <\/jats:p>\n          <jats:p\/>\n          <jats:p>\n            \u00a0\u00a0 In this article, we present XEngine, an approach that schedules network operators to heterogeneous devices in low memory environments by determining checkpoints and recomputations of tensors. Our approach selects suitable resources per timestep and operator and optimizes the end-to-end time for neural networks taking the memory limitation of each device into account. For this, we formulate a mixed-integer quadratic program (MIQP) to schedule operators of deep learning networks on heterogeneous systems. We compare our MIQP solver XEngine against Checkmate\u00a0[\n            <jats:xref ref-type=\"bibr\">12<\/jats:xref>\n            ], a mixed-integer linear programming (MILP) approach that solves recomputation on a single device. Our solver finds solutions that are up to 22.5% faster than the fastest Checkmate schedule in which the network is computed exclusively on a single device. We also find valid schedules for networks making use of both central processing units and graphics processing units if memory limitations do not allow scheduling exclusively to the graphics processing unit.\n          <\/jats:p>","DOI":"10.1145\/3568956","type":"journal-article","created":{"date-parts":[[2022,10,20]],"date-time":"2022-10-20T11:50:34Z","timestamp":1666266634000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["XEngine: Optimal Tensor Rematerialization for Neural Networks in Heterogeneous Environments"],"prefix":"10.1145","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8598-3410","authenticated-orcid":false,"given":"Manuela","family":"Schuler","sequence":"first","affiliation":[{"name":"Deutsches Forschungszentrum f\u00fcr K\u00fcnstliche Intelligenz (DFKI), Saarland Informatics Campus, Saarbr\u00fccken,  Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9979-7579","authenticated-orcid":false,"given":"Richard","family":"Membarth","sequence":"additional","affiliation":[{"name":"Technische Hochschule Ingolstadt, Research Institute AImotion Bavaria, Ingolstadt, Germany and Deutsches Forschungszentrum f\u00fcr K\u00fcnstliche Intelligenz (DFKI), Saarland Informatics Campus, Saarbr\u00fccken, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2189-2429","authenticated-orcid":false,"given":"Philipp","family":"Slusallek","sequence":"additional","affiliation":[{"name":"Deutsches Forschungszentrum f\u00fcr K\u00fcnstliche Intelligenz (DFKI), Saarland Informatics Campus, Saarbr\u00fccken, Germany and Saarland University, Saarland Informatics Campus, Saarbr\u00fccken, Germany"}]}],"member":"320","published-online":{"date-parts":[[2022,12,16]]},"reference":[{"key":"e_1_3_2_2_2","first-page":"1","article-title":"Optimal checkpointing for heterogeneous chains: How to train deep neural networks with limited memory","volume":"1911","author":"Beaumont Olivier","year":"2019","unstructured":"Olivier Beaumont, Lionel Eyraud-Dubois, Julien Herrmann, Alexis Joly, and Alena Shilova. 2019. Optimal checkpointing for heterogeneous chains: How to train deep neural networks with limited memory. CoRR abs\/1911.13214 (2019), 1\u201327. arxiv:1911.13214","journal-title":"CoRR"},{"key":"e_1_3_2_3_2","first-page":"23844","volume-title":"Proceeding of the Advances in Neural Information Processing Systems (NeurIPS)","author":"Beaumont Olivier","year":"2021","unstructured":"Olivier Beaumont, Lionel Eyraud-Dubois, and Alena Shilova. 2021. Efficient combination of rematerialization and offloading for training DNNs. In Proceeding of the Advances in Neural Information Processing Systems (NeurIPS), Marc.Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.). Curran Associates, Inc., 23844\u201323857."},{"key":"e_1_3_2_4_2","first-page":"1","article-title":"Training deep nets with sublinear memory cost","volume":"1604","author":"Chen Tianqi","year":"2016","unstructured":"Tianqi Chen, Bing Xu, Chiyuan Zhang, and Carlos Guestrin. 2016. Training deep nets with sublinear memory cost. CoRR abs\/1604.06174 (2016), 1\u201312. arxiv:1604.06174","journal-title":"CoRR"},{"key":"e_1_3_2_5_2","first-page":"2211","volume-title":"Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS\u201917)","author":"Gomez Aidan N.","year":"2017","unstructured":"Aidan N. Gomez, Mengye Ren, Raquel Urtasun, and Roger B. Grosse. 2017. The reversible residual network: Backpropagation without storing activations. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS\u201917), Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). Curran Associates Inc., 2211\u20132221."},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1080\/10556789208805505"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/347837.347846"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.5555\/3157382.3157559"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/2450153.2450158"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/3524059.3532394"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378530"},{"key":"e_1_3_2_13_2","first-page":"497","volume-title":"Proceedings of Machine Learning and Systems (MLSys\u201920)","author":"Jain Paras","year":"2020","unstructured":"Paras Jain, Ajay Jain, Aniruddha Nrusimha, Amir Gholami, Pieter Abbeel, Joseph Gonzalez, Kurt Keutzer, and Ion Stoica. 2020. Checkmate: Breaking the memory wall with optimal tensor rematerialization. In Proceedings of Machine Learning and Systems (MLSys\u201920), Inderjit S. Dhillon, Dimitris S. Papailiopoulos, and Vivienne Sze (Eds.). mlsys.org, 497\u2013511."},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMC.2019.2941492"},{"key":"e_1_3_2_15_2","first-page":"1","volume-title":"9th International Conference on Learning Representations (ICLR\u201921)","author":"Kirisame Marisa","year":"2021","unstructured":"Marisa Kirisame, Steven Lyubomirsky, Altan Haan, Jennifer Brennan, Mike He, Jared Roesch, Tianqi Chen, and Zachary Tatlock. 2021. Dynamic tensor rematerialization. In 9th International Conference on Learning Representations (ICLR\u201921). OpenReview.net, 1\u201331."},{"key":"e_1_3_2_16_2","first-page":"15146","volume-title":"Proceedings of the Advances in Neural Information Processing Systems (NeurIPS\u201919)","author":"Kumar Ravi","year":"2019","unstructured":"Ravi Kumar, Manish Purohit, Zoya Svitkina, Erik Vee, and Joshua Wang. 2019. Efficient rematerialization for deep networks. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS\u201919), Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d\u2019Alch\u00e9-Buc, Emily B. Fox, and Roman Garnett (Eds.). Curran Associates, Inc., 15146\u201315155."},{"key":"e_1_3_2_17_2","first-page":"1161","volume-title":"Proceedings of the Advances in Neural Information Processing Systems (NeurIPS\u201919)","author":"Kusumoto Mitsuru","year":"2019","unstructured":"Mitsuru Kusumoto, Takuya Inoue, Gentaro Watanabe, Takuya Akiba, and Masanori Koyama. 2019. A graph theoretic framework of recomputation algorithms for memory-efficient backpropagation. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS\u201919), Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d\u2019Alch\u00e9-Buc, Emily B. Fox, and Roman Garnett (Eds.). Curran Associates, Inc., 1161\u20131170."},{"key":"e_1_3_2_18_2","unstructured":"Jianjin Liao Mingzhen Li Qingxiao Sun Jiwei Hao Fengwei Yu Shengdong Chen Ye Tao Zicheng Zhang Hailong Yang Zhongzhi Luan and Depei Qian. 2022. Mimose: An Input-Aware Checkpointing Planner for Efficient Training on GPU. 1\u201313. CoRR abs\/2209.02478 (2022) arXiv:2209.02478."},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMC.2015.2504091"},{"key":"e_1_3_2_20_2","first-page":"17573","volume-title":"Proceedings of the 39th International Conference on Machine Learning (ICML\u201922)","author":"Patil Shishir G.","year":"2022","unstructured":"Shishir G. Patil, Paras Jain, Prabal Dutta, Ion Stoica, and Joseph E. Gonzalez. 2022. POET: Training neural networks on tiny devices with integrated rematerialization and paging. In Proceedings of the 39th International Conference on Machine Learning (ICML\u201922), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesv\u00e1ri, Gang Niu, and Sivan Sabato (Eds.). PMLR, 17573\u201317583."},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378505"},{"key":"e_1_3_2_22_2","unstructured":"Jie Ren Samyam Rajbhandari Reza Yazdani Aminabadi Olatunji Ruwase Shuangyan Yang Minjia Zhang Dong Li and Yuxiong He. 2021. ZeRO-Offload: Democratizing Billion-Scale Model Training. 1\u201314. CoRR abs\/2101.06840 (2021) arXiv:2101.06840."},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"e_1_3_2_24_2","first-page":"1","volume-title":"Proceedings of the 3rd International Conference on Learning Representations (ICLR\u201915)","author":"Simonyan Karen","year":"2015","unstructured":"Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations (ICLR\u201915), Yoshua Bengio and Yann LeCun (Eds.). 1\u201314."},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1080\/10556788.2018.1459621"},{"key":"e_1_3_2_26_2","unstructured":"Yu Tang Chenyu Wang Yufan Zhang Yuliang Liu Xingcheng Zhang Linbo Qiao Zhiquan Lai and Dongsheng Li. 2022. DELTA: Dynamically Optimizing GPU Memory beyond Tensor Recomputation. 1\u201312. CoRR abs\/2203.15980 (2022) arXiv:2203.15980."},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/INFCOMW.2018.8406881"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/3200691.3178491"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS53621.2022.00101"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3568956","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3568956","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:49:10Z","timestamp":1750182550000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3568956"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,16]]},"references-count":28,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,3,31]]}},"alternative-id":["10.1145\/3568956"],"URL":"https:\/\/doi.org\/10.1145\/3568956","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2022,12,16]]},"assertion":[{"value":"2022-05-30","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-10-05","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-12-16","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}