{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,24]],"date-time":"2025-09-24T09:18:51Z","timestamp":1758705531540,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":23,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,3,18]],"date-time":"2022-03-18T00:00:00Z","timestamp":1647561600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"NSF (National Science Foundation)","doi-asserted-by":"publisher","award":["2018016, 2119677, 2118737"],"award-info":[{"award-number":["2018016, 2119677, 2118737"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"NIH (National Institutes of Health)","doi-asserted-by":"publisher","award":["R41EB032722"],"award-info":[{"award-number":["R41EB032722"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,3,19]]},"DOI":"10.1145\/3497776.3517766","type":"proceedings-article","created":{"date-parts":[[2022,3,18]],"date-time":"2022-03-18T17:28:13Z","timestamp":1647624493000},"page":"104-116","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Training of deep learning pipelines on memory-constrained GPUs via segmented fused-tiled execution"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7787-6460","authenticated-orcid":false,"given":"Yufan","family":"Xu","sequence":"first","affiliation":[{"name":"University of Utah, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3294-1481","authenticated-orcid":false,"given":"Saurabh","family":"Raje","sequence":"additional","affiliation":[{"name":"University of Utah, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4556-4937","authenticated-orcid":false,"given":"Atanas","family":"Rountev","sequence":"additional","affiliation":[{"name":"Ohio State University, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8672-4071","authenticated-orcid":false,"given":"Gerald","family":"Sabin","sequence":"additional","affiliation":[{"name":"RNET Technologies, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4062-0293","authenticated-orcid":false,"given":"Aravind","family":"Sukumaran-Rajam","sequence":"additional","affiliation":[{"name":"Washington State University, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4737-2034","authenticated-orcid":false,"given":"P.","family":"Sadayappan","sequence":"additional","affiliation":[{"name":"University of Utah, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,3,18]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Mart\u00edn Abadi Ashish Agarwal Paul Barham Eugene Brevdo Zhifeng Chen Craig Citro Greg S. Corrado Andy Davis Jeffrey Dean Matthieu Devin Sanjay Ghemawat Ian Goodfellow Andrew Harp Geoffrey Irving Michael Isard Yangqing Jia Rafal Jozefowicz Lukasz Kaiser Manjunath Kudlur Josh Levenberg Dandelion Man\u00e9 Rajat Monga Sherry Moore Derek Murray Chris Olah Mike Schuster Jonathon Shlens Benoit Steiner Ilya Sutskever Kunal Talwar Paul Tucker Vincent Vanhoucke Vijay Vasudevan Fernanda Vi\u00e9gas Oriol Vinyals Pete Warden Martin Wattenberg Martin Wicke Yuan Yu and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https:\/\/www.tensorflow.org\/ Software available from tensorflow.org  Mart\u00edn Abadi Ashish Agarwal Paul Barham Eugene Brevdo Zhifeng Chen Craig Citro Greg S. Corrado Andy Davis Jeffrey Dean Matthieu Devin Sanjay Ghemawat Ian Goodfellow Andrew Harp Geoffrey Irving Michael Isard Yangqing Jia Rafal Jozefowicz Lukasz Kaiser Manjunath Kudlur Josh Levenberg Dandelion Man\u00e9 Rajat Monga Sherry Moore Derek Murray Chris Olah Mike Schuster Jonathon Shlens Benoit Steiner Ilya Sutskever Kunal Talwar Paul Tucker Vincent Vanhoucke Vijay Vasudevan Fernanda Vi\u00e9gas Oriol Vinyals Pete Warden Martin Wattenberg Martin Wicke Yuan Yu and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https:\/\/www.tensorflow.org\/ Software available from tensorflow.org"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2016.7783725"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-04747-4_23"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"crossref","unstructured":"Matthias Boehm Berthold Reinwald Dylan Hutchison Alexandre V Evfimievski and Prithviraj Sen. 2018. On optimizing operator fusion plans for large-scale machine learning in systemml. arXiv preprint arXiv:1801.00829.  Matthias Boehm Berthold Reinwald Dylan Hutchison Alexandre V Evfimievski and Prithviraj Sen. 2018. On optimizing operator fusion plans for large-scale machine learning in systemml. arXiv preprint arXiv:1801.00829.","DOI":"10.14778\/3229863.3229865"},{"key":"e_1_3_2_1_5_1","unstructured":"Chi-Chung Chen Chia-Lin Yang and Hsiang-Yun Cheng. 2018. Efficient and robust parallel dnn training through model parallelism on multi-gpu platform. arXiv preprint arXiv:1809.02839.  Chi-Chung Chen Chia-Lin Yang and Hsiang-Yun Cheng. 2018. Efficient and robust parallel dnn training through model parallelism on multi-gpu platform. arXiv preprint arXiv:1809.02839."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-021-21467-y"},{"key":"e_1_3_2_1_7_1","unstructured":"Tianqi Chen Bing Xu Chiyuan Zhang and Carlos Guestrin. 2016. Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174.  Tianqi Chen Bing Xu Chiyuan Zhang and Carlos Guestrin. 2016. Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174."},{"key":"e_1_3_2_1_8_1","unstructured":"Torch Contributors. 2018. Periodic checkpointing in pytorch.. https:\/\/pytorch.org\/docs\/stable\/checkpoint.html  Torch Contributors. 2018. Periodic checkpointing in pytorch.. https:\/\/pytorch.org\/docs\/stable\/checkpoint.html"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2019.00031"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/2834892.2834897"},{"key":"e_1_3_2_1_11_1","unstructured":"Russell J Hewett and Thomas J Grady II. 2020. A linear algebraic approach to model parallelism in deep learning. arXiv preprint arXiv:2006.03108.  Russell J Hewett and Thomas J Grady II. 2020. A linear algebraic approach to model parallelism in deep learning. arXiv preprint arXiv:2006.03108."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC41405.2020.00049"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CCGrid.2015.105"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41374-021-00579-5"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3453483.3454083"},{"volume-title":"PyTorch: An Imperative Style","author":"Paszke Adam","key":"e_1_3_2_1_16_1","unstructured":"Adam Paszke , Sam Gross , Francisco Massa , Adam Lerer , James Bradbury , Gregory Chanan , Trevor Killeen , Zeming Lin , Natalia Gimelshein , Luca Antiga , Alban Desmaison , Andreas Kopf , Edward Yang , Zachary DeVito , Martin Raison , Alykhan Tejani , Sasank Chilamkurthy , Benoit Steiner , Lu Fang , Junjie Bai , and Soumith Chintala . 2019. PyTorch: An Imperative Style , High-Performance Deep Learning Library . In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d' Alch\u00e9-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024\u20138035. https:\/\/doi.org\/10.5555\/3454287.3455008 10.5555\/3454287.3455008 Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d' Alch\u00e9-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024\u20138035. https:\/\/doi.org\/10.5555\/3454287.3455008"},{"key":"e_1_3_2_1_17_1","unstructured":"Joseph Redmon. 2013\u20132016. Darknet: Open Source Neural Networks in C. http:\/\/pjreddie.com\/darknet\/  Joseph Redmon. 2013\u20132016. Darknet: Open Source Neural Networks in C. http:\/\/pjreddie.com\/darknet\/"},{"key":"e_1_3_2_1_18_1","volume-title":"Esteban Meneses, Leonardo Bautista-Gomez, and Rosa M. Badia.","author":"Rojas Elvis","year":"2020","unstructured":"Elvis Rojas , Albert Njoroge Kahira , Esteban Meneses, Leonardo Bautista-Gomez, and Rosa M. Badia. 2020 . A Study of Checkpointing in Large Scale Training of Deep Neural Networks. CoRR , abs\/2012.00825 (2020), arXiv:2012.00825. arxiv:2012.00825 Elvis Rojas, Albert Njoroge Kahira, Esteban Meneses, Leonardo Bautista-Gomez, and Rosa M. Badia. 2020. A Study of Checkpointing in Large Scale Training of Deep Neural Networks. CoRR, abs\/2012.00825 (2020), arXiv:2012.00825. arxiv:2012.00825"},{"key":"e_1_3_2_1_19_1","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.  Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556."},{"volume-title":"XLA: Optimizing Compiler for TensorFlow. https:\/\/www.tensorflow.org\/xla","year":"2019","key":"e_1_3_2_1_20_1","unstructured":"TensorFlow. 2019 . XLA: Optimizing Compiler for TensorFlow. https:\/\/www.tensorflow.org\/xla TensorFlow. 2019. XLA: Optimizing Compiler for TensorFlow. https:\/\/www.tensorflow.org\/xla"},{"key":"e_1_3_2_1_21_1","volume-title":"Aleksandr Drozd, Jens Domke, Lingqi Zhang, Ryousei Takano, and Satoshi Matsuoka.","author":"Wahib Mohamed","year":"2020","unstructured":"Mohamed Wahib , Haoyu Zhang , Truong Thao Nguyen , Aleksandr Drozd, Jens Domke, Lingqi Zhang, Ryousei Takano, and Satoshi Matsuoka. 2020 . Scaling Distributed Deep Learning Workloads beyond the Memory Capacity with KARMA. CoRR , abs\/2008.11421 (2020), https:\/\/doi.org\/10.5555\/3433701.3433726 arXiv:2008.11421. 10.5555\/3433701.3433726 Mohamed Wahib, Haoyu Zhang, Truong Thao Nguyen, Aleksandr Drozd, Jens Domke, Lingqi Zhang, Ryousei Takano, and Satoshi Matsuoka. 2020. Scaling Distributed Deep Learning Workloads beyond the Memory Capacity with KARMA. CoRR, abs\/2008.11421 (2020), https:\/\/doi.org\/10.5555\/3433701.3433726 arXiv:2008.11421."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2018.2858384"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-59719-1_37"}],"event":{"name":"CC '22: 31st ACM SIGPLAN International Conference on Compiler Construction","sponsor":["SIGPLAN ACM Special Interest Group on Programming Languages"],"location":"Seoul South Korea","acronym":"CC '22"},"container-title":["Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3497776.3517766","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3497776.3517766","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3497776.3517766","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:49:26Z","timestamp":1750193366000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3497776.3517766"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,18]]},"references-count":23,"alternative-id":["10.1145\/3497776.3517766","10.1145\/3497776"],"URL":"https:\/\/doi.org\/10.1145\/3497776.3517766","relation":{},"subject":[],"published":{"date-parts":[[2022,3,18]]},"assertion":[{"value":"2022-03-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}