{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T18:59:46Z","timestamp":1772909986269,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":57,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T00:00:00Z","timestamp":1674777600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,1,27]]},"DOI":"10.1145\/3575693.3575736","type":"proceedings-article","created":{"date-parts":[[2023,1,30]],"date-time":"2023-01-30T22:56:55Z","timestamp":1675119415000},"page":"207-221","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":26,"title":["DeepUM: Tensor Migration and Prefetching in Unified Memory"],"prefix":"10.1145","author":[{"given":"Jaehoon","family":"Jung","sequence":"first","affiliation":[{"name":"Moreh, South Korea"}]},{"given":"Jinpyo","family":"Kim","sequence":"additional","affiliation":[{"name":"Seoul National University, South Korea"}]},{"given":"Jaejin","family":"Lee","sequence":"additional","affiliation":[{"name":"Seoul National University, South Korea"}]}],"member":"320","published-online":{"date-parts":[[2023,1,30]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Mart\u00edn Abadi Ashish Agarwal Paul Barham Eugene Brevdo Zhifeng Chen Craig Citro Greg S. Corrado Andy Davis Jeffrey Dean Matthieu Devin Sanjay Ghemawat Ian Goodfellow Andrew Harp Geoffrey Irving Michael Isard Yangqing Jia Rafal Jozefowicz Lukasz Kaiser Manjunath Kudlur Josh Levenberg Dandelion Man\u00e9 Rajat Monga Sherry Moore Derek Murray Chris Olah Mike Schuster Jonathon Shlens Benoit Steiner Ilya Sutskever Kunal Talwar Paul Tucker Vincent Vanhoucke Vijay Vasudevan Fernanda Vi\u00e9gas Oriol Vinyals Pete Warden Martin Wattenberg Martin Wicke Yuan Yu and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https:\/\/www.tensorflow.org\/ Software available from tensorflow.org \t\t\t\t  Mart\u00edn Abadi Ashish Agarwal Paul Barham Eugene Brevdo Zhifeng Chen Craig Citro Greg S. Corrado Andy Davis Jeffrey Dean Matthieu Devin Sanjay Ghemawat Ian Goodfellow Andrew Harp Geoffrey Irving Michael Isard Yangqing Jia Rafal Jozefowicz Lukasz Kaiser Manjunath Kudlur Josh Levenberg Dandelion Man\u00e9 Rajat Monga Sherry Moore Derek Murray Chris Olah Mike Schuster Jonathon Shlens Benoit Steiner Ilya Sutskever Kunal Talwar Paul Tucker Vincent Vanhoucke Vijay Vasudevan Fernanda Vi\u00e9gas Oriol Vinyals Pete Warden Martin Wattenberg Martin Wicke Yuan Yu and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https:\/\/www.tensorflow.org\/ Software available from tensorflow.org"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.1996.501191"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2001.937443"},{"key":"e_1_3_2_1_4_1","volume-title":"Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes. In 2017 50th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). 136\u2013150","author":"Ausavarungnirun Rachata","year":"2017","unstructured":"Rachata Ausavarungnirun , Joshua Landgraf , Vance Miller , Saugata Ghose , Jayneel Gandhi , Christopher J. Rossbach , and Onur Mutlu . 2017 . Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes. In 2017 50th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). 136\u2013150 . Rachata Ausavarungnirun, Joshua Landgraf, Vance Miller, Saugata Ghose, Jayneel Gandhi, Christopher J. Rossbach, and Onur Mutlu. 2017. Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes. In 2017 50th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). 136\u2013150."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/HiPC.2018.00024"},{"key":"e_1_3_2_1_6_1","volume-title":"FlashNeuron: SSD-Enabled Large-Batch Training of Very Deep Neural Networks. In 19th USENIX Conference on File and Storage Technologies (FAST 21)","author":"Bae Jonghyun","unstructured":"Jonghyun Bae , Jongsung Lee , Yunho Jin , Sam Son , Shine Kim , Hakbeom Jang , Tae Jun Ham , and Jae W. Lee . 2021 . FlashNeuron: SSD-Enabled Large-Batch Training of Very Deep Neural Networks. In 19th USENIX Conference on File and Storage Technologies (FAST 21) . USENIX Association, 387\u2013401. isbn:978-1-939133-20-5 https:\/\/www.usenix.org\/conference\/fast21\/presentation\/bae Jonghyun Bae, Jongsung Lee, Yunho Jin, Sam Son, Shine Kim, Hakbeom Jang, Tae Jun Ham, and Jae W. Lee. 2021. FlashNeuron: SSD-Enabled Large-Batch Training of Very Deep Neural Networks. In 19th USENIX Conference on File and Storage Technologies (FAST 21). USENIX Association, 387\u2013401. isbn:978-1-939133-20-5 https:\/\/www.usenix.org\/conference\/fast21\/presentation\/bae"},{"key":"e_1_3_2_1_7_1","unstructured":"Tom B. Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell Sandhini Agarwal Ariel Herbert-Voss Gretchen Krueger T. J. Henighan Rewon Child Aditya Ramesh Daniel M. Ziegler Jeff Wu Clemens Winter Christopher Hesse Mark Chen Eric Sigler Mateusz Litwin Scott Gray Benjamin Chess Jack Clark Christopher Berner Sam McCandlish Alec Radford Ilya Sutskever and Dario Amodei. 2020. Language models are few-shot learners. arXiv preprint arXiv:2005.14165. \t\t\t\t  Tom B. Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell Sandhini Agarwal Ariel Herbert-Voss Gretchen Krueger T. J. Henighan Rewon Child Aditya Ramesh Daniel M. Ziegler Jeff Wu Clemens Winter Christopher Hesse Mark Chen Eric Sigler Mateusz Litwin Scott Gray Benjamin Chess Jack Clark Christopher Berner Sam McCandlish Alec Radford Ilya Sutskever and Dario Amodei. 2020. Language models are few-shot learners. arXiv preprint arXiv:2005.14165."},{"key":"e_1_3_2_1_8_1","unstructured":"Tianqi Chen Bing Xu Chiyuan Zhang and Carlos Guestrin. 2016. Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174. \t\t\t\t  Tianqi Chen Bing Xu Chiyuan Zhang and Carlos Guestrin. 2016. Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/143365.143486"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA45697.2020.00080"},{"key":"e_1_3_2_1_11_1","volume-title":"Proceedings of the 28th International Conference on Neural Information Processing Systems -","volume":"2","author":"Courbariaux Matthieu","year":"2015","unstructured":"Matthieu Courbariaux , Yoshua Bengio , and Jean-Pierre David . 2015 . BinaryConnect: Training Deep Neural Networks with Binary Weights during Propagations . In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2 (NIPS\u201915). MIT Press, Cambridge, MA, USA. 3123\u20133131. Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. BinaryConnect: Training Deep Neural Networks with Binary Weights during Propagations. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2 (NIPS\u201915). MIT Press, Cambridge, MA, USA. 3123\u20133131."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/n19-1423"},{"key":"e_1_3_2_1_13_1","volume-title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv, abs\/1810.04805","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2019 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv, abs\/1810.04805 (2019). Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv, abs\/1810.04805 (2019)."},{"key":"e_1_3_2_1_14_1","unstructured":"Priya Goyal Piotr Doll\u00e1r Ross Girshick Pieter Noordhuis Lukasz Wesolowski Aapo Kyrola Andrew Tulloch Yangqing Jia and Kaiming He. 2017. Accurate large minibatch SGD: Training ImageNet in 1 hour. arXiv preprint arXiv:1706.02677. \t\t\t\t  Priya Goyal Piotr Doll\u00e1r Ross Girshick Pieter Noordhuis Lukasz Wesolowski Aapo Kyrola Andrew Tulloch Yangqing Jia and Kaiming He. 2017. Accurate large minibatch SGD: Training ImageNet in 1 hour. arXiv preprint arXiv:1706.02677."},{"key":"e_1_3_2_1_15_1","unstructured":"Khronos group. 2022. OpenCL Overview - The Khronos Group Inc. https:\/\/www.khronos.org\/opencl\/ \t\t\t\t  Khronos group. 2022. OpenCL Overview - The Khronos Group Inc. https:\/\/www.khronos.org\/opencl\/"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/3157382.3157559"},{"key":"e_1_3_2_1_17_1","volume-title":"Proceedings of the 32nd International Conference on International Conference on Machine Learning -","volume":"37","author":"Gupta Suyog","year":"2015","unstructured":"Suyog Gupta , Ankur Agrawal , Kailash Gopalakrishnan , and Pritish Narayanan . 2015 . Deep Learning with Limited Numerical Precision . In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37 (ICML\u201915). JMLR.org, 1737\u20131746. Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep Learning with Limited Numerical Precision. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37 (ICML\u201915). JMLR.org, 1737\u20131746."},{"key":"e_1_3_2_1_18_1","volume-title":"Proceedings of the 28th International Conference on Neural Information Processing Systems -","volume":"1","author":"Han Song","unstructured":"Song Han , Jeff Pool , John Tran , and William J. Dally . 2015. Learning Both Weights and Connections for Efficient Neural Networks . In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1 (NIPS\u201915). MIT Press, Cambridge, MA, USA. 1135\u20131143. Song Han, Jeff Pool, John Tran, and William J. Dally. 2015. Learning Both Weights and Connections for Efficient Neural Networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1 (NIPS\u201915). MIT Press, Cambridge, MA, USA. 1135\u20131143."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378465"},{"key":"e_1_3_2_1_22_1","unstructured":"HIOKI. 2022. AC\/DC POWER HiTESTER 3334. https:\/\/www.hioki.com\/global\/products\/power-meters\/single-phase-ac-dc\/id_6045 \t\t\t\t  HIOKI. 2022. AC\/DC POWER HiTESTER 3334. https:\/\/www.hioki.com\/global\/products\/power-meters\/single-phase-ac-dc\/id_6045"},{"key":"e_1_3_2_1_23_1","volume-title":"MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. ArXiv, abs\/1704.04861","author":"Howard Andrew G.","year":"2017","unstructured":"Andrew G. Howard , Menglong Zhu , Bo Chen , Dmitry Kalenichenko , Weijun Wang , Tobias Weyand , Marco Andreetto , and Hartwig Adam . 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. ArXiv, abs\/1704.04861 ( 2017 ). Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. ArXiv, abs\/1704.04861 (2017)."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378530"},{"key":"e_1_3_2_1_25_1","unstructured":"Intel. 2022. oneAPI Programming Model. https:\/\/www.oneapi.com\/ \t\t\t\t  Intel. 2022. oneAPI Programming Model. https:\/\/www.oneapi.com\/"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00070"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.752653"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2925426.2926294"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3332466.3374531"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3332466.3374531"},{"key":"e_1_3_2_1_31_1","unstructured":"Khronos. 2021. The OpenCL Specification. https:\/\/www.khronos.org\/registry\/OpenCL\/specs\/opencl-2.1.pdf#page=174 \t\t\t\t  Khronos. 2021. The OpenCL Specification. https:\/\/www.khronos.org\/registry\/OpenCL\/specs\/opencl-2.1.pdf#page=174"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378529"},{"key":"e_1_3_2_1_33_1","volume-title":"TFLMS: Large Model Support in TensorFlow by Graph Rewriting. ArXiv, abs\/1807.02037","author":"Le Tung D.","year":"2018","unstructured":"Tung D. Le , Haruki Imai , Yasushi Negishi , and Kiyokuni Kawachiya . 2018 . TFLMS: Large Model Support in TensorFlow by Graph Rewriting. ArXiv, abs\/1807.02037 (2018). Tung D. Le, Haruki Imai, Yasushi Negishi, and Kiyokuni Kawachiya. 2018. TFLMS: Large Model Support in TensorFlow by Graph Rewriting. ArXiv, abs\/1807.02037 (2018)."},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3297858.3304044"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2018.00035"},{"key":"e_1_3_2_1_36_1","volume-title":"Proceedings of Machine Learning and Systems, I. Dhillon, D. Papailiopoulos, and V. Sze (Eds.). 2, 336\u2013349","author":"Mattson Peter","year":"2020","unstructured":"Peter Mattson , Christine Cheng , Gregory Diamos , Cody Coleman , Paulius Micikevicius , David Patterson , Hanlin Tang , Gu-Yeon Wei , Peter Bailis , Victor Bittorf , David Brooks , Dehao Chen , Debo Dutta , Udit Gupta , Kim Hazelwood , Andy Hock , Xinyuan Huang , Daniel Kang , David Kanter , Naveen Kumar , Jeffery Liao , Deepak Narayanan , Tayo Oguntebi , Gennady Pekhimenko , Lillian Pentecost , Vijay Janapa Reddi , Taylor Robie , Tom St John , Carole-Jean Wu , Lingjie Xu , Cliff Young , and Matei Zaharia . 2020 . MLPerf Training Benchmark . In Proceedings of Machine Learning and Systems, I. Dhillon, D. Papailiopoulos, and V. Sze (Eds.). 2, 336\u2013349 . https:\/\/proceedings.mlsys.org\/paper\/2020\/file\/02522a2b2726fb0a03bb19f2d8d9524d-Paper.pdf Peter Mattson, Christine Cheng, Gregory Diamos, Cody Coleman, Paulius Micikevicius, David Patterson, Hanlin Tang, Gu-Yeon Wei, Peter Bailis, Victor Bittorf, David Brooks, Dehao Chen, Debo Dutta, Udit Gupta, Kim Hazelwood, Andy Hock, Xinyuan Huang, Daniel Kang, David Kanter, Naveen Kumar, Jeffery Liao, Deepak Narayanan, Tayo Oguntebi, Gennady Pekhimenko, Lillian Pentecost, Vijay Janapa Reddi, Taylor Robie, Tom St John, Carole-Jean Wu, Lingjie Xu, Cliff Young, and Matei Zaharia. 2020. MLPerf Training Benchmark. In Proceedings of Machine Learning and Systems, I. Dhillon, D. Papailiopoulos, and V. Sze (Eds.). 2, 336\u2013349. https:\/\/proceedings.mlsys.org\/paper\/2020\/file\/02522a2b2726fb0a03bb19f2d8d9524d-Paper.pdf"},{"key":"e_1_3_2_1_37_1","unstructured":"Maxim Naumov Dheevatsa Mudigere Hao-Jun Michael Shi Jianyu Huang Narayanan Sundaraman Jongsoo Park Xiaodong Wang Udit Gupta Carole-Jean Wu Alisson G. Azzolini Dmytro Dzhulgakov Andrey Mallevich Ilia Cherniavskii Yinghai Lu Raghuraman Krishnamoorthi Ansha Yu Volodymyr Kondratenko Stephanie Pereira Xianjie Chen Wenlin Chen Vijay Rao Bill Jia Liang Xiong and Mikhail Smelyanskiy. 2019. Deep Learning Recommendation Model for Personalization and Recommendation Systems. ArXiv abs\/1906.00091 (2019). \t\t\t\t  Maxim Naumov Dheevatsa Mudigere Hao-Jun Michael Shi Jianyu Huang Narayanan Sundaraman Jongsoo Park Xiaodong Wang Udit Gupta Carole-Jean Wu Alisson G. Azzolini Dmytro Dzhulgakov Andrey Mallevich Ilia Cherniavskii Yinghai Lu Raghuraman Krishnamoorthi Ansha Yu Volodymyr Kondratenko Stephanie Pereira Xianjie Chen Wenlin Chen Vijay Rao Bill Jia Liang Xiong and Mikhail Smelyanskiy. 2019. Deep Learning Recommendation Model for Personalization and Recommendation Systems. ArXiv abs\/1906.00091 (2019)."},{"key":"e_1_3_2_1_38_1","unstructured":"NVIDIA. 2021. Unified Memory Programming. https:\/\/docs.nvidia.com\/cuda\/cuda-c-programming-guide\/index.html#um-unified-memory-programming-hd \t\t\t\t  NVIDIA. 2021. Unified Memory Programming. https:\/\/docs.nvidia.com\/cuda\/cuda-c-programming-guide\/index.html#um-unified-memory-programming-hd"},{"key":"e_1_3_2_1_39_1","unstructured":"NVIDIA. 2022. Artificial Intelligence Architecture | NVIDIA Volta. https:\/\/www.nvidia.com\/en-us\/data-center\/volta-gpu-architecture\/ \t\t\t\t  NVIDIA. 2022. Artificial Intelligence Architecture | NVIDIA Volta. https:\/\/www.nvidia.com\/en-us\/data-center\/volta-gpu-architecture\/"},{"key":"e_1_3_2_1_40_1","unstructured":"NVIDIA. 2022. CUDA Parallel Computing Platform. https:\/\/developer.nvidia.com\/cuda-zone \t\t\t\t  NVIDIA. 2022. CUDA Parallel Computing Platform. https:\/\/developer.nvidia.com\/cuda-zone"},{"key":"e_1_3_2_1_41_1","unstructured":"NVIDIA. 2022. NVIDIA H100 Tensor Core GPU Architecture. https:\/\/nvdam.widen.net\/s\/9bz6dw7dqr\/gtc22-whitepaper-hopper \t\t\t\t  NVIDIA. 2022. NVIDIA H100 Tensor Core GPU Architecture. https:\/\/nvdam.widen.net\/s\/9bz6dw7dqr\/gtc22-whitepaper-hopper"},{"key":"e_1_3_2_1_42_1","unstructured":"NVIDIA. 2022. Pascal GPU Architecture. https:\/\/www.nvidia.com\/en-us\/data-center\/pascal-gpu-architecture\/ \t\t\t\t  NVIDIA. 2022. Pascal GPU Architecture. https:\/\/www.nvidia.com\/en-us\/data-center\/pascal-gpu-architecture\/"},{"key":"e_1_3_2_1_43_1","unstructured":"OpenACC-standard.org. 2022. OpenAcc. https:\/\/www.openacc.org\/ \t\t\t\t  OpenACC-standard.org. 2022. OpenAcc. https:\/\/www.openacc.org\/"},{"key":"e_1_3_2_1_44_1","volume-title":"PyTorch: An Imperative Style","author":"Paszke Adam","unstructured":"Adam Paszke , Sam Gross , Francisco Massa , Adam Lerer , James Bradbury , Gregory Chanan , Trevor Killeen , Zeming Lin , Natalia Gimelshein , Luca Antiga , Alban Desmaison , Andreas Kopf , Edward Yang , Zachary DeVito , Martin Raison , Alykhan Tejani , Sasank Chilamkurthy , Benoit Steiner , Lu Fang , Junjie Bai , and Soumith Chintala . 2019. PyTorch: An Imperative Style , High-Performance Deep Learning Library . In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d' Alch\u00e9-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024\u20138035. http:\/\/papers.neurips.cc\/paper\/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d' Alch\u00e9-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024\u20138035. http:\/\/papers.neurips.cc\/paper\/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf"},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378505"},{"key":"e_1_3_2_1_46_1","unstructured":"PyTorch. 2022. PyTorch Examples. https:\/\/github.com\/pytorch\/examples \t\t\t\t  PyTorch. 2022. PyTorch Examples. https:\/\/github.com\/pytorch\/examples"},{"key":"e_1_3_2_1_47_1","volume-title":"4th International Conference on Learning Representations, ICLR","author":"Radford Alec","year":"2016","unstructured":"Alec Radford , Luke Metz , and Soumith Chintala . 2016. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks . In 4th International Conference on Learning Representations, ICLR 2016 , San Juan, Puerto Rico , May 2-4, 2016, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds .). arxiv:1511.06434 Alec Radford, Luke Metz, and Soumith Chintala. 2016. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). arxiv:1511.06434"},{"key":"e_1_3_2_1_48_1","unstructured":"Alec Radford Jeff Wu Rewon Child David Luan Dario Amodei and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners. \t\t\t\t  Alec Radford Jeff Wu Rewon Child David Luan Dario Amodei and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners."},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3406703"},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA51647.2021.00057"},{"key":"e_1_3_2_1_51_1","volume-title":"Keckler","author":"Rhu Minsoo","year":"2016","unstructured":"Minsoo Rhu , Natalia Gimelshein , Jason Clemons , Arslan Zulfiqar , and Stephen W . Keckler . 2016 . vDNN: Virtualized Deep Neural Networks for Scalable, Memory- Efficient Neural Network Design. In The 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO-49). IEEE Press , Article 18, 13 pages. Minsoo Rhu, Natalia Gimelshein, Jason Clemons, Arslan Zulfiqar, and Stephen W. Keckler. 2016. vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design. In The 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO-49). IEEE Press, Article 18, 13 pages."},{"key":"e_1_3_2_1_52_1","unstructured":"Nikolay Sakharnykh. 2017. Maximizing Unified Memory Performance in CUDA. https:\/\/developer.nvidia.com\/blog\/maximizing-unified-memory-performance-cuda\/ \t\t\t\t  Nikolay Sakharnykh. 2017. Maximizing Unified Memory Performance in CUDA. https:\/\/developer.nvidia.com\/blog\/maximizing-unified-memory-performance-cuda\/"},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2002.1003576"},{"key":"e_1_3_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358307"},{"key":"e_1_3_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/3178487.3178491"},{"key":"e_1_3_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"e_1_3_2_1_57_1","unstructured":"Yang You Jing Li Sashank Reddi Jonathan Hseu Sanjiv Kumar Srinadh Bhojanapalli Xiaodan Song James Demmel Kurt Keutzer and Cho-Jui Hsieh. 2019. Large batch optimization for deep learning: Training BERT in 76 minutes. arXiv preprint arXiv:1904.00962. \t\t\t\t  Yang You Jing Li Sashank Reddi Jonathan Hseu Sanjiv Kumar Srinadh Bhojanapalli Xiaodan Song James Demmel Kurt Keutzer and Cho-Jui Hsieh. 2019. Large batch optimization for deep learning: Training BERT in 76 minutes. arXiv preprint arXiv:1904.00962."}],"event":{"name":"ASPLOS '23: 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2","location":"Vancouver BC Canada","acronym":"ASPLOS '23","sponsor":["SIGARCH ACM Special Interest Group on Computer Architecture","SIGOPS ACM Special Interest Group on Operating Systems","SIGPLAN ACM Special Interest Group on Programming Languages"]},"container-title":["Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3575693.3575736","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3575693.3575736","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:51:20Z","timestamp":1750182680000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3575693.3575736"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,27]]},"references-count":57,"alternative-id":["10.1145\/3575693.3575736","10.1145\/3575693"],"URL":"https:\/\/doi.org\/10.1145\/3575693.3575736","relation":{},"subject":[],"published":{"date-parts":[[2023,1,27]]},"assertion":[{"value":"2023-01-30","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}