{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,6]],"date-time":"2026-06-06T01:13:58Z","timestamp":1780708438972,"version":"3.54.1"},"publisher-location":"New York, NY, USA","reference-count":74,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T00:00:00Z","timestamp":1674777600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"IBM-ILLINOIS C3SR","award":[""],"award-info":[{"award-number":[""]}]},{"name":"IBM-ILLINOIS Discovery Accelerator Institute","award":[""],"award-info":[{"award-number":[""]}]},{"DOI":"10.13039\/100007065","name":"Nvidia","doi-asserted-by":"publisher","award":[""],"award-info":[{"award-number":[""]}],"id":[{"id":"10.13039\/100007065","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,1,27]]},"DOI":"10.1145\/3575693.3575748","type":"proceedings-article","created":{"date-parts":[[2023,1,30]],"date-time":"2023-01-30T22:56:55Z","timestamp":1675119415000},"page":"325-339","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":44,"title":["GPU-Initiated On-Demand High-Throughput Storage Access in the BaM System Architecture"],"prefix":"10.1145","author":[{"given":"Zaid","family":"Qureshi","sequence":"first","affiliation":[{"name":"University of Illinois at Urbana-Champaign, USA \/ NVIDIA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Vikram Sharma","family":"Mailthody","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign, USA \/ NVIDIA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Isaac","family":"Gelado","sequence":"additional","affiliation":[{"name":"NVIDIA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Seungwon","family":"Min","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign, USA \/ NVIDIA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Amna","family":"Masood","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign, USA \/ AMD, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jeongmin","family":"Park","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jinjun","family":"Xiong","sequence":"additional","affiliation":[{"name":"University at Buffalo, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"C. J.","family":"Newburn","sequence":"additional","affiliation":[{"name":"NVIDIA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Dmitri","family":"Vainbrand","sequence":"additional","affiliation":[{"name":"NVIDIA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"I-Hsin","family":"Chung","sequence":"additional","affiliation":[{"name":"IBM Research, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Michael","family":"Garland","sequence":"additional","affiliation":[{"name":"NVIDIA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"William","family":"Dally","sequence":"additional","affiliation":[{"name":"NVIDIA, USA \/ Stanford University, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Wen-mei","family":"Hwu","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign, USA \/ NVIDIA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2023,1,30]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA\u201921)","author":"Acun B.","unstructured":"B. Acun , M. Murphy , X. Wang , J. Nie , C. Wu , and K. Hazelwood . 2021. Understanding Training Efficiency of Deep Learning Recommendation Models at Scale . In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA\u201921) . IEEE Computer Society, Los Alamitos, CA, USA. 802\u2013814. B. Acun, M. Murphy, X. Wang, J. Nie, C. Wu, and K. Hazelwood. 2021. Understanding Training Efficiency of Deep Learning Recommendation Models at Scale. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA\u201921). IEEE Computer Society, Los Alamitos, CA, USA. 802\u2013814."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2830772.2830794"},{"key":"e_1_3_2_1_3_1","unstructured":"AMD. 2021. RADEON-SSG API Manual. https:\/\/www.amd.com\/system\/files\/documents\/ssg-api-user-manual.pdf \t\t\t\t  AMD. 2021. RADEON-SSG API Manual. https:\/\/www.amd.com\/system\/files\/documents\/ssg-api-user-manual.pdf"},{"key":"e_1_3_2_1_4_1","unstructured":"Jens Axboe. 2020. Efficient IO with io_uring. \t\t\t\t  Jens Axboe. 2020. Efficient IO with io_uring."},{"key":"e_1_3_2_1_5_1","unstructured":"2022. BaM GitHub Repository. https:\/\/github.com\/ZaidQureshi\/bam \t\t\t\t  2022. BaM GitHub Repository. https:\/\/github.com\/ZaidQureshi\/bam"},{"key":"e_1_3_2_1_6_1","volume-title":"Scalable and Fast Lazy Persistency on GPUs. In 2020 IEEE International Symposium on Workload Characterization (IISWC). 252\u2013263","author":"Baskara Yudha Ardhi Wiratama","year":"2020","unstructured":"Ardhi Wiratama Baskara Yudha , Keiji Kimura , Huiyang Zhou , and Yan Solihin . 2020 . Scalable and Fast Lazy Persistency on GPUs. In 2020 IEEE International Symposium on Workload Characterization (IISWC). 252\u2013263 . Ardhi Wiratama Baskara Yudha, Keiji Kimura, Huiyang Zhou, and Yan Solihin. 2020. Scalable and Fast Lazy Persistency on GPUs. In 2020 IEEE International Symposium on Workload Characterization (IISWC). 252\u2013263."},{"key":"e_1_3_2_1_7_1","volume-title":"Patterson","author":"Beamer Scott","year":"2015","unstructured":"Scott Beamer , Krste Asanovic , and David A . Patterson . 2015 . The GAP Benchmark Suite. CoRR , abs\/1508.03619 (2015), arxiv:1508.03619. arxiv:1508.03619 Scott Beamer, Krste Asanovic, and David A. Patterson. 2015. The GAP Benchmark Suite. CoRR, abs\/1508.03619 (2015), arxiv:1508.03619. arxiv:1508.03619"},{"key":"e_1_3_2_1_8_1","first-page":"38","volume-title":"SPIN: Seamless Operating System Integration of Peer-to-Peer DMA Between SSDs and GPUs. In 2017 USENIX Annual Technical Conference (USENIX ATC 17)","author":"Bergman Shai","year":"2017","unstructured":"Shai Bergman , Tanya Brokhman , Tzachi Cohen , and Mark Silberstein . 2017 . SPIN: Seamless Operating System Integration of Peer-to-Peer DMA Between SSDs and GPUs. In 2017 USENIX Annual Technical Conference (USENIX ATC 17) . USENIX Association, Santa Clara, CA. 167\u2013179. isbn:978-1-93 1971- 38 - 36 Shai Bergman, Tanya Brokhman, Tzachi Cohen, and Mark Silberstein. 2017. SPIN: Seamless Operating System Integration of Peer-to-Peer DMA Between SSDs and GPUs. In 2017 USENIX Annual Technical Conference (USENIX ATC 17). USENIX Association, Santa Clara, CA. 167\u2013179. isbn:978-1-931971-38-6"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1002\/spe.587"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1963405.1963488"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/988672.988752"},{"key":"e_1_3_2_1_12_1","volume-title":"GAIA: An OS Page Cache for Heterogeneous Systems. In 2019 USENIX Annual Technical Conference (USENIX ATC 19)","author":"Brokhman Tanya","year":"2019","unstructured":"Tanya Brokhman , Pavel Lifshits , and Mark Silberstein . 2019 . GAIA: An OS Page Cache for Heterogeneous Systems. In 2019 USENIX Annual Technical Conference (USENIX ATC 19) . USENIX Association, Renton, WA. 661\u2013674. Tanya Brokhman, Pavel Lifshits, and Mark Silberstein. 2019. GAIA: An OS Page Cache for Heterogeneous Systems. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). USENIX Association, Renton, WA. 661\u2013674."},{"key":"e_1_3_2_1_13_1","unstructured":"2022. CDW. https:\/\/www.cdw.com \t\t\t\t  2022. CDW. https:\/\/www.cdw.com"},{"key":"e_1_3_2_1_14_1","volume-title":"A Paging Experiment With The Multics System","author":"Corbato F. J.","unstructured":"F. J. Corbato . 1968. A Paging Experiment With The Multics System . Technical Report , Massachusetts Institute of Technology , Cambridge, Project MAC. F. J. Corbato. 1968. A Paging Experiment With The Multics System. Technical Report, Massachusetts Institute of Technology, Cambridge, Project MAC."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2931088.2931091"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2049662.2049663"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3341301.3359637"},{"key":"e_1_3_2_1_18_1","volume-title":"Big Data Analytics Market | 2021 Size","author":"Insights Fortune Business","year":"2028","unstructured":"Fortune Business Insights . 2021. Big Data Analytics Market | 2021 Size , Growth Insights, Share , COVID-19 Impact, Emerging Technologies, Key Players, Competitive Landscape, Regional and Global Forecast to 2028 . https:\/\/tinyurl.com\/2p8a8sbx Fortune Business Insights. 2021. Big Data Analytics Market | 2021 Size, Growth Insights, Share, COVID-19 Impact, Emerging Technologies, Key Players, Competitive Landscape, Regional and Global Forecast to 2028. https:\/\/tinyurl.com\/2p8a8sbx"},{"key":"e_1_3_2_1_19_1","volume-title":"Proceedings of the Fifteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XV). Association for Computing Machinery","author":"Gelado Isaac","unstructured":"Isaac Gelado , John E. Stone , Javier Cabezas , Sanjay Patel , Nacho Navarro , and Wen-mei W. Hwu . 2010. An Asymmetric Distributed Shared Memory Model for Heterogeneous Parallel Systems . In Proceedings of the Fifteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XV). Association for Computing Machinery , New York, NY, USA. 347\u2013358. Isaac Gelado, John E. Stone, Javier Cabezas, Sanjay Patel, Nacho Navarro, and Wen-mei W. Hwu. 2010. An Asymmetric Distributed Shared Memory Model for Heterogeneous Parallel Systems. In Proceedings of the Fifteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XV). Association for Computing Machinery, New York, NY, USA. 347\u2013358."},{"key":"e_1_3_2_1_20_1","volume-title":"The Architectural Implications of Facebook\u2019s DNN-Based Personalized Recommendation. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). 488\u2013501","author":"Gupta Udit","year":"2020","unstructured":"Udit Gupta , Carole-Jean Wu , Xiaodong Wang , Maxim Naumov , Brandon Reagen , David Brooks , Bradford Cottel , Kim Hazelwood , Mark Hempstead , Bill Jia , Hsien-Hsin S. Lee , Andrey Malevich , Dheevatsa Mudigere , Mikhail Smelyanskiy , Liang Xiong , and Xuan Zhang . 2020 . The Architectural Implications of Facebook\u2019s DNN-Based Personalized Recommendation. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). 488\u2013501 . Udit Gupta, Carole-Jean Wu, Xiaodong Wang, Maxim Naumov, Brandon Reagen, David Brooks, Bradford Cottel, Kim Hazelwood, Mark Hempstead, Bill Jia, Hsien-Hsin S. Lee, Andrey Malevich, Dheevatsa Mudigere, Mikhail Smelyanskiy, Liang Xiong, and Xuan Zhang. 2020. The Architectural Implications of Facebook\u2019s DNN-Based Personalized Recommendation. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). 488\u2013501."},{"key":"e_1_3_2_1_21_1","unstructured":"2022. H3 Platform. https:\/\/www.h3platform.com \t\t\t\t  2022. H3 Platform. https:\/\/www.h3platform.com"},{"key":"e_1_3_2_1_22_1","unstructured":"Weihua Hu Matthias Fey Hongyu Ren Maho Nakata Yuxiao Dong and Jure Leskovec. 2021. OGB-LSC: A Large-Scale Challenge for Machine Learning on Graphs. arXiv preprint arXiv:2103.09430. \t\t\t\t  Weihua Hu Matthias Fey Hongyu Ren Maho Nakata Yuxiao Dong and Jure Leskovec. 2021. OGB-LSC: A Large-Scale Challenge for Machine Learning on Graphs. arXiv preprint arXiv:2103.09430."},{"key":"e_1_3_2_1_23_1","unstructured":"Intel. 2021. Intel\u00ae Optane\u2122 Technology. https:\/\/www.intel.com\/content\/www\/us\/en\/architecture-and-technology\/intel-optane-technology.html \t\t\t\t  Intel. 2021. Intel\u00ae Optane\u2122 Technology. https:\/\/www.intel.com\/content\/www\/us\/en\/architecture-and-technology\/intel-optane-technology.html"},{"key":"e_1_3_2_1_24_1","volume-title":"Proceedings of the Tenth International Symposium on Code Generation and Optimization (CGO \u201912)","author":"Jablin Thomas B.","unstructured":"Thomas B. Jablin , James A. Jablin , Prakash Prabhu , Feng Liu , and David I. August . 2012. Dynamically Managed Data for CPU-GPU Architectures . In Proceedings of the Tenth International Symposium on Code Generation and Optimization (CGO \u201912) . Association for Computing Machinery, New York, NY, USA. 165\u2013174. Thomas B. Jablin, James A. Jablin, Prakash Prabhu, Feng Liu, and David I. August. 2012. Dynamically Managed Data for CPU-GPU Architectures. In Proceedings of the Tenth International Symposium on Code Generation and Optimization (CGO \u201912). Association for Computing Machinery, New York, NY, USA. 165\u2013174."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3341301.3359631"},{"key":"e_1_3_2_1_26_1","volume-title":"16th USENIX Conference on File and Storage Technologies (FAST 18)","author":"Kannan Sudarsun","year":"2018","unstructured":"Sudarsun Kannan , Andrea C. Arpaci-Dusseau , Remzi H. Arpaci-Dusseau , Yuangang Wang , Jun Xu , and Gopinath Palani . 2018 . Designing a True Direct-Access File System with DevFS . In 16th USENIX Conference on File and Storage Technologies (FAST 18) . Oakland, CA. Sudarsun Kannan, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Yuangang Wang, Jun Xu, and Gopinath Palani. 2018. Designing a True Direct-Access File System with DevFS. In 16th USENIX Conference on File and Storage Technologies (FAST 18). Oakland, CA."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3230543.3230572"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3477132.3483565"},{"key":"e_1_3_2_1_29_1","volume-title":"GPUnet: Networking Abstractions for GPU Programs. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14)","author":"Kim Sangman","year":"2014","unstructured":"Sangman Kim , Seonggu Huh , Xinya Zhang , Yige Hu , Amir Wated , Emmett Witchel , and Mark Silberstein . 2014 . GPUnet: Networking Abstractions for GPU Programs. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14) . USENIX Association, Broomfield, CO. 201\u2013216. Sangman Kim, Seonggu Huh, Xinya Zhang, Yige Hu, Amir Wated, Emmett Witchel, and Mark Silberstein. 2014. GPUnet: Networking Abstractions for GPU Programs. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). USENIX Association, Broomfield, CO. 201\u2013216."},{"key":"e_1_3_2_1_30_1","volume-title":"Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations (ICLR\u201917)","author":"Thomas","unstructured":"Thomas N. Kipf and Max Welling. 2017 . Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations (ICLR\u201917) . Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations (ICLR\u201917)."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3132747.3132770"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.14778\/3007328.3007331"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2008.31"},{"key":"e_1_3_2_1_34_1","volume-title":"Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (SOSP \u201921)","author":"Liu Jing","unstructured":"Jing Liu , Anthony Rebello , Yifan Dai , Chenhao Ye , Sudarsun Kannan , Andrea C. Arpaci-Dusseau , and Remzi H . Arpaci-Dusseau. 2021. Scale and Performance in a Filesystem Semi-Microkernel . In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (SOSP \u201921) . Association for Computing Machinery, New York, NY, USA. 819\u2013835. Jing Liu, Anthony Rebello, Yifan Dai, Chenhao Ye, Sudarsun Kannan, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2021. Scale and Performance in a Filesystem Semi-Microkernel. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (SOSP \u201921). Association for Computing Machinery, New York, NY, USA. 819\u2013835."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2016.49"},{"key":"e_1_3_2_1_36_1","volume-title":"Application Support And Adaptation For High-throughput Accelerator Orchestrated Fine-grain Storage Access. Ph. D. Dissertation","author":"Mailthody Vikram Sharma","unstructured":"Vikram Sharma Mailthody . 2022. Application Support And Adaptation For High-throughput Accelerator Orchestrated Fine-grain Storage Access. Ph. D. Dissertation . University of Illinois Urbana-Champaign. Vikram Sharma Mailthody. 2022. Application Support And Adaptation For High-throughput Accelerator Orchestrated Fine-grain Storage Access. Ph. D. Dissertation. University of Illinois Urbana-Champaign."},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2018.00035"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3462545"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.14778\/3425879.3425883"},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"crossref","unstructured":"Dheevatsa Mudigere Yuchen Hao Jianyu Huang Zhihao Jia Andrew Tulloch Srinivas Sridharan Xing Liu Mustafa Ozdal Jade Nie Jongsoo Park Liang Luo Jie Amy Yang Leon Gao Dmytro Ivchenko Aarti Basant Yuxi Hu Jiyan Yang Ehsan K. Ardestani Xiaodong Wang Rakesh Komuravelli Ching-Hsiang Chu Serhat Yilmaz Huayu Li Jiyuan Qian Zhuobo Feng Yinbin Ma Junjie Yang Ellie Wen Hong Li Lin Yang Chonglin Sun Whitney Zhao Dimitry Melts Krishna Dhulipala KR Kishore Tyler Graf Assaf Eisenman Kiran Kumar Matam Adi Gangidi Guoqiang Jerry Chen Manoj Krishnan Avinash Nayak Krishnakumar Nair Bharath Muthiah Mahmoud khorashadi Pallab Bhattacharya Petr Lapukhov Maxim Naumov Ajit Mathews Lin Qiao Mikhail Smelyanskiy Bill Jia and Vijay Rao. 2021. Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models. \t\t\t\t  Dheevatsa Mudigere Yuchen Hao Jianyu Huang Zhihao Jia Andrew Tulloch Srinivas Sridharan Xing Liu Mustafa Ozdal Jade Nie Jongsoo Park Liang Luo Jie Amy Yang Leon Gao Dmytro Ivchenko Aarti Basant Yuxi Hu Jiyan Yang Ehsan K. Ardestani Xiaodong Wang Rakesh Komuravelli Ching-Hsiang Chu Serhat Yilmaz Huayu Li Jiyuan Qian Zhuobo Feng Yinbin Ma Junjie Yang Ellie Wen Hong Li Lin Yang Chonglin Sun Whitney Zhao Dimitry Melts Krishna Dhulipala KR Kishore Tyler Graf Assaf Eisenman Kiran Kumar Matam Adi Gangidi Guoqiang Jerry Chen Manoj Krishnan Avinash Nayak Krishnakumar Nair Bharath Muthiah Mahmoud khorashadi Pallab Bhattacharya Petr Lapukhov Maxim Naumov Ajit Mathews Lin Qiao Mikhail Smelyanskiy Bill Jia and Vijay Rao. 2021. Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models.","DOI":"10.1145\/3470496.3533727"},{"key":"e_1_3_2_1_41_1","unstructured":"Maxim Naumov Dheevatsa Mudigere Hao-Jun Michael Shi Jianyu Huang Narayanan Sundaraman Jongsoo Park Xiaodong Wang Udit Gupta Carole-Jean Wu Alisson G. Azzolini Dmytro Dzhulgakov Andrey Mallevich Ilia Cherniavskii Yinghai Lu Raghuraman Krishnamoorthi Ansha Yu Volodymyr Kondratenko Stephanie Pereira Xianjie Chen Wenlin Chen Vijay Rao Bill Jia Liang Xiong and Misha Smelyanskiy. 2019. Deep Learning Recommendation Model for Personalization and Recommendation Systems. CoRR abs\/1906.00091 (2019). \t\t\t\t  Maxim Naumov Dheevatsa Mudigere Hao-Jun Michael Shi Jianyu Huang Narayanan Sundaraman Jongsoo Park Xiaodong Wang Udit Gupta Carole-Jean Wu Alisson G. Azzolini Dmytro Dzhulgakov Andrey Mallevich Ilia Cherniavskii Yinghai Lu Raghuraman Krishnamoorthi Ansha Yu Volodymyr Kondratenko Stephanie Pereira Xianjie Chen Wenlin Chen Vijay Rao Bill Jia Liang Xiong and Misha Smelyanskiy. 2019. Deep Learning Recommendation Model for Personalization and Recommendation Systems. CoRR abs\/1906.00091 (2019)."},{"key":"e_1_3_2_1_42_1","unstructured":"2016. State of GPUDirect Technologies. https:\/\/on-demand.gputechconf.com\/gtc\/2016\/presentation\/s6264-davide-rossetti-GPUDirect.pdf \t\t\t\t  2016. State of GPUDirect Technologies. https:\/\/on-demand.gputechconf.com\/gtc\/2016\/presentation\/s6264-davide-rossetti-GPUDirect.pdf"},{"key":"e_1_3_2_1_43_1","unstructured":"2019. How to make your life easier in the age of exascale computing using NVIDIA GPUDirect technologies. https:\/\/developer.download.nvidia.com\/video\/gputechconf\/gtc\/2019\/presentation\/s9653-how-to-make-your-life-easier-in-the-age-of-exascale-computing-using-nvidia-gpudirect-technologies.pdf \t\t\t\t  2019. How to make your life easier in the age of exascale computing using NVIDIA GPUDirect technologies. https:\/\/developer.download.nvidia.com\/video\/gputechconf\/gtc\/2019\/presentation\/s9653-how-to-make-your-life-easier-in-the-age-of-exascale-computing-using-nvidia-gpudirect-technologies.pdf"},{"key":"e_1_3_2_1_44_1","unstructured":"2020. NVIDIA DGX A100. https:\/\/www.nvidia.com\/content\/dam\/en-zz\/Solutions\/Data-Center\/nvidia-dgx-a100-datasheet.pdf \t\t\t\t  2020. NVIDIA DGX A100. https:\/\/www.nvidia.com\/content\/dam\/en-zz\/Solutions\/Data-Center\/nvidia-dgx-a100-datasheet.pdf"},{"key":"e_1_3_2_1_45_1","unstructured":"2020. NVIDIA Tesla A100 Tensor Core GPU Architecture. https:\/\/www.nvidia.com\/content\/dam\/en-zz\/Solutions\/Data-Center\/nvidia-ampere-architecture-whitepaper.pdf \t\t\t\t  2020. NVIDIA Tesla A100 Tensor Core GPU Architecture. https:\/\/www.nvidia.com\/content\/dam\/en-zz\/Solutions\/Data-Center\/nvidia-ampere-architecture-whitepaper.pdf"},{"key":"e_1_3_2_1_46_1","unstructured":"2021. CUDA RAPIDS: GPU-Accelerated Data Analytics and Machine Learning. https:\/\/developer.nvidia.com\/rapids \t\t\t\t  2021. CUDA RAPIDS: GPU-Accelerated Data Analytics and Machine Learning. https:\/\/developer.nvidia.com\/rapids"},{"key":"e_1_3_2_1_47_1","unstructured":"2022. GPUDirect Storage: A Direct Path Between Storage and GPU Memory. https:\/\/developer.nvidia.com\/blog\/gpudirect-storage\/ \t\t\t\t  2022. GPUDirect Storage: A Direct Path Between Storage and GPU Memory. https:\/\/developer.nvidia.com\/blog\/gpudirect-storage\/"},{"key":"e_1_3_2_1_48_1","unstructured":"2022. Unified Memory for CUDA Beginners. https:\/\/developer.nvidia.com\/blog\/unified-memory-cuda-beginners \t\t\t\t  2022. Unified Memory for CUDA Beginners. https:\/\/developer.nvidia.com\/blog\/unified-memory-cuda-beginners"},{"key":"e_1_3_2_1_49_1","volume-title":"Infrastructure to Enable and Exploit GPU Orchestrated High-Throughput Storage Access on GPUs. Ph. D. Dissertation","author":"Qureshi Zaid","unstructured":"Zaid Qureshi . 2022. Infrastructure to Enable and Exploit GPU Orchestrated High-Throughput Storage Access on GPUs. Ph. D. Dissertation . University of Illinois Urbana-Champaign. Zaid Qureshi. 2022. Infrastructure to Enable and Exploit GPU Orchestrated High-Throughput Storage Access on GPUs. Ph. D. Dissertation. University of Illinois Urbana-Champaign."},{"key":"e_1_3_2_1_50_1","volume-title":"Isaac Gelado, Seung Won Min, Amna Masood, Jeongmin Park, Jinjun Xiong, CJ Newburn, Dmitri Vainbrand, I-Hsin Chung, Michael Garland, William Dally, and Wen-mei Hwu.","author":"Qureshi Zaid","year":"2022","unstructured":"Zaid Qureshi , Vikram Sharma Mailthody , Isaac Gelado, Seung Won Min, Amna Masood, Jeongmin Park, Jinjun Xiong, CJ Newburn, Dmitri Vainbrand, I-Hsin Chung, Michael Garland, William Dally, and Wen-mei Hwu. 2022 . GPU-Orchestrated On-Demand High-Throughput Storage Access in the System Architecture . arXiv. arxiv:2203.04910 Zaid Qureshi, Vikram Sharma Mailthody, Isaac Gelado, Seung Won Min, Amna Masood, Jeongmin Park, Jinjun Xiong, CJ Newburn, Dmitri Vainbrand, I-Hsin Chung, Michael Garland, William Dally, and Wen-mei Hwu. 2022. GPU-Orchestrated On-Demand High-Throughput Storage Access in the System Architecture. arXiv. arxiv:2203.04910"},{"key":"e_1_3_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.7217356"},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/3458817.3476205"},{"key":"e_1_3_2_1_53_1","volume-title":"CrossFS: A Cross-layered Direct-Access File System. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20)","author":"Ren Yujie","year":"2020","unstructured":"Yujie Ren , Changwoo Min , and Sudarsun Kannan . 2020 . CrossFS: A Cross-layered Direct-Access File System. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20) . USENIX Association, 137\u2013154. isbn:978-1-939133-19-9 Yujie Ren, Changwoo Min, and Sudarsun Kannan. 2020. CrossFS: A Cross-layered Direct-Access File System. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 137\u2013154. isbn:978-1-939133-19-9"},{"key":"e_1_3_2_1_54_1","unstructured":"Samsung. 2021. Samsung 980 PRO SSD. https:\/\/www.samsung.com\/us\/computing\/memory-storage\/solid-state-drives\/980-pro-pcie-4-0-nvme-ssd-1tb-mz-v8p1t0b-am\/ \t\t\t\t  Samsung. 2021. Samsung 980 PRO SSD. https:\/\/www.samsung.com\/us\/computing\/memory-storage\/solid-state-drives\/980-pro-pcie-4-0-nvme-ssd-1tb-mz-v8p1t0b-am\/"},{"key":"e_1_3_2_1_55_1","unstructured":"2021. Samsung Z-NAND Technology Brief. https:\/\/www.samsung.com\/us\/labs\/pdfs\/collateral\/Samsung_Z-NAND_Technology_Brief_v5.pdf \t\t\t\t  2021. Samsung Z-NAND Technology Brief. https:\/\/www.samsung.com\/us\/labs\/pdfs\/collateral\/Samsung_Z-NAND_Technology_Brief_v5.pdf"},{"key":"e_1_3_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/2688500.2688526"},{"key":"e_1_3_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001200"},{"key":"e_1_3_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/2451116.2451169"},{"key":"e_1_3_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/3097983.3098057"},{"key":"e_1_3_2_1_60_1","unstructured":"The City of New York. 2021. TLC Trip Record Data. https:\/\/www1.nyc.gov\/site\/tlc\/about\/tlc-trip-record-data.page \t\t\t\t  The City of New York. 2021. TLC Trip Record Data. https:\/\/www1.nyc.gov\/site\/tlc\/about\/tlc-trip-record-data.page"},{"key":"e_1_3_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378528"},{"key":"e_1_3_2_1_62_1","unstructured":"Hung-Wei Tseng Yang Liu Mark Gahagan Jing Li Yanqin Jin and Steven Swanson. 2015. Gullfoss : Accelerating and Simplifying Data Movement among Heterogeneous Computing and Storage Resources. \t\t\t\t  Hung-Wei Tseng Yang Liu Mark Gahagan Jing Li Yanqin Jin and Steven Swanson. 2015. Gullfoss : Accelerating and Simplifying Data Movement among Heterogeneous Computing and Storage Resources."},{"key":"e_1_3_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001143"},{"key":"e_1_3_2_1_64_1","volume-title":"Generic System Calls for GPUs. In 2018 ACM\/IEEE 45th Annual International Symposium on Computer Architecture (ISCA\u201918)","author":"Vesel\u00fd J\u00e1n","unstructured":"J\u00e1n Vesel\u00fd , Arkaprava Basu , Abhishek Bhattacharjee , Gabriel H. Loh , Mark Oskin , and Steven K. Reinhardt . 2018 . Generic System Calls for GPUs. In 2018 ACM\/IEEE 45th Annual International Symposium on Computer Architecture (ISCA\u201918) . 843\u2013856. J\u00e1n Vesel\u00fd, Arkaprava Basu, Abhishek Bhattacharjee, Gabriel H. Loh, Mark Oskin, and Steven K. Reinhardt. 2018. Generic System Calls for GPUs. In 2018 ACM\/IEEE 45th Annual International Symposium on Computer Architecture (ISCA\u201918). 843\u2013856."},{"key":"e_1_3_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCSE.2015.4"},{"key":"e_1_3_2_1_66_1","volume-title":"Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques. 93\u2013102","author":"Wang Bin","unstructured":"Bin Wang , Bo Wu , Dong Li , Xipeng Shen , Weikuan Yu , Yizheng Jiao , and Jeffrey S. Vetter . 2013. Exploring hybrid memory for GPU energy efficiency through software-hardware co-design . In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques. 93\u2013102 . Bin Wang, Bo Wu, Dong Li, Xipeng Shen, Weikuan Yu, Yizheng Jiao, and Jeffrey S. Vetter. 2013. Exploring hybrid memory for GPU energy efficiency through software-hardware co-design. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques. 93\u2013102."},{"key":"e_1_3_2_1_67_1","unstructured":"Weka.io. 2021. WekaFS Architecture Whitepaper. https:\/\/www.weka.io\/wp-content\/uploads\/files\/2017\/12\/Architectural_WhitePaper-W02R6WP201812-1.pdf \t\t\t\t  Weka.io. 2021. WekaFS Architecture Whitepaper. https:\/\/www.weka.io\/wp-content\/uploads\/files\/2017\/12\/Architectural_WhitePaper-W02R6WP201812-1.pdf"},{"key":"e_1_3_2_1_68_1","doi-asserted-by":"crossref","unstructured":"Jaewon Yang and Jure Leskovec. 2012. Defining and Evaluating Network Communities based on Ground-truth. CoRR. \t\t\t\t  Jaewon Yang and Jure Leskovec. 2012. Defining and Evaluating Network Communities based on Ground-truth. CoRR.","DOI":"10.1109\/ICDM.2012.138"},{"key":"e_1_3_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1109\/CloudCom.2017.14"},{"key":"e_1_3_2_1_70_1","volume-title":"NVMMU: A Non-volatile Memory Management Unit for Heterogeneous GPU-SSD Architectures. In 2015 International Conference on Parallel Architecture and Compilation (PACT). 13\u201324","author":"Zhang Jie","year":"2015","unstructured":"Jie Zhang , David Donofrio , John Shalf , Mahmut T. Kandemir , and Myoungsoo Jung . 2015 . NVMMU: A Non-volatile Memory Management Unit for Heterogeneous GPU-SSD Architectures. In 2015 International Conference on Parallel Architecture and Compilation (PACT). 13\u201324 . Jie Zhang, David Donofrio, John Shalf, Mahmut T. Kandemir, and Myoungsoo Jung. 2015. NVMMU: A Non-volatile Memory Management Unit for Heterogeneous GPU-SSD Architectures. In 2015 International Conference on Parallel Architecture and Compilation (PACT). 13\u201324."},{"key":"e_1_3_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA45697.2020.00090"},{"key":"e_1_3_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1145\/3466752.3480107"},{"key":"e_1_3_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1145\/3316781.3317827"},{"key":"e_1_3_2_1_74_1","volume-title":"Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems. In Third Conference on Machine Learning and Systems.","author":"Zhao Weijie","year":"2020","unstructured":"Weijie Zhao , Deping Xie , Ronglai Jia , Yulei Qian , Ruiquan Ding , Mingming Sun , and Ping Li . 2020 . Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems. In Third Conference on Machine Learning and Systems. Weijie Zhao, Deping Xie, Ronglai Jia, Yulei Qian, Ruiquan Ding, Mingming Sun, and Ping Li. 2020. Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems. In Third Conference on Machine Learning and Systems."}],"event":{"name":"ASPLOS '23: 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2","location":"Vancouver BC Canada","acronym":"ASPLOS '23","sponsor":["SIGARCH ACM Special Interest Group on Computer Architecture","SIGOPS ACM Special Interest Group on Operating Systems","SIGPLAN ACM Special Interest Group on Programming Languages"]},"container-title":["Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3575693.3575748","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3575693.3575748","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:51:20Z","timestamp":1750182680000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3575693.3575748"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,27]]},"references-count":74,"alternative-id":["10.1145\/3575693.3575748","10.1145\/3575693"],"URL":"https:\/\/doi.org\/10.1145\/3575693.3575748","relation":{},"subject":[],"published":{"date-parts":[[2023,1,27]]},"assertion":[{"value":"2023-01-30","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}