{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,22]],"date-time":"2025-11-22T11:32:42Z","timestamp":1763811162252,"version":"3.41.0"},"reference-count":61,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2023,6,26]],"date-time":"2023-06-26T00:00:00Z","timestamp":1687737600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGOPS Oper. Syst. Rev."],"published-print":{"date-parts":[[2023,6,26]]},"abstract":"<jats:p>Serverless platforms have been attracting applications from traditional platforms because infrastructure management responsibilities are shifted from users to providers. Many applications well-suited to serverless environments could leverage GPU acceleration to enhance their performance. Unfortunately, current serverless platforms do not expose GPUs to serverless applications.<\/jats:p>","DOI":"10.1145\/3606557.3606560","type":"journal-article","created":{"date-parts":[[2023,6,28]],"date-time":"2023-06-28T16:25:16Z","timestamp":1687969516000},"page":"10-20","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Disaggregated GPU Acceleration for Serverless Applications"],"prefix":"10.1145","volume":"57","author":[{"given":"Henrique","family":"Fingler","sequence":"first","affiliation":[{"name":"University of Texas at Austin, Austin, TX, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhiting","family":"Zhu","sequence":"additional","affiliation":[{"name":"University of Texas at Austin, Austin, TX, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Esther","family":"Yoon","sequence":"additional","affiliation":[{"name":"University of Texas at Austin, Austin, TX, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhipeng","family":"Jia","sequence":"additional","affiliation":[{"name":"University of Texas at Austin, Austin, TX, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Emmett","family":"Witchel","sequence":"additional","affiliation":[{"name":"University of Texas at Austin, Austin, TX, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Christopher J.","family":"Rossbach","sequence":"additional","affiliation":[{"name":"University of Texas at Austin, Austin, TX, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,6,28]]},"reference":[{"volume-title":"October","year":"2021","key":"e_1_2_1_1_1","unstructured":"ArcFace. (Accessed : October 2021 ). ArcFace. (Accessed: October 2021)."},{"key":"e_1_2_1_2_1","volume-title":"May","author":"Accessed Best","year":"2023","unstructured":"Best practices for GPU-accelerated instances. ( Accessed : May , 2023 ). Best practices for GPU-accelerated instances. (Accessed: May, 2023)."},{"key":"e_1_2_1_3_1","volume-title":"May","author":"Azure Container Deploy","year":"2023","unstructured":"Deploy GPU-enabled container instance - Azure Container Instances | Microsoft Learn . (Accessed : May , 2023 ). Deploy GPU-enabled container instance - Azure Container Instances | Microsoft Learn. (Accessed: May, 2023)."},{"key":"e_1_2_1_4_1","volume-title":"October","author":"Mware End Solutions","year":"2021","unstructured":"End-to- End Solutions for AI\/ML Workloads | V Mware . (Accessed : October , 2021 ). End-to-End Solutions for AI\/ML Workloads | VMware. (Accessed: October, 2021)."},{"key":"e_1_2_1_5_1","volume-title":"October","author":"Accessed NVIDIA GRID.","year":"2021","unstructured":"NVIDIA GRID. ( Accessed : October 2021 ). NVIDIA GRID. (Accessed: October 2021)."},{"key":"e_1_2_1_6_1","volume-title":"January","author":"Serverless Functions Made Simple S","year":"2021","unstructured":"OpenFaa S - Serverless Functions Made Simple . (Accessed : January 2021 ). OpenFaaS - Serverless Functions Made Simple. (Accessed: January 2021)."},{"volume-title":"A COVID-19 CT Scan Dataset Applicable in Machine Learning and Deep Learning. (Accessed","year":"2021","key":"e_1_2_1_7_1","unstructured":"ShahinSHH\/COVID-CT-MD : A COVID-19 CT Scan Dataset Applicable in Machine Learning and Deep Learning. (Accessed : October , 2021 ). ShahinSHH\/COVID-CT-MD : A COVID-19 CT Scan Dataset Applicable in Machine Learning and Deep Learning. (Accessed: October, 2021)."},{"key":"e_1_2_1_8_1","volume-title":"October","author":"Computing Resources Underutilizing Cloud","year":"2021","unstructured":"Underutilizing Cloud Computing Resources . (Accessed : October 2021 ). Underutilizing Cloud Computing Resources. (Accessed: October 2021)."},{"key":"e_1_2_1_9_1","first-page":"1","article-title":"Drmaestro: orchestrating disaggregated resources on virtualized datacenters","volume":"10","author":"Amaral M.","year":"2021","unstructured":"M. Amaral , Jord\u00e0 Polo , David Carrera , N. Gonzalez , Chih-Chieh Yang , Alessandro Morari , Bruce D. D'Amora , A. Youssef , and M. Steinder . Drmaestro: orchestrating disaggregated resources on virtualized datacenters . Journal of Cloud Computing , 10 : 1 -- 20 , 2021 . M. Amaral, Jord\u00e0 Polo, David Carrera, N. Gonzalez, Chih-Chieh Yang, Alessandro Morari, Bruce D. D'Amora, A. Youssef, and M. Steinder. Drmaestro: orchestrating disaggregated resources on virtualized datacenters. Journal of Cloud Computing, 10:1--20, 2021.","journal-title":"Journal of Cloud Computing"},{"key":"e_1_2_1_10_1","first-page":"499","volume-title":"14th USENIX OSDI 2020","author":"Bai Zhihao","year":"2020","unstructured":"Zhihao Bai , Zhen Zhang , Yibo Zhu , and Xin Jin . Pipeswitch : Fast pipelined context switching for deep learning applications . In 14th USENIX OSDI 2020 , pages 499 -- 514 . USENIX Association , November 2020 . Zhihao Bai, Zhen Zhang, Yibo Zhu, and Xin Jin. Pipeswitch: Fast pipelined context switching for deep learning applications. In 14th USENIX OSDI 2020, pages 499--514. USENIX Association, November 2020."},{"key":"e_1_2_1_11_1","first-page":"185","volume-title":"Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms","author":"Chekuri Chandra","year":"1999","unstructured":"Chandra Chekuri and Sanjeev Khanna . On multidimensional packing problems . In Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms , pages 185 -- 194 . Citeseer , 1999 . Chandra Chekuri and Sanjeev Khanna. On multidimensional packing problems. In Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms, pages 185--194. Citeseer, 1999."},{"key":"e_1_2_1_12_1","volume-title":"CVPR 09","author":"Deng Jia","year":"2009","unstructured":"Jia Deng , Wei Dong , Richard Socher , Li-Jia Li , Kai Li , and Li Fei-Fei . ImageNet : A large-scale hierarchical image database . In CVPR 09 . IEEE, 2009 . Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical image database. In CVPR 09. IEEE, 2009."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00482"},{"key":"e_1_2_1_14_1","volume-title":"Retinaface: Single-stage dense face localisation in the wild. In arxiv","author":"Deng Jiankang","year":"2019","unstructured":"Jiankang Deng , Jia Guo , Zhou Yuxiang , Jinke Yu , Irene Kotsia , and Stefanos Zafeiriou . Retinaface: Single-stage dense face localisation in the wild. In arxiv , 2019 . Jiankang Deng, Jia Guo, Zhou Yuxiang, Jinke Yu, Irene Kotsia, and Stefanos Zafeiriou. Retinaface: Single-stage dense face localisation in the wild. In arxiv, 2019."},{"key":"e_1_2_1_15_1","volume-title":"Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 , 2018 . Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018."},{"key":"e_1_2_1_16_1","first-page":"947","volume-title":"Workshops and Phd Forum","author":"Diab K. M.","year":"2013","unstructured":"K. M. Diab , M. M. Rafique , and M. Hefeeda . Dynamic sharing of gpus in cloud systems. In 2013 IEEE ISPA , Workshops and Phd Forum , pages 947 -- 954 , 2013 . K. M. Diab, M. M. Rafique, and M. Hefeeda. Dynamic sharing of gpus in cloud systems. In 2013 IEEE ISPA, Workshops and Phd Forum, pages 947--954, 2013."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2012.01.020"},{"key":"e_1_2_1_18_1","volume-title":"Design and Implementation. In Workshop on I\/O Virtualization","author":"Dong Yaozu","year":"2008","unstructured":"Yaozu Dong , Zhao Yu , and Greg Rose . SR-IOV Networking in Xen: Architecture , Design and Implementation. In Workshop on I\/O Virtualization , 2008 . Yaozu Dong, Zhao Yu, and Greg Rose. SR-IOV Networking in Xen: Architecture, Design and Implementation. In Workshop on I\/O Virtualization, 2008."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1618525.1618534"},{"key":"e_1_2_1_20_1","unstructured":"Dong Du Tianyi Yu Yubin Xia Binyu Zang Guanglu Yan Chenggang Qin Qixuan Wu and Haibo Chen. Catalyzer: Sub-millisecond startup for serverless com  Dong Du Tianyi Yu Yubin Xia Binyu Zang Guanglu Yan Chenggang Qin Qixuan Wu and Haibo Chen. Catalyzer: Sub-millisecond startup for serverless com"},{"key":"e_1_2_1_21_1","first-page":"1","volume-title":"Proceedings of the 2011 18th HIPC","author":"Duato Jos\u00e9","year":"2011","unstructured":"Jos\u00e9 Duato , Antonio J. Pena , Federico Silla , Juan C. Fernandez , Rafael Mayo , and Enrique S . Quintana-Orti. Enabling CUDA Acceleration Within Virtual Machines Using rCUDA . In Proceedings of the 2011 18th HIPC , pages 1 -- 10 , Washington, DC, USA , 2011 . IEEE Computer Society. Jos\u00e9 Duato, Antonio J. Pena, Federico Silla, Juan C. Fernandez, Rafael Mayo, and Enrique S. Quintana-Orti. Enabling CUDA Acceleration Within Virtual Machines Using rCUDA. In Proceedings of the 2011 18th HIPC, pages 1--10, Washington, DC, USA, 2011. IEEE Computer Society."},{"key":"e_1_2_1_22_1","doi-asserted-by":"crossref","first-page":"739","DOI":"10.1109\/IPDPS53621.2022.00077","volume-title":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","author":"Fingler Henrique","year":"2022","unstructured":"Henrique Fingler , Zhiting Zhu , Esther Yoon , Zhipeng Jia , EmmettWitchel, and Christopher J. Rossbach . Dgsf: Disaggregated gpus for serverless functions . In 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) , pages 739 -- 750 , 2022 . Henrique Fingler, Zhiting Zhu, Esther Yoon, Zhipeng Jia, EmmettWitchel, and Christopher J. Rossbach. Dgsf: Disaggregated gpus for serverless functions. In 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 739--750, 2022."},{"key":"e_1_2_1_23_1","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1007\/978-3-642-15277-1_37","volume-title":"Euro-Par 2010-Parallel Processing","author":"Giunta G.","year":"2010","unstructured":"G. Giunta , R. Montella , G. Agrillo , and G. Coviello . A gpgpu transparent virtualization component for high performance computing clouds . Euro-Par 2010-Parallel Processing , pages 379 -- 391 , 2010 . G. Giunta, R. Montella, G. Agrillo, and G. Coviello. A gpgpu transparent virtualization component for high performance computing clouds. Euro-Par 2010-Parallel Processing, pages 379--391, 2010."},{"key":"e_1_2_1_24_1","first-page":"349","volume-title":"2019 IEEE 12th International CLOUD","author":"Guleria Anubhav","year":"2019","unstructured":"Anubhav Guleria , J Lakshmi , and Chakri Padala . Quadd : Quantifying accelerator disaggregated datacenter efficiency . In 2019 IEEE 12th International CLOUD , pages 349 -- 357 , 2019 . Anubhav Guleria, J Lakshmi, and Chakri Padala. Quadd: Quantifying accelerator disaggregated datacenter efficiency. In 2019 IEEE 12th International CLOUD, pages 349--357, 2019."},{"key":"e_1_2_1_25_1","first-page":"114","volume-title":"Proceedings of the ACM SoCC","author":"Guo Fan","year":"2019","unstructured":"Fan Guo , Yongkun Li , John C. S. Lui , and Yinlong Xu. Dcuda : Dynamic gpu scheduling with live migration support . In Proceedings of the ACM SoCC , page 114 -- 125 , New York, NY, USA , 2019 . Association for Computing Machinery. Fan Guo, Yongkun Li, John C. S. Lui, and Yinlong Xu. Dcuda: Dynamic gpu scheduling with live migration support. In Proceedings of the ACM SoCC, page 114--125, New York, NY, USA, 2019. Association for Computing Machinery."},{"key":"e_1_2_1_26_1","first-page":"17","volume-title":"Parthasarathy Ranganathan. GViM: GPU-accelerated Virtual Machines. In Proceedings of the 3rd ACM Workshop HPCVirt","author":"Gupta Vishakha","year":"2009","unstructured":"Vishakha Gupta , Ada Gavrilovska , Karsten Schwan , Harshvardhan Kharche , Niraj Tolia , Vanish Talwar , and Parthasarathy Ranganathan. GViM: GPU-accelerated Virtual Machines. In Proceedings of the 3rd ACM Workshop HPCVirt , pages 17 -- 24 , New York, NY, USA , 2009 . ACM. Vishakha Gupta, Ada Gavrilovska, Karsten Schwan, Harshvardhan Kharche, Niraj Tolia, Vanish Talwar, and Parthasarathy Ranganathan. GViM: GPU-accelerated Virtual Machines. In Proceedings of the 3rd ACM Workshop HPCVirt, pages 17--24, New York, NY, USA, 2009. ACM."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_1_28_1","first-page":"1","volume-title":"2020 IEEE ISPASS","author":"Hu B.","year":"2020","unstructured":"B. Hu and C. J. Rossbach . Altis: Modernizing gpgpu benchmarks . In 2020 IEEE ISPASS , pages 1 -- 11 , 2020 . B. Hu and C. J. Rossbach. Altis: Modernizing gpgpu benchmarks. In 2020 IEEE ISPASS, pages 1--11, 2020."},{"key":"e_1_2_1_30_1","volume-title":"Thirty-second Conference on Neural Information Processing Systems","author":"Jain Paras","year":"2018","unstructured":"Paras Jain , Xiangxi Mo , Ajay Jain , Harikaran Subbaraj , Rehan Sohail Durrani , Alexey Tumanov , Joseph Gonzalez , and Ion Stoica . Dynamic space-time scheduling for GPU inference . In Thirty-second Conference on Neural Information Processing Systems , 2018 . Paras Jain, Xiangxi Mo, Ajay Jain, Harikaran Subbaraj, Rehan Sohail Durrani, Alexey Tumanov, Joseph Gonzalez, and Ion Stoica. Dynamic space-time scheduling for GPU inference. In Thirty-second Conference on Neural Information Processing Systems, 2018."},{"key":"e_1_2_1_31_1","doi-asserted-by":"crossref","unstructured":"Tahereh Javaheri Morteza Homayounfar Zohreh Amoozgar Reza Reiazi Fatemeh Homayounieh Engy Abbas Azadeh Laali Amir Reza Radmard Mohammad Hadi Gharib Seyed Ali Javad Mousavi Omid Ghaemi Rosa Babaei Hadi Karimi Mobin Mehdi Hosseinzadeh Rana Jahanban-Esfahlan Khaled Seidi Mannudeep K. Kalra Guanglan Zhang L. T. Chitkushev Benjamin Haibe-Kains Reza Malekzadeh and Reza Rawassizadeh. Covidctnet: an open-source deep learning approach to diagnose covid-19 using small cohort of ct images. npj Digital Medicine 4(1) December 2021.  Tahereh Javaheri Morteza Homayounfar Zohreh Amoozgar Reza Reiazi Fatemeh Homayounieh Engy Abbas Azadeh Laali Amir Reza Radmard Mohammad Hadi Gharib Seyed Ali Javad Mousavi Omid Ghaemi Rosa Babaei Hadi Karimi Mobin Mehdi Hosseinzadeh Rana Jahanban-Esfahlan Khaled Seidi Mannudeep K. Kalra Guanglan Zhang L. T. Chitkushev Benjamin Haibe-Kains Reza Malekzadeh and Reza Rawassizadeh. Covidctnet: an open-source deep learning approach to diagnose covid-19 using small cohort of ct images. npj Digital Medicine 4(1) December 2021.","DOI":"10.1038\/s41746-021-00399-3"},{"key":"e_1_2_1_32_1","series-title":"Applied Mechanics and Materials","first-page":"15","volume-title":"Information, Communication and Engineering","author":"Jo Hee Seung","year":"2013","unstructured":"Hee Seung Jo , Myung Ho Lee, and Dong Hoon Choi. Gpu virtualization using PCI direct pass-through . In Information, Communication and Engineering , volume 311 of Applied Mechanics and Materials , pages 15 -- 19 . Trans Tech Publications Ltd, 5 2013 . Hee Seung Jo, Myung Ho Lee, and Dong Hoon Choi. Gpu virtualization using PCI direct pass-through. In Information, Communication and Engineering, volume 311 of Applied Mechanics and Materials, pages 15--19. Trans Tech Publications Ltd, 5 2013."},{"key":"e_1_2_1_33_1","first-page":"445","volume-title":"Proceedings SoCC 2017","author":"Jonas Eric","year":"2017","unstructured":"Eric Jonas , Qifan Pu , Shivaram Venkataraman , Ion Stoica , and Benjamin Recht . Occupy the cloud: Distributed computing for the 99% . In Proceedings SoCC 2017 , pages 445 -- 451 , New York, NY, USA , 2017 . ACM. Eric Jonas, Qifan Pu, Shivaram Venkataraman, Ion Stoica, and Benjamin Recht. Occupy the cloud: Distributed computing for the 99%. In Proceedings SoCC 2017, pages 445--451, New York, NY, USA, 2017. ACM."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080246"},{"key":"e_1_2_1_35_1","first-page":"533","volume-title":"2018 26th Euromicro International Conference on Parallel, Distributed and Networkbased Processing (PDP)","author":"Kim Jaewook","year":"2018","unstructured":"Jaewook Kim , Tae Joon Jun , Daeyoun Kang , Dohyeun Kim , and Daeyoung Kim . Gpu enabled serverless computing framework . In 2018 26th Euromicro International Conference on Parallel, Distributed and Networkbased Processing (PDP) , pages 533 -- 540 , 2018 . Jaewook Kim, Tae Joon Jun, Daeyoun Kang, Dohyeun Kim, and Daeyoung Kim. Gpu enabled serverless computing framework. In 2018 26th Euromicro International Conference on Parallel, Distributed and Networkbased Processing (PDP), pages 533--540, 2018."},{"key":"e_1_2_1_36_1","first-page":"887","volume-title":"2018 HPCS","author":"Kurkure U.","year":"2018","unstructured":"U. Kurkure , H. Sivaraman , and L. Vu . Virtualized gpus in high performance datacenters . In 2018 HPCS , pages 887 -- 894 , 2018 . U. Kurkure, H. Sivaraman, and L. Vu. Virtualized gpus in high performance datacenters. In 2018 HPCS, pages 887--894, 2018."},{"key":"e_1_2_1_37_1","unstructured":"Kuan-Ching Li Keunsoo Kim WonW. Ro Tien-Hsiung Weng Che-Lun Hung Chen-Hao Ku Albert Cohen and  Kuan-Ching Li Keunsoo Kim WonW. Ro Tien-Hsiung Weng Che-Lun Hung Chen-Hao Ku Albert Cohen and"},{"key":"e_1_2_1_38_1","volume-title":"11th USENIX HotCloud 19","author":"Mohan Anup","year":"2019","unstructured":"Anup Mohan , Harshad Sane , Kshitij Doshi , Saikrishna Edupuganti , Naren Nayak , and Vadim Sukhomlinov . Agile cold starts for scalable serverless . In 11th USENIX HotCloud 19 , Renton, WA , July 2019 . USENIX Association. Anup Mohan, Harshad Sane, Kshitij Doshi, Saikrishna Edupuganti, Naren Nayak, and Vadim Sukhomlinov. Agile cold starts for scalable serverless. In 11th USENIX HotCloud 19, Renton, WA, July 2019. USENIX Association."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2020.01.004"},{"key":"e_1_2_1_40_1","first-page":"665","volume-title":"2018 USENIX ATC","author":"Peng Bo","year":"2018","unstructured":"Bo Peng , Haozhong Zhang , Jianguo Yao , Yaozu Dong , Yu Xu , and Haibing Guan . MDev-NVMe : a NVMe storage virtualization solution with mediated pass-through . In 2018 USENIX ATC , pages 665 -- 676 , 2018 . Bo Peng, Haozhong Zhang, Jianguo Yao, Yaozu Dong, Yu Xu, and Haibing Guan. MDev-NVMe: a NVMe storage virtualization solution with mediated pass-through. In 2018 USENIX ATC, pages 665--676, 2018."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2019.2924433"},{"key":"e_1_2_1_42_1","volume-title":"100, 000+ questions for machine comprehension of text. CoRR, abs\/1606.05250","author":"Rajpurkar Pranav","year":"2016","unstructured":"Pranav Rajpurkar , Jian Zhang , Konstantin Lopyrev , and Percy Liang . Squad : 100, 000+ questions for machine comprehension of text. CoRR, abs\/1606.05250 , 2016 . Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. Squad: 100, 000+ questions for machine comprehension of text. CoRR, abs\/1606.05250, 2016."},{"key":"e_1_2_1_43_1","first-page":"217","volume-title":"Proceedings of the 20th HPDC","author":"Ravi Vignesh T.","year":"2011","unstructured":"Vignesh T. Ravi , Michela Becchi , Gagan Agrawal , and Srimat Chakradhar . Supporting gpu sharing in cloud environments with a transparent runtime consolidation framework . In Proceedings of the 20th HPDC , page 217 -- 228 , New York, NY, USA , 2011 . Association for Computing Machinery. Vignesh T. Ravi, Michela Becchi, Gagan Agrawal, and Srimat Chakradhar. Supporting gpu sharing in cloud environments with a transparent runtime consolidation framework. In Proceedings of the 20th HPDC, page 217--228, New York, NY, USA, 2011. Association for Computing Machinery."},{"key":"e_1_2_1_44_1","volume-title":"20th Annual International Conference on High Performance Computing, 0:1--10","author":"Rea\u00f1o Carlos","year":"2012","unstructured":"Carlos Rea\u00f1o , Antonio J. Pe\u00f1a , Federico Silla , Jos\u00e9 Duato , Rafael Mayo , and Enrique S . Quintana-Ort\u00ed. CU2rCU: Towards the complete rCUDA remote GPU virtualization and sharing solution . 20th Annual International Conference on High Performance Computing, 0:1--10 , 2012 . Carlos Rea\u00f1o, Antonio J. Pe\u00f1a, Federico Silla, Jos\u00e9 Duato, Rafael Mayo, and Enrique S. Quintana-Ort\u00ed. CU2rCU: Towards the complete rCUDA remote GPU virtualization and sharing solution. 20th Annual International Conference on High Performance Computing, 0:1--10, 2012."},{"key":"e_1_2_1_45_1","unstructured":"Vijay Janapa Reddi Christine Cheng David Kanter Peter Mattson Guenther Schmuelling Carole-Jean Wu Brian Anderson Maximilien Breughe Mark Charlebois William Chou Ramesh Chukka Cody Coleman Sam Davis Pan Deng Greg Diamos Jared Duke Dave Fick J. Scott Gardner Itay Hubara Sachin Idgunji Thomas B. Jablin Jeff Jiao Tom St. John Pankaj Kanwar David Lee Jeffery Liao Anton Lokhmotov Francisco Massa Peng Meng Paulius Micikevicius Colin Osborne Gennady Pekhimenko Arun Tejusve Raghunath Rajan Dilip Sequeira Ashish Sirasao Fei Sun Hanlin Tang Michael Thomson Frank Wei Ephrem Wu Lingjie Xu Koichi Yamada Bing Yu George Yuan Aaron Zhong Peizhao Zhang and Yuchen Zhou. Mlperf inference benchmark 2019.  Vijay Janapa Reddi Christine Cheng David Kanter Peter Mattson Guenther Schmuelling Carole-Jean Wu Brian Anderson Maximilien Breughe Mark Charlebois William Chou Ramesh Chukka Cody Coleman Sam Davis Pan Deng Greg Diamos Jared Duke Dave Fick J. Scott Gardner Itay Hubara Sachin Idgunji Thomas B. Jablin Jeff Jiao Tom St. John Pankaj Kanwar David Lee Jeffery Liao Anton Lokhmotov Francisco Massa Peng Meng Paulius Micikevicius Colin Osborne Gennady Pekhimenko Arun Tejusve Raghunath Rajan Dilip Sequeira Ashish Sirasao Fei Sun Hanlin Tang Michael Thomson Frank Wei Ephrem Wu Lingjie Xu Koichi Yamada Bing Yu George Yuan Aaron Zhong Peizhao Zhang and Yuchen Zhou. Mlperf inference benchmark 2019."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2015.03.014"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2011.112"},{"key":"e_1_2_1_48_1","volume-title":"KVM Forum","volume":"2014","author":"Song Jike","year":"2014","unstructured":"Jike Song , Zhiyuan Lv , and Kevin Tian . KVMGT : a Full GPU Virtualization Solution . In KVM Forum , volume 2014 , 2014 . Jike Song, Zhiyuan Lv, and Kevin Tian. KVMGT: a Full GPU Virtualization Solution. In KVM Forum, volume 2014, 2014."},{"key":"e_1_2_1_49_1","volume-title":"https:\/\/www.rightscale.com\/lp\/state-of-the-cloud. (Accessed","author":"State of the cloud report.","year":"2021","unstructured":"State of the cloud report. https:\/\/www.rightscale.com\/lp\/state-of-the-cloud. (Accessed : January , 2021 ). State of the cloud report. https:\/\/www.rightscale.com\/lp\/state-of-the-cloud. (Accessed: January, 2021)."},{"key":"e_1_2_1_50_1","first-page":"80","volume-title":"Proceedings SoCC 2017","author":"Suzuki Yusuke","year":"2017","unstructured":"Yusuke Suzuki , Hiroshi Yamada , Shinpei Kato , and Kenji Kono . Gloop : An event-driven runtime for consolidating gpgpu applications . In Proceedings SoCC 2017 , page 80 -- 93 , New York, NY, USA , 2017 . Association for Computing Machinery. Yusuke Suzuki, Hiroshi Yamada, Shinpei Kato, and Kenji Kono. Gloop: An event-driven runtime for consolidating gpgpu applications. In Proceedings SoCC 2017, page 80--93, New York, NY, USA, 2017. Association for Computing Machinery."},{"key":"e_1_2_1_51_1","first-page":"121","volume-title":"2014 USENIX ATC","author":"Tian Kun","year":"2014","unstructured":"Kun Tian , Yaozu Dong , and David Cowperthwaite . A Full GPU Virtualization Solution with Mediated Pass- Through . In 2014 USENIX ATC , pages 121 -- 132 . USENIX Association , June 2014 . Kun Tian, Yaozu Dong, and David Cowperthwaite. A Full GPU Virtualization Solution with Mediated Pass- Through. In 2014 USENIX ATC, pages 121--132. USENIX Association, June 2014."},{"key":"e_1_2_1_52_1","volume-title":"Proceedings of the Third ACM Symposium on Cloud Computing","author":"Tumanov Alexey","year":"2012","unstructured":"Alexey Tumanov , James Cipar , Gregory R. Ganger , and Michael A. Kozuch . Alsched: Algebraic scheduling of mixed workloads in heterogeneous clouds . In Proceedings of the Third ACM Symposium on Cloud Computing , New York, NY, USA , 2012 . Association for Computing Machinery. Alexey Tumanov, James Cipar, Gregory R. Ganger, and Michael A. Kozuch. Alsched: Algebraic scheduling of mixed workloads in heterogeneous clouds. In Proceedings of the Third ACM Symposium on Cloud Computing, New York, NY, USA, 2012. Association for Computing Machinery."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/2901318.2901355"},{"key":"e_1_2_1_54_1","first-page":"1","volume-title":"Rishi Bidarkar. GPU Virtualization for High Performance General Purpose Computing on the ESX Hypervisor. In Proceedings of HPC Symposium","author":"Vu Lan","year":"2014","unstructured":"Lan Vu , Hari Sivaraman , and Rishi Bidarkar. GPU Virtualization for High Performance General Purpose Computing on the ESX Hypervisor. In Proceedings of HPC Symposium , pages 2: 1 -- 2 :8, 2014 . Lan Vu, Hari Sivaraman, and Rishi Bidarkar. GPU Virtualization for High Performance General Purpose Computing on the ESX Hypervisor. In Proceedings of HPC Symposium, pages 2:1--2:8, 2014."},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/1618525.1618535"},{"key":"e_1_2_1_56_1","first-page":"124","volume-title":"Proceedings of the 12th IEEE\/ACM CCGrid","author":"Xiao Shucai","year":"2012","unstructured":"Shucai Xiao , Pavan Balaji , James Dinan , Qian Zhu , Rajeev Thakur , Susan Coghlan , Heshan Lin , Gaojin Wen , Jue Hong , and Wu-chun Feng. Transparent accelerator migration in a virtualized GPU environment . In Proceedings of the 12th IEEE\/ACM CCGrid , pages 124 -- 131 , 2012 . Shucai Xiao, Pavan Balaji, James Dinan, Qian Zhu, Rajeev Thakur, Susan Coghlan, Heshan Lin, Gaojin Wen, Jue Hong, and Wu-chun Feng. Transparent accelerator migration in a virtualized GPU environment. In Proceedings of the 12th IEEE\/ACM CCGrid, pages 124--131, 2012."},{"key":"e_1_2_1_57_1","first-page":"595","volume-title":"13th USENIX 2018 OSDI","author":"Xiao Wencong","year":"2018","unstructured":"Wencong Xiao , Romil Bhardwaj , Ramachandran Ramjee , Muthian Sivathanu , Nipun Kwatra , Zhenhua Han , Pratyush Patel , Xuan Peng , Hanyu Zhao , Quanlu Zhang , Fan Yang , and Lidong Zhou . Gandiva : Introspective cluster scheduling for deep learning . In 13th USENIX 2018 OSDI , pages 595 -- 610 , Carlsbad, CA , October 2018 . USENIX Association. Wencong Xiao, Romil Bhardwaj, Ramachandran Ramjee, Muthian Sivathanu, Nipun Kwatra, Zhenhua Han, Pratyush Patel, Xuan Peng, Hanyu Zhao, Quanlu Zhang, Fan Yang, and Lidong Zhou. Gandiva: Introspective cluster scheduling for deep learning. In 13th USENIX 2018 OSDI, pages 595--610, Carlsbad, CA, October 2018. USENIX Association."},{"key":"e_1_2_1_58_1","volume-title":"Proceedings of the 1st MOTA","author":"Yan Mengting","year":"2016","unstructured":"Mengting Yan , Paul Castro , Perry Cheng , and Vatche Ishakian . Building a chatbot with serverless computing . In Proceedings of the 1st MOTA , New York, NY, USA , 2016 . Association for Computing Machinery. Mengting Yan, Paul Castro, Perry Cheng, and Vatche Ishakian. Building a chatbot with serverless computing. In Proceedings of the 1st MOTA, New York, NY, USA, 2016. Association for Computing Machinery."},{"key":"e_1_2_1_59_1","first-page":"5525","volume-title":"2016 IEEE CVPR","author":"Yang Shuo","year":"2016","unstructured":"Shuo Yang , Ping Luo , Chen Change Loy , and Xiaoou Tang . Wider face : A face detection benchmark . In 2016 IEEE CVPR , pages 5525 -- 5533 , 2016 . Shuo Yang, Ping Luo, Chen Change Loy, and Xiaoou Tang. Wider face: A face detection benchmark. In 2016 IEEE CVPR, pages 5525--5533, 2016."},{"key":"e_1_2_1_60_1","first-page":"807","volume-title":"International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)","author":"Yu Hangchen","year":"2020","unstructured":"Hangchen Yu , Arthur Michener Peters , Amogh Akshintala , and Christopher J. Rossbach . AvA: Accelerated virtualization of accelerators . In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) , pages 807 -- 825 . ACM, 2020 . Hangchen Yu, Arthur Michener Peters, Amogh Akshintala, and Christopher J. Rossbach. AvA: Accelerated virtualization of accelerators. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 807-- 825. ACM, 2020."},{"key":"e_1_2_1_61_1","volume-title":"ISCA","author":"Yu Hangchen","year":"2017","unstructured":"Hangchen Yu and Christopher J Rossbach . Full Virtualization for GPUs Reconsidered. In 14th WDDD , ISCA , 2017 . Hangchen Yu and Christopher J Rossbach. Full Virtualization for GPUs Reconsidered. In 14th WDDD, ISCA, 2017."},{"key":"e_1_2_1_62_1","first-page":"98","volume-title":"PLMR 20","volume":"2","author":"Yu Peifeng","year":"2020","unstructured":"Peifeng Yu and Mosharaf Chowdhury . Fine-grained gpu sharing primitives for deep learning applications. In I. Dhillon, D. Papailiopoulos, and V. Sze, editors , PLMR 20 , volume 2 , pages 98 -- 111 , 2020 . Peifeng Yu and Mosharaf Chowdhury. Fine-grained gpu sharing primitives for deep learning applications. In I. Dhillon, D. Papailiopoulos, and V. Sze, editors, PLMR 20, volume 2, pages 98--111, 2020."}],"container-title":["ACM SIGOPS Operating Systems Review"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3606557.3606560","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3606557.3606560","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:48:51Z","timestamp":1750182531000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3606557.3606560"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,26]]},"references-count":61,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,6,26]]}},"alternative-id":["10.1145\/3606557.3606560"],"URL":"https:\/\/doi.org\/10.1145\/3606557.3606560","relation":{},"ISSN":["0163-5980"],"issn-type":[{"type":"print","value":"0163-5980"}],"subject":[],"published":{"date-parts":[[2023,6,26]]},"assertion":[{"value":"2023-06-28","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}