{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,6]],"date-time":"2026-06-06T01:15:30Z","timestamp":1780708530336,"version":"3.54.1"},"publisher-location":"New York, NY, USA","reference-count":62,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,10,12]],"date-time":"2020-10-12T00:00:00Z","timestamp":1602460800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"US NSF","award":["CNS-1763929"],"award-info":[{"award-number":["CNS-1763929"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,10,12]]},"DOI":"10.1145\/3419111.3421284","type":"proceedings-article","created":{"date-parts":[[2020,10,13]],"date-time":"2020-10-13T04:40:25Z","timestamp":1602564025000},"page":"492-506","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":123,"title":["GSLICE"],"prefix":"10.1145","author":[{"given":"Aditya","family":"Dhakal","sequence":"first","affiliation":[{"name":"University of California"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Sameer G","family":"Kulkarni","sequence":"additional","affiliation":[{"name":"IIT, Gandhinagar"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"K. K.","family":"Ramakrishnan","sequence":"additional","affiliation":[{"name":"University of California"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2020,10,12]]},"reference":[{"key":"e_1_3_2_2_1_1","unstructured":"2014. Data plane development kit. http:\/\/dpdk.org\/. [online].  2014. Data plane development kit. http:\/\/dpdk.org\/. [online]."},{"key":"e_1_3_2_2_2_1","unstructured":"2019. NVIDIA Container runtime for Docker. https:\/\/github.com\/NVIDIA\/nvidia-docker. (2019). [online].  2019. NVIDIA Container runtime for Docker. https:\/\/github.com\/NVIDIA\/nvidia-docker. (2019). [online]."},{"key":"e_1_3_2_2_3_1","unstructured":"2019. NVIDIA TensorRT Inference Server. https:\/\/github.com\/NVIDIA\/tensorrt-inference-server. [online].  2019. NVIDIA TensorRT Inference Server. https:\/\/github.com\/NVIDIA\/tensorrt-inference-server. [online]."},{"key":"e_1_3_2_2_4_1","unstructured":"2020. Clipper Github. https:\/\/github.com\/ucbrise\/clipper.  2020. Clipper Github. https:\/\/github.com\/ucbrise\/clipper."},{"key":"e_1_3_2_2_5_1","unstructured":"2020. CUBLAS LIBRARY. https:\/\/docs.nvidia.com\/cuda\/cublas\/index.html. Accessed: 2020-02-19.  2020. CUBLAS LIBRARY. https:\/\/docs.nvidia.com\/cuda\/cublas\/index.html. Accessed: 2020-02-19."},{"key":"e_1_3_2_2_6_1","unstructured":"2020. Metal Documentation. https:\/\/developer.apple.com\/documentation\/metal. Accessed: 2020-04-25.  2020. Metal Documentation. https:\/\/developer.apple.com\/documentation\/metal. Accessed: 2020-04-25."},{"key":"e_1_3_2_2_7_1","unstructured":"2020. NVIDIA Ampere MIG. https:\/\/www.nvidia.com\/en-us\/technologies\/multi-instance-gpu.  2020. NVIDIA Ampere MIG. https:\/\/www.nvidia.com\/en-us\/technologies\/multi-instance-gpu."},{"key":"e_1_3_2_2_8_1","volume-title":"NVIDIA Tesla V100 GPU Architecture","unstructured":"2020. NVIDIA Tesla V100 GPU Architecture . http:\/\/images.nvidia.com\/content\/volta-architecture\/pdf\/volta-architecture-whitepaper.pdf. Accessed: 2020-02-01. 2020. NVIDIA Tesla V100 GPU Architecture. http:\/\/images.nvidia.com\/content\/volta-architecture\/pdf\/volta-architecture-whitepaper.pdf. Accessed: 2020-02-01."},{"key":"e_1_3_2_2_9_1","unstructured":"2020. ROCm Github. https:\/\/github.com\/RadeonOpenCompute\/ROCml. Accessed: 2020-04-25.  2020. ROCm Github. https:\/\/github.com\/RadeonOpenCompute\/ROCml. Accessed: 2020-04-25."},{"key":"e_1_3_2_2_10_1","unstructured":"2020. tcpreplay Github. https:\/\/github.com\/appneta\/tcpreplay.  2020. tcpreplay Github. https:\/\/github.com\/appneta\/tcpreplay."},{"key":"e_1_3_2_2_11_1","volume-title":"Tensorflow: A system for large-scale machine learning. In 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16). 265--283.","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi , Paul Barham , Jianmin Chen , Zhifeng Chen , Andy Davis , Jeffrey Dean , Matthieu Devin , Sanjay Ghemawat , Geoffrey Irving , Michael Isard , 2016 . Tensorflow: A system for large-scale machine learning. In 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16). 265--283. Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A system for large-scale machine learning. In 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16). 265--283."},{"key":"e_1_3_2_2_12_1","volume-title":"Real-time video analytics: The killer app for edge computing. computer 50, 10","author":"Ananthanarayanan Ganesh","year":"2017","unstructured":"Ganesh Ananthanarayanan , Paramvir Bahl , Peter Bod\u00edk , Krishna Chintalapudi , Matthai Philipose , Lenin Ravindranath , and Sudipta Sinha . 2017. Real-time video analytics: The killer app for edge computing. computer 50, 10 ( 2017 ), 58--67. Ganesh Ananthanarayanan, Paramvir Bahl, Peter Bod\u00edk, Krishna Chintalapudi, Matthai Philipose, Lenin Ravindranath, and Sudipta Sinha. 2017. Real-time video analytics: The killer app for edge computing. computer 50, 10 (2017), 58--67."},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3320060"},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2018.2877890"},{"key":"e_1_3_2_2_15_1","volume-title":"Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274","author":"Chen Tianqi","year":"2015","unstructured":"Tianqi Chen , Mu Li , Yutian Li , Min Lin , Naiyan Wang , Minjie Wang , Tianjun Xiao , Bing Xu , Chiyuan Zhang , and Zheng Zhang . 2015 . Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274 (2015). Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274 (2015)."},{"key":"e_1_3_2_2_16_1","unstructured":"Tianqi Chen Thierry Moreau Ziheng Jiang Lianmin Zheng Eddie Yan Haichen Shen Meghan Cowan Leyuan Wang Yuwei Hu Luis Ceze etal 2018. {TVM}: An automated end-to-end optimizing compiler for deep learning. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 578--594.  Tianqi Chen Thierry Moreau Ziheng Jiang Lianmin Zheng Eddie Yan Haichen Shen Meghan Cowan Leyuan Wang Yuwei Hu Luis Ceze et al. 2018. {TVM}: An automated end-to-end optimizing compiler for deep learning. In 13th { USENIX } Symposium on Operating Systems Design and Implementation ( { OSDI } 18) . 578--594."},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001177"},{"key":"e_1_3_2_2_18_1","volume-title":"cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759","author":"Chetlur Sharan","year":"2014","unstructured":"Sharan Chetlur , Cliff Woolley , Philippe Vandermersch , Jonathan Cohen , John Tran , Bryan Catanzaro , and Evan Shelhamer . 2014. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 ( 2014 ). Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014)."},{"key":"e_1_3_2_2_19_1","volume-title":"NIPS Workshop.","author":"Collobert R.","unstructured":"R. Collobert , K. Kavukcuoglu , and C. Farabet . 2011. Torch7: A Matlab-like Environment for Machine Learning. In BigLearn , NIPS Workshop. R. Collobert, K. Kavukcuoglu, and C. Farabet. 2011. Torch7: A Matlab-like Environment for Machine Learning. In BigLearn, NIPS Workshop."},{"key":"e_1_3_2_2_20_1","volume-title":"Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830","author":"Courbariaux Matthieu","year":"2016","unstructured":"Matthieu Courbariaux , Itay Hubara , Daniel Soudry , Ran El-Yaniv , and Yoshua Bengio . 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830 ( 2016 ). Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830 (2016)."},{"key":"e_1_3_2_2_21_1","volume-title":"Clipper: A low-latency online prediction serving system. In 14th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 17). 613--627.","author":"Crankshaw Daniel","year":"2017","unstructured":"Daniel Crankshaw , Xin Wang , Guilio Zhou , Michael J Franklin , Joseph E Gonzalez , and Ion Stoica . 2017 . Clipper: A low-latency online prediction serving system. In 14th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 17). 613--627. Daniel Crankshaw, Xin Wang, Guilio Zhou, Michael J Franklin, Joseph E Gonzalez, and Ion Stoica. 2017. Clipper: A low-latency online prediction serving system. In 14th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 17). 613--627."},{"key":"e_1_3_2_2_22_1","volume-title":"Best practice guide","author":"Cuda C","year":"2018","unstructured":"C Cuda . 2018. Best practice guide , 2018 . C Cuda. 2018. Best practice guide, 2018."},{"key":"e_1_3_2_2_23_1","volume-title":"NetML: An NFV Platform with Efficient Support for Machine Learning Applications. In 2019 IEEE Conference on Network Softwarization (NetSoft). IEEE, 396--404","author":"Dhakal Aditya","unstructured":"Aditya Dhakal and K. K. Ramakrishnan . 2019 . NetML: An NFV Platform with Efficient Support for Machine Learning Applications. In 2019 IEEE Conference on Network Softwarization (NetSoft). IEEE, 396--404 . Aditya Dhakal and K. K. Ramakrishnan. 2019. NetML: An NFV Platform with Efficient Support for Machine Learning Applications. In 2019 IEEE Conference on Network Softwarization (NetSoft). IEEE, 396--404."},{"key":"e_1_3_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2815675.2815692"},{"key":"e_1_3_2_2_25_1","volume-title":"Proceedings of the 14th USENIX Conference on Networked Systems Design and Implementation. USENIX Association, 83--96","author":"Go Younghwan","year":"2017","unstructured":"Younghwan Go , Muhammad Jamshed , YoungGyoun Moon , Changho Hwang , and KyoungSoo Park . 2017 . APUNet: revitalizing GPU as packet processing accelerator . In Proceedings of the 14th USENIX Conference on Networked Systems Design and Implementation. USENIX Association, 83--96 . Younghwan Go, Muhammad Jamshed, YoungGyoun Moon, Changho Hwang, and KyoungSoo Park. 2017. APUNet: revitalizing GPU as packet processing accelerator. In Proceedings of the 14th USENIX Conference on Networked Systems Design and Implementation. USENIX Association, 83--96."},{"key":"e_1_3_2_2_26_1","unstructured":"Allison Gray Chris Gottbrath Ryan Olson and Shashank Prasanna. 2017. Deploying deep neural networks with nvidia tensorrt. https:\/\/devblogs.nvidia.com\/deploying-deep-learning-nvidia-tensorrt\/.  Allison Gray Chris Gottbrath Ryan Olson and Shashank Prasanna. 2017. Deploying deep neural networks with nvidia tensorrt. https:\/\/devblogs.nvidia.com\/deploying-deep-learning-nvidia-tensorrt\/."},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/1851182.1851207"},{"key":"e_1_3_2_2_28_1","volume-title":"Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149","author":"Han Song","year":"2015","unstructured":"Song Han , Huizi Mao , and William J Dally . 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 ( 2015 ). Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015)."},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_2_30_1","volume-title":"ZeroMQ: messaging for many applications. \"O'Reilly Media","author":"Hintjens Pieter","unstructured":"Pieter Hintjens . 2013. ZeroMQ: messaging for many applications. \"O'Reilly Media , Inc .\". Pieter Hintjens. 2013. ZeroMQ: messaging for many applications. \"O'Reilly Media, Inc.\"."},{"key":"e_1_3_2_2_31_1","volume-title":"Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861","author":"Howard Andrew G","year":"2017","unstructured":"Andrew G Howard , Menglong Zhu , Bo Chen , Dmitry Kalenichenko , Weijun Wang , Tobias Weyand , Marco Andreetto , and Hartwig Adam . 2017 . Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017). Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)."},{"key":"e_1_3_2_2_32_1","volume-title":"Squeezenet: Alexnet-level accuracy with 50x fewer parameters and 0.5 mb model size. arXiv preprint arXiv:1602.07360","author":"Iandola Forrest N","year":"2016","unstructured":"Forrest N Iandola , Song Han , Matthew W Moskewicz , Khalid Ashraf , William J Dally , and Kurt Keutzer . 2016 . Squeezenet: Alexnet-level accuracy with 50x fewer parameters and 0.5 mb model size. arXiv preprint arXiv:1602.07360 (2016). Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. 2016. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and 0.5 mb model size. arXiv preprint arXiv:1602.07360 (2016)."},{"key":"e_1_3_2_2_33_1","unstructured":"IBM Corporation. 2018. PowerAI Vision Inference Server. https:\/\/www.ibm.com\/support\/knowledgecenter\/SSRU69_1.1.2\/base\/vision_pdf.pdf?view=kc. Accessed:2019-12-01.  IBM Corporation. 2018. PowerAI Vision Inference Server. https:\/\/www.ibm.com\/support\/knowledgecenter\/SSRU69_1.1.2\/base\/vision_pdf.pdf?view=kc. Accessed:2019-12-01."},{"key":"e_1_3_2_2_34_1","volume-title":"Alexey Tumanov, Joseph Gonzalez, and Ion Stoica.","author":"Jain Paras","year":"2018","unstructured":"Paras Jain , Xiangxi Mo , Ajay Jain , Harikaran Subbaraj , Rehan Sohail Durrani , Alexey Tumanov, Joseph Gonzalez, and Ion Stoica. 2018 . Dynamic Space-Time Scheduling for GPU Inference . arXiv preprint arXiv:1901.00041 (2018). Paras Jain, Xiangxi Mo, Ajay Jain, Harikaran Subbaraj, Rehan Sohail Durrani, Alexey Tumanov, Joseph Gonzalez, and Ion Stoica. 2018. Dynamic Space-Time Scheduling for GPU Inference. arXiv preprint arXiv:1901.00041 (2018)."},{"key":"e_1_3_2_2_35_1","unstructured":"Keon Jang Sangjin Han Seungyeop Han Sue B Moon and KyoungSoo Park. 2011. SSLShader: Cheap SSL Acceleration with Commodity Processors.. In NSDI.  Keon Jang Sangjin Han Seungyeop Han Sue B Moon and KyoungSoo Park. 2011. SSLShader: Cheap SSL Acceleration with Commodity Processors.. In NSDI."},{"key":"e_1_3_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080246"},{"key":"e_1_3_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2741948.2741969"},{"key":"e_1_3_2_2_38_1","unstructured":"Alex Krizhevsky Ilya Sutskever and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.  Alex Krizhevsky Ilya Sutskever and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105."},{"key":"e_1_3_2_2_39_1","volume-title":"Jasper: An End-to-End Convolutional Neural Acoustic Model. arXiv preprint arXiv:1904.03288","author":"Li Jason","year":"2019","unstructured":"Jason Li , Vitaly Lavrukhin , Boris Ginsburg , Ryan Leary , Oleksii Kuchaiev , Jonathan M Cohen , Huyen Nguyen , and Ravi Teja Gadde . 2019 . Jasper: An End-to-End Convolutional Neural Acoustic Model. arXiv preprint arXiv:1904.03288 (2019). Jason Li, Vitaly Lavrukhin, Boris Ginsburg, Ryan Leary, Oleksii Kuchaiev, Jonathan M Cohen, Huyen Nguyen, and Ravi Teja Gadde. 2019. Jasper: An End-to-End Convolutional Neural Acoustic Model. arXiv preprint arXiv:1904.03288 (2019)."},{"key":"e_1_3_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3213344.3213345"},{"key":"e_1_3_2_2_41_1","volume-title":"Themis: Fair and Efficient {GPU} Cluster Scheduling. In 17th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 20). 289--304.","author":"Mahajan Kshiteej","year":"2020","unstructured":"Kshiteej Mahajan , Arjun Balasubramanian , Arjun Singhvi , Shivaram Venkataraman , Aditya Akella , Amar Phanishayee , and Shuchi Chawla . 2020 . Themis: Fair and Efficient {GPU} Cluster Scheduling. In 17th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 20). 289--304. Kshiteej Mahajan, Arjun Balasubramanian, Arjun Singhvi, Shivaram Venkataraman, Aditya Akella, Amar Phanishayee, and Shuchi Chawla. 2020. Themis: Fair and Efficient {GPU} Cluster Scheduling. In 17th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 20). 289--304."},{"key":"e_1_3_2_2_42_1","unstructured":"Mellanox Inc. 2018. AI composabilitity and Virtualization: Mellanox Network attached GPUs. http:\/\/www.mellanox.com\/related-docs\/solutions\/SB_ai_composability_virtualization.pdf. [online].  Mellanox Inc. 2018. AI composabilitity and Virtualization: Mellanox Network attached GPUs. http:\/\/www.mellanox.com\/related-docs\/solutions\/SB_ai_composability_virtualization.pdf. [online]."},{"key":"e_1_3_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3341301.3359646"},{"key":"e_1_3_2_2_44_1","unstructured":"NVIDIA. 2019. TensorRT Developer Guide. https:\/\/docs.nvidia.com\/deeplearning\/sdk\/tensorrt-developer-guide\/index.html. [online].  NVIDIA. 2019. TensorRT Developer Guide. https:\/\/docs.nvidia.com\/deeplearning\/sdk\/tensorrt-developer-guide\/index.html. [online]."},{"key":"e_1_3_2_2_45_1","volume-title":"V100 GPU architecture. The world's most advanced data center GPU. Version WP-08608-001_v1.1. NVIDIA. Aug","author":"Tesla NVIDIA","year":"2017","unstructured":"NVIDIA , Tesla . 2017. V100 GPU architecture. The world's most advanced data center GPU. Version WP-08608-001_v1.1. NVIDIA. Aug ( 2017 ), 108. NVIDIA, Tesla. 2017. V100 GPU architecture. The world's most advanced data center GPU. Version WP-08608-001_v1.1. NVIDIA. Aug (2017), 108."},{"key":"e_1_3_2_2_46_1","unstructured":"NVIDIA Tesla. 2019. MULTI-PROCESS SERVICE. (2019).  NVIDIA Tesla. 2019. MULTI-PROCESS SERVICE. (2019)."},{"key":"e_1_3_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/2775054.2694346"},{"key":"e_1_3_2_2_48_1","unstructured":"Adam Paszke Sam Gross Francisco Massa Adam Lerer James Bradbury Gregory Chanan Trevor Killeen Zeming Lin Natalia Gimelshein Luca Antiga etal 2019. PyTorch: An imperative style high-performance deep learning library In Advances in Neural Information Processing Systems. 8024--8035.  Adam Paszke Sam Gross Francisco Massa Adam Lerer James Bradbury Gregory Chanan Trevor Killeen Zeming Lin Natalia Gimelshein Luca Antiga et al. 2019. PyTorch: An imperative style high-performance deep learning library In Advances in Neural Information Processing Systems. 8024--8035."},{"key":"e_1_3_2_2_49_1","unstructured":"Joseph Redmon. 2013--2016. Darknet: Open Source Neural Networks in C. http:\/\/pjreddie.com\/darknet\/.  Joseph Redmon. 2013--2016. Darknet: Open Source Neural Networks in C. http:\/\/pjreddie.com\/darknet\/."},{"key":"e_1_3_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-015-0816-y"},{"key":"e_1_3_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2945397"},{"key":"e_1_3_2_2_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/3341301.3359658"},{"key":"e_1_3_2_2_53_1","volume-title":"Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman . 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 ( 2014 ). Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)."},{"key":"e_1_3_2_2_54_1","volume-title":"OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in science & engineering 12, 3","author":"Stone John E","year":"2010","unstructured":"John E Stone , David Gohara , and Guochun Shi . 2010. OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in science & engineering 12, 3 ( 2010 ), 66. John E Stone, David Gohara, and Guochun Shi. 2010. OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in science & engineering 12, 3 (2010), 66."},{"key":"e_1_3_2_2_55_1","unstructured":"Giorgos Vasiliadis Lazaros Koromilas Michalis Polychronakis and Sotiris Ioannidis. 2014. {GASPP}: A GPU-Accelerated Stateful Packet Processing Framework. In 2014 {USENIX} Annual Technical Conference ({USENIX}{ATC} 14). 321--332.  Giorgos Vasiliadis Lazaros Koromilas Michalis Polychronakis and Sotiris Ioannidis. 2014. {GASPP}: A GPU-Accelerated Stateful Packet Processing Framework. In 2014 { USENIX } Annual Technical Conference ( { USENIX }{ ATC } 14). 321--332."},{"key":"e_1_3_2_2_56_1","unstructured":"Wikipedia Article. 2018. Diminishing returns. https:\/\/en.wikipedia.org\/wiki\/Diminishing_returns. [online].  Wikipedia Article. 2018. Diminishing returns. https:\/\/en.wikipedia.org\/wiki\/Diminishing_returns. [online]."},{"key":"e_1_3_2_2_57_1","unstructured":"Piotr Wojciechowski Purnendu Mukherjee and Siddharth Sharma. 2018. How to Speed Up Deep Learning Inference Using TensorRT. https:\/\/devblogs.nvidia.com\/speed-up-inference-tensorrt\/.  Piotr Wojciechowski Purnendu Mukherjee and Siddharth Sharma. 2018. How to Speed Up Deep Learning Inference Using TensorRT. https:\/\/devblogs.nvidia.com\/speed-up-inference-tensorrt\/."},{"key":"e_1_3_2_2_58_1","unstructured":"Yonghui Wu Mike Schuster Zhifeng Chen Quoc V Le Mohammad Norouzi Wolfgang Macherey Maxim Krikun Yuan Cao Qin Gao Klaus Macherey etal 2016. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016).  Yonghui Wu Mike Schuster Zhifeng Chen Quoc V Le Mohammad Norouzi Wolfgang Macherey Maxim Krikun Yuan Cao Qin Gao Klaus Macherey et al. 2016. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)."},{"key":"e_1_3_2_2_59_1","volume-title":"Gandiva: Introspective cluster scheduling for deep learning. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 595--610.","author":"Xiao Wencong","year":"2018","unstructured":"Wencong Xiao , Romil Bhardwaj , Ramachandran Ramjee , Muthian Sivathanu , Nipun Kwatra , Zhenhua Han , Pratyush Patel , Xuan Peng , Hanyu Zhao , Quanlu Zhang , 2018 . Gandiva: Introspective cluster scheduling for deep learning. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 595--610. Wencong Xiao, Romil Bhardwaj, Ramachandran Ramjee, Muthian Sivathanu, Nipun Kwatra, Zhenhua Han, Pratyush Patel, Xuan Peng, Hanyu Zhao, Quanlu Zhang, et al. 2018. Gandiva: Introspective cluster scheduling for deep learning. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 595--610."},{"key":"e_1_3_2_2_60_1","volume-title":"Proceedings of Machine Learning and Systems, I. Dhillon, D. Papailiopoulos, and V. Sze (Eds.).","volume":"2","author":"Yu Peifeng","year":"2020","unstructured":"Peifeng Yu and Mosharaf Chowdhury . 2020 . Fine-Grained GPU Sharing Primitives for Deep Learning Applications . In Proceedings of Machine Learning and Systems, I. Dhillon, D. Papailiopoulos, and V. Sze (Eds.). Vol. 2 . 98--111. https:\/\/proceedings.mlsys.org\/paper\/2020\/file\/f7177163c833dff4b38fc8d2872f1ec6-Paper.pdf Peifeng Yu and Mosharaf Chowdhury. 2020. Fine-Grained GPU Sharing Primitives for Deep Learning Applications. In Proceedings of Machine Learning and Systems, I. Dhillon, D. Papailiopoulos, and V. Sze (Eds.). Vol. 2. 98--111. https:\/\/proceedings.mlsys.org\/paper\/2020\/file\/f7177163c833dff4b38fc8d2872f1ec6-Paper.pdf"},{"key":"e_1_3_2_2_61_1","volume-title":"15th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 18). 187--200.","author":"Zhang Kai","unstructured":"Kai Zhang , Bingsheng He , Jiayu Hu , Zeke Wang , Bei Hua , Jiayi Meng , and Lishan Yang . 2018. G-NET : Effective {GPU} Sharing in {NFV} Systems . In 15th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 18). 187--200. Kai Zhang, Bingsheng He, Jiayu Hu, Zeke Wang, Bei Hua, Jiayi Meng, and Lishan Yang. 2018. G-NET: Effective {GPU} Sharing in {NFV} Systems. In 15th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 18). 187--200."},{"key":"e_1_3_2_2_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2018.2858384"}],"event":{"name":"SoCC '20: ACM Symposium on Cloud Computing","location":"Virtual Event USA","acronym":"SoCC '20","sponsor":["SIGMOD ACM Special Interest Group on Management of Data","SIGOPS ACM Special Interest Group on Operating Systems"]},"container-title":["Proceedings of the 11th ACM Symposium on Cloud Computing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3419111.3421284","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3419111.3421284","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:32:05Z","timestamp":1750195925000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3419111.3421284"}},"subtitle":["controlled spatial sharing of GPUs for a scalable inference platform"],"short-title":[],"issued":{"date-parts":[[2020,10,12]]},"references-count":62,"alternative-id":["10.1145\/3419111.3421284","10.1145\/3419111"],"URL":"https:\/\/doi.org\/10.1145\/3419111.3421284","relation":{},"subject":[],"published":{"date-parts":[[2020,10,12]]},"assertion":[{"value":"2020-10-12","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}