{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,29]],"date-time":"2025-09-29T08:08:52Z","timestamp":1759133332816,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":33,"publisher":"ACM","license":[{"start":{"date-parts":[[2018,11,26]],"date-time":"2018-11-26T00:00:00Z","timestamp":1543190400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Army Research Laboratories","award":["W911NF-09-2-0053"],"award-info":[{"award-number":["W911NF-09-2-0053"]}]},{"name":"Conix SRC JUMP Center"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2018,11,26]]},"DOI":"10.1145\/3274808.3274813","type":"proceedings-article","created":{"date-parts":[[2019,2,13]],"date-time":"2019-02-13T18:41:21Z","timestamp":1550083281000},"page":"53-65","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":16,"title":["Olympian"],"prefix":"10.1145","author":[{"given":"Yitao","family":"Hu","sequence":"first","affiliation":[{"name":"University of Southern California"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Swati","family":"Rallapalli","sequence":"additional","affiliation":[{"name":"IBM Research"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bongjun","family":"Ko","sequence":"additional","affiliation":[{"name":"IBM Research"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ramesh","family":"Govindan","sequence":"additional","affiliation":[{"name":"University of Southern California"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2018,11,26]]},"reference":[{"key":"e_1_3_2_2_1_1","volume-title":"Nam Sung Kim, and Michael J. Schulte","author":"Adriaens Jacob T.","year":"2012","unstructured":"Jacob T. Adriaens , Katherine Compton , Nam Sung Kim, and Michael J. Schulte . 2012 . The case for GPGPU spatial multitasking. 79--90. Jacob T. Adriaens, Katherine Compton, Nam Sung Kim, and Michael J. Schulte. 2012. The case for GPGPU spatial multitasking. 79--90."},{"key":"e_1_3_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/ECRTS.2012.15"},{"key":"e_1_3_2_2_3_1","unstructured":"Caffe-tensorflow repository on GitHub 2018. https:\/\/github.com\/ethereon\/caffe-tensorflow\/. (2018).  Caffe-tensorflow repository on GitHub 2018. https:\/\/github.com\/ethereon\/caffe-tensorflow\/. (2018)."},{"key":"e_1_3_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3018743.3018748"},{"key":"e_1_3_2_2_5_1","volume-title":"Clipper: A Low-Latency Online Prediction Serving System.. In NSDI. 613--627.","author":"Crankshaw Daniel","year":"2017","unstructured":"Daniel Crankshaw , Xin Wang , Guilio Zhou , Michael J Franklin , Joseph E Gonzalez , and Ion Stoica . 2017 . Clipper: A Low-Latency Online Prediction Serving System.. In NSDI. 613--627. Daniel Crankshaw, Xin Wang, Guilio Zhou, Michael J Franklin, Joseph E Gonzalez, and Ion Stoica. 2017. Clipper: A Low-Latency Online Prediction Serving System.. In NSDI. 613--627."},{"key":"e_1_3_2_2_6_1","unstructured":"CUDA Profiling Toolkit Interface 2017. http:\/\/docs.nvidia.com\/cuda\/cupti\/index.html. (2017).  CUDA Profiling Toolkit Interface 2017. http:\/\/docs.nvidia.com\/cuda\/cupti\/index.html. (2017)."},{"key":"e_1_3_2_2_7_1","volume-title":"High Performance Computing and Simulation (HPCS), 2010 International Conference on. Proceedings of the 2010 International Conference on High Performance Computing and Simulation, HPCS 2010","author":"Duato Jose","year":"2010","unstructured":"Jose Duato , Antonio Jos\u00e9 Pe\u00f1a , Federico Silla , Rafael Mayo , and Enrique S . Quintana-Ort. 2010. RCUDA: Reducing the number of GPU-based accelerators in high performance clusters , In High Performance Computing and Simulation (HPCS), 2010 International Conference on. Proceedings of the 2010 International Conference on High Performance Computing and Simulation, HPCS 2010 ( 2010 ), 224--231. Jose Duato, Antonio Jos\u00e9 Pe\u00f1a, Federico Silla, Rafael Mayo, and Enrique S. Quintana-Ort. 2010. RCUDA: Reducing the number of GPU-based accelerators in high performance clusters, In High Performance Computing and Simulation (HPCS), 2010 International Conference on. Proceedings of the 2010 International Conference on High Performance Computing and Simulation, HPCS 2010 (2010), 224--231."},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.5555\/1887695.1887738"},{"key":"e_1_3_2_2_9_1","unstructured":"GPU Preemption 2016. https:\/\/www.anandtech.com\/show\/10325\/the-nvidia-geforce-gtx-1080-and-1070-founders-edition-review\/10. (2016).  GPU Preemption 2016. https:\/\/www.anandtech.com\/show\/10325\/the-nvidia-geforce-gtx-1080-and-1070-founders-edition-review\/10. (2016)."},{"key":"e_1_3_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1519138.1519141"},{"key":"e_1_3_2_2_11_1","volume-title":"Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC'11). USENIX Association","author":"Gupta Vishakha","year":"2011","unstructured":"Vishakha Gupta , Karsten Schwan , Niraj Tolia , Vanish Talwar , and Parthasarathy Ranganathan . 2011 . Pegasus: Coordinated Scheduling for Virtualized Accelerator-based Systems . In Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC'11). USENIX Association , Berkeley, CA, USA, 3--3. http:\/\/dl.acm.org\/citation.cfm?id= 2002181.2002184 Vishakha Gupta, Karsten Schwan, Niraj Tolia, Vanish Talwar, and Parthasarathy Ranganathan. 2011. Pegasus: Coordinated Scheduling for Virtualized Accelerator-based Systems. In Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC'11). USENIX Association, Berkeley, CA, USA, 3--3. http:\/\/dl.acm.org\/citation.cfm?id=2002181.2002184"},{"key":"e_1_3_2_2_12_1","volume-title":"Estimating Mobile Application Energy Consumption Using Program Analysis. In 35th International Conference on Software Engineering (ICSE","author":"Hao Shuai","year":"2013","unstructured":"Shuai Hao , Ding Li , William G.J. Halfond , and Ramesh Govindan . 2013 . Estimating Mobile Application Energy Consumption Using Program Analysis. In 35th International Conference on Software Engineering (ICSE 2013). Shuai Hao, Ding Li, William G.J. Halfond, and Ramesh Govindan. 2013. Estimating Mobile Application Energy Consumption Using Program Analysis. In 35th International Conference on Software Engineering (ICSE 2013)."},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_2_14_1","volume-title":"Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093","author":"Jia Yangqing","year":"2014","unstructured":"Yangqing Jia , Evan Shelhamer , Jeff Donahue , Sergey Karayev , Jonathan Long , Ross Girshick , Sergio Guadarrama , and Trevor Darrell . 2014 . Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093 (2014). Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093 (2014)."},{"key":"e_1_3_2_2_15_1","volume-title":"Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC'11). USENIX Association.","author":"Kato Shinpei","year":"2011","unstructured":"Shinpei Kato , Karthik Lakshmanan , Ragunathan Rajkumar , and Yutaka Ishikawa . 2011 . TimeGraph: GPU Scheduling for Realtime Multi-tasking Environments . In Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC'11). USENIX Association. Shinpei Kato, Karthik Lakshmanan, Ragunathan Rajkumar, and Yutaka Ishikawa. 2011. TimeGraph: GPU Scheduling for Realtime Multi-tasking Environments. In Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC'11). USENIX Association."},{"key":"e_1_3_2_2_16_1","unstructured":"Alex Krizhevsky Ilya Sutskever and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.   Alex Krizhevsky Ilya Sutskever and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105."},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1254810.1254816"},{"key":"e_1_3_2_2_18_1","unstructured":"nvidia-smi Document 2018. https:\/\/developer.download.nvidia.com\/compute\/DCGM\/docs\/nvidia-smi-367.38.pdf. (2018).  nvidia-smi Document 2018. https:\/\/developer.download.nvidia.com\/compute\/DCGM\/docs\/nvidia-smi-367.38.pdf. (2018)."},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2451116.2451160"},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2632216"},{"key":"e_1_3_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2011.112"},{"key":"e_1_3_2_2_22_1","volume-title":"Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman . 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 ( 2014 ). Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)."},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2366145.2366180"},{"key":"e_1_3_2_2_24_1","volume-title":"2014 USENIX Annual Technical Conference (USENIX ATC 14)","author":"Suzuki Yusuke","year":"2014","unstructured":"Yusuke Suzuki , Shinpei Kato , Hiroshi Yamada , and Kenji Kono . 2014 . GPUvm: Why Not Virtualizing GPUs at the Hypervisor? . In 2014 USENIX Annual Technical Conference (USENIX ATC 14) . USENIX Association. Yusuke Suzuki, Shinpei Kato, Hiroshi Yamada, and Kenji Kono. 2014. GPUvm: Why Not Virtualizing GPUs at the Hypervisor?. In 2014 USENIX Annual Technical Conference (USENIX ATC 14). USENIX Association."},{"volume-title":"Inception-ResNet and the Impact of Residual Connections on Learning. In ICLR 2016 Workshop. https:\/\/arxiv.org\/abs\/1602","author":"Szegedy Christian","key":"e_1_3_2_2_25_1","unstructured":"Christian Szegedy , Sergey Ioffe , Vincent Vanhoucke , and Alex A. Alemi . 2016. Inception-v4 , Inception-ResNet and the Impact of Residual Connections on Learning. In ICLR 2016 Workshop. https:\/\/arxiv.org\/abs\/1602 .07261 Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alex A. Alemi. 2016. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. In ICLR 2016 Workshop. https:\/\/arxiv.org\/abs\/1602.07261"},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"crossref","unstructured":"Christian Szegedy Wei Liu Yangqing Jia Pierre Sermanet Scott Reed Dragomir Anguelov Dumitru Erhan Vincent Vanhoucke Andrew Rabinovich and others. 2015. Going deeper with convolutions. Cvpr.  Christian Szegedy Wei Liu Yangqing Jia Pierre Sermanet Scott Reed Dragomir Anguelov Dumitru Erhan Vincent Vanhoucke Andrew Rabinovich and others. 2015. Going deeper with convolutions. Cvpr.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2678373.2665702"},{"volume-title":"GraphOptions API 2018","year":"2018","key":"e_1_3_2_2_28_1","unstructured":"TensorFlow GraphOptions API 2018 . https:\/\/www.tensorflow.org\/api__docs\/python\/tf\/GraphOptions. ( 2018 ). TensorFlow GraphOptions API 2018. https:\/\/www.tensorflow.org\/api__docs\/python\/tf\/GraphOptions. (2018)."},{"key":"e_1_3_2_2_29_1","unstructured":"TensorFlow Website 2018. https:\/\/www.tensorflow.org\/. (2018).  TensorFlow Website 2018. https:\/\/www.tensorflow.org\/. (2018)."},{"key":"e_1_3_2_2_30_1","volume-title":"2014 USENIX Annual Technical Conference (USENIX ATC 14)","author":"Tian Kun","year":"2014","unstructured":"Kun Tian , Yaozu Dong , and David Cowperthwaite . 2014 . A Full GPU Virtualization Solution with Mediated Pass-Through . In 2014 USENIX Annual Technical Conference (USENIX ATC 14) . USENIX Association. Kun Tian, Yaozu Dong, and David Cowperthwaite. 2014. A Full GPU Virtualization Solution with Mediated Pass-Through. In 2014 USENIX Annual Technical Conference (USENIX ATC 14). USENIX Association."},{"key":"e_1_3_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3037697.3037742"},{"key":"e_1_3_2_2_32_1","volume-title":"2016 USENIX Annual Technical Conference (USENIX ATC 16)","author":"Xue Mochi","year":"2016","unstructured":"Mochi Xue , Kun Tian , Yaozu Dong , Jiacheng Ma , Jiajun Wang , Zhengwei Qi , Bingsheng He , and Haibing Guan . 2016 . gScale: Scaling up GPU Virtualization with Dynamic Sharing of Graphics Memory Space . In 2016 USENIX Annual Technical Conference (USENIX ATC 16) . USENIX Association. Mochi Xue, Kun Tian, Yaozu Dong, Jiacheng Ma, Jiajun Wang, Zhengwei Qi, Bingsheng He, and Haibing Guan. 2016. gScale: Scaling up GPU Virtualization with Dynamic Sharing of Graphics Memory Space. In 2016 USENIX Annual Technical Conference (USENIX ATC 16). USENIX Association."},{"key":"e_1_3_2_2_33_1","volume-title":"Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling. CoRR abs\/1303.5164","author":"Zhong Jianlong","year":"2013","unstructured":"Jianlong Zhong and Bingsheng He . 2013 . Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling. CoRR abs\/1303.5164 (2013). Jianlong Zhong and Bingsheng He. 2013. Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling. CoRR abs\/1303.5164 (2013)."}],"event":{"name":"Middleware '18: 19th International Middleware Conference","sponsor":["ACM Association for Computing Machinery","USENIX Assoc USENIX Assoc","IFIP International Federation for Information Processing"],"location":"Rennes France","acronym":"Middleware '18"},"container-title":["Proceedings of the 19th International Middleware Conference"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3274808.3274813","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3274808.3274813","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T00:44:03Z","timestamp":1750207443000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3274808.3274813"}},"subtitle":["Scheduling GPU Usage in a Deep Neural Network Model Serving System"],"short-title":[],"issued":{"date-parts":[[2018,11,26]]},"references-count":33,"alternative-id":["10.1145\/3274808.3274813","10.1145\/3274808"],"URL":"https:\/\/doi.org\/10.1145\/3274808.3274813","relation":{},"subject":[],"published":{"date-parts":[[2018,11,26]]},"assertion":[{"value":"2018-11-26","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}